NP-Hardness of minimum expected coverage

PATTERN RECOGNITION LETTERS, v. 117, p. 45-51, 2019

Pesquisadores: Lucas Henrique Sousa Mello, Flávio M. Varejão, Alexandre L. Rodrigues, Thomas W. Rauber

In multi-label learning a single object can be associated with multiple labels simultaneously. In a context where labels follow a random distribution, every labelling has a probability of occurrence. Thus, any prediction is associated with an expected error measured by a predefined loss function. From an exponential number of possible labellings, an algorithm should choose the prediction that minimizes the expected error. This is known as loss minimization. This work shows a proof of the NP-completeness, with respect to the number of labels, of a specific case of the loss minimization of the Coverage loss function, which allows to conclude that the general case is NP-hard.


Reducing power companies billing costs via empirical bayes and seasonality remover


Pesquisadores: Alexandre Loureiros Rodrigues, Lucas Martinuzzo, Flávio Miguel Varejão, Vítor E. Silva Souza, Thiago de Oliveira Santos

Billing errors increase the costs of power companies and lower their reliability as perceived by customers. The majority of these errors are due to wrong readings that occur when employees of power companies visit the customers to read electrical meters and issue the bills. To prevent such errors, prediction techniques calculate a predicted value for each customer based on the values of their previous readings, plus a tolerance around this value, sending bills to be inspected by analysts if the reading extrapolates the established range. However, such analysis increases the personnel cost of the power company. In addition, wrongly printed bills lead to possible lawsuits and fines that might also affect the costs and reliability of the power company. The main focus of this work is to minimize personnel cost by reducing the number of correct readings sent to unnecessary analysis, while protecting the power company credibility by not increasing the number of bills with wrong values sent to clients in the process. The proposed solution uses Empirical Bayes methods along with a method to consider seasonal behavior of customers. The methodology was applied to a dataset comprising 35,704,489 measurements from 1,330,989 different customers of a Brazilian power company. The results show that the new methodology was able to decrease the number of correct bills sent to analysis without lowering the reputation of the company.


Combining classifiers with decision templates for automatic fault diagnosis of electrical submersible pumps


Pesquisadores: Thiago de Oliveira Santos; Alexandre Loureiros Rodrigues; Vitor Freitas Rocha; Thomas W. Rauber; Flávio Miguel Varejão; Marcos Pellegrini Ribeiro.

A machine learning approach to perform automatic detection and diagnosis of faults of electrical submersible pump systems is presented. Several thousand vibration patterns were acquired from vertically distributed accelerometers along the string of motors, pumps and protectors. Intermediate features are extracted from the raw vibration signals originating from the set of accelerometers. Each pattern was labelled by a human expert to provide ground truth with respect to the different operation classes (normal, sensor fault, rubbing, unbalance or misalignment). A software framework is used to compare several classifier architectures (K-Nearest-Neighbor, Random Forest, Support Vector Machine, Naïve Bayes and Decision Trees) in a bias aware performance evaluation. In order to boost the classification performance, an ensemble of different versions of a classifier architecture is constructed using the Decision Templates fusion function. The robustness of the system with respect to the emergence of new faults (i.e., untreated faults so far) is corroborated by a systematic analysis methodology.


Sampling approaches for applying DBSCAN to large datasets

PATTERN RECOGNITION LETTERS, v. 117, p. 90-96, 2018.

Pesquisadores: Diego Luchi, Alexandre Loureiros Rodrigues, Flávio Miguel Varejão

DBSCAN is a classic clustering method for identifying clusters of different shapes and isolate noisy patterns. Despite these qualities, many articles in the literature address the scalability problem of DBSCAN. This work presents two methods to generate a good sample for the DBSCAN algorithm. The execution time decreases due to the reduction in the number of patterns presented to DBSCAN. One method is an improvement of the Rough-DBSCAN and presented consistently better results. The second is a new heuristic called I-DBSCAN capable of adapting and generating good results for all datasets without the need of any additional parameter.


Cascade Feature Selection and ELM for Automatic Fault Diagnosis of the Tennessee Eastman Process

Neurocomputing (Amsterdam), v. 1, p. 1, 2017

Pesquisadores: Thomas W. Rauber, Francisco de Assis Boldt, Flávio Miguel Varejão

This work presents the concept of Cascade Feature Selection to combine feature selection methods. Fast and weak methods, like ranking, are placed on the top of the cascade to reduce the dimensionality of the initial feature set. Thus, strong and computationally demanding methods, placed on the bottom of the cascade, have to deal with less features. Three cascade combinations are tested with the Extreme Learning Machine as the underlying classification architecture. The Tennessee Eastman chemical process simulation software and one high-dimensional data set are used as sources of the benchmark data. Experimental results suggest that the cascade arrangement can produce smaller final feature subsets, expending less time, with higher classification performances than a feature selection based on a Genetic Algorithm. Many works in the literature have proposed mixed methods with specific combination strategies. The main contribution of this work is a concept able to combine any existent method using a single strategy. Provided that the Cascade Feature Selection requirements are fulfilled, the combinations might reduce the time to select features or increase the classification performance of the classifiers trained with the selected features.


Heterogeneous Feature Models and Feature Selection Applied to Bearing Fault Diagnosis

IEEE Transactions on Industrial Electronics, v. 62, p. 637-646, 2015.

Pesquisadores: Thomas W. Rauber, Francisco de Assis Boldt, Flávio Miguel Varejão

Distinct feature extraction methods are simultaneously used to describe bearing faults. This approach produces a large number of heterogeneous features that augment discriminative information but, at the same time, create irrelevant and redundant information. A subsequent feature selection phase filters out the most discriminative features. The feature models are based on the complex envelope spectrum, statistical time- and frequency-domain parameters, and wavelet packet analysis. Feature selection is achieved by conventional search of the feature space by greedy methods. For the final fault diagnosis, the k-nearest neighbor classifier, feedforward net, and support vector machine are used. Performance criteria are the estimated error rate and the area under the receiver operating characteristic curve (AUC-ROC). Experimental results are shown for the Case Western Reserve University Bearing Data. The main contribution of this paper is the strategy to use several different feature models in a single pool, together with feature selection to optimize the fault diagnosis system. Moreover, robust performance estimation techniques usually not encountered in the context of engineering are employed.


Performance Analysis of Extreme Learning Machine for Automatic Diagnosis of Electrical Submersible Pump Conditions.

INDIN 2014 - 12th IEEE International Conference on Industrial Informatics, 2014, Porto Alegre, RS. Proc. of 12th IEEE International Conference on Industrial Informatics, 2014.

Pesquisadores: Francisco de Assis Boldt, Thomas W. Rauber, Flávio Miguel Varejão, Marcos Pellegrini Ribeiro


Automatic diagnosis of submersible motor pump conditions in offshore oil exploration

IECON 2013 - The 39th Annual Conference of the IEEE Industrial Electronics Society, 2013, Vienna. Proc. of the 39th Annual Conference of the IEEE Industrial Electronics Society, 2013.

Pesquisadores: Alexandre Rodrigues Loureiros, Fábio Fabris, Flávio Miguel Varejão, Thomas W. Rauber, Marcos Pellegrini Ribeiro


Computational Intelligence for Automatic Diagnosis of Submersible Motor Pump Conditions in Offshore Oil Exploration

IEEE International Conference on Electronics, Circuits, and Systems, 2013, Abu Dhabi. Proc. of IEEE International Conference on Electronics, Circuits, and Systems, 2013.

Pesquisadores: Thomas W. Rauber, Francisco de Assis Boldt, Flávio Miguel Varejão


Feature Extraction and Selection for Automatic Fault Diagnosis of Rotating Machinery

ENIAC 2013 - Encontro Nacional de Inteligência Artificial e Computacional, 2013, Fortaleza, CE. Proc. of Encontro Nacional de Inteligência Artificial e Computacional, 2013.

Pesquisadores: Francisco de Assis Boldt, Thomas W. Rauber, Flávio Miguel Varejão