Data Mining

Empirical Software Engineering Laboratory

Position: 
Graduate Research Assistant
Employment Date: 
July, 2002 to July, 2004
Geography: 

Software quality analysis by combining multiple projects and learners

Software Quality Journal This paper continues to explore noise filtering using learners. Four classification scenarios were investigated. The first scenario applies the more classical approach: training one classifier with a single fit dataset and predicting the test dataset. The second approach is a popular method in data mining: a classifier is built based on the prediction of multiple learners induced on the same dataset. The third approach consists of using the prediction of the same learner induced on multiple fit datasets (multi-dataset classifier). Finally, the most generic approach combines the predictions of multiple learners built on multiple fit datasets and applied on the dataset we want to predict. Such a technique is referred to as multi-learner multi-dataset classifier.

To our knowledge, this empirical work is one of the largest in terms of both scale and scope: 119 (17 9 7) base classification models were built, and more than 700 vectors of base estimates were generated. This paper was published in the Software Quality Journal. You can find more information on Springer and on the ACM portal.

Improving software quality prediction by noise filtering techniques

Journal of Computer Science and Technology (JCST) Dr. Khoshgoftaar and I outline our two filtering techniques, the multiple-partitioning filter and the iterative-partitioning filter respectively. The primary aim of this study is to compare the predictive performances of the final models built on the filtered and the un-filtered training datasets. A case study of software measurement data provided by the NASA is performed. The data is available through the Metrics Data Program (MDP) and includes software measurement data and associated error data collected at the subroutine level.

This article was published in the Journal of Computer Science and Technology (JCST) in Volume 22, Issue 3 in 2006. Details of the paper can be found on the editor web site.

Quality Problem in Software Measurement Data

Cover of Advances in Computers After publishing a couple of articles on quality of software metrics, the Empirical Software Engineering Laboratory was contacted to contribute a chapter for the special issue on Quality Software Development for the book Advances in Computers published by Elsevier.

Therefore, I became the first author of the chapter where I presented the various results about quality in software metrics, noise filtering, and partitioning algorithms. The book has been lately added to Google Books.

Noise elimination with partitioning filter for software quality estimation

IJCAT Dr. Khoshgoftaar and I presents two filtering techniques, the multiple-partitioning filter and the iterative-partitioning filter respectively.

This article was published in a special issue for on Data Mining Applications of the International Journal of Computer Applications in Technology (IJCAT) in Volume 27, Issue 4 in 2006. Details are available on the ACM portal or on the site of the publisher.

Evaluating noise elimination techniques for software quality estimation

Dr. Khoshgoftaar and I presents our two main filtering techniques, the multiple-partitioning filter and the iterative-partitioning filter respectively, compared head-to-head with a new comparison algorithm. Our study demonstrates that with a conservative filtering approach, using several different base learners can improve the efficiency of the filtering schemes.

This article was published in the Intelligent Data Analysis Journal (IDA) in Volume 9, Issue 5 in 2005. Details are available on the ACM portal or on the site of the publisher.

Generating Multiple Noise Elimination Filters with the Ensemble-Partitioning Filter

IEEE This is the first paper where Dr. Khoshgoftaar and I present the Ensemble Filter to filter noise using machine learners. We show that this filter can be specialized into the classification, ensemble, multiple-partitioning, or iterative-partitioning filter. A case study of software metrics data from a high assurance software project analyzes the similarities between the filters obtained from the specialization of the Ensemble Filter.

This paper was published part of the Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration (IRI04). You can find more information on IEEE.

Slides from the Advanced Data Mining course

I have made available the slides from my two-hours presentation related to the Advanced Data Mining and Machine Learning course (COP 6579). The presentation aims at presenting the concepts used to implement noise elimination with the Ensemble-Partitioning Filter.
Subscribe to RSS - Data Mining