Generating Multiple Noise Elimination Filters with the Ensemble-Partitioning Filter
This is the first paper where Dr. Khoshgoftaar and I present the Ensemble Filter to filter noise using machine learners. We show that this filter can be specialized into the classification, ensemble, multiple-partitioning, or iterative-partitioning filter. A case study of software metrics data from a high assurance software project analyzes the similarities between the filters obtained from the specialization of the Ensemble Filter.
This paper was published part of the Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration (IRI04). You can find more information on IEEE.
We present the ensemble-partitioning filter which is a generalization of some common filtering techniques developed in the literature. Filtering the training dataset, i.e., removing noisy data, can be used to improve the accuracy of the induced data mining learners. Tuning the few parameters of the ensemble-partitioning filter allows filtering a given data mining problem appropriately. For example, it is possible to specialize the ensemble-partitioning filter into the classification, ensemble, multiple-partitioning, or iterative-partitioning filter. The predictions of the filtering experts are then utilized such that if an instance is misclassified by a certain number of experts or learners, it is identified as noisy. The conservativeness of the ensemble-partitioning filter depends on the filtering level and the number of filtering iterations. A case study of software metrics data from a high assurance software project analyzes the similarities between the filters obtained from the specialization of the ensemble-partitioning filter. We show that over 25% of the time, the filters at different levels of conservativeness agree on labeling instances as noisy. In addition, the classification filter has the lowest agreement with the other filters.