D-semble: Efficient Diversity-Guided Search for Resilient ML Ensembles

Posted on November 21, 2024 by karthikp | Comments Off

Abraham Chan, Arpan Gujarati, Karthik Pattabiraman and Sathish Gopalakrishnan, Proceedings of the ACM International Symposium on Applied Computing (SAC), 2025. Safe, Secure, and Robust AI Track. (Acceptance Rate: 23%) [ PDF | Talk ] (code)

Abstract: Supervised Machine learning (ML) is used in many safety-critical applications, such as self-driving cars and medical imaging. Unfortunately, many training datasets have been discovered to contain faults. The accuracy of individual models when trained with faulty datasets can significantly degrade. In comparison, ensembles, consisting of multiple models combined through simple majority voting, are able to retain accuracy despite training data faults, due to their classification diversity, and are thus more resilient. However, there are many different ways to generate ML ensembles, and their accuracy can significantly differ. This creates a large searchspace for ensembles, making it challenging to find ensembles that maximize accuracy despite training data faults. We identify three different ways to generate diverse ML models, and present D-semble, a technique that uses Genetic Algorithms and diversity to efficiently search for resilient ensembles. We evaluate D-semble by measuring the balanced accuracies and F1-scores of ensembles it finds. Compared with bagging, greedy search, random selection, and the best individual model, ensembles found by D-semble are on average 9%, 16%, 28%, 32% more resilient respectively.

D-semble: Efficient Diversity-Guided Search for Resilient ML Ensembles

Recent Papers

Pages