A Scalable Approach to Covariate and Concept Drift Management via Adaptive Data Segmentation

Yarabolu, Vennela; Waghmare, Govind; Gupta, Sonia; Asthana, Siddhartha

doi:10.1145/3703323.3703337

Abstract:In many real-world applications, continuous machine learning (ML) systems are crucial but prone to data drift, a phenomenon where discrepancies between historical training data and future test data lead to significant performance degradation and operational inefficiencies. Traditional drift adaptation methods typically update models using ensemble techniques, often discarding drifted historical data, and focus primarily on either covariate drift or concept drift. These methods face issues such as high resource demands, inability to manage all types of drifts effectively, and neglecting the valuable context that historical data can provide. We contend that explicitly incorporating drifted data into the model training process significantly enhances model accuracy and robustness. This paper introduces an advanced framework that integrates the strengths of data-centric approaches with adaptive management of both covariate and concept drift in a scalable and efficient manner. Our framework employs sophisticated data segmentation techniques to identify optimal data batches that accurately reflect test data patterns. These data batches are then utilized for training on test data, ensuring that the models remain relevant and accurate over time. By leveraging the advantages of both data segmentation and scalable drift management, our solution ensures robust model accuracy and operational efficiency in large-scale ML deployments. It also minimizes resource consumption and computational overhead by selecting and utilizing relevant data subsets, leading to significant cost savings. Experimental results on classification task on real-world and synthetic datasets show our approach improves model accuracy while reducing operational costs and latency. This practical solution overcomes inefficiencies in current methods, providing a robust, adaptable, and scalable approach.

Comments:	Accepted in CODS-COMAD 2024
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2411.15616 [cs.LG]
	(or arXiv:2411.15616v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.15616
Related DOI:	https://doi.org/10.1145/3703323.3703337

Computer Science > Machine Learning

Title:A Scalable Approach to Covariate and Concept Drift Management via Adaptive Data Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators