SCALABLE ANOMALY DETECTION USING STREAM MINING TECHNIQUES ON BIG DATA FRAMEWORKS
Abstract
The explosive growth of data generated on social media platforms like Twitter presents both opportunities and challenges for real-time anomaly detection. Traditional approaches struggle to scale with the velocity, volume, and variety of such data. This paper proposes a scalable framework for anomaly detection using stream mining techniques built on Apache Spark and its machine learning library, MLlib. The system is designed to process high-throughput tweet streams in real time, detect anomalous patterns, and evaluate the performance of various anomaly detection algorithms including Streaming K-Means and Isolation Forests.