Detecting Patterns of Anomalies

reportActive / Technical Report | Accession Number: ADA501930 | Open PDF

Abstract:

An anomaly is an observation that does not conform to the expected normal behavior. With the ever increasing amount of data being collected, automatic surveillance systems are becoming more popular and are increasingly using data mining methods to detect patterns of anomalies. The diverse nature of real-world datasets, and the difficulty of obtaining labeled training data make it challenging to develop a universal framework for anomaly detection. We focus on a key feature of most real world scenarios, that multiple anomalous records are usually generated by a common anomalous process. In this thesis we develop methods that utilize the similarity between records in these groups or patterns of anomalies to perform better detection. We also investigate new methods for detection of individual record anomalies, which we then incorporate into the group detection methods. A recurring feature of our methods is combinatorial search over some space e.g. over all subsets of attributes, or over all subsets of records. We use a variety of computational speedup tricks and approximation techniques to make these methods scalable to large datasets. Since most of our motivating problems involve datasets having categorical or symbolic values, we focus on categorical datasets. Apart from this, we make few assumptions about the data, and our methods are very general and applicable to a wide variety of domains. Additionally, we investigate anomaly pattern detection in data structured by space and time. Our method generalizes the popular method of spatio-temporal scan statistics to learn and detect specific, time-varying spatial patterns in the data. Finally, we show an efficient and easily interpretable technique for anomaly detection in multivariate time series data. We evaluate our methods on a variety of real world data sets including both real and synthetic anomalies.

Security Markings

DOCUMENT & CONTEXTUAL SUMMARY

Distribution:
Approved For Public Release
Distribution Statement:
Approved For Public Release; Distribution Is Unlimited.

RECORD

Collection: TR
Identifying Numbers
Subject Terms