Accession Number : ADA580209


Title :   Fast Anomaly Discovery Given Duplicates


Corporate Author : CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE


Personal Author(s) : Lee, Jay-Yoon ; Kang, U ; Koutra, Danai ; Faloutsos, Christos


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a580209.pdf


Report Date : Dec 2012


Pagination or Media Count : 17


Abstract : Given a large cloud of multi-dimensional points, and an off-the-shelf outlier detection method, why does it take a week to finish? After careful analysis, we discovered that duplicate points create subtle issues, that the literature has ignored: if dmax is the multiplicity of the most overplotted point, typical algorithms are quadratic on dmax. For graph-related outlier detection, all the satellites of a 'star' node will have identical features, and thus create over-plotting with dmax being the highest degree; due to power law degree distributions, this may be huge, for real graph data. We propose several ways to eliminate the problem; we report wall-clock times and our time savings and we show that our methods give either exact results, or highly accurate approximate ones.


Descriptors :   *COMPUTER SCIENCE , ANOMALIES , DETECTION , GRAPHS


Subject Categories : Computer Programming and Software


Distribution Statement : APPROVED FOR PUBLIC RELEASE