Log Analysis Using Splunk Hadoop Connect
Abstract:
The purpose of this research it to use Splunk and Hadoop to do timestamp analysis on computer logs. Splunk is a commercial data analytics tool. Hadoop is a system for large-scale distributed storage and processing. This research ingested computer logs from two kinds of forensic data from the Real Data Corpus to establish a baseline and find anomalies. We analyzed timestamps and Event IDs on more than two thousand logs across hundreds of drives. Additionally, we used packet captures from Center for Applied Internet Data Analysis to test Hadoops ability to store and transfer data between Hadoop Distributed File System and Splunk. We used Splunk Hadoop Connect for data transfer between a Splunk server and a Hadoop cluster. Splunk was able to effectively identify and represent statistical anomalies in log files. These anomalies could reveal misconfiguration, security concerns, or unusual but harmless traffic. Splunk could also easily transfer data to relatively inexpensive commodity servers using Splunk Hadoop Connect.