Accession Number : ADA577443


Title :   Shark: Fast Data Analysis Using Coarse-grained Distributed Memory


Descriptive Note : Technical rept.


Corporate Author : CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE


Personal Author(s) : Engle, Clifford


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a577443.pdf


Report Date : 01 May 2013


Pagination or Media Count : 32


Abstract : Shark is a research data analysis system built on a novel coarse-grained distributed shared-memory abstraction. Shark marries query processing with deep data analysis, providing a unified system for easy data manipulation using SQL and pushing sophisticated analysis closer to data. It scales to thousands of nodes in a fault-tolerant manner. Shark can answer queries 40X faster than Apache Hive and run machine learning programs 25X faster than MapReduce programs in Apache Hadoop on large datasets. This is a complete overview of the development of Shark, including design decisions, performance details, and comparison with existing data warehousing solutions. It demonstrates some of Shark's distinguishing features including its in-memory columnar caching and its unified machine learning interface.


Descriptors :   *LEARNING MACHINES , *MEMORY DEVICES , DATA PROCESSING , DECISION MAKING , EXPERIMENTAL DATA


Subject Categories : Psychology
      Computer Programming and Software


Distribution Statement : APPROVED FOR PUBLIC RELEASE