Accession Number:

ADA624306

Title:

Resource-Efficient Data-Intensive System Designs for High Performance and Capacity

Descriptive Note:

Doctoral thesis

Corporate Author:

CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE

Personal Author(s):

Report Date:

2015-09-01

Pagination or Media Count:

140.0

Abstract:

Data-intensive systems are a critical building block for todays large-scale Internet services. These systems have enabled high throughput and capacity, reaching billions of requests per second for trillions of items in a single storage cluster. However many systems are surprisingly inefficient for instance, memcached, a widely-used in-memory key-value store system, handles 1-2 million requests per second on a modern server node, whereas an optimized software system could achieve over 70 million requests per second using the same hardware. Reducing such inefficiencies can improve the cost effectiveness of the systems significantly. This dissertation shows that by leveraging modern hardware and exploiting workload characteristics, data-intensive storage systems that process a large amount of fine-grained data can achieve an order of magnitude higher performance and capacity than prior systems that are built for generic hardware and workloads. As examples we present SILT and MICA, which are resource-efficient key-value stores for flash and memory. SILT provides flash-speed query processing and 5.7X higher capacity than the previous state-of-the-art system. It employs new memory-efficient indexing schemes including ECT that requires only 2.5 bits per item in memory, and a system cost model built upon new accurate and fast analytic primitives to find workload-specific system configurations. MICA offers 4X higher throughput over the network than previous in-memory key-value store systems by performing efficient parallel request processing on multi-core processors and low-overhead request direction with modern network interface cards, and by using new key-value data structures designed for specific workload types.

Subject Categories:

  • Computer Programming and Software
  • Computer Hardware
  • Computer Systems Management and Standards
  • Radio Communications

Distribution Statement:

APPROVED FOR PUBLIC RELEASE