Accession Number:

ADA597352

Title:

Scale-Independent Relational Query Processing

Descriptive Note:

Technical rept.

Corporate Author:

CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES

Personal Author(s):

Report Date:

2013-10-04

Pagination or Media Count:

146.0

Abstract:

An increasingly common pattern is for newly-released web applications to succumb to a Success Disaster. In this scenario, overloaded database machines and resultant high response times destroy a previously good user experience, just as a site is becoming popular. Unfortunately, the data independence provided by a traditional relational database system while useful for agile development, only exacerbates the problem by hiding potentially expensive queries under simple declarative expressions. The disconnect between expression complexity and runtime cost often leads developers to mistrust the suitability of relational database systems for their web applications in the long term. As a result, developers of these applications are increasingly abandoning relational systems in favor of imperative code written against distributed NoSQL keyvalue stores, losing the many benefits of data independence in the process. While some claim that scalability issues are inherent in the use of the relational model this thesis challenges that notion by extending standard data independence with the notion of scale independence. In contrast to traditional relational databases, a scale-independent system is capable of providing predictable response time for all of the queries in an application even as the amount of data grows by orders of magnitude. This predictability is achieved by compile-time enforcement of strict upper bounds on the number of operations that will be performed for all queries. Coupled with a service level objective SLO compliance prediction model and a scalable storage architecture, these upper bounds make it easy for developers to write success-tolerant applications that support an arbitrarily large number of users while still providing acceptable performance. Statically bounding the amount of work required to execute a query can be easy for some queries, such as those that perform a lookup of a single record by primary key.

Subject Categories:

  • Computer Programming and Software

Distribution Statement:

APPROVED FOR PUBLIC RELEASE