Accession Number:

ADA584739

Title:

Algorithms for Large-Scale Astronomical Problems

Descriptive Note:

Doctoral thesis

Corporate Author:

CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE

Personal Author(s):

Report Date:

2013-08-01

Pagination or Media Count:

124.0

Abstract:

Modern astronomical datasets are getting larger and larger, which already include billions of celestial objects and take up terabytes of disk space. Meanwhile, many astronomical applications do not scale well to such large amount of data, which raises the following question How can we use modern computer science techniques to help astronomers better analyze large datasets To answer this question, we applied various computer science techniques to provide fast, scalable solutions to the following astronomical problems. We developed algorithms to better work with big data. We found out that for some astronomical problems, the information that users require each time only covers a small proportion of the input dataset. Thus we carefully organized data layout on disk to quickly answer user queries, and the developed technique uses only one desktop computer to handle datasets with billions of data entries. We made use of database techniques to store and retrieve data. We designed table schemas and query processing functions to maximize their performance on large datasets. Some database features like indexing and sorting further reduce the processing time of user queries. We processed large data using modern distributed computing frameworks. We considered widely-used frameworks in the astronomy world, like Message Passing Interface MPI, as well as emerging frameworks such as MapReduce. The developed implementations scale well to tens of billions of objects on hundreds of compute cores. During our research, we noticed that modern computer hardware is helpful to solve some sub-problems we encountered. One example is the use of Solid-State Drives SSDs, whose random access time is faster than regular hard disk drives. The use of Graphics Processing Units GPUs is another example, which, under right circumstances, is able to achieve a higher level of parallelism than ordinary CPU clusters.

Subject Categories:

  • Astronomy
  • Numerical Mathematics
  • Computer Systems

Distribution Statement:

APPROVED FOR PUBLIC RELEASE