Accession Number : ADA584739


Title :   Algorithms for Large-Scale Astronomical Problems


Descriptive Note : Doctoral thesis


Corporate Author : CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE


Personal Author(s) : Fu, Bin


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a584739.pdf


Report Date : Aug 2013


Pagination or Media Count : 124


Abstract : Modern astronomical datasets are getting larger and larger, which already include billions of celestial objects and take up terabytes of disk space. Meanwhile, many astronomical applications do not scale well to such large amount of data, which raises the following question: How can we use modern computer science techniques to help astronomers better analyze large datasets? To answer this question, we applied various computer science techniques to provide fast, scalable solutions to the following astronomical problems. We developed algorithms to better work with big data. We found out that for some astronomical problems, the information that users require each time only covers a small proportion of the input dataset. Thus we carefully organized data layout on disk to quickly answer user queries, and the developed technique uses only one desktop computer to handle datasets with billions of data entries. We made use of database techniques to store and retrieve data. We designed table schemas and query processing functions to maximize their performance on large datasets. Some database features like indexing and sorting further reduce the processing time of user queries. We processed large data using modern distributed computing frameworks. We considered widely-used frameworks in the astronomy world, like Message Passing Interface (MPI), as well as emerging frameworks such as MapReduce. The developed implementations scale well to tens of billions of objects on hundreds of compute cores. During our research, we noticed that modern computer hardware is helpful to solve some sub-problems we encountered. One example is the use of Solid-State Drives (SSDs), whose random access time is faster than regular hard disk drives. The use of Graphics Processing Units (GPUs) is another example, which, under right circumstances, is able to achieve a higher level of parallelism than ordinary CPU clusters.


Descriptors :   *ALGORITHMS , *ASTRONOMY , *DATA BASES , *DISTRIBUTED COMPUTING , *INFORMATION RETRIEVAL , ACCESS TIME , ASTRONOMERS , ASTRONOMICAL BODIES , CLUSTERING , COMPUTERS , DISTRIBUTED DATA PROCESSING , INPUT , INTERFACES , INTERROGATION , MESSAGE PROCESSING , METHODOLOGY , MINICOMPUTERS , PARALLEL PROCESSING , PROCESSING , PROCESSING EQUIPMENT , RANDOM ACCESS COMPUTER STORAGE , SCALE , SCALING FACTOR , SOLID STATE ELECTRONICS , SOLUTIONS(GENERAL) , SORTING , TABLES(DATA) , THESES , TIME , USER NEEDS


Subject Categories : Astronomy
      Numerical Mathematics
      Computer Systems


Distribution Statement : APPROVED FOR PUBLIC RELEASE