Accession Number : AD1038470

Title :   Working with and Visualizing Big Data Efficiently with Python for the DARPA XDATA Program

Descriptive Note : Technical Report,01 Oct 2012,01 Mar 2017

Corporate Author : Continuum Analytics, Inc. Austin United States

Personal Author(s) : Oliphant,Travis ; Wang,Peter ; Seibert,Stan ; Rocklin,Matthew ; Van de Ven,Bryan ; Sparra,Hunt

Full Text :

Report Date : 01 Aug 2017

Pagination or Media Count : 43

Abstract : Research performed under the XDATA program focused on computational techniques and software tools for analyzing large volumes of data, both semi-structured (e.g. tabular, relational, categorical, meta-data) and unstructured (e.g. text, documents, message traffic). Several open source project which have seen community and industry adoption grew out of this effort. - Blaze: A collection packages for describing and accessing, and manipulating disparate data sources and types - Numba: A just-in-time function compiler for Python, based on LLVM compiler project allowing researchers to run their Python code near native speeds on CPUs and GPUs. - Dask: Parallelizes generic Python and extends NumPy, Pandas, and Scikit-learn with parallel variants. -Bokeh: Create interactive web applications from Python without having to know Javascript, CSS, or HTML.

Descriptors :   SOFTWARE TOOLS , PYTHON PROGRAMMING LANGUAGE , DATA VISUALIZATION , computer program documentation , machine learning , workload , algorithms , high performance computing , databases , information systems , web applications , computer programming , CLUSTERING

Subject Categories : Computer Programming and Software

Distribution Statement : APPROVED FOR PUBLIC RELEASE