Accession Number : AD1038470


Title :   Working with and Visualizing Big Data Efficiently with Python for the DARPA XDATA Program


Descriptive Note : Technical Report,01 Oct 2012,01 Mar 2017


Corporate Author : Continuum Analytics, Inc. Austin United States


Personal Author(s) : Oliphant,Travis ; Wang,Peter ; Seibert,Stan ; Rocklin,Matthew ; Van de Ven,Bryan ; Sparra,Hunt


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/1038470.pdf


Report Date : 01 Aug 2017


Pagination or Media Count : 43


Abstract : Research performed under the XDATA program focused on computational techniques and software tools for analyzing large volumes of data, both semi-structured (e.g. tabular, relational, categorical, meta-data) and unstructured (e.g. text, documents, message traffic). Several open source project which have seen community and industry adoption grew out of this effort. - Blaze: A collection packages for describing and accessing, and manipulating disparate data sources and types - Numba: A just-in-time function compiler for Python, based on LLVM compiler project allowing researchers to run their Python code near native speeds on CPUs and GPUs. - Dask: Parallelizes generic Python and extends NumPy, Pandas, and Scikit-learn with parallel variants. -Bokeh: Create interactive web applications from Python without having to know Javascript, CSS, or HTML.


Descriptors :   SOFTWARE TOOLS , PYTHON PROGRAMMING LANGUAGE , DATA VISUALIZATION , computer program documentation , machine learning , workload , algorithms , high performance computing , databases , information systems , web applications , computer programming , CLUSTERING


Subject Categories : Computer Programming and Software


Distribution Statement : APPROVED FOR PUBLIC RELEASE