An Adaptive Pipeline from Scientific Data to Models
Abstract:
Under the Defense Advanced Research Projects Agency's Synergistic Discovery and Design program, the Duke Team, composed of scientists from Duke, Rutgers, Montana State, and Florida Atlantic Universities, as well as Geometric Data Analytics, and Netrias, Inc., broadly researched and developed data driven techniques for scientific discovery and robust design, proving feasibility through program challenge problems with Yeast States, Novel Chassis, Protein Stability, and Perovskite. Their efforts developed an adaptable computational pipeline with approaches, methods, and tooling which learns the structure and function of regulatory networks for model construction from time-series data, providing an ability to utilize high-fidelity models and simulations to account for process perturbations and component interactions for robust scientific discovery and system design. Aside from their innovative Dynamic Signatures Generated by Regulatory Networks (DSGRN), tools were developed for automating data pre-processing, normalization, quality control, scientific extraction, data aggregation, and data analyses; accelerating the design, build, test, learn loop; and transitioned to the National Institute of Healths Accelerating COVID-19 Therapeutic Interventions and Vaccines Program. Additionally, when it became clear that the foundry-style, high-throughput laboratories charged with duties of experimentation and data collection did not have sufficient capabilities to produce some data types and perform developmental protocols, Duke Team members stepped in with their own wet-lab capabilities to fulfill the gap in data collection. This work facilitated new analyses aimed at integrating data from both high-throughput and benchtop laboratories, substantially extending the capabilities of the SD2 architecture with the innovative extension of Aquarium for the benchtop.