Charles River Analytics Inc. Cambridge United States
Subject matter experts SMEs attempting to solve real-world analytic problems face several challenges due to the lack of applied mathematics, statistics, and machine learning skills that data scientists possess. The goal of our TA2 effort under the DARPA D3M Program was to span this gap by using novel methods and automation to enable SMEs to act as their own data scientists. Our effort was designed to fuse data- and knowledge-driven approaches to produce a virtual data scientist we call Eve. To translate domain-expert intent into formal representations of learning problems, we built a problem representation system that deterministically converts TA3 inputs into computer-interpretable mathematical expressions. To efficiently search for and compose the sequences of machine learning steps that comprise learning plans, we built a Monte Carlo Discrepancy Search approach that explores the vast space of possible plans through efficient modification and testing of prior related andor successful plans. Further, we enriched these plans by incorporating data preparation models, treating data preprocessing functions as operators to be planned in-line with learning operators.