The RADAR Test Methodology: Evaluating a Multi-Task Machine Learning System with Humans in the Loop

Steinfeld, Aaron; Bennett, Rachael; Cunningham, Kyle; Lahut, Matt; Quinones, Pablo-alejandro; Wexler, Django; Siewiorek, Dan; Cohen, Paul; Fitzgerald, Julie; Hansson, Othar

The RADAR Test Methodology: Evaluating a Multi-Task Machine Learning System with Humans in the Loop

Active / Technical Report | Accession Number: ADA457300 |

Open PDF

Abstract:

The RADAR Reflective Agents with Distributed Adaptive Reasoning project involves a collection of machine learning research thrusts that are integrated into a cognitive personal assistant. Progress is examined with a test developed to measure the impact of learning when used by a human user. Three conditions conventional tools, Radar without learning, and Radar with learning are evaluated in a large-scale, between-subjects study. This paper describes the RADAR Test with a focus on test design, test harness development, experiment execution, and analysis. Results for the 1.1 version of Radar illustrate the measurement and diagnostic capability of the test. General lessons on such efforts are also discussed.

Author(s):

Steinfeld, Aaron ; Bennett, Rachael ; Cunningham, Kyle ; Lahut, Matt ; Quinones, Pablo-alejandro ; Wexler, Django ; Siewiorek, Dan ; Cohen, Paul ; Fitzgerald, Julie ; Hansson, Othar

Author Organization(s):

CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE

Descriptive Note:

Technical rept.

Supplementary Note:

Prepared in cooperation with the University of Southern California; JSF Counsulting; Thinkbank, Inc.; IET, Inc.; and SRI International. Sponsored in part by the Department of Interior, National Business Center. The original document contains color images. DOI: 10.21236/ADA457300

Pagination:

0025

Security Markings

DOCUMENT & CONTEXTUAL SUMMARY

Distribution:

Approved For Public Release

Distribution Statement:

Approved For Public Release; Distribution Is Unlimited.

RECORD

Collection: TR

Identifying Numbers

Report Number(s):

CMU-CS-06-125, CMU-HCII-06-102

Monitor Series:

DARPA

Subject Terms

Joint Capability Areas:

JCA_5_Command and Control; JCA_5.5.2_Task; JCA_5.5_Direct; JCA_5.3_Planning; JCA_8_Building Partnerships; JCA_8.1_Communicate; JCA_8.1.3_Influence Adversary and Competitor Audiences; JCA_5.3.1_Analyze Problem; JCA_6_Net Centric; JCA_1_Force Support; JCA_6.2.3_Core Enterprise Services; JCA_1.2_Force Preparation; JCA_5.2.2_Develop Knowledge and Situational Awareness; JCA_5.2_Understand; JCA_6.2_Enterprise Services; JCA_1.2.7_Experimentation; JCA_6.1_Information Transport; JCA_1.3_Human Capital Management; JCA_1.3.2_Personnel Management; JCA_1.2.3_Educating; JCA_1.2.1_Training

Modernization Areas:

Autonomy

Communities of Interest:

Autonomy

Descriptor(s):

*LEARNING MACHINES, TEST METHODS, ARTIFICIAL INTELLIGENCE, MAN MACHINE SYSTEMS, MAN COMPUTER INTERFACE

Field(s)/Group(s):

Cybernetics, Test Facilities, Equipment and Methods, Human Factors Engineering and Man Machine Systems

Keyword(s):

RADAR(REFLECTIVE AGENTS WITH DISTRIBUTED ADAPTIVE REASONING) PROJECT, LITW(LEARNING IN THE WILD), HUMAN SUBJECT EXPERIMENTS, MULTI AGENT SYSTEMS

Report Date:

2006 Oct 01

Creation Date:

2006 Dec 06