Accession Number:

AD1062208

Title:

Provenance and Processing of an Inuktitut-English Parallel Corpus Part 1: Inuktitut Data Preparation and Factored Data Format

Descriptive Note:

Technical Report

Corporate Author:

US Army Research Laboratory Adelphi United States

Personal Author(s):

Report Date:

2018-10-19

Pagination or Media Count:

76.0

Abstract:

We describe the Nunavut Hansard, a parallel English-Inuktitut corpus derived from Nunavut legislative proceedings, and we describe the processing that was carried out to prepare the data for use in morphological analysis and downstream machine translation experiments. We provide all of the scripts and code used to process the data.

Subject Categories:

  • Linguistics
  • Computer Programming and Software

Distribution Statement:

APPROVED FOR PUBLIC RELEASE