Sequential Learning with Very Non-uniformly Sampled Time Series

reportActive / Technical Report | Accesssion Number: AD1167434 | Open PDF

Abstract:

Time series classification problems implement supervised machine learning techniques to analyze temporally ordered data and classify new sequential data. Time series classification has grown in popularity as access to time series data has increased in recent years, and the problems have appeared across a wide spectrum of applications such as audio recordings, medical signals, and weather prediction. Generally, an assumption is made that the temporal ordering is uniformly or close to uniformly sampled. However, there are important applications where this is not the case. This project looked at a dataset that was a very non-uniformly sampled time series with the task of classification of three labels. The dataset was also quite large and required very high dimensional features. These considerations encouraged the use of sequential learning techniques. Sequential learning refers to machine learning models that have sequences of data as the input or output. The goal of this project was to identify pre-processing techniques and approaches for generating sequences that would be helpful for this classification task. If successful, the results could help give insights to similar sequential learning problems. The data were first standardized over the entire dataset. The data as given had large gaps of time where no samples resided, called dead zones, that were artificially filled in by a process of interpolating and zero-mean padding. A relative time encoding feature was also created to help the predictor interpret the amount of time between bursts of data. Decimation was performed to maintain the sequence length for a window while simultaneously increasing the duration of time that it represented. A jointly optimal predictor was determined as (D, N, P, S) = (8, 644616, 250, S/8) where D represents the decimation factor, N represents the number of sequences used in training, P represents the window length, and S represents the stride.

Security Markings

DOCUMENT & CONTEXTUAL SUMMARY

Distribution Code:
A - Approved For Public Release
Distribution Statement: Public Release

RECORD

Collection: TRECMS
Identifying Numbers
Subject Terms