Sequential Learning with Very Non-uniformly Sampled Time Series

Gregg, Michael

Sequential Learning with Very Non-uniformly Sampled Time Series

Active / Technical Report | Accesssion Number: AD1167434 |

Open PDF

Abstract:

Time series classification problems implement supervised machine learning techniques to analyze temporally ordered data and classify new sequential data. Time series classification has grown in popularity as access to time series data has increased in recent years, and the problems have appeared across a wide spectrum of applications such as audio recordings, medical signals, and weather prediction. Generally, an assumption is made that the temporal ordering is uniformly or close to uniformly sampled. However, there are important applications where this is not the case. This project looked at a dataset that was a very non-uniformly sampled time series with the task of classification of three labels. The dataset was also quite large and required very high dimensional features. These considerations encouraged the use of sequential learning techniques. Sequential learning refers to machine learning models that have sequences of data as the input or output. The goal of this project was to identify pre-processing techniques and approaches for generating sequences that would be helpful for this classification task. If successful, the results could help give insights to similar sequential learning problems. The data were first standardized over the entire dataset. The data as given had large gaps of time where no samples resided, called dead zones, that were artificially filled in by a process of interpolating and zero-mean padding. A relative time encoding feature was also created to help the predictor interpret the amount of time between bursts of data. Decimation was performed to maintain the sequence length for a window while simultaneously increasing the duration of time that it represented. A jointly optimal predictor was determined as (D, N, P, S) = (8, 644616, 250, S/8) where D represents the decimation factor, N represents the number of sequences used in training, P represents the window length, and S represents the stride.

Author(s):

Gregg, Michael

Author Organization(s):

Ohio State University

Funding Organization(s):

AFRL/RYWE, WPAFB , OH

Document Type:

Technical Report/Master's Thesis

Publication Date:

2022 Apr 12

Pagination:

24

Security Markings

DOCUMENT & CONTEXTUAL SUMMARY

Distribution Code:

A - Approved For Public Release

Distribution Statement: Public Release

RECORD

Collection: TRECMS

Identifying Numbers

Report Number(s):

AFRL-RY-WP-TR-2022-0146

Subject Terms

Modernization Areas:

Autonomy

Communities of Interest:

Autonomy

Descriptor(s):

air force, machine learning, air force facilities, supervised machine learning, classification, learning, universities, weather forecasting, sequences, united states, governments

Keyword(s):

time series classification, SEQUENTIAL LEARNING, NONUNIFORM sampling, decimation factors, Scenario-sequences, Time Encoding, Durations, Window, stride, Data Preprocessing, sequence generation, zero padding, Joint optimization, Stride Experiments

Subject Categories:

Mathematical and Computer Sciences; Mathematical and Computer Sciences

Creation Date:

2022 Apr 26

Update Date:

2022 Jun 10