Speech Analysis/Synthesis Based on a Sinusoidal Representation.

Mcaulay, R. J.; Quatieri, T. F.

Speech Analysis/Synthesis Based on a Sinusoidal Representation.

Active / Technical Report | Accession Number: ADA157023 |

Need Help?

Abstract:

A sinusoidal model for the speech waveform is used to develop a new analysissynthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated from the short-time Fourier transform using a simple peak-picking algorithm. Rapid changes in the highly-resolved spectral components are tracked using the concept of birth and death of the underlying sine waves. For a given frequency track a cubic function is used to unwrap and interpolate the phase such that the phase track is maximally smooth. This phase function is applied to a sine wave generator which is amplitude modulated and added to the other sine waves to give the final speech output. The resulting synthetic waveform preserves the general waveform shape and is essentially perceptually indistinguishable from the original speech. Furthermore, in the presence of the noise the perceptual characteristics of the speech as well as the noise are maintained. Finally, it was found that the representation was sufficiently general that high quality reproduction was obtained for a larger class of inputs, including two overlapping, superposed speech waveforms music waveforms speech in musical backgrounds and certain marine biologic sounds.

Author(s):

Mcaulay, R. J. ; Quatieri, T. F.

Author Organization(s):

MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB

Descriptive Note:

Technical rept.,

Pagination:

0046

Security Markings

DOCUMENT & CONTEXTUAL SUMMARY

Distribution:

Approved For Public Release

RECORD

Collection: TR

Identifying Numbers

Report Number(s):

TR-693, ESD-TR-84-330

Contract/Grant Number(s):

F19628-85-C-0002

Monitor Series:

TR-84-330

Subject Terms

Communities of Interest:

No COI(s) Identified

Descriptor(s):

*SPEECH ANALYSIS, *SPEECH, *SINE WAVES, *WAVE ANALYZERS, *WAVEFORM GENERATORS, ALGORITHMS, PEAK VALUES, WAVEFORMS, SHORT RANGE(TIME), AMPLITUDE, AMPLITUDE MODULATION, MARINE BIOLOGICAL NOISE, AUDITORY PERCEPTION, PHASE(ELECTRONICS)

Field(s)/Group(s):

Voice Communications

Keyword(s):

*Speech synthesis, PE63735F, PE33401F

Report Date:

1985 May 17