An Analysis of Duplex Sequence‑to‑Sequence Learning for Speech Chain

Archtechture of the proposed Duplex Speech Chain model

Abstract

The main objective of this paper is to explore the utilization of reversible neural network layers for constructing a duplex speech chain model, enabling effective utilization of bidirectional supervision signals from parallel datasets. Current methods employing bidirectional supervision signals are primarily categorized into two groups: general multi-task learning and cycle consistency. While both categories utilize bidirectional supervision signals, these methods possess their own limitations. To address these challenges and create a duplex model for bidirectional speech tasks encompassing speech synthesis and speech recognition, we propose reversible modules and operations that can handle text and speech length discrepancies. The proposed model represents the first duplex sequence-to-sequence model capable of addressing both speech synthesis and speech recognition challenges. Moreover, this research introduces the application of reversible neural networks to speech-related tasks. We also conduct an analysis of how the utilization of bidirectional supervision signals affects the performance of the duplex model.

Type
Publication
National Taiwan University Theses and Dissertations Repository
陳柏文 Bo-Wen Chen
陳柏文 Bo-Wen Chen
Graduate Researcher

My research focuses on machine learning and digital speech processing, specifically in the areas of Text to Speech (TTS) and Automatic Speech Recognition (ASR).