RNfinity
Research Infinity Logo, Orange eye of horus, white eye of Ra
  • Home
  • Submit
    Research Articles
    Ebooks
  • Articles
    Academic
    Ebooks
  • Info
    Home
    Subject
    Submit
    About
    News
    Submission Guide
    Contact Us
    Personality Tests
  • Login/sign up
    Login
    Register

Physics Maths Engineering

Refining a deep learning-based formant tracker using linear prediction methods

rnfinity

info@rnfinity.com

orcid logo

Sudarsana Reddy Kadiri,

Sudarsana Reddy Kadiri

Department of Information and Communications Engineering

sudarsana.kadiri@aalto.fi


Paavo Alku

Paavo Alku

Department of Information and Communications Engineering

info@rnfinity.com


  Peer Reviewed

copyright icon

© attribution CC-BY

  • 0

rating
731 Views

Added on

2023-05-10

Doi: https://doi.org/10.1016/j.csl.2023.101515

Abstract

In this study, formant tracking is investigated by refining the formants tracked by an existing data-driven tracker, DeepFormants, using the formants estimated in a model-driven manner by linear prediction (LP)-based methods. As LP-based formant estimation methods, conventional covariance analysis (LP-COV) and the recently proposed quasi-closed phase forward–backward (QCP-FB) analysis are used. In the proposed refinement approach, the contours of the three lowest formants are first predicted by the data-driven DeepFormants tracker, and the predicted formants are replaced frame-wise with local spectral peaks shown by the model-driven LP-based methods. The refinement procedure can be plugged into the DeepFormants tracker with no need for any new data learning. Two refined DeepFormants trackers were compared with the original DeepFormants and with five known traditional trackers using the popular vocal tract resonance (VTR) corpus. The results indicated that the data-driven DeepFormants trackers outperformed the conventional trackers and that the best performance was obtained by refining the formants predicted by DeepFormants using QCP-FB analysis. In addition, by tracking formants using VTR speech that was corrupted by additive noise, the study showed that the refined DeepFormants trackers were more resilient to noise than the reference trackers. In general, these results suggest that LP-based model-driven approaches, which have traditionally been used in formant estimation, can be combined with a modern data-driven tracker easily with no further training to improve the tracker’s performance.

Key Questions

What is formant tracking, and why is it important?

Formant tracking is the process of identifying and tracking the resonant frequencies (formants) of speech, which are crucial for understanding speech production and recognition. It is widely used in speech processing, linguistics, and voice analysis.

What is DeepFormants, and how does it work?

DeepFormants is a data-driven formant tracking tool that uses deep learning to predict formant frequencies. It provides accurate formant contours but can be further refined using model-driven methods like linear prediction (LP).

How does the hybrid approach improve formant tracking?

The hybrid approach combines the strengths of data-driven (DeepFormants) and model-driven (LP-based) methods. It refines DeepFormants' predictions by replacing them with local spectral peaks identified by LP methods, improving accuracy without requiring additional training.

What are LP-COV and QCP-FB methods?

LP-COV (Linear Prediction Covariance Analysis) and QCP-FB (Quasi-Closed Phase Forward-Backward Analysis) are model-driven methods for estimating formants. QCP-FB, a recent advancement, provides more accurate formant estimates and is used to refine DeepFormants' predictions.

How does the refined DeepFormants compare to traditional trackers?

The refined DeepFormants, especially when using QCP-FB, outperforms traditional formant trackers. It achieves higher accuracy and is more resilient to noise, making it suitable for real-world applications where speech quality may vary.

What is the VTR corpus, and why is it used in this study?

The VTR (Vocal Tract Resonance) corpus is a popular dataset for evaluating formant tracking algorithms. It provides clean and noisy speech samples, making it ideal for testing the accuracy and noise resilience of the proposed methods.

How does the hybrid approach handle noisy speech?

The hybrid approach, particularly when refined with QCP-FB, shows greater resilience to noise compared to traditional trackers. It maintains accurate formant tracking even in noisy conditions, which is critical for real-world speech processing.

What are the benefits of combining data-driven and model-driven methods?

Combining these methods leverages the strengths of both: data-driven methods provide robust predictions, while model-driven methods offer precise local adjustments. This hybrid approach improves accuracy without requiring additional training or data.

Can this approach be applied to other speech processing tasks?

Yes, the hybrid approach can be adapted for tasks like speech recognition, speaker identification, and voice analysis. Its ability to handle noisy data makes it particularly useful for real-world applications.

What makes QCP-FB better than other LP-based methods?

QCP-FB provides more accurate formant estimates by analyzing speech signals in both forward and backward directions. This makes it particularly effective for refining formant predictions in noisy or challenging conditions.

How can researchers use this hybrid approach in their work?

Researchers can integrate the refined DeepFormants tracker into their speech processing pipelines to improve formant tracking accuracy. The approach is easy to implement, as it requires no additional training or data.

What are the practical applications of improved formant tracking?

Improved formant tracking can enhance applications like speech synthesis, voice pathology detection, and linguistic research. It is also valuable for developing robust speech recognition systems in noisy environments.

Summary Video Not Available

Review 0

Login

ARTICLE USAGE


Article usage: May-2023 to Jun-2025
Show by month Manuscript Video Summary
2025 June 97 97
2025 May 83 83
2025 April 54 54
2025 March 79 79
2025 February 51 51
2025 January 38 38
2024 December 47 47
2024 November 55 55
2024 October 43 43
2024 September 41 41
2024 August 30 30
2024 July 31 31
2024 June 23 23
2024 May 28 28
2024 April 25 25
2024 March 6 6
Total 731 731
Show by month Manuscript Video Summary
2025 June 97 97
2025 May 83 83
2025 April 54 54
2025 March 79 79
2025 February 51 51
2025 January 38 38
2024 December 47 47
2024 November 55 55
2024 October 43 43
2024 September 41 41
2024 August 30 30
2024 July 31 31
2024 June 23 23
2024 May 28 28
2024 April 25 25
2024 March 6 6
Total 731 731
Related Subjects
Physics
Math
Chemistry
Computer science
Engineering
Earth science
Biology
copyright icon

© attribution CC-BY

  • 0

rating
731 Views

Added on

2023-05-10

Doi: https://doi.org/10.1016/j.csl.2023.101515

Related Subjects
Physics
Math
Chemistry
Computer science
Engineering
Earth science
Biology

Follow Us

  • Xicon
  • Contact Us
  • Privacy Policy
  • Terms and Conditions

5 Braemore Court, London EN4 0AE, Telephone +442082758777

© Copyright 2025 All Rights Reserved.