Ultrasound
This page groups the current SSI review database by the real `modality:` tag `modality:ultrasound`.
The list below includes every paper page that currently carries this technique label.
Papers
Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks
Strong full-text-backed evidence that most of the gain comes from fast input alignment, not from inventing a new SSI stack.
Speech Reconstruction from Silent Tongue and Lip Articulation By Pseudo Target Generation and Domain Adversarial Training
Strong SSI paper improving silent speech reconstruction by generating pseudo acoustic targets and using domain adversarial training to address domain mismatch; validated with TaL dataset showing substantial WER and MOS gains over TaLNet.
Improved Processing of Ultrasound Tongue Videos by Combining ConvLSTM and 3D Convolutional Networks
An empirically supported, incremental advancement showing that hybrid 3D-CNN plus ConvLSTM models modestly outperform prior ultrasound tongue video SSI architectures in mel-spectrogram regression accuracy and model efficiency on single-speaker data.
Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory Input
Helpful side information, not standalone SSI.
Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces
The ultrasound-based x-vector speaker embedding is highly effective for speaker recognition, achieving under 1% error on unseen speakers, but its integration yields only a marginal improvement in multi-speaker ultrasound-to-speech synthesis accuracy.
Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks
Preprocessing paper, narrow but legitimate.
Improving Neural Silent Speech Interface Models by Adversarial Training
A clean, well-executed incremental advance using GAN loss to modestly improve articulatory-to-acoustic mapping from ultrasound, validated objectively on two single-speaker corpora.
3D Convolutional Neural Networks for Ultrasound-Based Silent Speech Interfaces
Temporal context helps, but the evidence is a single-speaker vocoder-parameter study.
Convolutional Neural Network-Based Age Estimation Using B-Mode Ultrasound Tongue Image
Real signal, wrong target for SSI.
Ultra2Speech -- A Deep Learning Framework for Formant Frequency Estimation and Tracking from Ultrasound Tongue Images
Strong ultrasound SSI paper with unusually clear quantitative gains.
Ultrasound-based Silent Speech Interface Built on a Continuous Vocoder
The key advancement is continuous F0 tracking via CNNs yielding lower pitch error and slight naturalness improvement over discontinuous F0 pipelines in ultrasound SSI.
Autoencoder-Based Articulatory-to-Acoustic Mapping for Ultrasound Silent Speech Interfaces
The paper advances ultrasound silent speech interfaces by compressing ultrasound images using an autoencoder bottleneck prior to spectral parameter prediction, resulting in improved accuracy and more natural synthesized speech with smaller models.
Denoising convolutional autoencoder based B-mode ultrasound tongue image feature extraction
DCAE provides cleaner, more robust ultrasound tongue features leading to improved silent speech recognition, outperforming prior feature extraction strategies.
SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks
A solid proof of concept that reconstructs speech audio from ultrasound for controlling unmodified smart speakers, showcasing important system design insight despite prototype limitations in latency, hardware bulk, and speaker dependency.
Updating the silent speech challenge benchmark with deep learning
Benchmark update with a real, reproducible WER gain.
Contour-based 3d tongue motion visualization using ultrasound image sequences
Useful tongue-modeling tool, not a recognizer.