Wearable Audio-Visual Enhanced Speech-recognition System (WAVESS): A Conceptual Model with Lipreading Capabilities
PDF

Keywords

Multimodal Speech Recognition
Wearable Technology
Audio-Visual Fusion
Lipreading Systems
Human–Computer Interaction

Categories

How to Cite

Adeoye , A., Olaye, E., & Onwuegbuzie, U. I. (2025). Wearable Audio-Visual Enhanced Speech-recognition System (WAVESS): A Conceptual Model with Lipreading Capabilities. Tech-Sphere Journal for Pure and Applied Sciences, 2(1). https://doi.org/10.5281/zenodo.17205329

Abstract

Speech recognition technologies have evolved significantly from early rule-based systems to modern deep learning models; however, conventional audio-only approaches remain constrained by noise interference, diverse accents, and speech impairments, limiting their robustness in real-world applications. Recent research highlights the value of multimodal systems that combine auditory and visual cues, with lipreading offering complementary information where audio signals alone may fail. This study proposes the Wearable Audio-Visual Enhanced Speech-recognition System (WAVESS), a conceptual model implemented in the form of smart glasses equipped with a microphone array and a mini-camera that captures lip movements. The system integrates audio and video inputs through preprocessing pipelines, noise reduction and Mel-Frequency Cepstral Coefficients (MFCC) for audio, and lip region detection and feature extraction for video, before fusing them in a real-time multimodal recognition engine. The fused representation enhances recognition accuracy, adaptability, and resilience in challenging conditions such as noisy environments, hearing impairment contexts, and human–machine interaction scenarios. The model also incorporates connectivity features for wireless or edge-based computation and provides multimodal feedback through augmented reality overlays, audio, or haptic signals. WAVESS demonstrates the comparative advantage of wearable, multimodal systems in accessibility, communication, education, and security applications while addressing scalability and ethical considerations. The conceptual framework establishes a foundation for future prototyping, dataset expansion, and real-world deployment in advancing robust speech recognition research.

https://doi.org/10.5281/zenodo.17205329
PDF
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Copyright (c) 2025 Tech-Sphere Journal for Pure and Applied Sciences

Downloads

Download data is not yet available.