Wearable Audio-Visual Enhanced Speech-recognition System (WAVESS): A Conceptual Model with Lipreading Capabilities

A.E. Adeoye; E. Olaye; U. I. Onwuegbuzie

doi:10.5281/zenodo.17205329

Vol. 2 No. 1 (2025), Articles

Vol. 2 No. 1 (2025)

Wearable Audio-Visual Enhanced Speech-recognition System (WAVESS): A Conceptual Model with Lipreading Capabilities

Articles

https://doi.org/10.5281/zenodo.17205329

Published 24-09-2025

Adeoye A. E.⁺⁻
Olaye E.⁺⁻
Onwuegbuzie U. I.⁺⁻

Adeoye A. E.

Department of Computer Science, Dennis Osadebay University, Asaba, Delta State, Nigeria.

https://orcid.org/0000-0003-2132-7267

Olaye E.

Department of Computer Engineering, University of Benin, Benin City, Edo State, Nigeria.

Onwuegbuzie U. I.

Department of Cybersecurity, Dennis Osadebay University, Asaba, Delta State, Nigeria.

https://orcid.org/0000-0002-1582-1739

PDF

Keywords

Multimodal Speech Recognition
Wearable Technology
Audio-Visual Fusion
Lipreading Systems
Human–Computer Interaction

How to Cite

Adeoye , A., Olaye, E., & Onwuegbuzie, U. I. (2025). Wearable Audio-Visual Enhanced Speech-recognition System (WAVESS): A Conceptual Model with Lipreading Capabilities. Tech-Sphere Journal for Pure and Applied Sciences, 2(1). https://doi.org/10.5281/zenodo.17205329

Abstract

Speech recognition technologies have evolved significantly from early rule-based systems to modern deep learning models; however, conventional audio-only approaches remain constrained by noise interference, diverse accents, and speech impairments, limiting their robustness in real-world applications. Recent research highlights the value of multimodal systems that combine auditory and visual cues, with lipreading offering complementary information where audio signals alone may fail. This study proposes the Wearable Audio-Visual Enhanced Speech-recognition System (WAVESS), a conceptual model implemented in the form of smart glasses equipped with a microphone array and a mini-camera that captures lip movements. The system integrates audio and video inputs through preprocessing pipelines, noise reduction and Mel-Frequency Cepstral Coefficients (MFCC) for audio, and lip region detection and feature extraction for video, before fusing them in a real-time multimodal recognition engine. The fused representation enhances recognition accuracy, adaptability, and resilience in challenging conditions such as noisy environments, hearing impairment contexts, and human–machine interaction scenarios. The model also incorporates connectivity features for wireless or edge-based computation and provides multimodal feedback through augmented reality overlays, audio, or haptic signals. WAVESS demonstrates the comparative advantage of wearable, multimodal systems in accessibility, communication, education, and security applications while addressing scalability and ethical considerations. The conceptual framework establishes a foundation for future prototyping, dataset expansion, and real-world deployment in advancing robust speech recognition research.

https://doi.org/10.5281/zenodo.17205329

PDF

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Downloads

Download data is not yet available.

Wearable Audio-Visual Enhanced Speech-recognition System (WAVESS): A Conceptual Model with Lipreading Capabilities

Keywords

Categories

How to Cite

Abstract

Downloads

Similar Articles

Similar Articles

Effect of Library Data Base in Retrieving Information and Knowledge Discovering by HND II Students of Architectural Technology in Federal Polytechnic Ado-Ekiti

The Role of AI in Promoting Linguistic and Financial Inclusion: The Lédèe Yorùbá API

From X-Ray Interactions to Modern CT: A Physics-Centered Review

From Automation to Personalization: A Comprehensive Survey on the Role of AI in Modern e-Learning

Artificial Intelligence in e-Learning: A Systematic Review of 21st Century Trends and Innovations

A Low-Cost Dual-Sensor Navigation Aid for the Visually Impaired: Development and Evaluation

An Intelligent Image-Based Human Intrusion Detection using ImageAI and Local SMS Alerting

Enhancing Digital Inclusion through AI-Based Yorùbá Language Localization: Challenges, Solutions, and Future Prospects

From the Ground Up: Leveraging AI to Transform Education at the School Level in Nigeria

Deep Learning-Based Network Intrusion Detection Using CNN and Enhanced UNSW-NB15 Multi-Class Dataset

Wearable Audio-Visual Enhanced Speech-recognition System (WAVESS): A Conceptual Model with Lipreading Capabilities

Keywords

Categories

How to Cite

Download Citation

Abstract

Downloads

Similar Articles