Journal

JJEM: Special Edition 2 (December 2024)

Published On:

2024-12-08

Topic

An Efficient Technique for Identification of Speech using Neural Network

Authors

Niranjan V S, Prashanth A

Abstract

Identifying people through the assistance of their voice differentiation is known as speaker recognition. The progress in speaker recognition is being carried on using neural networks and deep learning. One may recognize several of the traditional methods that used feature extraction but did not mimic human speech along with Gaussian Mixture Model (GMM), Universal Background Model and example Mel-frequency Cepstral Coefficients (MFCCs). Deep neural networks like Convolutional and Recurrent, for instance, have advanced to more end-to-end solutions by directly learning from raw audio data. Hence, models of end-to-end have been instituted to improve the work productivity and speed. Improvements like the CNNs with BN and the CTC has enhanced the models and training. This paper also covers the application of BN in voice recognition through the design of an end-to-end CNN effective system.

Keywords

Mel-Frequency Cepstral Coefficients (MFCCs), Connectionist Temporal Classification (CTC), Batch Normalization (BN).