DNN for Speaker Verification

The code has been developed using TensorFlow. This code is aimed to provide the implementation for Speaker Verification (SR) by using 3D convolutional neural networks following the SR protocol. We leveraged 3D convolutional architecture for creating the speaker model in order to simultaneously capturing the speech-related and temporal information from the speakers' utterances. In this work, a 3D Convolutional Neural Network (3D-CNN) architecture has been utilized for text-independent speaker verification in three phases. 1. At the development phase, a CNN is trained to classify speakers at the utterance-level. 2. In the enrollment stage, the trained network is utilized to directly create a speaker model for each speaker based on the extracted features. 3. Finally, in the evaluation phase, the extracted features from the test utterance will be compared to the stored speaker model to verify the claimed identity.