ASR system using transformers neural networks from scratch
This is a project that was with the intention for DOT.
Our own ASR with trasnformers Neural Networks for DOT, the problem was the Dataset, it contains only 500 hours for training "Librispeech clean 360 and 100", you can see the model in kaggle.
https://www.kaggle.com/code/bernardoolisan/new-asr-dot?scriptVersionId=93010436
To make a model like Google's ASR, you will need more than 10000 hours for training, there's a lib called GigaSpeech that contains more than 10000 hours, but is more than 1Tb, you will need to have a better pc, at least 1 epoch is going to take 12 or more hours.
https://github.com/SpeechColab/GigaSpeech
But the model at the end works perfect:)
the model.model only contains 100 epochs of training with a loss aprox of 0.65
The tutorial is on Medium.com here https://medium.com/@bernardoolisan/asr-speech-recognition-creation-with-tranformers-nn-30a8b8af1b6e
Go Clap that post!
