Deep Speech is an open speech-to-text engine by Mozilla. Speech synthesis and Speech to text are fun to try out, and I read that it could run on a Raspberry Pi4 with ease on one core, so I decided to give it a try.
The Raspberry Pi version is using Google’s TensorFlow Lite for an implementation of Baidu’s DeepSpeech architecture.
Installing it on a Raspberry 4 Buster distribution was not straightforward. First I read instructions on the Github page and tried to download and install the git version and, but I ran into problems. It was taking ages and I ran into the famous `wheels` problem.
Failed building wheel for scipy
After tweaking and trying a few times, i gave up on the Github version and tried the instructions here, but also that was a bumpy road. But success waits in the end.
Let’s go, how to install DeepSpeech on the RPI4
Create a dev directory:
mkdir dev cd dev
Create a Python Virtual environment.
python3 -m venv deepspeech-train-venv
Activate the virtual environment
source dev/deepspeech-train-venv/bin/activate
Create the deepspeech directory
mkdir deepspeech
cd deepspeech
Install deepspeech
pip install deepspeech
Download pre-trained English model
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-models.tar.gz tar xvf deepspeech-0.6.0-models.tar.gz
Download example audio files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/audio-0.6.0.tar.gz tar xvf audio-0.6.0.tar.gz
Done, run, well , eh, I tried to run the example on the instruction page
deepspeech --model deepspeech-0.6.0-models/output_graph.pbmm --lm deepspeech-0.6.0-models/lm.binary --trie deepspeech-0.6.0-models/trie --audio audio/2830-3980-0043.wav
Errors!?! I installed a missing dependency:
sudo apt install libatlas3-base
Still errors
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
So I check if I had numpy installed
pip install numpy Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple Requirement already satisfied: numpy in /home/pi/dev/deepspeech-train-venv/lib/python3.7/site-packages (1.15.4)
I decided to update numpy:
pip install --upgrade numpy Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple Collecting numpy Using cached https://www.piwheels.org/simple/numpy/numpy-1.18.0-cp37-cp37m-linux_armv7l.whl tensorboard 2.0.2 has requirement setuptools>=41.0.0, but you'll have setuptools 40.8.0 which is incompatible. Installing collected packages: numpy Found existing installation: numpy 1.15.4 Uninstalling numpy-1.15.4: Successfully uninstalled numpy-1.15.4 Successfully installed numpy-1.18.0
So i decided to update setuptools too:
pip install --upgrade setuptools Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple Collecting setuptools Using cached https://files.pythonhosted.org/packages/f9/d3/955738b20d3832dfa3cd3d9b07e29a8162edb480bf988332f5e6e48ca444/setuptools-44.0.0-py2.py3-none-any.whl Installing collected packages: setuptools Found existing installation: setuptools 40.8.0 Uninstalling setuptools-40.8.0: Successfully uninstalled setuptools-40.8.0 Successfully installed setuptools-44.0.0
I tried to run the example on the instruction page again
# Transcribe an audio file
deepspeech --model deepspeech-0.6.0-models/output_graph.pbmm --lm deepspeech-0.6.0-models/lm.binary --trie deepspeech-0.6.0-models/trie --audio audio/2830-3980-0043.wav
Another error
Loading model from file deepspeech-0.6.0-models/output_graph.pbmm TensorFlow: v1.14.0-21-ge77504a DeepSpeech: v0.6.0-0-g6d43e21 ERROR: Model provided has model identifier '='+;', should be 'TFL3'
Didn’t work. I needed to change the model to `tflite`
deepspeech --model deepspeech-0.6.0-models/output_graph.tflite --lm deepspeech-0.6.0-models/lm.binary --trie deepspeech-0.6.0-models/trie --audio audio/2830-3980-0043.wav
Success in the end!
Loading model from file deepspeech-0.6.0-models/output_graph.tflite
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.0-0-g6d43e21
INFO: Initialized TensorFlow Lite runtime.
Loaded model in 0.0019s.
Running inference.
why should one hault on the way
Inference took 4.091s for 2.735s audio file.
Then I played the audio-file:
aplay audio/4507-16021-0012.wav
Must say DeepSpeech is much smarter then me, I couldn’t understand it:
why should one hault on the way
BTW good question. No I need another engine to answer that!
Way to go, folks.