This project shows how to build an ASR service based on Kaldi, a famous Speech Recognition Toolkit.
Currently, two server side program audio-server-online2-nnet2
(out-of-date) and audio-server-online2-nnet3
are provided.
They are based on online2 & nnet2/nnet3(chain), and provide multi tcp service at the same time.
More features that maybe useful for a practical system, as well as client side demo will be added later.
To start, you need to INSTALL Kaldi from main Kaldi repository in GitHub.
Make ext
maybe also required. Then build audio-server-online2-nnet2
(out-of-date) and audio-server-online2-nnet3
based on Kaldi.
To test, you could use any online2 & nnet2/nnet3(chain) model trained with kaldi, such as Api.ai kaldi Speech Recognition Model or Librispeech TDNN models with silence probability.
- . ./path.sh
- nohup audio-server-online2-nnet3 --acoustic-scale=1.0 --beam=11.0 --frame-subsampling-factor=3 --lattice-beam=4.0 --max-active=5000 --config=exp/api.ai-model/conf/online.conf --word-symbol-table=exp/api.ai-model/words.txt data/lang_nosp/phones/align_lexicon.int exp/api.ai-model/final.mdl exp/api.ai-model/HCLG.fst &
- nohup audio-server-online2-nnet2 --config=exp/nnet2_online/nnet_ms_a_online/conf/online_nnet2_decoding.conf --word-symbol-table=data/lang/words.txt data/lang/phones/align_lexicon.int exp/nnet2_online/nnet_ms_a_online/final.mdl exp/nnet2_online/nnet_ms_a_online/graph_test/HCLG.fst &
- online-audio-client localhost 5010 'scp:data/test_clean_example/wav.scp'