How do you get started with adding voice commands to a computer system?

Question

Let's suppose you want to add support for voice commands to a Linux Distro.For simplicity's sake, let's say you want to be able to tell the computer (The terminal is running): "Create XY directory" and as a response the directory XY is created on the current directory.How do you implement such a feature?Will a Software developer first need to train a system over lots of people pronouncing "Create directory" phrases. And then perform inference on production?Are some corporations/start-ups already providing trained models for natural language - computer interaction?How do you get started these sort of tasks these days?And of course, for accessibility purposes, text-based interaction remains unchanged.Thanks!

smoldesu · Accepted Answer

Use Whisper! It's a fairly small AI speech-to-text model that's great for getting your feet wet with AI libraries. It's extremely precise and easy to get working, I recommend it over pretty much everything else.https://github.com/openai/whisper

daanzu · Answer

https://github.com/dictation-toolbox/dragonflyhttps://github.com/daanzu/kaldi-active-grammar

How do you get started with adding voice commands to a computer system?

Use Whisper! It's a fairly small AI speech-to-text model that's great for getting your feet wet with AI libraries. It's extremely precise and easy to get working, I recommend it over pretty much everything else.
https://github.com/openai/whisper

https://github.com/dictation-toolbox/dragonfly
https://github.com/daanzu/kaldi-active-grammar