HACKER Q&A
📣 calebjosue

How do you get started with adding voice commands to a computer system?


Let's suppose you want to add support for voice commands to a Linux Distro.

For simplicity's sake, let's say you want to be able to tell the computer (The terminal is running): "Create XY directory" and as a response the directory XY is created on the current directory.

How do you implement such a feature?

Will a Software developer first need to train a system over lots of people pronouncing "Create directory" phrases. And then perform inference on production?

Are some corporations/start-ups already providing trained models for natural language - computer interaction?

How do you get started these sort of tasks these days?

And of course, for accessibility purposes, text-based interaction remains unchanged.

Thanks!


  👤 smoldesu Accepted Answer ✓
Use Whisper! It's a fairly small AI speech-to-text model that's great for getting your feet wet with AI libraries. It's extremely precise and easy to get working, I recommend it over pretty much everything else.

https://github.com/openai/whisper