HACKER Q&A
📣 VictorPenJust

AI that allows you to make phone calls in a language you don't speak?


Imagine this: You're trying to book a table at a sushi restaurant in Tokyo over the phone, but you don’t speak Japanese. With this hypothetical software, you could make the call in English, and the software would translate and synthesize your voice into Japanese in real-time. So, you would speak in English, and the restaurant would hear everything in Japanese.

This idea came to me during my travels in Japan and China, where English is not commonly spoken in many places. It's incredibly challenging to navigate without knowing the language. We often had to rely on our hotel's front desk to assist us with reservations and contacting support services.

I can envision other applications for this technology, such as in call centers, for international business calls, meetings, etc.

What do you guys think? Thanks in advance!


  👤 drtgh Accepted Answer ✓
About six years ago I saw a tourist ordering in a restaurant by writing on his smartphone, translating, and using a classic text-to-speech function while showing the screen to the waitress, and it worked him pretty well for ordering. Since then the people I saw using it increased. Common sentences are used.

The problem would be in a conversation between two persons. Nowadays automated text translations are not reliable, can introduce even opposite meanings and they are not aware of nuances; it needs active supervision at this moment (and the following years).

With voice, a time-delay is needed for to acquire sentence context if the sources and target languages share structures, or a mandatory time-delay when the language structures are different, and also a general time-delay would be recommended for to avoid the interlocutors to listen two voices at same time. I'm not sure the real-time can be done like the ones we see in Star Trek (with voice to voice at least).

Important note: I would not recommend to popularize the synthesis of our personal voices. Variations from some reference models would be much better.


👤 dustincoates
Samsung has already announced that live translation of calls will be coming to their next phones:

> AI Live Translate Call will soon give users with the latest Galaxy AI phone a personal translator whenever they need it. Because it’s integrated into the native call feature, the hassle of having to use third-party apps is gone. Audio and text translations will appear in real-time as you speak, making calling someone who speaks another language about as simple as turning on closed captions when you stream a show. Because it’s on-device Galaxy AI, you can trust that no matter the scenario, private conversations never leave your phone.

https://news.samsung.com/global/a-new-era-of-galaxy-ai-is-co...


👤 rantallion
> and the software would translate ... in real-time

Depending on the pair of languages being translated between, isn't this literally impossible? The ordering of sentence parts is not the same in all languages (coincidentally, Japanese and English are a perfect example here of how different grammar can be), so you often have to wait until you've heard the whole sentence before you can parse it translate into another language.

Given the above issue, how is what you're envisioning any better than just using Google Translate?


👤 bdhcuidbebe
> This idea came to me during my travels in Japan and China, where English is not commonly spoken in many places.

I got the same idea, but from Douglas Adams ;)


👤 umtksa
you can do that with whisper https://github.com/openai/whisper even there is fast whisper runs like a charm on my old 2012 imac on cpu https://github.com/FamousDirector/FastWhisper

👤 gorbypark
I’ve been keeping my eye on Seamless M4T streaming project from Meta. Although I haven’t gotten it to run locally yet (mostly due to lack of time), I think it has the potential to allow things like real time phone calls. My end goal is to have system level real-time translated transcriptions (for video conferences, etc).

👤 rahimnathwani
Pixel buds and phone let you do this for face to face conversations, but I don't know how well it works: https://www.technologyreview.com/technology/babel-fish-earbu...

👤 sargstuff
not a wicked googly jimmy cricket concept. web search engine terms lilliputing "pocket translater" shows quite a few options.

  https://itranslate.com/features/camera-translations

  https://blog.google/products/translate/see-world-in-your-lan...

  https://translate.google.com/about/

👤 behnamoh
Didn't Google showcase this feature a couple years ago? ofc it's Google, so who knows if it ever got into production.

👤 CodeNest
Samsung will beat the Apple. good for them.

👤 barbariangrunge
This will be the worst. We get enough spam calls as it is. Soon we’ll stop being able to use “broken English” as a clue that it’s a scam and even hearing our parents voices won’t be evidence it’s a real call

👤 karmakaze
I was about to comment that this isn't AI, then realized that it's a pointless distinction now. Whether something is AGI is still meaningful to know when we got there, until then all AI/ML-tech will simply either keep being called AI or computer, agent/assistent, or whatever.