I'm currently researching unique, cutting edge and honestly kick ass user experiences that just work with AI. This can be a feature/service from open-source to mainstream.
Keen to see the best
That is the level of experience AI needs to get to. Not buttons that basically say: "Use AI!!" but features so fully integrated and smooth that you don't even think about whether or not AI is behind it... it just does what it does when it needs to do it.
(And I know, my anecdote wasn't about LLMs, but that is kinda the point.)
https://www.soundslice.com/sheet-music-scanner/
Personally I think the user experience is interesting because we show you very specific questions for low-confidence decisions. Some example screenshots are on that link above. Over time, the number of manual questions has gone down, as our models have gotten smarter about the (seemingly endless!) edge cases in music notation.
Once music is uploaded and scanned, you can use our bespoke notation editor to make any edits. The original image is tightly integrated into our editor, so you can cross-reference.
This is my first production ML product after 20 years of being a web developer. I wrote up some general thoughts here: http://www.holovaty.com/writing/machine-learning-thoughts/
[1] - https://github.com/bramses/quoordinates [2] - https://github.com/bramses/commonplace-bot
Not so much kick-ass, but still works nicely: https://github.com/mlang/tracktales -- My MPD track announcer with support for describing album art...
Comparing this to yesterdays adventure with other service (my package got lost) where the bot couldn't decipher what a WRITTEN "my package got lost" or "where is my delivery" means.
I used the spoken interface with ChatGPT 4 a lot a few months ago after it was released on the iPhone app, and it was pretty immersive. The latency was a bit long, though, and even when prompted to reply briefly the bot tended to ramble on, often with numbered lists, which sound awkward in speech.
For the past couple of weeks, I’ve been experimenting with Inflection AI’s Pi. Its voices are very natural—the American female voice I use even has vocal fry [1]—and the latency is short. It will talk about serious topics (sometimes with numbered lists), but it seems prompted mainly for friendly conversation. It calls me by my name and remembers our previous conversations. I can easily see people becoming emotionally attached to bots like that.
A man named Chris Cappetta has created some open-source software for talking with Claude 3. His conversations with the bot about AI are pretty remarkable [2, 3].
The current spoken interfaces all seem to run what the user says through a speech-to-text converter, so the bot does not perceive pronunciation, intonation, hesitation, etc. After multimodal models that can hear and respond to the speaker’s tone become available, the experience will become even stickier.
[1] https://en.wikipedia.org/wiki/Vocal_fry_register
There is also AI in video editing apps: fast autofocus on faces, face detection and modifiers that follow you. It's really incredible and intuiive enough so that many people use it (too much maybe).
You also have to take into consideration, that since chat history is sent with every new message, the price of the conversion growths ~ n^2 of the number of messages. So do you send the whole codebase? Or do you let AI run commands like ls and cat to read the files it needs? Do you want a file in the directory with quick history of what's already done, and what needs to be done?
Another thing I find interesting is how microservices became a natural choose vs. monolith apps when building with AI, again due to limits of the context window. So you focus on thinking through all the components and their APIs, and then let AI build each of the component. If it can be done in isolation without any knowledge of other components, that's better.
Also, it quickly becomes obvious that fully-autonoumous builder does not make any practical sense. Real person still needs to look at the progress, and give guidance. Not even because AI can't do this, it probably can. But because your own understand of what you are building changes over time. So it should be semi-automatical, with real users being able to change the course any moment.
How do you build the autonomous loop?
One thing I find useful is to let AI write tests first, and then run those automatically on each new chat message. TypeScript types also helps catching broken code early. In those case automatical message is sent "Hey, you broke the tests. Here are the error messages. Go ahead and fix those." Operator doesn't have to bother, until it's fixed.
Another loop can be build with the ability to send screenshots. So at any moment system can send a screenshot to AI, and ask if it's good enough, and if it wants to make any changes. That also improves the quality.
Well, you get the ides. It's an interesting task to ponder.
The signup and setup flow is quite lengthy because they need to implement email flows, abandoned cart reminders, sms flows, push messaging which of course it needs to be highly customized. All of this is needed just to unlock some of the basic features of the tool.
I was surprised and delighted to begin setting up an email series only to discover it had already scanned my website and used AI to write the content of all the messages to be applicable to our tone and messaging.
Highly impressive and it makes getting it up and running super fast.
[1] https://youtu.be/CcHevgjAnV0?feature=shared&t=1374 [2] https://axle.ai/
Whisper transcribes conversations from audio files. Hello Transcribe is a GUI wrapper by someone else that uses Whisper under the hood to create subtitles in subtitle format with timestamps.
Does not distinguish between speakers though
I wish other use cases were considered by that application, as I pretty much never want subtitle files and never plan to
There are some techniques to distinguish between speakers but I haven't seen anyone put the combinations together in a nice GPU leveraging app.
3D model generation is also a good example, you can save tons of time.
AI is everywhere, but most of the time you simply don't notice it since it's so well integrated or they are driving backend logic, anti fraud systems, etc.
Recently I got to drive a rental Corolla with lanekeep assist and smart-ish cruise control which could sorta keep itself on the road by itself. It was definitely a fun toy for a controls engineer, kind of like riding a narcoleptic horse with ADHD but still on the cusp of being a net positive.
a. Personal assistants like Siri, Alexa, and Google Assistant.
b. AI photo editing tools that allow users to quickly enhance photos with filters, touch-ups, and automated suggestions.
c. AI tools that augment creative work like automated image generation, video editing assistants, writing aids, and more.
d. Accessibility tools using AI for tasks like vision assistance, transcription, translation, and more.
e. Personalized digital assistants created by Anthropic to be helpful, harmless, and honest.
Free website for AI stem separation that runs client-side with WebAssembly
So, my list:
* Spotify's weekly picks used to be pretty good at recommending new music, although it's actually got worse in the last 2-3 years.
* AI filtering out things like fraudulent transactions and virus-laden web pages. They're a long way from 100%, but it's got better, even as the challenge has got bigger.
* Some games have started making good use of AI - Red Dead Redemption 2 is probably the best I've played. Makes the in-game world feel a bit more dynamic, rather than the same procedural world.
* Google Maps does a lot with AI, redirecting based on traffic. It doesn't go out of its way to tell you how clever it's being, so it's hard to spot. But 10 years ago, I used to get stuck in a lot more traffic than I do now.
* ChatGPT is awesome, even if the hype cycle is now turning and we're all eye-rolling at it. I've conversed with it to improve my understanding of all sorts of topics, and it is amazing.
AI works better when it's invisible and like magic rather than being yelled at you.