Show HN: openai-realtime-embedded-SDK Build AI assistants on microcontrollers

Hi HN! This is an SDK for ESP32s (microcontrollers) that runs against OpenAI's new WebRTC service [0] My hope is that people can easily add AI to lots of 'real' devices. Wearable devices, speakers around the house, toys etc... You don't have to write any code, just buy a device and set some env variables.

If you have any feedback/questions I would love to hear! I hope this kicks off a generation of new interesting devices. If you aren't familiar with WebRTC it can do some magical things. Check out WebRTC for the Curious[1] and would love to talk about all the cool things that does also.

[0] https://platform.openai.com/docs/guides/realtime-webrtc

[1] https://webrtcforthecurious.com

63
14
Sean-Der
10 months ago
github.com

kaycebasques
·
10 months ago
·
[ - ]

Took a bit of poking to figure out what the use case is. Doesn't seem to be mentioned in the README (usage section is empty) or the intro above. Looks like the main use case is speech-to-speech. Which makes sense since we're talking about embedded products, and text-to-speech (for example) wouldn't usually be relevant (because most embedded products don't have a keyboard interface). Congrats on the launch! Cool to see WebRTC applied to embedded space. Streaming speech-to-speech with WebRTC could make a lot of sense.

Sean-Der
·
10 months ago
·
[ - ]

Sorry I forgot to put use cases in! Here are the ones I am excited about.

* Making a toy. I have had a lot of fun putting a silly/sarcastic voice in toys. My 4 year old thinks it is VERY funny.

* Smart Speaker/Assistant. I want to put one in each room. If I am in the kitchen it has a prompt to assist with recipes.

I have A LOT more in the future I want to do. The microcontrollers I was using can't do video yet BUT ESP32 does have newer ones that can. When I pull that I can do smart cameras, then it gets really fun :)

kaycebasques
·
10 months ago
·
[ - ]

"Use case" perhaps wasn't the right word for me to use. Maybe "applications" would have been a better word. What this enables is speech-to-speech applications in embedded devices. (From my quick scan) it doesn't seem to do anything around other ML applications that OpenAI could potentially be involved in, such as speech-to-text, text-to-speech, or computer vision.

But yeah, once I figured out that this enables streaming speech-to-speech applications on embedded devices, then it's easy to think up use cases.

swatcoder
·
10 months ago
·
[ - ]

It doesn't help that this was posted to HN with the "Usages" section of the README left blank. That alone would probably have addressed your question. The submission is just a little prematue.

Beyond that, while it does seem like its primarily vision is for speech-to-speech interfaces, it could easily be stretched to do things like send a templatized text prompt that was constructed based on toggle states, sensor readings, etc and (optimistically) asking for a structured response that could control lights or servos or whatever.

Generally, this looks like a very early stage in a hobby project (the code practices fall short of my expectations for good embedded work, being presented as a library would be better than as an application, the README needs lots of work, etc), but something more sophisticated isn't too far out of reach.

Sean-Der
·
10 months ago
·
[ - ]

I will work on making it better! This was announced Tuesday [0] I still need to give it lots of love.

Even though the README isn’t completely done, give it a chance I bet you can have fun with it :)

[0] https://youtu.be/14leJ1fg4Pw?t=625&si=aqHm1UAdDEz91TnD

jonathan-adly
·
10 months ago
·
[ - ]

Here is a nice use-case. Put this in a pharmacy - have people hit a button, and ask questions about over-the-counter medications.

Really - any physical place where people are easily overwhelmed, have something like that would be really nice.

With some work - you can probably even run RAG on the questions and answer esoteric things like where the food court in an airport or the ATM in a hotel.

swatcoder
·
10 months ago
·
[ - ]

> Put this in a pharmacy - have people hit a button, and ask questions about over-the-counter medications.

Even if you trust OpenAI's models more than your trained, certified, and insured pharmacist -- the pharmacists, their regulators, and their insurers sure won't!

They've got a century of sunk costs to consider (and maybe even some valid concern over the answers a model might give on their behalf...)

Don't be expecting anything like that in an traditional regulated medical setting any time soon.

dymk
·
10 months ago
·
[ - ]

The last few doctors appointments I’ve had, the clinician used a service to record and summarize the visit. It was using some sort of TTS and LLM to do so. It’s already in medical settings.

swatcoder
·
10 months ago
·
[ - ]

Transcription and summary is a vastly different thing than providing medical advice to patients.

pixelsort
·
10 months ago
·
[ - ]

Thanks for digging that out. Yes, that makes sense to me as someone who made a fully local speech-2-speech prototype with Electron, including VAD and AEC. It was responsive but taxing. I had to use a mix of specialty models over onnx/wasm in the renderer and llama.cpp in the main process. One day, multimodal model will just do it all.

roland35
·
10 months ago
·
[ - ]

Favorited and starred! I wonder if the real power of this could be in integrating large low cost sensor networks? I think with things like video and audio it might make more sense to bump up to a single board Linux board - but maybe the AI could help parse or create notifications based on sensor readings, and push back events to the real world (lights, solenoids, etc)

I think it would help to either have a freertos example, or if you want to go real crazy create a zephyr integration! It would be a lot of fun to work on AI and microcontroller combination - what a cool niche!

Sean-Der
·
10 months ago
·
[ - ]

I’m very curious about what a LLM could deduce if you sent in lots of sensor data.

I love my Airthings. It don’t know if it’s actionable, but it would be cool to see what conclusions would come up from sending co2 and radon readings in. Could make understanding your home a lot easirr

johanam
·
10 months ago
·
[ - ]

Love this! Excited to give it a try.

Sean-Der
·
10 months ago
·
[ - ]

Thank you! If you run into problems shoot me a message. I really want to make this easy enough for everyone to build with it.

I have talked with incredibly creative developers that are hampered by domain knowledge requirements. I hope to see an explosion of cool projects if we get this right :)

·
10 months ago
·
[ - ]