The PineBuds are designed and sold as an open firmware platform to allow software experimentation, so there’s nothing bad nor any economic failures going on here. Having a powerful general purpose microcontroller to experiment with is a design goal of the product.
That said, ANC Bluetooth earbuds are not menial products. Doing ANC properly is very complicated. It’s much harder than taking the input from a microphone, inverting the signal, and feeding it into the output. There’s a lot of computation that needs to be done continuously.
Using a powerful microcontroller isn’t a failure, it’s a benefit of having advanced semiconductor processes. Basically anything small and power efficient on a modern process will have no problem running at tens of MHz speeds. You want modern processes for the battery efficiency and you get speed as a bonus.
The speed isn’t wasted, either. Higher clock speeds means lower latency. In a battery powered device having an MCU running at 48MHz may seem excessive until you realize that the faster it finishes every unit of work the sooner it can go to sleep. It’s not always about raw power.
Modern earbuds are complicated. Having a general purpose MCU to allow software updates is much better than trying to get the entire wireless stack, noise cancellation, and everything else completely perfect before spinning out a custom ASIC.
We’re very fortunate to have all of this at our disposal. The groveling about putting powerful microcontrollers into small things ignores the reality of how hard it is to make a bug-free custom ASIC and break even on it relative to spending $0.10 per unit on a proven microcontroller manufacturer at scale.
If that's all your evidence is, don't you dare go near any scientific papers.
But it is important to note that a lot of what people decry as "AI Generated" is really the fact that someone is adhering to what have been best practices in publishing arguments for some time.
They really aren't. Every material that goes into every chip needs to be sourced from various mines around the world, shipped to factories to be assembled, then the end goods need to be shipped again around the world to be sold or directly dumped.
High power, low power, it all has negative environmental impact.
it's not 'just sand'.
Doing the work in software allows for updates and bug fixes, which are more likely to prevent piles of hardware from going into the landfill (in some cases before they even reach customers’ hands).
You might be wondering "how on earth a more advanced chip can end up being cheaper." Well, it may surprise you but not all cost in manufacturing is material cost. If you have to design a bespoke chip for your earbuds, you need to now hire chip designers, you need to go through the whole design and testing process, you need to get someone to make your bespoke chip in smaller quantities which may easily end up more expensive than the more powerful mass manufactured chips, you will need to teach your programmers how to program on your new chip, and so on. The material savings (which are questionable — are you sure you can make your bespoke chip more efficiently than the mass manufactured ones?) are easily outweighed by business costs in other parts of the manufacturing process.
Hardware is cheap and small enough that we can run doom on an earbud, and I’m supposed to think this is a bad thing?
It's absolute bonkers amount of hardware scaling that happened since Doom was released. Yes, this is a tremendous overkill here, but the crazy part here is that this fits into an earpiece.
Why don't you compare it to let's say pdp11, vax780/11 or Cray 1 supercomputer?
NASA used a lot of supercomputers here on earth pior to mission start.
More than anything, it was designed to be small and use little power.
But these little ARM Cortex M4F that we're comparing to are also designed for embedded, possibly hard-real-time operations. And dominant factors in experience on playback through earbuds are response time and jitter.
If the AGC could get a capsule to the moon doing hard real-time tasks (and spilling low priority tasks as necessary), a single STM32F405 with a Cortex M4F could do it better.
Actually, my team is going to fly a STM32F030 for minimal power management tasks-- but still hard real-time-- on a small satellite. Cortex-M0. It fits in 25 milliwatts vs 55W. We're clocked slow, but still exceed the throughput of the AGC by ~200-300x. Funnily enough, the amount of RAM is about the same as the AGC :D It's 70 cents in quantity, but we have to pay three whole dollars at quantity 1.
> NASA used a lot of supercomputers here on earth pior to mission start.
Fine, let's compare to the CDC 6600, the fastest computer of the late 60's. M4F @ 300MHz is a couple hundred single precision megaflops; CDC6600 was like 3 not-quite-double-precision megaflops. The hacky "double single precision" techniques have comparable precision-- figure that is probably about 10x slower on average, so each M4F could do about 20 CDC-6600 equivalent megaflops or is roughly 5-10x faster. The amount of RAM is about the same on this earbud.
His 486-25 -- if a DX model with the FPU -- was probably roughly twice as fast as the 6600 and probably had 4x the RAM, and used 2 orders of magnitude less power and massed 3 orders of magnitude less.
Control flow, integer math, etc, being much faster than that.
Just a few more pennies gets you a microcontroller with a double precision FPU, like a Cortex-M7F with the FPv4-SP-D16, which at 300MHz is good for maybe 60 double precision megaflops-- compared to the 6600, 20x faster and more precision.
It looks like NASA had 5 360/75's plus a 360/91 by the end, plus a few other computers.
The biggest 360/75's (I don't know that NASA had the highest spec model for all 5) were probably roughly 1/10th of a 486-100 plus 1 megabyte of RAM. The 360/91 that they had at the end was maybe 1/3rd of a 486-100 plus up to 6 megabytes of RAM.
Those computers alone would be about 85% of a 486-100. Everything else was comparatively small. And, of course, you need to include the benefits from getting results on individual jobs much faster, even if sustained max throughput is about the same. So all of NASA, by the late 60's, probably fits into one relatively large 486DX4-100.
Incidentally, one random bit of my family lore; my dad was an IBM man and knew a lot about 360's and OS/360. He received a call one evening from NASA during Apollo 13 asking for advice about how they could get a little bit more out of their machines. My mom was miffed about dinner being interrupted until she understood why :D
And, of course, most of Cerebras' costs are NRE and the stuff like getting heat out of that wafer and power in.
Also, process that is good at making logic isn't necessarily good for making DRAM. Yes, eDRAM exists, but most designs don't put DRAM on the same die as logic and instead stack it or put it off-chip.
Almost all these microcontrollers that are single-die have flash+SRAM. Almost all microprocessor cache designs are SRAM for these reasons (with some designs using off-die L3 DRAM)-- for these reasons.
>The whole point of putting memory close is to increase performance and bandwidth, and DRAM is fundamentally latent.
When the access patterns are well established and understood, like in the case of transformers, you can mitigate latency by prefetch (we can even have very beefed up prefetch pipeline knowing that we target transformers), while putting memory on the same chip gives you huge number of data lines thus resulting in huge bandwidth.
Would it be a little better with on-wafer distributed DRAM and sophisticated prefetch? Sure, but it wouldn't match SRAM, and you'd end up with a lot more interconnect and associated logic. And, of course, there's no clear path to run on a leading logic process and embed DRAM cells.
In turn, you batch for inference on H200, where Cerebras can get full performance with very small batch sizes.
I bought a kodak camera in 2000 (640x480 resolution) and even that could run Doom on it. Way back when. Actually playable with sounds and everything.
Here's an even older one running it: https://m.youtube.com/watch?v=k-AnvqiKzjY
FPGAs are not cost efficient at all for something like this.
MCUs are so cheap that you’d never get to a cheaper solution by building out a team to iterate on custom hardware until it was bug free and ready to scale. You’d basically be reinventing the MCU that can be bought for $0.10, but with tens of millions of dollars of engineering and without economies of scale that the MCU companies have.
Where are you imagining costy savings coming from? Custom anything is almost always vastly more expensive than using a standardised product.
An earbud that does ANC, supports multiple different audio standard including low battery standby, is somewhat resistant to interference, can send and receive over many meters. That's awesome for the the price. That it has enough processing to run a 33 year old game.. well, that's just technological progression.
A single modern smartphone has more compute than all global conpute of 1980 combined.
(imagine the lunar lander computer being an earbud ha)
A single Airpod would be about 10^4 times as powerful as the entire lunar lander guidance system.
Or to put another way: a single Airpod would outcompute the entire Soviet Union's space program.
You're literally just wasting sand. We've perfected the process to the point where it's inexpensive to produce tiny and cheap chips that pack more power than a 386 computer. It makes little difference if it's 1,000 transistors or 1,000,000. It gets more complicated on the cutting edge, but this ain't it. These chips are probably 90 nm or 40 nm, a technology that's two decades old, and it's basically the off-ramp for older-generation chip fabs that can no longer crank out cutting-edge CPUs or GPUs.
Building specialized hardware for stuff like that costs a lot more than writing software that uses just the portions you need. It requires deeper expertise, testing is more expensive and slower, etc.
It's also a triumph of the previous generation of programmers to be able to make interesting games that took so little compute.
We've got a long way to go.
On every other axis, though, it's likely a very clear win: reusable chips means cheaper units, which often translates into real resource savings (in the extreme case, it may save an entire additional factory for the custom chips, saving untold energy and effort.)
The RAM costs a little bit, but if you want to firmware update in a friendly way, etc, you need some RAM to stage the updates.
Now ... I played the game when I was young. It was addictive. I don't think it was a good game but it was addictive. And somewhat simple.
So what is the problem then? Well ... games have gotten a lot bigger, often more complicated. Trying to port that to small platforms is close to impossible. This makes me sad. I think the industry, excluding indie tech/startups, totally lost the focus here. The games that are now en vogue, do not interest me at all. Sometimes they have interesting ideas - I liked little nightmares here - but they are huge and very different from the older games. And often much more boring too.
One of my favourite DOS games was master of orion 1 for instance. I could, despite its numerous flaws, play that again and again and again. Master of Orion 2 was not bad either, but it was nowhere near as addictive and the gameplay was also more convoluted and slower.
(Sometimes semi-new games are also ok such as Warcraft 3. I am not saying ALL new games are bad, but it seems as if games were kind of dumbed down to be more like a video to watch, with semi-few interactive elements as you watch it. That's IMO not really a game. And just XP grinding for the big bad wolf to scale to the next level, deal out more damage, as your HP grows ... that's not really playing either. That's just wasting your time.)
The value of being small for most users almost doesn't exist. If you have bandwidth limits then yeah download size is important but most don't.
So the only meaningful change optimizations make is "will it run well enough" and "does it fit on my disk".
Put more plainly "if it works at all it doesn't matter" is how most consumers (probably correctly) treat performance optimizations/installation size.
The sacrifices you talk about were made at explicit request of consumers. Games have to be "long enough" and the difference between enough game loop and grinding is a taste thing. Games have to be "pretty" and for better or worse stylized takes effort and is a taste thing (see Wind Waker) while fancy high res lighting engines are generally recognized as good.
I will say though while being made by indies means they are optimized terribly the number of stylized short games is phenomenally high it can just be hard to find them.
Especially since it is difficult for an hour or two game to be as impactful as a similar length movie so they tend to not be brought up as frequently.
Filesize matters, especially to people with limited bandwidth and data caps. The increasing cost of SSDs only makes this situation more hardware constrained.
I wonder what his feelings are in this age of AI.
Just speculation on my part of course.
Also, “masters of doom” is such a good book. Recommend it for anyone who wants to peek behind the scenes of how Carmack, Romero, and iD software built Doom (and Wolf3D etc).
But a very cool link, thanks for sharing! :)
No touch controls though, it just plays the intro loop
Also, with DOOM running on all these things now, is it still impossible to get it to run well on a 386?
Does this means you can run a doom instance on each bud? Is it viable to make a distributed app to use the computing power of both buds at once?
Using them for distributed computation though? interesting use of free will xD
Left ear for the right eye and vice versa
"VR Doom has been ported to an earbud(s)" ;)
But, this probably makes more sense.
- Society continues to produce more and more powerful devices.
- More and more of these devices begin running Doom.
- When this reaches the saturation point, society becomes Doom.