Are chiplets enough to save Moore’s Law? (68 points, 1 year ago, 73 comments) https://news.ycombinator.com/item?id=36371651
Intel enters a new era of chiplets (128 points, 2 years ago, 164 comments) https://news.ycombinator.com/item?id=32619824
U.S. focuses on invigorating ‘chiplets’ to stay cutting-edge in tech (85 points, 2 years ago, 48 comments) https://news.ycombinator.com/item?id=35917361
Intel, AMD, and other industry heavyweights create a new standard for chiplets (10 points, 3 years ago) https://news.ycombinator.com/item?id=30533548
Intel Brings Chiplets to Data Center CPUs (49 points, 3 years ago, 50 comments) https://news.ycombinator.com/item?id=28250076
Deep Dive: Nvidia Inference Research Chip Scales to 32 Chiplets (47 points, 5 years ago) https://news.ycombinator.com/item?id=20343550
Chiplets Gaining Steam (4 points, 7 years ago) https://news.ycombinator.com/item?id=15636572
Tiny Chiplets: A New Level of Micro Manufacturing (36 points, 12 years ago, 12 comments) https://news.ycombinator.com/item?id=5517695
Managing your own threads and always assuming the worst case scenario (I.e., they're all a microsecond away from each other) will give you the most defensive solution as architecture changes over time.
I really don't know the best answer. Another NUMA node implies some necessary additional complexity. It would seem the wonder of 192 cores in a socket has some tradeoffs the software people need to compensate for now. You can't get all that magic for free.
The nuance is that even if you "hide" the extra NUMA tier and have oversized lanes connecting everything, you still have some cores that are further away than others in time. It doesn't matter how it shows up in device manager, the physical constraint is still there.
Threadripper and EPYC are totally different. They actually have NUMA topologies, where different dies are wired such that they have more distance to traverse to reach a given memory module, and accessing the farther ones involves some negotiation with a routing component.
In later generations of Zen Epyc CPUs, there's a central IO die which has the memory controllers and serdes blocks, then small CCX dies which effectively just had CPU cores and caches on them. This allowed for using different process nodes or manufacturing technologies to optimize for each die's specific tasks. It also allowed for much more modular configurations of CPU core counts and CPU frequencies while providing a high number of memory controllers and serdes lanes, to address different market desires.
(There are times when I'd rather ask HN, than AI).
Is it kind of like taking a step backward: going back to then ancient classic circuit board with multiple chips and shrinking that design as-is, but without putting everything on one die?