I'm surprised the microcode ROM and format hasn't been dumped already. Is anyone working on this?

EDIT: The later Atom processors were dumped, are there any similarities?

[1] https://x.com/_markel___/status/1262697756805795841

[2] https://github.com/chip-red-pill/glm-ucode

EDIT 2: Some Pentium Pro disassembly work: https://pbx.sh/pentiumii-part2/

  • kens
  • ·
  • 1 day ago
  • ·
  • [ - ]
There are some people working on the 386 microcode. Dumping the Pentium microcode ROM from the die photos would be straightforward (but tedious). The hard part is to figure out what all the bits mean.
Any ideas if the mask ROM is scrambled? Apparently the P6 doesn't have a direct mask ROM : microcode relationship.

https://github.com/peterbjornx/p6tools

  • kens
  • ·
  • 1 day ago
  • ·
  • [ - ]
The Pentium's ROM appears to be slightly scrambled (see footnote 6 in my article). ROMs are often a bit permuted for electrical reasons. For example, instead of columns ordered ABABABAB..., they will be ordered ABBAABBA... and then the A and B select lines can be shared by two columns. But the columns in the Pentium appear to be permuted in an irregular way. I'm not sure if this was for obfuscation or if automated layout software decided this was better.
I'm curious if the register you see near the microcode ROM is potentially hooked up to MSRs -- it could potentially be a read or write buffer.

https://www.cs.cmu.edu/~ralf/papers/highmsr.html

> To the left of the MAR is a 32-bit register that is apparently unrelated to the microcode ROM, although I haven't determined its function.

  • kens
  • ·
  • 1 day ago
  • ·
  • [ - ]
That register could be a Model-Specific Register; I haven't looked at it closely enough to see what it does. The Pentium is very complicated with 3.1 million transistors, so my reverse-engineering of it is essentially bits and pieces here and there.
> are there any similarities?

Don't know about the format, but if you look thru old ITJ articles[^1], it seems like the "direct access" interface for reading out different memories exists on older Pentium parts too. Presumably, if it were possible to dump over JTAG, it would be at least a little bit similar to what Peter/Mark have already looked at on newer parts.

[^1: https://www.intel.com/content/dam/www/public/us/en/documents...

  • kens
  • ·
  • 1 day ago
  • ·
  • [ - ]
Author here for your Pentium questions :-)
Hi Ken,

Nice article. While reading I remembered that I watched some time ago the Oral History of Gary Davidian and he was quite bit involved with microcoding. And if I were you I would try asking him if he could be able to give you some ideas where to get more information about microcode workings and development.

Here are links to that interview, if you have time to watch it. It's in two parts.

- https://www.youtube.com/watch?v=l_Go9D1kLNU

- https://www.youtube.com/watch?v=MVEKt_H3FsI

Cheers,

:-) riku

This is the Gary Davidian of the Classic MacOS PPC nanokernel, no? I wish I've had as much fun work as he's had in his career.
Also the Gary Davidian of Intel vs NEC fame!

https://thechipletter.substack.com/p/intel-vs-nec-the-case-o...

> If you have enough time, you can extract the bits from the ROM by examining the silicon and seeing where transistors are present.

I'm curious if this is a better way than somehow scanning the ROM electronically? Asking based on my very shallow understanding of how ROM works in this situation, although I did read the bit about M1, M2, and M3 lines/contacts.

[edit: I also read about the testing circuitry, that "runs through each address," but it's unclear if this is an auto feature running without being asked at startup, or if there is some way to tap into / intercept this functionality from outside.]

  • kens
  • ·
  • 1 day ago
  • ·
  • [ - ]
You could put microprobes on the die and read out the ROM contents electrically, but that would be difficult and would need specialized equipment. Reading out the ROM visually is much easier, and there is software that can interpret images if they are clear enough, e.g. maskromtool: https://github.com/travisgoodspeed/maskromtool

The Pentium's built-in self test is somewhat documented: you pull the INIT pin high while the RESET pin goes low to trigger the test. You can also execute the RUNBIST instruction through boundary scan. I don't think this helps you get the ROM data; the test just reports pass/fail.

Can I add my own instruction set extensions to the original x86 isa as implemented by the 8086 without permission from Intel and / or AMD as long as I'm not copying any x86 instruction set extensions?
Any patents on the 8086 have long expired, and so have the ones from the last century. As Ken says, the microcode is copyrighted but you don't need to use that to make a compatible version.
  • kens
  • ·
  • 1 day ago
  • ·
  • [ - ]
I don't know the legal details here but I think you can do whatever you want as long as you're not violating any patents (good luck). Also, Intel claims a copyright on the mnemonics for 8080 and 8086 assembly language. Microcode is also protected by copyright.
Isn't the Pentium's microcode upgradable? Or is that only in later chips?

These fixed transistors imply no upgradability.

  • kens
  • ·
  • 1 day ago
  • ·
  • [ - ]
Microcode updates were first implemented in the Pentium Pro. When the original Pentium had the infamous FDIV bug, the only fix was for Intel to replace the processors at a cost of $475 million.
Good thing win95 came out soon after and filled everyone’s coffers from new equipment buying
Why wasn’t the Pentium’s successor the Sexium?
Or the Hexium.

The CPU serial number debacle of the 90s would have been even funnier with more overt mark of the beast references.

  • kens
  • ·
  • 1 day ago
  • ·
  • [ - ]
Ha ha. Internally, the successor to the Pentium (P5) had the codename P6, but it was called the Pentium Pro externally rather than anything six-related.

Instead, Intel decided to go with an incomprehensible system of naming: Pentium Overdrive, Pentium MMX, Pentium Pro, Pentium II, Pentium III, Pentium III Xeon, Pentium D, Pentium M, Pentium Extreme Edition, etc. Good luck trying to figure out the ordering of these processors.

  • ssl-3
  • ·
  • 1 day ago
  • ·
  • [ - ]
Intel's bad naming is still shooting them in the foot today. For a company that butters their bread by selling new products, they're doing a spectacularly bad job of letting people know what the new hotness is.

I hear things like "What do you mean it's slow? It's an i7!" or "It can't be slow -- it's a Xeon!" from too many people in the wild.

To them, the first number is the important one. What they see is that it is still an i7 and therefore they think it must be still be (relatively) fast, even if their second-gen i7-2600 is demonstrably pretty slow.

I tried once to explain how Intel's numbering system has worked to a friend. I failed pretty miserably. I even used a whiteboard. I couldn't convey what needed to be conveyed in order to explain why his computer (an i7) wasn't keeping up with the tasks he gave to it.

But I can convey the problem simply enough in this crowd, here on HNN: What's faster, a "Core i3-9100" or a "Core i7-2600"?

(At least with 286, 386, 486, and Pentium, the nomenclature was much more digestible.)

What's faster, a "Core i3-9100" or a "Core i7-2600"?

One has 4 threads, the other has 8; and the difference between 6 generations is actually not that big, especially if you start talking about overclocking, cooling, and thermal throttling.

At least with 286, 386, 486, and Pentium, the nomenclature was much more digestible

Those were all single-core, but still, you could ask "what's faster, a 486SX-16 or a 386DX-33?" (The answer may surprise you. Sorry, couldn't resist...):

https://dependency-injection.com/the-slowest-486-vs-fastest-...

we could in theory get rid of the "ix" at the first. there is no i7 10100f cpu, only i3 10100f, so if you say "i have 10100f intel cpu", i know it's worse than 10400f cpu
Bob Colwell gave an interview on the Pentium Pro, the first "out of order" Intel x86.

His observations on the Itanium make me gasp.

https://www.sigmicro.org/media/oralhistories/colwell.pdf

https://news.ycombinator.com/item?id=38459128

'I said, wait I am sorry to derail this meeting. But how would you use a simulator if you don't have a compiler? He said, well that's true we don't have a compiler yet, so I hand assembled my simulations. I asked "How did you do thousands of line of code that way?" He said “No, I did 30 lines of code”. Flabbergasted, I said, "You're predicting the entire future of this architecture on 30 lines of hand generated code?" [chuckle], I said it just like that, I did not mean to be insulting but I was just thunderstruck. Andy Grove piped up and said "we are not here right now to reconsider the future of this effort, so let’s move on".'

Colwell is (more formally) the author of The Pentium Chronicles which I plan to read someday.

https://www.amazon.com/Pentium-Chronicles-Robert-P-Colwell/d...

E5200.
I imagine the same reason why we had the Macintosh II and IIx, but not the SE and SEx (instead SE/30)...
Fascinating deep dive into the Pentium microcode ROM circuitry! It's incredible to see the clever tricks Intel used, like the pseudo-random counter, to pack so much logic into such a constrained space. Articles like this give us a rare glimpse into the unsung engineering heroics behind these landmark chips.
I would love to know how multiplication and division work in modern chips to have such low cycle count compared to addition, since in theory the addition complexity is linear in the amount of bits but multiplication and division are quadratic, or loglinear for large inputs. Part of that is solved by surface area rather than time I guess, but that's also true for the adders already with the carry logic
  • kens
  • ·
  • 12 hours ago
  • ·
  • [ - ]
I'm working on the multiplication circuit in the Pentium; I've done a partial writeup: https://www.righto.com/2025/03/pentium-multiplier-adder-reve... The short answer is that multiplication uses a large tree of adders so it can add up all the long-division terms at once. It also uses base-8 for the multiplier to reduce the number of terms. The adders are 4:2 carry-save compressors that take four numbers as inputs and produce two numbers as outputs.

I also wrote about the Pentium's division circuitry and the infamous FDIV bug: https://www.righto.com/2024/12/this-die-photo-of-pentium-sho... The short answer is that the Pentium used base-4 SRT division, similar to long division but generating two bits of result per cycle. It used a lookup table to determine the two quotient bits; an error in this table resulted in the bug.

I remember reading somewhere--memory is hazy--that at least division uses a partial look up table, kinda like how you'd do it in 6502 assembly back in the day. E.g., if you have to multiply something by 5, and you can get the range of inputs down to something reasonable, then you can just have a table of x*5 for that range and just look it up.

Also I'm not sure multiplication/division are quadratic if your algorithm is not "add X to itself Y times." Look at this for 6502 16-bit multiply - https://www.llx.com/Neil/a2/mult.html - it's dependent on the bit width, not the value of the multiplier/cand. Of course this is for integers, not floating point.

[dead]