Nerdsnipe confirmed :)

Claude Opus came up with this script:

https://pastebin.com/ntE50PkZ

It produces a somewhat-readable PDF (first page at least) with this text output:

https://pastebin.com/SADsJZHd

(I used the cleaned output at https://pastebin.com/UXRAJdKJ mentioned in a comment by Joe on the blog page)

> It produces a somewhat-readable PDF (first page at least) with this text output

Any chance you could share a screenshot / re-export it as a (normalized) PDF? I’m curious about what’s in there, but all of my readers refuse to open it.

  • pests
  • ·
  • 5 hours ago
  • ·
  • [ - ]
So it was a public event attended by 450 people:

https://www.mountsinai.org/about/newsroom/2012/dubin-breast-...

https://www.businessinsider.com/dubin-breast-center-benefit-...

Even names match up, but oddly the date is different.

Your links are for the inaugural (first) ball in December 2011; OP's text referred to a second annual ball in December 2012.
  • pests
  • ·
  • 3 hours ago
  • ·
  • [ - ]
You are right my first is incorrect but the second does seem to be from 2012.
DUBIN BREAST CENTER SECOND ANNUAL BENEFIT MONDAY, DECEMBER 10, 2012 HONORING ELISA PORT, MD, FACS AND THE RUTTENBERG FAMILY HOST CYNTHIA MCFADDEN SPECIAL MUSICAL PERFORMANCES CAROLINE JONES, K'NAAN, HALEY REINHART, THALIA, EMILY WARREN MANDARIN ORIENTAL 7:00PM COCKTAILS LOBBY LOUNGE 8:00PM DINNER AND ENTERTAINMENT MANDARIN BALLROOM FESTIVE ATTIRE
looks like we have it. in the end it's pretty mundane...
Which begs the question why was it censored?
> it’s safe to say that Pam Bondi’s DoJ did not put its best and brightest on this

Or worse. She did.

there are a few messaging conversations between FB agents early on that are kind of interesting. It would be very interesting to see them about the releases. I sometimes wonder if some was malicious compliance... ie, do a shitty job so the info get's out before it get re-redacted... we can hope...
I mean, the internet is finding all her mistakes for her. She is actually doing alright with this. Crowdsource everything, fix the mistakes. lol.
This would be funnier if it wasn’t child porn being unredacted by our government
Weren’t. Subjunctive mood.
I can't believe what we've become.
Every second of my political consciousness in the United States has been acutely tinged with the awareness that a bunch of people, across most of the political spectrum live in a constant state of denial. Denial of personal responsibility or culpability. Denial of cognitive dissonance. Denial of any distinct, self-informed morals. Denial of anything but a fear of others. Denial of anything that makes them fearful or uncomfortable or might invite confrontation.

I've known from the second I started doing debate and FX/DX in highschool, well, let's just say I never thought that the majority of the 2FA-folks would be worth a damn when tyranny really came knocking. Fear of the other as a form of manipulation, and a distraction from class consciousness, has been their literal raison d'état since decades before I was born.

I guess I was shocked that the President being a convicted rapist and documented child predator would be a bridge too far. But then we re-elected him.

I believe it. We voted for this. We do nothing in the face of zero actual justice. This is exactly as good as we deserve. And best of all, it certainly doesn't stop here. This is what they chose to not redact. When we know they spent enormous tax-payer hundreds-of-people hours redacting the documents.

I don't think it's even conspiratorial to say they left stuff in, so they could use it as justification for not releasing the other HALF of the files that haven't been released, even overly censored.

We deserve this, and the much worse that our apathy has invited.

I will certainly feel less confident ridiculing conspiracy theories.

I’d never believe Bill Gates would secretly slip antibiotics into his wife’s cocktail to treat an STI he got from a Russian prostitute on convicted pedophile estate.

But here we are.

I wish I could believe in more conspiracy theories. At least then I might believe there was some sort of master plan, that some individual or group had some image of a better world (to them) and that the world was being steered somewhere.

Unfortunately no, it just seems to be greed, incompetence, and incompetent greed. At least when a tank drives over a protestor somebody gets to be on the side of the tank. When the bus goes off a cliff because the driver sold the steering wheel everybody dies.

  • s5300
  • ·
  • 3 hours ago
  • ·
  • [ - ]
[dead]
> become

the mascot of 4chan was literally pedobear, what time frame are you referring to?

Become?
I wonder if this could be intentional. If the datasets are contaminated with CSAM, anybody with a copy is liable to be arrested for possession.

More likely it's just an oversight, but it could also be CYA for dragging their feet, like "you rushed us, and look at these victims you've retraumatized". There are software solutions to find nudity and they're quite effective.

Or it's distraction. Leave nudity in to use up attention that should be turning to analysis of what's been redacted.

There's redaction to protect victims and there's redaction to protect specific co-conspirators in Epstein's spy ring

the issue is that mistakes can't be fixed in the sense once they are discovered, it doesn't matter if they are eventually redacted
Let's see her sued for leaking PII. Here in Europe, she'd be mincemeat.
  • ISL
  • ·
  • 7 hours ago
  • ·
  • [ - ]
The US administration is, at present, regularly violating the law and ignoring court orders. Indeed, these very releases are patently in violation of multiple federal laws -- they're simultaneously insufficiently-responsive to meet the requirements of the law requiring the release of the files and fall afoul of CSAM laws by being incompletely redacted.

The challenge, as we're all experiencing together, is that the law is not inherently self-enforcing.

Can you provide a couple examples of the laws they're violating?
  • ISL
  • ·
  • 6 hours ago
  • ·
  • [ - ]
As noted above:

https://www.govinfo.gov/content/pkg/PLAW-119publ38/pdf/PLAW-... : the Attorney General was to have produced the entirety of the Epstein files, with very narrowly-enumerated redactions, in December. She has not done so.

Furthermore, there are numerous allegations that the documents that have been released contain CSAM, which (referencing the PDF above) may fall afoul of 18 U.S.C. 2252–2252A.

In addition, one need only glance at the action in US courts to see egregious violations of the Constitution and valid court orders playing out daily.

https://www.documentcloud.org/documents/26513988-trorder0128...

https://storage.courtlistener.com/recap/gov.uscourts.mnd.230...

  • ·
  • 5 hours ago
  • ·
  • [ - ]
Allegations aren't evidence. Has the Administration actually been found guilty of violating the law - if that is even possible.
Yes, the Abrego Garcia and Öztürk detentions are two very newsworthy cases that have actually reached the point of a final judgement in the district courts, as opposed to "merely" preliminary injunctions against the government.

(It's also worth noting that almost none of the government's appeals to their losses in preliminary injunctions have been on the merits as to whether or not their actions were legal, but rather on the grounds of "no one should be allowed to challenge our actions," which has also been a fairly losing argument for everybody except SCOTUS.)

>if that is even possible

yes.... any administration can be found guilty of violating law, and should be dealt with accordingly.

Evidence is evidence - of which there are enormous amounts of.
Are you expecting the administration to prosecute itself?
That's why there is separation of powers or ought to be.
How about court orders?

https://www.cbsnews.com/minnesota/news/ice-violations-judge-...

> ICE has likely violated more court orders in January 2026 than some federal agencies have violated in their entire existence," Schiltz said, adding that he counted 96 court orders that ICE has violated in 74 cases.

https://www.cbsnews.com/news/frustrations-from-judge-prosecu...

  • ·
  • 5 hours ago
  • ·
  • [ - ]
Allegations are not evidence.
"Allegations" from the exact judges whose orders aren't being enacted? The orders in question are pretty simple: release this guy. Don't take this guy out of state. It's pretty clear when they're not being followed. This guy is not a slouch:

https://www.politico.com/news/2026/01/27/patrick-schiltz-jud...

https://storage.courtlistener.com/recap/gov.uscourts.mnd.230...

https://storage.courtlistener.com/recap/gov.uscourts.mnd.230...

Did you notice that one article I linked involved a DoJ lawyer admitting that she couldn't convince ICE to obey court orders that she was trying to transmit to them? That's beyond an allegation and into admission. How is that not evidence?

More on these ignored court orders:

https://www.mprnews.org/story/2026/01/28/ice-illegally-detai...

They illegally fired the IGs responsible for whistleblowers and fraud in every department; https://www.nycbar.org/press-releases/firings-of-inspectors-...

They illegally withheld funds (impoundment) from congressionally authorized/mandated expenditures and relied on pocket rescissions to defund programs they didn't like: https://www.cbpp.org/research/federal-budget/pocket-rescissi...

They keep illegally appointing unqualified hacks as US attorney in defiance of the mandate they're approved by the Senate (Essayli, Habba, Halligan, Sarcone, Chattah) - judges have found at least five of the appointments illegal. As one example: https://www.politico.com/news/2025/10/28/judge-los-angeles-t...

They've repeatedly violated court orders to either return immigrant detainees or release them. "This is one of dozens of court orders with which respondents have failed to comply in recent weeks.": https://www.cnn.com/2026/01/27/politics/patrick-schiltz-judg...

The EPA illegally convened a secret panel of climate deniers to issue a sham report in order to repeal the endangerment finding: https://www.nytimes.com/2026/01/30/climate/energy-department...

His targeting and shakedowns of Universities, law firms, and media companies is transparently illegal jawboning.

Everything about the tariffs is obviously illegal which he confirms every time he opens his mouth since he's relying on 'national security' justifications to issue them without Congress and he keeps insisting they're punishment for some random perceived slight.

His illegal firing of Federal workers without the notice required: https://www.npr.org/2025/09/25/nx-s1-5544317/federal-probati...

Some sillier things like renaming the Kennedy Center -- the law that established it literally said that it couldn't be renamed without Congress -- so Trump firing everyone on the board and then appointing a bunch of his flunkees to vote for the name change doesn't cut it.. https://beatty.house.gov/sites/evo-subsites/beatty.house.gov...

It's a literal onslaught of illegality so I can't tell if you haven't read a news article since 2025 or if you're trolling.

  • k33n
  • ·
  • 1 hour ago
  • ·
  • [ - ]
Nothing you mentioned above is illegal btw
[flagged]
  • k33n
  • ·
  • 1 hour ago
  • ·
  • [ - ]
[flagged]
There's more than enough credible reports of CSAM in the Epstein Files dump - more than enough for me to not go and download even a single file of them myself, simply because German law does not care about why you are in the possession of CSAM, even if you took the picture yourself.

The legal situation regarding CSAM is very strict no matter which country, and I better hope no one here will actually be dumb enough to provide actual links.

Yeah - they'll take these lessons learned for future batches of releases.
Teseract supports being trained for specific fonts, that would probably be a good starting point

https://pretius.com/blog/ocr-tesseract-training-data

It decodes to binary pdf and there are only so many valid encodings. So this is how I would solve it.

1. Get an open source pdf decoder

2. Decode bytes up to first ambiguous char

3. See if next bits are valid with an 1, if not it’s an l

4. Might need to backtrack if both 1 and l were valid

By being able to quickly try each char in the middle of the decoding process you cut out the start time. This makes it feasible to test all permutations automatically and linearly

Sounds like a job for afl
You might need to backtrack a lot more, due to the intermediate compression step?
This is one of those things that seems like a nerd snipe but would be more easily accomplished through brute forcing it. Just get 76 people to manually type out one page each, you'd be done before the blog post was written.
Or one person types 76 pages. This is a thing people used to do, not all that infrequently. Or maybe you have one friend who will help–cool, you just cut the time in half.
Typing 76 pages is easy when it's words in a language you understand. WPM is going to be incredibly slow when you actually have to read every character. On top of that, no spaces and no spellcheck so hopefully you didn't miss a character.
Seems like a job for an LLM
You think compelling 76 people to honestly and accurately transcribe files is something that's easy and quick to accomplish.
Non-engineers are perfectly willing to volunteer their time to do drudgery. It's one of my opseng career's distinguishing specialties: I'll do drudgery rather than code when appropriate, rather than avoiding it or sulking about it (as was a common response at work for some number of decades!). Learned that lesson when I was 18 from an internship (where I completely failed to deliver any work product due to trying to code around the work). It's part of why I'm going into accounting: apparently having the stamina for dreary work is rare?!

Also look up double/triple data-entry systems, where you have multiple people enter the data and then flag and resolve differences. Won't protect you from your staff banding together to fuck you over with maliciously bad data, but it's incredibly effective to ensure people were Actually Working Their Blocks under healthy circumstances.

> Just get 76 people

I consider myself fairly normal in this regard, but I don't have 76 friends to ask to do this, so I don't know how I'd go about doing this. Post an ad on craigslist? Fiverr? Seems like a lot to manage.

First, build a fanbase by streaming on Twitch.
Amazon Mechanical Turk?
You can use the justice.gov search box to find several different copies of that same email.

The copy linked in the post:

https://www.justice.gov/epstein/files/DataSet%209/EFTA004004...

Three more copies:

https://www.justice.gov/epstein/files/DataSet%2010/EFTA02153...

https://www.justice.gov/epstein/files/DataSet%2010/EFTA02154...

https://www.justice.gov/epstein/files/DataSet%2010/EFTA02154...

Perhaps having several different versions might make it easier.

Also, I found a different base64 encoding with a different font here:

https://www.justice.gov/epstein/files/DataSet%209/EFTA007755...

This doesn't solve the "1 & l" problem for the pdf you are looking at, but it could be useful anyway.

And this might be a copy of the original pdf:

https://www.justice.gov/epstein/files/DataSet%2011/EFTA02702...

Given how much of a hot mess PDFs are in general, it seems like it would behoove the government to just develop a new, actually safe format to standardize around for government releases and make it open source.

Unlike every other PDF format that has been attempted, the federal government doesn't have to worry about adoption.

  • Ekaros
  • ·
  • 14 minutes ago
  • ·
  • [ - ]
I give any new document format 3 to 5 years until it ends up with similar mess. And that is if it starts well designed and limited.
You’re thinking about this as a nerd.

It’s not a tools problem, it’s a problem of malicious compliance and contempt for the law.

Even the previous justice departments struggled with PDFs. The way they handled it was scrubbing all possible metadata and uploading it as images.

For example, when the Mueller reports were released with redactions, they had no searchable text or meta data because they were worried about these exact kind of data leaks.

However, vast troves of unsearchable text is not a huge win for transparency.

PDFs are just a garbage format and even good administrations struggle.

JPEG?
That's not really comparable - It needs to be editable and searchable.
Lossy
PNG
  • ·
  • 8 hours ago
  • ·
  • [ - ]
Why not just try every permutation of (1,l)? Let’s see, 76 pages, approx 69 lines per page, say there’s one instance of [1l] per line, that’s only… uh… 2^5244 possibilities…

Hmm. Anyone got some spare CPU time?

It should be much easier than that. You should should be able to serially test if each edit decodes to a sane PDF structure, reducing the cost similar to how you can crack passwords when the server doesn't use a constant-time memcmp. Are PDFs typically compressed by default? If so that makes it even easier given built-in checksums. But it's just not something you can do by throwing data at existing tools. You'll need to build a testing harness with instrumentation deep in the bowels of the decoders. This kind of work is the polar opposite of what AI code generators or naive scripting can accomplish.
I wonder if you could leverage some of the fuzzing frameworks tools like Jepsen rely on. I’m sure there’s got to be one for PDF generation.
On the contrary, that kind of one-off tooling seems a great fit for AI. Just specify the desired inputs, outputs and behavior as accurately as possible.
Easy, just start a crypto currency (Epsteincoin?) based on solving these base64 scans and you'll have all the compute you could ever want just lining up
  • ·
  • 9 hours ago
  • ·
  • [ - ]
DUBIN BREAST CENTER SECOND ANNUAL BENEFIT MONDAY, DECEMBER 10, 2012 HONORING ELISA PORT, MD, FACS AND THE RUTTENBERG FAMILY HOST CYNTHIA MCFADDEN SPECIAL MUSICAL PERFORMANCES CAROLINE JONES, K'NAAN, HALEY REINHART, THALIA, EMILY WARREN MANDARIN ORIENTAL 7:00PM COCKTAILS LOBBY LOUNGE 8:00PM DINNER AND ENTERTAINMENT MANDARIN BALLROOM FESTIVE ATTIRE
Some pics from the event. Doppelgänger in the background?: https://web.archive.org/web/20121215131412/https://thaliadiv...
This proves my paranoia that you should print and rescan redactions. That or do screenshots of the pdf redacted and convert back to a pdf
How would that help in this case?
this would not have helped here
  • darig
  • ·
  • 4 hours ago
  • ·
  • [ - ]
[dead]
I wonder if jmail (https://www.jmail.world/) has worked on this?

I tried to find the message in this blog post, but couldn't. (don't see how to search by date).

pdftoppm and Ghostscript (invoked via Imagemagick) re-rasterize full pages to generate their output. That's why it was slow. Even worse with a Q16 build of Imagemagick. Better to extract the scanned page images directly with pdfimages or mutool.

Followup: pdfimages is 13x faster than pdftoppm

  • ·
  • 7 hours ago
  • ·
  • [ - ]
My non political take about this gift that keeps on giving is that: PDF might seem great for the end user that is just expected to read or print the file they are given, but the technology actually sucks.

PDF is basically a prettify layer on top of the older PS that brings an all lot of baggage. The moment you start trying to do what should be simple stuff like editing lines, merging pages, change resolution of the images, it starts giving you a lot of headaches.

I used to have a few scripts around to fight some of its quirks from when I was writing my thesis and had to work daily with it. But well, it was still an improvement over Word.

  • nubg
  • ·
  • 7 hours ago
  • ·
  • [ - ]
Wait would this give us the unredacted PDFs?
That's the idea yeah. There are other people actively working on this. You can follow vx-underground on twitter. They're tracking it.
  • poyu
  • ·
  • 7 hours ago
  • ·
  • [ - ]
I think it's the PDF files that were attached to the emails, since they're base64 encoded.
Love this, absolutely looking forward to some results.
Bummer that it's not December - the https://www.reddit.com/r/adventofcode/ crows would love this puzzle
If only Base64 had used a checksum.
"had used"? Base64 is still in very common use, specifically embedded within JSON and in "data URLs" on the Web.
"had" in the sense of when it was designed and introduced as a standard
here's another few to decode,

https://www.justice.gov/epstein/files/DataSet%2010/EFTA01804...

https://www.justice.gov/epstein/files/DataSet%209/EFTA007755...

https://www.justice.gov/epstein/files/DataSet%209/EFTA004349...

and than this one judging by the name of the file (hanna something) and content of the email:

"Here is my girl, sweet sparkling Hanna=E2=80=A6! I am sure she is on Skype "

maybe more sinister (so be careful, i have no ideas what the laws are if you uncover you know what trump and Epstein were into)...

https://www.justice.gov/epstein/files/DataSet%2011/EFTA02715...

[Above is probably a legit modeling CV for HANNA BOUVENG, based on, https://www.justice.gov/epstein/files/DataSet%209/EFTA011204..., but still creepy, and doesn't seem like there's evidence of her being a victim]

this one has a better font, might be a simple copy&paste job
I've checked for copy and paste, there's so many character flaws, their OCR must have sucked really bad, I may try with deepseekOCR or something. I mean the database would probably more searchable if someone ran every file through a better OCR.
On one hand, the DOJ gets shit because it was taking too long to produce the documents, and then on another, they get shit because there are mistakes in the redacting because there are 3 million pages of documents.
What they are redacting is pretty questionable though. Entire pages being suspiciously redacted with no explanation (which they are supposed to provide). This is just my opinion, but I think it's pretty hard to defend them as making an honest and best effort here. Remember they all lied about and changed their story on the Epstein "files" several times now (by all I mean Bondi, Patel, Bongino, and Trump).

It's really really hard to give them the benefit of the doubt at this point.

My favorite is that sometimes they redact the word "don't". Not only does it totally change the meaning of whatever sentence it's in, the conspiracy theory is that they had a Big Dumb Regex for redacting /Don\W+T/i to remove Trump references
"On the one hand the chef gets shit for taking too long, and then on another for undercooked, badly plated dishes."

Incompetence is incompetence.

Considering the justice to document ratio that's kind of on them regardless.
  • ·
  • 4 hours ago
  • ·
  • [ - ]
I took at stab at training Tesseract and holy jeebus is their CLI awful. Just an insanely complicated configuration procedure.
> …but good luck getting that to work once you get to the flate-compressed sections of the PDF.

A dynamic programming type approach might still be helpful. One version or other of the character might produce invalid flate data while the other is valid, or might give an implausible result.

Time to flex those Leetcode skills.
  • ·
  • 9 hours ago
  • ·
  • [ - ]
Honestly, this is something that should've been kept private, until each and every single one of the files is out in the open. Sure, mistakes are being made, but if you blast them onto the internet, they WILL eventually get fixed.

Cool article, however.

This one is irresistible to play with. Indeed a nerd snipe.
I doubt the PDF would be very interesting. There are enough clues in the human-readable parts: it's an invite to a benefit event in New York (filename calls it DBC12) that's scheduled on December 10, 2012, 8pm... Good old-fashioned searching could probably uncover what DBC12 was, although maybe not, it probably wasn't a public event.

The recipient is also named in there...

There's potentially a lot of files attached and printed out in this fashion.

The search on the DOJ website (which we shouldn't trust), given the query: "Content-Type: application/pdf; name=", yields maybe a half dozen or so similarly printed BASE64 attachments.

There's probably lots of images as well attached in the same way (probably mostly junk). I deleted all my archived copies recently once I learned about how not-quite-redacted they were. I will leave that exercise to someone else.

There's 70 results that come out when searching for "application/pdf" on the doj website
OK, but if the solution is to brute-force them, there's probably a need to choose which files to focus on.

Of course there are other content-types, e.g. searching for "Content-Type: image/jpeg" gets hits as well. But only a few of them actually have the base64 data, mostly there are just the MIME headers.. Looking for "/9j/" (which is Base64 for FF D8 FF, which is the header for JPEG files), the Trumpian justice.gov website ignores "/" and shows results case-insensitively, but there are 4 or 5 base64'ed JPEG images in there.

I also saw that the page is vulnerable to code injection, somehow garbage in one search result preview was OCREd as "<s [lots of garbage]>", and the rest of the search results were striken-through because "<s>" is the HTML to do that.

I'm only here to shout out fish shell, a shell finally designed for the modern world of the 90s
Are there archives of this? I have no doubt after this post goes viral some of these files might go “missing” Having a large number of conspiracies validated has lead me to firmly plant my aluminum hat
[dead]
With the whole Epstein stuff, hadn on heart, what man would not live like him if you had the opputtunity? Its like a dream. A role model, can I get some movies or shows to immerse in?
it's really all about the blackmail