https://github.com/mmastrac/gibbet-hill
Top and bottom halves of the page in the repo here:
https://github.com/mmastrac/gibbet-hill/blob/main/scan-1.png https://github.com/mmastrac/gibbet-hill/blob/main/scan-2.png
EDIT: If you have access to a multi-modal LLM, the rough transcription + the column scan and the instruction to "OCR this text, keep linebreaks" gives a _very good_ result.
EDIT 2: Rough draft, needs some proofreading and corrections:
(I've transcribed various things over the years, including Sonia Greene's Alcestis [1] and Holtzman & Kershenblatt's "Castlequest" source code [2], so I know it doesn't take much except quick fingers and sufficient motivation. :))
[1] https://quuxplusone.github.io/blog/2022/10/22/alcestis/
[2] https://quuxplusone.github.io/blog/2021/03/09/castlequest/
EDIT: ...and as I was writing that, you seem to have finished your transcription. :)
LLMs are increasingly becoming cheaper and more accessible than humans with a baseline of literacy.
I've done transcription in the past myself (did two books for standard ebooks with some from-scratch transcription and lots of editing) and I know the pain. I've always found it easier to fix up OCR than type the whole thing by hand because I've found my error rate of eyeball transcription to be higher.
If you want to tackle the proofing passes, I'm happy to add you to the repo :)
<https://woodsfae.tumblr.com/post/764918993659330560/gibbet-h...>
That's like bringing a knife to a gun fight my friend, never underestimate the power of a committed Tumblr user
https://github.com/mmastrac/gibbet-hill/blob/main/col-1-a.pn...
Tip: Load the pngs into Preview, hit "Auto Levels," and crank up "Sharpness" on each one. Looks pretty good!
I realise I've used vague terms in that sentence, even setting aside the tricky question of what makes the things often described as great works "greater" than things that are looked down on, but might be much more popular.
I once read a great foreword to a novel lamenting the loss of "good bad books", citing Dracula as an example. It was by a famous author (as I remember), but I can't remember, and can't find, the foreword or the novel I'm thinking of.
By the way, for anyone who is thinking of reading Dracula's Guest, it is likely it was intended as a first chapter of Dracula, but was cut.
My wife has read most of his stuff. I know because I buy it for her. She says aside from Dracula, most of it is not great.
It's one of those firsts that established a genre.
I know Stoker didn't invent vampires, but they came into western English speaking culture through his Dracula.
Thematically it is of course a dogs breakfast of repressed sexuality, homosexuality, etc. All of which were taboo topics in the Victorian age. Which is precisely why the story works so well. And even today it still works. If you can get over the Victorian era biases, it's a surprisingly fresh and modern story. Which is why modern takes on the story are still interesting.
And of course, these topics are still playing a role. Just look at the current election round in the US where things like abortion and gay rights are still being challenged. And it's not just the US where these topics are used by populist politicians to gain votes.
There's also a sub-theme of the too secular modern men who don't believe in superstition (Jonathan Harker doesn't believe in vampires in the beginning) needing to get in better touch with Christianity to defeat Dracula- and features a rejection of secular psychiatry to defeat what turns out to not be "mental illness" way before The Exorcist did it.
Sadly I had really bought into the vampire chic trend when Coppola's Dracula came out in the early 90s. I had my dentist create some fangs for me to wear. More than one woman formally requested me to bite them on the neck. I dressed for goth clubs, more or less like an Anne Rice vampire (another thoroughly gay mythos).
It wasn't until Stephenie Meyer claimed vampires for the Latter-Day Saints movement that those Twilight sparkling dudes could be considered thoroughly hetero.
Go full screen and go to page 2 it starts at about the middle.
Serendipitous.
It's interesting how much you can find just by reading old newspapers and magazines. Everybody just reads Wikipedia now, even journalists, so it's become basically the sole source of truth. If it's not in there, it's not on record. But if you scratch the surface just a little bit, you find tidbits that are only mentioned on places like ancient newsgroup threads, or sometimes not mentioned at all on the public web. Libraries, man!
I also wondered if he was tipped off.
And why bother with trying to deceive when one could build reputation for creating truly good fan fiction based off real source material.
Just because tech can be used to abuse trust doesn't mean it will be the most interesting and commonly recognized thing to do with it.
"3 million lost works of Shakespeare found"
These aren’t things historians have had hundreds of years to document, but several thousand or more people have been on this space long before I was looking at it more intently than I could ever and I still come across things from time to time that weren’t known to exist.
Likewise, in the past month I’ve spent an unfortunate amount of time reading laws and board bylaws and it doesn’t take long to find long forgotten rules that are being actively violated. Even outside of code, documentation is hard.
> And then there's also a kind of notion that everything is there online when in point of fact lots of information about the past still only exists in archives
<https://conversationswithtyler.com/episodes/alan-taylor/>
* of the audio version, that is; at that timestamp in the YouTube video, they're discussing the question "How will large language models change historical research"—interviewee's response: he doesn't know
There are many details to the craft that are hinted at in variety of formats, (youtube videos, blog entries, etc) but the clear truths are not clearly stated anywhere. These are stored in the minds and practices of artisans.
Without some specific clues, a real historian would not be looking for Bram Stoker stories in an 1890 issue of the Daily Express Dublin Edition. He'd be skimming through the archives of many of the newspapers & magazines published in an era and geographic region, cataloging authors & stories & poems. "Success" would be just compiling a well-done catalog. 15 minutes of fame in the popular press could equally well result from finding some unknown early work by James Joyce, or Winston Churchill, or George Bernard Shaw, or Oscar Wilde, or Yeats, or ...
OP got incredibly lucky though that the author's name was included in the original publication - things like this (i.e. contributions to newspapers or magazines) were often published under obscure pseudonyms, initials, puzzling hints like "By the author of Such-and-such" or no author indication at all.
If UK copyright applied, then the story would have entered public domain in 1932. The term of copyright for published works at the time as 7 years after the authors death, or 42 years, whichever was longer.
Is that one API call or some out of control process slinging 100s of requests?
Must have been a ton of data, as their most expensive model (Opus) seems to $15 per million input tokens. I guess if you just set it to use an entire project as the input, you'll hit 1m input tokens quickly.
I wasn’t working in an enormous file, but I meant to highlight a block and accidentally highlighted the entire file and asked it to do something that made no sense in that context. It did its best to do something with the situation and eventually ran out of steam, haha. It’s possible that multiple requests needed to be made, or I was around the 200k context window.
Previous to this I’m fairly sure most of my requests cost fractions of pennies. My credit takes ages to decrease by any meaningful amount. Except until last night. It’s normally an extremely cost-effective tool for me.
IA should be an archivist organization first and foremost and abandon the idea of making books, movies, and music publicly available. That's just painting a target on their back and risking their goal of preserving a snapshot of our time.
The wayback machine is great, though, and they should keep doing that.
Imagine if we had found Bram Stokers work, and it was also encrypted mumbo jumbo that is now useless to us. We'll likely never know what we lost.
I mentally replace "an amateur" with "a talented and passionate"
For me, amateur just doesn't mean the insult that it meant when I was a youngster.
For example, you can be a professional, but do things "pro bono" (for free or for public good) or "pro lucro" (for money).
Another word of classical origin with a striking difference is the meaning of the word "pathetisch" in German, which means "(exaggeratedly) passionate", which corresponds more or less to the meaning of the Ancient Greek word "pathetikos".
It doesn't mean anything in Latin. It means "lover" in French. Possibly a version of French from before "love" changed to "aimer".
"Amateurish" or "amateurishly" feel damning and assertions about a certain absence of quality or attention to detail.
Describing someone as a "total amateur" feels a bit like calling them a hack.
This needs a separate word or concept.
Amateur historian could never be an insult, because it's actually better to have a real career in something substantial, and do the history stuff on the side as a hobby.