An amateur historian has discovered a long-lost short story by Bram Stoker

258
94
lermontov
15 hours ago
bbc.com

mmastrac
·
13 hours ago
·
[ - ]

I started a quick transcription here -- not enough time to complete more than half the first column, but some scans and very rough OCR are here if anyone is interested in contributing:

https://github.com/mmastrac/gibbet-hill

Top and bottom halves of the page in the repo here:

https://github.com/mmastrac/gibbet-hill/blob/main/scan-1.png https://github.com/mmastrac/gibbet-hill/blob/main/scan-2.png

EDIT: If you have access to a multi-modal LLM, the rough transcription + the column scan and the instruction to "OCR this text, keep linebreaks" gives a _very good_ result.

EDIT 2: Rough draft, needs some proofreading and corrections:

https://github.com/mmastrac/gibbet-hill/blob/main/story.md

quuxplusone
·
12 hours ago
·
[ - ]

Seems like you don't need an LLM, you just need a human who (1) likes reading Stoker and (2) touch-types. :) I'd volunteer, if I didn't think I'd be duplicating effort at this point.

(I've transcribed various things over the years, including Sonia Greene's Alcestis [1] and Holtzman & Kershenblatt's "Castlequest" source code [2], so I know it doesn't take much except quick fingers and sufficient motivation. :))

[1] https://quuxplusone.github.io/blog/2022/10/22/alcestis/

[2] https://quuxplusone.github.io/blog/2021/03/09/castlequest/

EDIT: ...and as I was writing that, you seem to have finished your transcription. :)

eru
·
39 minutes ago
·
[ - ]

> Seems like you don't need an LLM, you just need a human who (1) likes reading Stoker and (2) touch-types.

LLMs are increasingly becoming cheaper and more accessible than humans with a baseline of literacy.

mmastrac
·
12 hours ago
·
[ - ]

I finished a very rough, tesseract + LLM transcription, but it absolutely needs editing passes.

I've done transcription in the past myself (did two books for standard ebooks with some from-scratch transcription and lots of editing) and I know the pain. I've always found it easier to fix up OCR than type the whole thing by hand because I've found my error rate of eyeball transcription to be higher.

If you want to tackle the proofing passes, I'm happy to add you to the repo :)

wahnfrieden
·
10 hours ago
·
[ - ]

Use LiveText API. Much much better accuracy than Tesseract. You can rent access to it.

cxr
·
10 hours ago
·
[ - ]

Too late. You have already been scooped by, of course, tumblr:

<https://woodsfae.tumblr.com/post/764918993659330560/gibbet-h...>

oliyoung
·
9 hours ago
·
[ - ]

A battle of a Tumblr user named Woodsfae versus advanced LLM transcribing new goth literature?

That's like bringing a knife to a gun fight my friend, never underestimate the power of a committed Tumblr user

drivers99
·
8 hours ago
·
[ - ]

In the scan, where it says "and shortly came to the edge of the Punchbowl and easted my eyes on its beauty" OP changed "easted" to "cast" and the tumbler one says "easted[sic]" ([sic] is theirs). I wonder if it's supposed to be "feasted".

·
12 hours ago
·
[ - ]

simonw
·
13 hours ago
·
[ - ]

I tried extracting the content using Google Gemini 1.5 Pro 002 using https://aistudio.google.com/ - the first page (scan-2) worked fantastically well, the second page not so much. Here's what I got so far: https://gist.github.com/simonw/ba87f507ef5c11d3335959c055533...

mmastrac
·
13 hours ago
·
[ - ]

I cropped the columns out into six files -- it might have an easier time with these:

https://github.com/mmastrac/gibbet-hill/blob/main/col-1-a.pn...

reaperducer
·
12 hours ago
·
[ - ]

…and my wife's Halloween present has been printed.

Tip: Load the pngs into Preview, hit "Auto Levels," and crank up "Sharpness" on each one. Looks pretty good!

qmr
·
12 hours ago
·
[ - ]

[dead]

1317
·
11 hours ago
·
[ - ]

probably you would want to get the project gutenberg people onto it

mNovak
·
8 hours ago
·
[ - ]

I went ahead and made a post over at the PG proofreaders site (pgdp.net) to make them aware.

staticman2
·
12 hours ago
·
[ - ]

I remember reading somewhere- I think it was in an annotated addition of Dracula, or maybe it was a journal article- that said that Bram Stoker wrote a large number of novels but everything he wrote other than Dracula was awful. Per Wikipedia he wrote 14 books, supposedly he was only able to write one good one.

red369
·
7 hours ago
·
[ - ]

It seems that often even Dracula is viewed as a "good bad book". Not high quality literature, but great to read.

I realise I've used vague terms in that sentence, even setting aside the tricky question of what makes the things often described as great works "greater" than things that are looked down on, but might be much more popular.

I once read a great foreword to a novel lamenting the loss of "good bad books", citing Dracula as an example. It was by a famous author (as I remember), but I can't remember, and can't find, the foreword or the novel I'm thinking of.

nu11ptr
·
10 hours ago
·
[ - ]

Not a novel, but the short story "Dracula's Guest" I thought was quite good. I was sad it was so short.

red369
·
7 hours ago
·
[ - ]

In addition to the Dracula's Guest short story, I actually liked quite a few of the other stories in the book Dracula's Guest.

By the way, for anyone who is thinking of reading Dracula's Guest, it is likely it was intended as a first chapter of Dracula, but was cut.

reaperducer
·
12 hours ago
·
[ - ]

I suspect you're getting downvoted by people who haven't actually read anything by Stoker.

My wife has read most of his stuff. I know because I buy it for her. She says aside from Dracula, most of it is not great.

timeinput
·
11 hours ago
·
[ - ]

For me it feels like Stokers dracula is only so popular because it's where all the tropes come from, not because it's particularly well written, or something like that.

It's one of those firsts that established a genre.

I know Stoker didn't invent vampires, but they came into western English speaking culture through his Dracula.

jillesvangurp
·
39 minutes ago
·
[ - ]

I think that's classic chicken and egg. I've read it a few times over the years. It's a good story and fairly well written. And it obviously inspired movies and countless other works early on (as early as the 1920s). I don't think it has really been surpassed by other books or authors. Though there certainly have been some good ones. It both defines and leads the genre. Despite generations of literary critics trying to deconstruct and dismiss it, nobody has really done a better vampire story.

Thematically it is of course a dogs breakfast of repressed sexuality, homosexuality, etc. All of which were taboo topics in the Victorian age. Which is precisely why the story works so well. And even today it still works. If you can get over the Victorian era biases, it's a surprisingly fresh and modern story. Which is why modern takes on the story are still interesting.

And of course, these topics are still playing a role. Just look at the current election round in the US where things like abortion and gay rights are still being challenged. And it's not just the US where these topics are used by populist politicians to gain votes.

nu11ptr
·
10 hours ago
·
[ - ]

I am not a literary critic, but I very much enjoyed Dracula. When I read it, I did not know there were claims he wasn't a good writer, so I had no bias, I simply liked it quite a bit.

·
7 hours ago
·
[ - ]

staticman2
·
6 hours ago
·
[ - ]

I don't know, Stroker does some interesting stuff in Dracula, essentially it's this Victorian hysterical story about extramarital or premarital sex "ruining" women (in this cases, essentially turning them into undead monsters as an extended metaphor for a woman's reputation being ruined in victorian society) as the cuckolded Victorian gentleman look in horror until they figure out the source of the trouble-- a no good foreigner.

There's also a sub-theme of the too secular modern men who don't believe in superstition (Jonathan Harker doesn't believe in vampires in the beginning) needing to get in better touch with Christianity to defeat Dracula- and features a rejection of secular psychiatry to defeat what turns out to not be "mental illness" way before The Exorcist did it.

AStonesThrow
·
4 hours ago
·
[ - ]

It took me nearly 50 years before I learned how thoroughly gay Dracula really was. It was, of course, replete with coded references, and couldn't overtly depict homosexuality, so it was with Oscar Wilde, such as The Importance of Being Earnest.

Sadly I had really bought into the vampire chic trend when Coppola's Dracula came out in the early 90s. I had my dentist create some fangs for me to wear. More than one woman formally requested me to bite them on the neck. I dressed for goth clubs, more or less like an Anne Rice vampire (another thoroughly gay mythos).

It wasn't until Stephenie Meyer claimed vampires for the Latter-Day Saints movement that those Twilight sparkling dudes could be considered thoroughly hetero.

fauria
·
13 hours ago
·
[ - ]

Brian Cleary will be discussing his findings next Saturday in Dublin, as part of the Bram Stoker Festival: https://bramstokerfestival.com/en/events/an-extraordinary-br...

politelemon
·
14 hours ago
·
[ - ]

You can read it here: https://catalogue.nli.ie/Record/vtls000924296

Go full screen and go to page 2 it starts at about the middle.

sleepytimetea
·
8 hours ago
·
[ - ]

The discovery happened because the amateur historian suffered a sudden loss of hearing and took leave from his job to go browse the archives in Dublin. A special Christmas supplement to the regular newspaper from 1890 and he decided to just browse it for fun ?

Serendipitous.

karaterobot
·
7 hours ago
·
[ - ]

Well, he lived in Dublin, where Stoker lived, and I'd bet the library he visited had a special Stoker collection that might have attracted a fan. There's also the fact that Stoker's mother Charlotte helped open state schools for deaf children; so there may have been some connection there, too. But yeah, on such strange coincidences many discoveries rest.

It's interesting how much you can find just by reading old newspapers and magazines. Everybody just reads Wikipedia now, even journalists, so it's become basically the sole source of truth. If it's not in there, it's not on record. But if you scratch the surface just a little bit, you find tidbits that are only mentioned on places like ancient newsgroup threads, or sometimes not mentioned at all on the public web. Libraries, man!

bredren
·
7 hours ago
·
[ - ]

The funds from publishing also going to an org focused on those who suffer hearing loss—-which is for Stoker’s mother who was a hearing loss campaigner.

I also wondered if he was tipped off.

ndileas
·
13 hours ago
·
[ - ]

I don't mean to disparage this particular instance at all, as it seems pretty great. But I wonder if the rise of llms is going to make scams that sounds a lot like this much easier in the future. I think at the moment it's hard to make something really sound like a particular author without a lot of work, but that will probably change in the future.

3D30497420
·
36 minutes ago
·
[ - ]

I imagine it will be a lot like other pieces of art where the provenance is really critical. If it cannot be traced properly back to the originator, then it will always be viewed as dubious. With that said, as a person largely ignorant of the field, I'd wager this is probably true now, irrespective the rise of LLMs.

bredren
·
12 hours ago
·
[ - ]

Sure, people can do scams but it will be way more interesting to apply them to finding stuff like this. Up through now, literary treasures and open secrets are sitting out waiting to be recognized.

And why bother with trying to deceive when one could build reputation for creating truly good fan fiction based off real source material.

Just because tech can be used to abuse trust doesn't mean it will be the most interesting and commonly recognized thing to do with it.

booleandilemma
·
9 hours ago
·
[ - ]

I can see it now:

"3 million lost works of Shakespeare found"

intalentive
·
14 hours ago
·
[ - ]

"Honey, come look! I've found some information all the world's top historians missed."

jonhohle
·
13 hours ago
·
[ - ]

I’ve found that it’s not uncommon for an interested individual to find details that have not been documented or “found” by others. I collect video games and have found variants of popular games that have been otherwise undocumented on any list or archive that I was aware of. I’ve found audio recordings from the 90s that seemingly have no recorded history on the internet.

These aren’t things historians have had hundreds of years to document, but several thousand or more people have been on this space long before I was looking at it more intently than I could ever and I still come across things from time to time that weren’t known to exist.

Likewise, in the past month I’ve spent an unfortunate amount of time reading laws and board bylaws and it doesn’t take long to find long forgotten rules that are being actively violated. Even outside of code, documentation is hard.

cxr
·
12 hours ago
·
[ - ]

Tyler Cowen recently interviewed a historian (Alan Taylor), and they approached this subject near the end of the episode—how much the job of a historian still involves browsing undigitized material sitting on a shelf in a cold room somewhere. Around 3215 seconds* in:

> And then there's also a kind of notion that everything is there online when in point of fact lots of information about the past still only exists in archives

<https://conversationswithtyler.com/episodes/alan-taylor/>

* of the audio version, that is; at that timestamp in the YouTube video, they're discussing the question "How will large language models change historical research"—interviewee's response: he doesn't know

bredren
·
12 hours ago
·
[ - ]

This happens often when going down the rabbit hole on a niche project. For example, repair and restoration of Persian rugs.

There are many details to the craft that are hinted at in variety of formats, (youtube videos, blog entries, etc) but the clear truths are not clearly stated anywhere. These are stored in the minds and practices of artisans.

bell-cot
·
13 hours ago
·
[ - ]

"missed" might be taken to imply that one or more of them had ever bothered to look.

SketchySeaBeast
·
12 hours ago
·
[ - ]

Well, even if people were looking, this sort of thing is a lot of right place and right time.

bell-cot
·
11 hours ago
·
[ - ]

Try skimming the Wikipedia articles on some major authors of that era, to get a sense for how much short (or serialized) fiction & poetry was routinely published in newspapers and magazines back then.

Without some specific clues, a real historian would not be looking for Bram Stoker stories in an 1890 issue of the Daily Express Dublin Edition. He'd be skimming through the archives of many of the newspapers & magazines published in an era and geographic region, cataloging authors & stories & poems. "Success" would be just compiling a well-done catalog. 15 minutes of fame in the popular press could equally well result from finding some unknown early work by James Joyce, or Winston Churchill, or George Bernard Shaw, or Oscar Wilde, or Yeats, or ...

sam345
·
7 hours ago
·
[ - ]

Stoker's Dracula is one of my all-time favorite books. I may now just read it again. My only regret in reading the last time was reading the modern take in the forward.

Exoristos
·
5 hours ago
·
[ - ]

Modern liberal-arts academics is a process of taking what you love and learning how to hate it.

nu11ptr
·
14 hours ago
·
[ - ]

How would copyright law apply here? Would this fall into the public domain immediately? I read that Irish law is that it would be "70 years from date first made available to the public". Since published in a newspaper, I would assume this would be public domain now. Correct?

zozbot234
·
12 hours ago
·
[ - ]

If this was an unpublished manuscript, rights of first publication would apply and it might be covered by a kind of copyright that would vary depending on the country. Since this was "rediscovered" after first being unambiguously published back in the 1890s, it's pretty clearly in the public domain.

OP got incredibly lucky though that the author's name was included in the original publication - things like this (i.e. contributions to newspapers or magazines) were often published under obscure pseudonyms, initials, puzzling hints like "By the author of Such-and-such" or no author indication at all.

papercrane
·
10 hours ago
·
[ - ]

I _think_ UK copyright law would matter here, since at the time the story was published (1890) the Ireland was part of the UK (Ireland gained independence in 1921.)

If UK copyright applied, then the story would have entered public domain in 1932. The term of copyright for published works at the time as 7 years after the authors death, or 42 years, whichever was longer.

·
13 hours ago
·
[ - ]

cortesoft
·
13 hours ago
·
[ - ]

Yes, it's public domain

Rebelgecko
·
10 hours ago
·
[ - ]

Funnily enough there was a reddit post from around the time the manuscript was discovered (but before it was announced) asking a similar question

nuz
·
14 hours ago
·
[ - ]

Seems like a non pessimistic idea of something LLMs could help us out with. Mass analysis of old texts for new finds like this. If this one exists surely there are many more just a mass analysis away

steve_adams_86
·
14 hours ago
·
[ - ]

I accidentally got Zed to parse way more code than I intended last night and it cost close to $2 on the anthropic API. All I can think is how incredibly expensive it would be to feed an LLM text in hopes of making those connections. I don’t think you’re wrong, though. This is the territory where their ability to find patterns can feel pretty magical. It would cost many, many, many $2 though

diggan
·
14 hours ago
·
[ - ]

> I accidentally got Zed to parse way more code than I intended last night and it cost close to $2 on the anthropic API

Is that one API call or some out of control process slinging 100s of requests?

Must have been a ton of data, as their most expensive model (Opus) seems to $15 per million input tokens. I guess if you just set it to use an entire project as the input, you'll hit 1m input tokens quickly.

steve_adams_86
·
13 hours ago
·
[ - ]

Come to think of it, I’m not sure how Zed performs LLM requests with the inline assistant.

I wasn’t working in an enormous file, but I meant to highlight a block and accidentally highlighted the entire file and asked it to do something that made no sense in that context. It did its best to do something with the situation and eventually ran out of steam, haha. It’s possible that multiple requests needed to be made, or I was around the 200k context window.

Previous to this I’m fairly sure most of my requests cost fractions of pennies. My credit takes ages to decrease by any meaningful amount. Except until last night. It’s normally an extremely cost-effective tool for me.

pcthrowaway
·
14 hours ago
·
[ - ]

This is a pretty good case for just using a local model. Even if it's 50% worse than Anthropic or whatever the gap is now between open models and proprietary state of the art, it's still likely 'good enough' to categorize a story in an old newspaper as missing from an author's known bibliography.

steve_adams_86
·
12 hours ago
·
[ - ]

Good point. I use llama3.1 for a lot of small tasks and rarely feel like I need to use Claude instead. It’s fine. I’m even running the model a (big) step down from 70b, because I’ve only got 32GB of ram. It’s a solid model that probably costs me next to nothing to run.

hyperbrainer
·
14 hours ago
·
[ - ]

The problem with copyright is going to be a big hurdle though.

diggan
·
14 hours ago
·
[ - ]

Why? Old texts would be out of copyright, and even if they weren't, as long as you're not publishing the source material or anything containing the source material (or anything that can verbatim output the source), it seems you'd be in the clear.

hyperbrainer
·
12 hours ago
·
[ - ]

You are right! I forgot about this completely.

ebiester
·
14 hours ago
·
[ - ]

If we go to the era of public domain, there is no worry about copyright.

Mistletoe
·
14 hours ago
·
[ - ]

I’m concerned things like this will just be gone forever in the digital era. Paper and film are great storage mediums. I know this was on a screen but would it have still existed if it wasn’t on paper first?

stavros
·
14 hours ago
·
[ - ]

Hard disks are great storage mediums when we don't purposely set fire to them to preserve the profits of large corporations. The Internet Archive is perfectly capable of preserving things, unless copyright holders manage to shut them down for short-term profit.

echelon
·
13 hours ago
·
[ - ]

IA shouldn't try to wage war against copyright. They should leave that to other entities.

IA should be an archivist organization first and foremost and abandon the idea of making books, movies, and music publicly available. That's just painting a target on their back and risking their goal of preserving a snapshot of our time.

The wayback machine is great, though, and they should keep doing that.

·
4 hours ago
·
[ - ]

bongodongobob
·
13 hours ago
·
[ - ]

What are you referring to here? Hopefully not the secure destruction of hard disks.

stavros
·
12 hours ago
·
[ - ]

The law's preference for 120 years of copyright instead of the preservation of culture. IA should be state-funded.

bongodongobob
·
9 hours ago
·
[ - ]

How does copyright relate to burning hard drives?

freedomben
·
14 hours ago
·
[ - ]

Agreed, and I think it's important to note that paper doesn't have any sort of DRM encumbrance on it. I seriously think that at some point in the next few decades, the "pirates" who right now are hated and prosecuted vigorously by all the "rightsholders" may turn out to be venerable heroes for having preserved the creations.

Imagine if we had found Bram Stokers work, and it was also encrypted mumbo jumbo that is now useless to us. We'll likely never know what we lost.

mock-possum
·
14 hours ago
·
[ - ]

All this, and yet no link to read it?

alanbernstein
·
14 hours ago
·
[ - ]

It's 134 years old but hasn't been published as a book yet, so surely it requires 100 years of copyright protection starting today!

gwbas1c
·
14 hours ago
·
[ - ]

https://news.ycombinator.com/item?id=41905844

unit149
·
11 hours ago
·
[ - ]

Used to be that writers were paid by the word and novels were serialized.

busyant
·
14 hours ago
·
[ - ]

It's funny (ironic?), but when I read "an amateur {insert occupation} has"

I mentally replace "an amateur" with "a talented and passionate"

For me, amateur just doesn't mean the insult that it meant when I was a youngster.

rahimnathwani
·
14 hours ago
·
[ - ]

The word 'amateur' originates from the Latin word for 'lover'.

zanellato19
·
14 hours ago
·
[ - ]

Thank you! I've been using this word in portuguese (amador) and its so _so_ clear in that language, even so, I hadn't realized. Amar -> Amador (the one who loves it). Quite clearly.

bombcar
·
13 hours ago
·
[ - ]

Exactly, and "professional" means they do it for money.

otherme123
·
13 hours ago
·
[ - ]

The point is that "amateur" means literally "lover" in latin. While "professional" means "for money" today, in latin it meant "to profess a vow to do it with high standards".

For example, you can be a professional, but do things "pro bono" (for free or for public good) or "pro lucro" (for money).

retrac
·
13 hours ago
·
[ - ]

"Vocation" has undergone a similar shift; originally it meant a calling, or a summons.

RandomThoughts3
·
13 hours ago
·
[ - ]

It still does.

thrwaway1337
·
13 hours ago
·
[ - ]

Just don't go looking for the etymology of "vanilla"

Archelaos
·
12 hours ago
·
[ - ]

"Doing something was a high standard" is still the main meaning of the word "professionell" in German. So someone can make something "unprofessionell" for money or "professionell" without payment.

Another word of classical origin with a striking difference is the meaning of the word "pathetisch" in German, which means "(exaggeratedly) passionate", which corresponds more or less to the meaning of the Ancient Greek word "pathetikos".

thaumasiotes
·
4 hours ago
·
[ - ]

> The point is that "amateur" means literally "lover" in latin.

It doesn't mean anything in Latin. It means "lover" in French. Possibly a version of French from before "love" changed to "aimer".

echelon
·
13 hours ago
·
[ - ]

But amateur has taken on a negative connotation in the common vernacular.

"Amateurish" or "amateurishly" feel damning and assertions about a certain absence of quality or attention to detail.

Describing someone as a "total amateur" feels a bit like calling them a hack.

This needs a separate word or concept.

RandomThoughts3
·
13 hours ago
·
[ - ]

Dilletante already exists to mean someone who doesn’t do something with seriousness and amateur doesn’t carry the same connotation as amateurish anyway so you don’t really need a new word.

idiotlogical
·
13 hours ago
·
[ - ]

The term 'nerd' needs to complete its rehabilitation like 'geek' has the last 20 years. It's the most concise term I can think of when describing someone who is enthusiastic, focused, and knowledgable on a subject. I think it's a badge of honor

PsylentKnight
·
13 hours ago
·
[ - ]

There's "aficionado", though that feels a little pretentious

adamc
·
13 hours ago
·
[ - ]

We could try reclaiming the word.

cortesoft
·
14 hours ago
·
[ - ]

I have never thought of it as an insult, just meaning they don't do it for money.

qingcharles
·
13 hours ago
·
[ - ]

For me, amateur generally just translates as "not paid for his services."

kazinator
·
13 hours ago
·
[ - ]

Yeah but it's often intended as an insult. Especially as the adjective amateurish, or phrases like the work of an amateur.

Amateur historian could never be an insult, because it's actually better to have a real career in something substantial, and do the history stuff on the side as a hobby.

·
12 hours ago
·
[ - ]

boilerupnc
·
14 hours ago
·
[ - ]

Evolving Wikipedia Entry on the Story "Gibbet Hill" [0]. Plot Summary described on the page.

[0] https://en.wikipedia.org/wiki/Gibbet_Hill_(short_story)

·
14 hours ago
·
[ - ]

javajosh
·
14 hours ago
·
[ - ]

Does the name "Bram Stoker" not carry any weight?

adrianmonk
·
14 hours ago
·
[ - ]

Yes, but "Dracula author" carries more, and headlines aim to reach as many people as possible.

WCSTombs
·
14 hours ago
·
[ - ]

For some reason his name is in the page's <head> but not in the article's title.

·
14 hours ago
·
[ - ]

chachacharge
·
14 hours ago
·
[ - ]

Pro search tip- its Stoker not Stroker

slothtrop
·
13 hours ago
·
[ - ]

porn parody potential there

dang
·
14 hours ago
·
[ - ]

It does here!

slothtrop
·
14 hours ago
·
[ - ]

Insofar as he's associated with "that Dracula story and movie", yes.

hshshshshsh
·
11 hours ago
·
[ - ]

I don't know why people get obsessed over things like this. Finding significance in something because it's written by an entity whose name is popular makes no sense.