Internet Archive breached again through stolen access tokens

sirolimus
·
1 day ago
·
[ - ]

It’s incredibly sad to see threat actors attack something as altruistic as an internet library. Truly demoralizing to see such degeneracy.

userbinator
·
1 day ago
·
[ - ]

When there are plenty of people who are steeped in the dogma of Imaginary Property, and whose lives depend on it, it's not too surprising.

boplicity
·
1 day ago
·
[ - ]

FYI: "Money" is imaginary property. Not sure you want to call people supporting "imaginary property" dogmatic. It's what our society is built on.

userbinator
·
1 day ago
·
[ - ]

Money is not imaginary. You can touch and interact with it.

gruez
·
1 day ago
·
[ - ]

That's like saying movies aren't imaginary either because there's blu-rays. Even if we take that point at face value though, the vast majority of money is imaginary, only existing on ledgers. When the fed "prints money", it's just adjusting an entry on a database somewhere.

billy99k
·
16 hours ago
·
[ - ]

It's paper and ink that we've given a perceived value to.

SideQuark
·
15 hours ago
·
[ - ]

The vast majority of money is not physical, by likely 2 orders of magnitude, so you cannot touch it. It's a value in a ledger that can be moved electronically (or copied by hand).

https://en.wikipedia.org/wiki/Money_supply

Thorrez
·
18 hours ago
·
[ - ]

Was the hacker motivated by IP? The article appears to say no.

ddq
·
15 hours ago
·
[ - ]

I/P not IP, as in Israel/Palestine.

sim7c00
·
1 day ago
·
[ - ]

anything with tons of traffic going to it is a target. it has nothing to do with what the entity does, more with what potential reach it has. criminal behaviour is what it is. people pulling loads of visitors need to properly secure their shit, to prevent their their customers becoming their victims.

croes
·
1 day ago
·
[ - ]

Seems like the actor did it only for the street credit and the second breach is only a reminder that IA didn’t properly fixed it after the first breach.

Could be worse.

A4ET8a8uTh0
·
1 day ago
·
[ - ]

Not defending attacker, because I see IA as common good. That said one of the messages from this particular instance reads almost as if they were trying to help by pointing out issues that IA clearly missed:

"Whether you were trying to ask a general question, or requesting the removal of your site from the Wayback Machine your data is now in the hands of some random guy. If not me, it'd be someone else."

I am starting to wonder if the chorus of 'maybe one org should not be responsible for all this; it is genuinely too important' has a point.

jeffwask
·
16 hours ago
·
[ - ]

> I am starting to wonder if the chorus of 'maybe one org should not be responsible for all this; it is genuinely too important' has a point.

I agree this probably needs to be run more professionally but I think the "chorus" is missing the key fact that no one has stepped up to pay for or build an alternate and driving this one to insolvency just leaves us poorer.

A4ET8a8uTh0
·
15 hours ago
·
[ - ]

I think this is why I am kinda debating what personally I can do about it ( in a reasonably efficient way ). And I admit I am on the fence. I try to donate funds to some worthy causes ( like EFF ) every so often and IA might be getting some offers of help now that is in the spotlight ( and can I reasonably compare to someone steeped in the subject? likely no ).

I do advocate for some variant of digital prepping in my social circle, but the response has been similar to my talk about privacy. The ones that do care, have already taken steps or are in the process of some sort of solution for their use-case. Those people do not need convincing or even much help.. they mostly simply know what they want and/or need.

As for a more systemic solution.. I honestly don't know. HN itself seems pretty divided and I can absolutely understand why.

All that said, I think I agree with you. There is no real alternative now so IA needs our support at the very least until we can figure out how to preserve what was already gathered. I said the following on this forum before. Wikipedia is not hurting for money, but IA, being in legal crosshairs and whatnot, likely will.

luckylion
·
1 day ago
·
[ - ]

A different framing is: be grateful that it's these types of people breaching IA and being vocal about it & asking IA to fix their systems. Others might just nuke them, or subtly alter content, or do whatever else bad thing you can think of.

They're providing a public service by pointing out that a massive organization controlling a lot of PII doesn't care about security at all.

codezero
·
1 day ago
·
[ - ]

There are many state actors that attack targets of opportunity just to cause chaos and asymmetric financial costs.

xyst
·
1 day ago
·
[ - ]

Blame bad leadership.

callc
·
1 day ago
·
[ - ]

Is there a reason to blame the victim, rather than the attackers?

I’m asking seriously - did IA do shitty things that make them a worthy cause for politically/ideologically motivated hacking?

lolinder
·
1 day ago
·
[ - ]

I imagine they're referring to the fact that the leadership showed extremely bad judgement in deciding to pick a battle with the major publishing companies that everyone knew they would lose before it even began [0].

I don't think that justifies blaming the victim here, and from what I can see the attacker doesn't seem to be motivated by anything other than funsies, but I absolutely lost a lot of faith in their leadership when they pulled the NEL nonsense. The IA is too valuable for them to act like a young activist org—there's too much for us to lose at this point. They need to hold the ground they've won and leave the activism to other organizations.

[0] https://www.wired.com/story/internet-archive-loses-hachette-...

jampekka
·
1 day ago
·
[ - ]

> there's too much for us to lose at this point

Feeling entitled?

lolinder
·
1 day ago
·
[ - ]

"Us" means all of humankind for hopefully many generations to come. It's not about my personal entitlement, it's that the IA serves a vital role for humanity (one which they fought hard to make permissible).

jampekka
·
22 hours ago
·
[ - ]

If it's so important to us, perhaps we should support it then?

The discussion around IA nowadays seems a lot like random users ranting at open source maintainers in Github issues.

mrguyorama
·
13 hours ago
·
[ - ]

IA is sooooo important we can't even undo the clusterfuck that is the disney-bought copyright system.

The black woman on the bus refusing to give up her seat was also 100% legally obviously in the wrong. IA lost not because what they were doing was morally wrong, but because each and every one of us continually refuses to agitate for the kind of change that would benefit the world.

If you want the public to have a library, you must enshrine that library's right to exist and operate in law, or it will never survive legal challenges from IP holders. Physical libraries would never be allowed to exist in modern America, not without 100 years of precedence of the first sale doctrine. You can bet your ass disney would have tried to kill such a thing. Freely watch our movies? No chance.

IntelMiner
·
1 day ago
·
[ - ]

Only if you don't care about history

jampekka
·
22 hours ago
·
[ - ]

Books are not part of history?

kleiba
·
1 day ago
·
[ - ]

People with solid info sec knowledge: this is a good opportunity to offer your expertise pro-bono for a good cause!

kyleyeats
·
1 day ago
·
[ - ]

They're buried in these offers right now.

op00to
·
1 day ago
·
[ - ]

I wonder how many offers are legitimate.

TZubiri
·
1 day ago
·
[ - ]

An org amidst an attack might not be the most open to giving credentials and access to strangers.

·
1 day ago
·
[ - ]

knowitnone
·
1 day ago
·
[ - ]

[flagged]

TZubiri
·
1 day ago
·
[ - ]

To one (or two) attackers, it can always be worse.

·
1 day ago
·
[ - ]

xyst
·
1 day ago
·
[ - ]

[flagged]

·
1 day ago
·
[ - ]

·
1 day ago
·
[ - ]

5jh5j56
·
1 day ago
·
[ - ]

At this point they should consider a rewirte from scratch. I bet they are running a tech stack from 1992.

trompetenaccoun
·
1 day ago
·
[ - ]

We need archives built on decentralized storage. Don't get me wrong, I really like and support the work Internet Archive is doing, but preserving history is too important to entrust it solely to singular entities, which means singular points of failure.

jdiff
·
1 day ago
·
[ - ]

This seems to get brought at least once in the comments for every one of these articles that pops up.

The IA has tried distributing their stores, but nowhere near enough people actually put their storage where their mouths are.

jonny_eh
·
1 day ago
·
[ - ]

Nearly every entry in the library has a torrent file (which is a distributed storage system), but with the index pages down, they're not accessible.

bastawhiz
·
19 hours ago
·
[ - ]

I can hardly find a healthy torrent for an obscure feature film that I care about. How am I supposed to find a healthy torrent for a random web page from the aughts?

johnisgood
·
20 hours ago
·
[ - ]

If we want it to be distributed across laymen, we need something easier than opening torrent files (or inputting magnet URI) over a thousand times. Perhaps https://github.com/ipfs/in-web-browsers?

highwaylights
·
22 hours ago
·
[ - ]

You're correct, but even then you've still the problem of storage - the torrents are only useful (and there's a lot of them) if a sustainable number of seeds remain available.

Wheatman
·
18 hours ago
·
[ - ]

How abiut torrenting a collection of websites in one collection?

You can distribute less popular websites with more used ones to avoid losing it? And Torrents are good with transfering large files in my experience.

baby_souffle
·
17 hours ago
·
[ - ]

> You can distribute less popular websites with more used ones to avoid losing it?

So long as this distributed protocol has the concept of individual files, there _will_ be clients out there that allow the user to select `popular-site.archive.tar.gz` and not `less-popular.tar.gz` for download.

And what one person doesn't download... they can't seed back. Distributed stuff is really good for low cost, high scale distribution of in-demand content. It's _terrible_ for long term reliability/availability, though.

armada651
·
16 hours ago
·
[ - ]

That is fundamentally the problem, no one wants to donate storage to host stuff they're not interested in.

mrguyorama
·
13 hours ago
·
[ - ]

More concretely, nobody wants to donate anything. They just want it to exist. Charity has never been a functional solution to normal coordination problems. We have centuries of evidence of this.

giancarlostoro
·
18 hours ago
·
[ - ]

Maybe there needs to be a torrentable offline-first HTML file (only goes online to tell you if there's a new torrent whatsoever with more files), that lets you look through for more torrents (Magnet links are really tiny).

I miss when TPB used to have a CSV of all their magnet links, their new UI is trash. I can't even find anything like the old days, pretty much TPB is a dying old relic.

sumtechguy
·
19 hours ago
·
[ - ]

The problem with their torrents is they are usually broken. Lots of complaints on them being broken. But no one fixing it.

HappMacDonald
·
1 day ago
·
[ - ]

They're not using DHT?

jdiff
·
21 hours ago
·
[ - ]

They're not talking about peer discovery, they're talking about .torrent file discovery.

Capricorn2481
·
15 hours ago
·
[ - ]

It's not abnormal for their torrents to be missing most of their direct downloads on the same page

rolandog
·
16 hours ago
·
[ - ]

Perhaps a naïve question, but hasn't this problem been solved by the FreeNet Project (now HyphaNet) [0]? (the re-write — current FreeNet — was previously called Locutus, IIRC [1]).

Side note: As an outsider, and someone who hasn't tried either version of FreeNet in more than almost 2 decades, was this kind of a schism like the Python 2 vs. Python 3 kerfuffle? Is there more to it?

[0]: https://www.hyphanet.org/

[1]: https://freenet.org/

sanity
·
11 hours ago
·
[ - ]

Hi, Freenet's FAQ explains the renaming/rebranding here: [1]

Neither version of Freenet is designed for long-term archiving of large amounts of data so it probably isn't ideally suited to replacing archive.org, but we are planning to build decentralized alternatives to services like wikipedia on top of Freenet.

[1] https://freenet.org/faq/#why-was-freenet-rearchitected-and-r...

zelphirkalt
·
1 day ago
·
[ - ]

Perhaps one idea is to let people choose what they want to protect. This way people wanting to support it can have their mission.

card_zero
·
1 day ago
·
[ - ]

I want it to protect all sorts of random obscure documents, mostly kind of crappy, that I can't predict in advance, so I can pursue my hobby of answering random obscure questions. For instance:

* What is a "bird famine", and did one happen in 1880?

* Did any astrologer ever claim that the constellations "remember" the areas of the sky, and hence zodiac signs, that they belonged to in ancient times before precession shifted them around?

* Who first said "psychology is pulling habits out of rats", and in what context? (That one's on Wikiquote now, but only because I put it there after research on IA.)

Or consider the recently rediscovered Bram Stoker short story. That was found in an actual library, but only because the library kept copies of old Irish newspapers instead of lining cupboards with them.

The necessary documents to answer highly specific questions are very boring, and nobody has any reason to like them.

oxygen_crisis
·
1 day ago
·
[ - ]

You could let users choose what to mirror, and one of those choices could be a big bucket of all the least available stuff, for pure preservationists who don't want to focus on particular segments of the data.

Sort of like the bittorrent algorithm that favors retrieving and sharing the least-available chunks if you haven't assigned any priority to certain parts.

deafpolygon
·
1 day ago
·
[ - ]

My favorite question is: whether or not Bowser took the princess to another castle.

card_zero
·
1 day ago
·
[ - ]

Since the IA had a collection of emulators (some of them running online*), and old ROMs and floppies and such, it could probably help with that one too.

* Strictly speaking, running in-browser, but that sounded like "Bowser" so I wrote online instead.

dawnerd
·
1 day ago
·
[ - ]

You already can, they have torrents for everything.

diggan
·
1 day ago
·
[ - ]

> they have torrents for everything

Including the index itself? That would be awesome.

tourmalinetaco
·
1 day ago
·
[ - ]

Their torrents suck and IME don’t update to changes in the archive.

addandsubtract
·
1 day ago
·
[ - ]

Aren't torrents terrible at handling updates in general? If you want to make a change to the data, or even just add our remove data, you have to create a new torrent and somehow get people to update their torrent and data as well.

macawfish
·
1 day ago
·
[ - ]

There's a mutable torrent extension (BEP-46) but unfortunately I don't think it's widely supported. I think IPFS/IPNS is the more likely direction.

tourmalinetaco
·
1 day ago
·
[ - ]

Which IA has moved into and hasn’t found much luck in, unfortunately.

jpk
·
1 day ago
·
[ - ]

How come?

rbanffy
·
23 hours ago
·
[ - ]

Torrents are immutable in principle, which is good for preserving things. A new version of a set of files should be a new torrent.

diggan
·
20 hours ago
·
[ - ]

> Torrents are immutable in principle

In practice, that's mostly how they're being used.

But the protocol does support mutation. The BEP describing the behavior even has archive.org as an example...

> The intention is to allow publishers to serve content that might change over time in a more decentralized fashion. Consumers interested in the publisher's content only need to know their public key + optional salt. For instance, entities like Archive.org could publish their database dumps, and benefit from not having to maintain a central HTTP feed server to notify consumers about updates.

http://www.bittorrent.org/beps/bep_0046.html

account42
·
16 hours ago
·
[ - ]

Specs are nice but does any client actually implement this?

zelphirkalt
·
22 hours ago
·
[ - ]

How would preservationists go about automatically updating the torrent and data they seed? Or would they need to manually regularly check, if they are still seeding the up-to-date content?

vundercind
·
1 day ago
·
[ - ]

This is accurate, their torrent-generating system is basically broken to the point of being useless.

creer
·
1 day ago
·
[ - ]

And it's guaranteed not to happen if the efforts don't continue.

acdha
·
1 day ago
·
[ - ]

You could say the same thing about perpetual motion. Being realistic about why past efforts have failed is key to doing better in the future: for example, people won’t mirror content which could get them in trouble and most people want to feel some kind of benefit or thanks. People should be thinking about how to change dynamics like those rather than burning out volunteers trying more ideas which don’t change the underlying game.

creer
·
1 day ago
·
[ - ]

There are certainly research questions and cost questions and practicality and subsetting and whatnot. Addressed by some ideas and not by others.

What there isn't is a currently maintained and advertised client and plan. That I can find. Clunky or not, incomplete or not.

There are other systems that have a rough plan for duplication and local copy and backup. You can easily contribute to them, run them, or make local copies. But not IA. (I mean you can try and cook up your own duplication method. And you can use a personal solution to mirror locally everything you visit and such.) No duplication or backup client or plan. No sister mirrored institution that you might fund. Nothing.

WarOnPrivacy
·
1 day ago
·
[ - ]

> nowhere near enough people actually put their storage where their mouths are.

Typically because most people who have the upload, don't know that they can. And if they come to the notion on their own, they won't know how.

If they put the notion to a search engine, the keywords they come up with probably don't return the needed ELI5 page.

As in: How do I [?] for the Internet Archive?, most folks won't know what [?] needs to be.

TZubiri
·
1 day ago
·
[ - ]

This is literally torrents. Just give up

WarOnPrivacy
·
1 day ago
·
[ - ]

> This is literally torrents. Just give up

Most casual visitors to IA don't know that. Which is the point.

Giving up is for others.

briandear
·
1 day ago
·
[ - ]

The problem with torrents is they have a bad reputation since people use it to steal and redistribute other people’s content without their consent.

ycombinatrix
·
1 day ago
·
[ - ]

The problem with websites is they have a bad reputation since people use it to steal and redistribute other people’s content without their consent.

thwarted
·
1 day ago
·
[ - ]

The problem with file transfer is they have a bad reputation since people use it to [insert illegal or immoral activity here].

Then rename it from "torrent" to something else.

TZubiri
·
1 day ago
·
[ - ]

I'm not sure what the argumentative line is here. But file uploading and downloading needs to have accountability for hosting, which p2p obscures.

The bad reputation is inherent to the tech, not a random quirk.

komali2
·
1 day ago
·
[ - ]

It doesn't really, you can host a server off a raw IP.

Downloading from example.com is just peer to peer with someone big. There's lots of hosting providers and DNS providers that are happy to host illegal-in-some-places content.

TZubiri
·
17 hours ago
·
[ - ]

Incorrect.

The protocols for downloading from example.com are assymettrical client server architectures, not symmetrical decentralized peer to peer.

card_zero
·
1 day ago
·
[ - ]

Is there any form of torrent where you can do a full text search? That, to me, is the more important problem with torrents.

TZubiri
·
1 day ago
·
[ - ]

But internet archive doesn't do this? It's a key based search (url keys)

card_zero
·
1 day ago
·
[ - ]

Internet archive allows full text search of books, newspapers, etc.. Or anyway it did, before being breached.

TZubiri
·
13 hours ago
·
[ - ]

It does transcribe books (through imperfect OCR) so I guess that's possible. Never relied on it as I search by title and author.

But anyways not the case for the wayback product which is the unique core to IA.

card_zero
·
12 hours ago
·
[ - ]

That's not unique, not a product, and not the part I use most.

Well, OK, maybe other webpage archives don't work as well, I haven't tried them, but there are others. And they're newer, so don't have such extensive historical pages.

Large numbers of Wikipedia references (which relied on IA to prevent link rot) must be completely broken now.

tourmalinetaco
·
1 day ago
·
[ - ]

Torrents have a bad reputation due to malicious executables, I have never met someone who genuinely saw piracy as stealing, only as dangerous. In fact, stealing as a definition cannot cover digital piracy, as stealing is to take something away, and to take is to possess something physically. The correct term is copying, because you are duplicating files. And that’s not even getting into the cultural protection piracy affords in today’s DRM and license-filled world.

master-lincoln
·
19 hours ago
·
[ - ]

What does this have to do with torrents? If you get an executable from the internet it is widely known not to execute it if not trusted. You can get malicious executables from websites too.

If this is what people think we need to work on education...

tourmalinetaco
·
13 hours ago
·
[ - ]

Piracy also is not unique to torrents, and yet that was what GP used.

The average person, in my experience, can barely work a non-cellphone filesystem and actively stresses when a terminal is in front of them, especially for a brief moment. Education went out the window a decade ago.

ranger_danger
·
1 day ago
·
[ - ]

To me this is like saying you shouldn't use a knife because they are also used by criminals.

John_Cena
·
1 day ago
·
[ - ]

This kind of talk is simply modern politik-speak. I can't stand it and the people who fall for their deception. Stretch the truth to disarm the constituents

jonhohle
·
1 day ago
·
[ - ]

In what way? Torrents are used all over for content delivery. Battle.net uses a proprietary version of BitTorrent. It’s now owned by Microsoft. There’s many more legitimate uses as commented by many others.

Criminals using tools does not make the tools criminal.

TZubiri
·
1 day ago
·
[ - ]

It's a matter of numbers, if tens of thousands of criminals use tech X, and it has few genuine uses, it's going to be restricted.

This has precedent in illegal drug categorization, it's not just about the damage, but its ratio of noxious to helpful use.

master-lincoln
·
19 hours ago
·
[ - ]

This precedent is problematic I think. It seems like the populist way of addressing issues. Always just following the biggest outcry instead of the symptoms. Just because there are currently more illegitimate users for a a thing we shouldn't prevent legitimate uses I think. The ratio might just be skewed because in the legitimate world you grow your audience with marketing investing tons of capital, while for illegitimate use cases, the marketing is often just word of mouth because of features.

MichaelZuo
·
18 hours ago
·
[ - ]

That's how most countries work, if enough people raise a big enough fuss, it's restricted like dangerous drugs.

jonhohle
·
8 hours ago
·
[ - ]

Literally millions of people use it (whether they know it or not).

Societies should criminalize behavior and then (shocker!) enforce the laws! Let tools be tools.

Kerb_
·
1 day ago
·
[ - ]

That precedent was and still is legally used to federally regulate marijuana harsher than fentanyl, a precedent I strongly disagree with, so you'll have to forgive me for believing that the degree to which something causes harm matters more than the amount of misuse

TZubiri
·
17 hours ago
·
[ - ]

Marihuana ruins millions of young minds.

·
15 hours ago
·
[ - ]

AlienRobot
·
1 day ago
·
[ - ]

Give it a good reputation then.

What are some legal torrent trackers?

seam_carver
·
1 day ago
·
[ - ]

Humble Bundle. Various Linux iso

unleaded
·
1 day ago
·
[ - ]

archive.org to name one

boomboomsubban
·
1 day ago
·
[ - ]

That's debatable. Most of their torrents are for things under copyright, though any other decentralized archive would have the same problem.

tourmalinetaco
·
1 day ago
·
[ - ]

That’s a copyright problem. 99% of things made in the last 100 years fall under copyright.

ranger_danger
·
1 day ago
·
[ - ]

Except when their own employees publicly tell people not to worry about copyright and just upload stuff anyway, they make it their own problem.

trod123
·
1 day ago
·
[ - ]

and a good number of things that were going to pass into copyright were further extended to 2053.

ranger_danger
·
1 day ago
·
[ - ]

What is your definition of a legal torrent tracker? I was not aware there were even any illegal ones.

mikae1
·
1 day ago
·
[ - ]

> I was not aware there were even any illegal ones.

Depends on the jurisdiction. Remember what happened in the The Pirate Bay trial?

ranger_danger
·
1 day ago
·
[ - ]

My understanding is that that court case did not show that operating a torrent tracker is illegal, but specifically operating a (any) service with the explicit intent of violating copyright... huge difference IMO.

To me that's not even related to it being a torrent tracker, just that they were "aiding and abetting" copyright infringement.

TZubiri
·
1 day ago
·
[ - ]

Ok. But what is the case law in hosting illegal content? Sure you may operate a torrent, but if your client is distributing child porn, in my view, you bear responsibility.

defrost
·
1 day ago
·
[ - ]

I'm backing ranger_danger here.

In Law the technicalities matter.

Trackers generally do not host any content, just hashcodes and (sometimes) meta data descriptions of content.

If "your" (ie let's say _you_ TZubiri) client is distributing child pornography content because you have a partially downloaded CP file then that's on _you_ and not on the tracker.

The "tracker" has unique hashcode signatures of tens of millions of torrents - it literaly just puts clients (such as the one that you might be running yourself on your machine in the example above) in touch with other clients who are "just asking" about the same unique hashcode signature.

Some tracker affiliated websites (eg: TPB) might host searchable indexes of metadata associated with specific torrents (and still not host the torrents themselves) but "pure" trackers can literally operate with zero knowledge of any content - just arrange handshakes between clients looking for matching hashes - whether that's UbuntuLatest or DonkeyNotKong

TZubiri
·
1 day ago
·
[ - ]

We agree in that if my client distributes illegal content, I am responsible, at least in part.

On the other hand I also believe that a tracker that hosts hashes of illegal content, provides search facilities for and facilitates their download, is responsible, in a big way. That's my personal opinion and I think it's backed in cases like the pirate bay and sci hub.

That 0 knowledge tracker is interesting, my first reaction is that it's going to end up in very nasty places like Tor, onion, etc..

defrost
·
1 day ago
·
[ - ]

> That 0 knowledge tracker is interesting,

Most actual trackers are zero knowledge.

A tracker (bit of central software that handles 100+ thousand connections/second) is not a "torrent site" such as TPB, EZTV, etc.

A tracker handshakes torrent clients and introduces peers to each other, it has no idea nor needs an idea that "SomeName 1080p DSPN" maps to D23F5C5AAE3D5C361476108C97557F200327718A

All it needs is to store IP addresses that are interested in that hash and to pass handfuls of interested IP addresses to other interested parties (and some other bookkeeping).

From an actual tracker PoV the content is irrelevant and there's no means of telling one thing from another other than size - it's how trackers have operated for 20+ years now.

Here are some actual tracker addresses and ports

    udp://tracker.opentrackr.org:1337/announce
    udp://p4p.arenabg.com:1337/announce
    udp://tracker.torrent.eu.org:451/announce
    udp://tracker.dler.org:6969/announce
    udp://open.stealth.si:80/announce
    udp://ipv4.tracker.harry.lu:80/announce
    https://opentracker.i2p.rocks:443/announce

Here's the bittorrent protocol: http://bittorrent.org/beps/bep_0052.html

Trackers can hand out .torrent files if asked (bencoded dictionaries that describe filenames, sizes, checksums, directory structures of a torrents contents) but they don't have to; mostly they hand out peer lists of other clients .. peers can also answer requests for .torrent files.

A .torrent file isn't enough to determine illegal content.

Pornography can be contained in files labelled "BeautifulSunset.mkv" and Rick Astley parody videos can frequently be found in files labelled "DirtyFilthyRepubicanFootTappingNudeAfrica.avi"

Given that it's not clear how trackers could effectively filter by content that never actually traverses their servers.

TZubiri
·
1 day ago
·
[ - ]

Oh ok, it seems to be a misconception of mine then.

Mathematically a tracker would offer a function that given a hash, it returns you a list of peers with that file.

While a "torrent site" like TPB or SH, would offer a search mechanism, whereby they would host an index, content hashes and english descriptors, along with a search engine.

A user would then need to first use the "torrent site" to enter their search terms, and find the hash, then they would need to give the hash to a tracker, which would return the list of peers?

Is that right?

In any case, each party in the transaction shares liability. If we were analyzing a drug case or a people trafficking case, each distributor, wholesaler or retailer would bear liability and face criminal charges. A legal defense of the type "I just connected buyers with sellers I never exchanged the drug" would not have much chance of succeding, although it is a common method to obstruct justice by complicating evidence gathering. (One member collects the money, the other gives the drugs.)

defrost
·
1 day ago
·
[ - ]

> A user would then need to first use the "torrent site" to enter their search terms, and find the hash, then they would need to give the hash to a tracker, which would return the list of peers?

> Is that right?

More or less.

> In any case, each party in the transaction shares liability.

That's exactly right Bob. Just as a telephone exchange shares liability for connecting drug sellers to drug buyers when given a phone number.

Clearly the telephone exchange should know by the number that the parties intend to discuss sharing child pornography rather than public access to free to air documentaries.

How do you propose that a telephone exchange vet phone numbers to ensure drugs are not discussed?

Bear in mind that in the case of a tracker the 'call' is NOT routed through the exchange.

With a proper telephone exchange the call data (voices) pass through the exchange equipment, with a tracker no actual file content passes through the trackers hardware.

The tracker, given a number, tells interested parties about each other .. they then talk directly to each other; be it about The Sky at Night -s2024e07- 2024-10-07 Question Time or about Debbie Does Donkeys.

Also keep in mind that trackers juggle a vast volume of connections of which a very small amount would be (say) child abuse related.

TZubiri
·
15 hours ago
·
[ - ]

Interesting. That's a good point.

I'll restate the principle of good usage to bad usage ratio, telephone providers are a well established service with millions of legitimate users and uses. Furthermore they are a recognized service in law, they are regulated, and they can comply with law enforcement.

They are closer to the ISP, which according to my theory has some liability as well.

It's just a matter of the liability being small and the service to society being useful and necessary.

To take a spin to a similar but newer tech, consider crypto. My position is that its legality and liability for illegal usage of users (considering that of exchanges and online wallets, since the network is often not a legal entity) will depend on the ratio of legitimate to ilegitimate use that will be given to it.

There's definitely a second system effect, were undesirables go to the second system, so it might be a semantical difference unrelated to the technical protocols. Maybe if one system came first, or if by chance it were the most popular, the tables would be turned.

But I feel more strongly that there's design features that make law compliance, traceability and accountability difficult. In the case of trackers perhaps the microservice/object is a simple key-value store, but it is semantically associated with other protocols which have 'noxious' features described above AND are semantically associates with illegal material.

defrost
·
6 hours ago
·
[ - ]

> I'll restate the principle of good usage to bad usage ratio, telephone providers are a well established service with millions of legitimate users and uses

Ditto trackers.

Have a look at the graphs here: https://opentrackr.org/

Over 10 million torrents tracked daily, on the order of 300 thousand connections per second, handshaking between some 200 million peers per week.

That's material from the Internet Archive, software releases, pooled filesharing, legitimate content sharing via embedded clients that use torrents to share load, and a lot of TV and movies that have variable copyright status

( One of the largest TV|Movie sharing sites for decades recent closed down after the sole operator stopped bearing the cost and didn't want to take on dubious revenue sources; that was housed in a country that had no copyright agreements with the US or UK and was entirely legal on its home soil.

Another "club" MVGroup only rip documentaries that are "free to air" in the US, the UK, Japan, Australia, etc. and in 20 years of publicaly sharing publicaly funded content haven't had any real issues )

> the ISP, which according to my theory has some liability as well.

The world's a big place.

The US MPA (Motion Picture Association - the big five) backed an Australian mini-me group AFACT (Australian Federation Against Copyright Theft) to establish ISP liability in a G20 country as a beach head bit of legislation.

That did not go well: Roadshow Films Pty Ltd v iiNet Ltd decided in the High Court of Australia (2012) https://en.wikipedia.org/wiki/Roadshow_Films_Pty_Ltd_v_iiNet...

    The alliance of 34 companies unsuccessfully claimed that iiNet authorised primary copyright infringement by failing to take reasonable steps to prevent its customers from downloading and sharing infringing copies of films and television programs using BitTorrent.

That was a three strikes total face plant:

    The trial court delivered judgment on 4 February 2010, dismissing the application and awarding costs to iiNet.

    An appeal to the Full Court of the Federal Court was dismissed.

    A subsequent appeal to the High Court was unanimously dismissed on 20 April 2012.

It set a legal precedent:

    This case is important in copyright law of Australia because it tests copyright law changes required in the Australia–United States Free Trade Agreement, and set a precedent for future law suits about the responsibility of Australian Internet service providers with regards to copyright infringement via their services.

It's also now part of Crown Law .. ie. not directly part of the core British Law body, but a recognised bit of Commonwealth High Court Law that can be referenced for consideration in the UK, Canada, etc.

> but it is semantically associated with other protocols which have 'noxious' features described above AND are semantically associates with illegal material.

Gosh, semantics hey. Some people feel in their waters that this is a protocol used by criminals and must therefore by banned or policed into non existance?

Is that a legal argument?

GoblinSlayer
·
14 hours ago
·
[ - ]

Are you sure open.stealth.si is a zero knowledge tracker? Some trackers reject unregistered torrents.

defrost
·
6 hours ago
·
[ - ]

The list I gave was of some public trackers, I made no claim that they were zero knowledge trackers, I simply made a statement that trackers needn't be aware of .torrent file manifests in order to share peer lists.

I also indicated above that having knowledge of .torrent manifests is problematic as that doesn't provide real actual knowledge of file contents just knowledge of file names ... LatestActionMovie.mkv might be a rootkit virus and HappyBunnyRabbits.avi might be the worst most exploitative underage pornography you can think of.

Some trackers are also private and require membership keys to access.

I was skating a lot as TZubiri seems unaware of many of the actual details and legitimate use cases, existing law, etc.

ranger_danger
·
1 day ago
·
[ - ]

I don't think TPB ever hosted any copyrighted content, even indirectly by its users. Torrent peers do not ever send any file contents through the tracker.

AlienRobot
·
1 day ago
·
[ - ]

A tracker that only tracks legal torrents, e.g. free software, OCRemix content, etc.

boomboomsubban
·
1 day ago
·
[ - ]

https://linuxtracker.org/ http://www.publicdomaintorrents.info/ https://ocremix.org/torrents

TZubiri
·
1 day ago
·
[ - ]

How would you keep the definition of legality without a centralizing authority?

AlienRobot
·
1 day ago
·
[ - ]

A tracker is a centralized authority.

master-lincoln
·
19 hours ago
·
[ - ]

But legality doesn't have a central authority. What is illegal in one jurisdiction is ok in another

AlienRobot
·
14 hours ago
·
[ - ]

Just track things that are legal everywhere or in most jurisdictions then.

ranger_danger
·
1 day ago
·
[ - ]

I don't see how that would be enforceable. Policy perhaps, but it would be impossible to absolutely prevent it from being used for that purpose IMO.

immibis
·
1 day ago
·
[ - ]

Keep in mind the IA archives a lot of garbage. If it could be more focused it would be more likely to work.

Blackthorn
·
1 day ago
·
[ - ]

The IA only works because it archives everything. You don't know what you need until you need it.

Spooky23
·
1 day ago
·
[ - ]

Archives generally purposefully don’t have a strong editorial streak. My trash is your treasure.

immibis
·
19 hours ago
·
[ - ]

They have to if they don't want to use infinite space.

db48x
·
1 day ago
·
[ - ]

The attempts have actually been focused on specific types of content, such as historical videos.

unleaded
·
1 day ago
·
[ - ]

personally I love all the random crap on IA!

MattPalmer1086
·
1 day ago
·
[ - ]

Lots of Copies Keeps Stuff Safe

https://www.lockss.org/

This is a brilliant system relying on a randomised consensus protocol. I wanted to do my info sec dissertation on it, but its security model is extremely well thought out. There wasn't anything I felt I could add to it.

ChadNauseam
·
1 day ago
·
[ - ]

I wish IPFS wasn't so wasteful with respect to storage. I tried pinning a 200mb PDF on IPFS and doing so ended up taking almost a gigabyte of disk space altogether. It's also relatively slow. However its implementation of global deduplication is super cool – it means that I can host 5 pages and you can host 50, and any overlap between them means we can both help one another keep them available even if we don't know about one another beforehand.

For a large-scale archival project, it might not be ideal. Maybe something based on erasure coding would be better. Do you know how LOCKSS compares?

diggan
·
1 day ago
·
[ - ]

> I tried pinning a 200mb PDF on IPFS and doing so ended up taking almost a gigabyte of disk space altogether

Was that any file in particular? I just tried it myself with a 257mb PDF (as reported by `ls -lrth`) and doesn't seem to add that much overhead:

    $ du -sh ~/.ipfs
    84K     /home/user/.ipfs

    $ ipfs add ~/Downloads/large\ PDF\ File.pdf
    added QmSvbEgCuRNZpkKyQm6nA5vz5RTHW1nxb6MJdR4cZUrnDj large PDF File.pdf
     256.58 MiB / 256.58 MiB [============] 100.00%

    $ du -sh ~/.ipfs
    264M    /home/user/.ipfs

TZubiri
·
1 day ago
·
[ - ]

High Costs Makes Lots of Copies Unfeasible

MattPalmer1086
·
1 day ago
·
[ - ]

That was actually one of the key constraints in the LOCKSS system, since it was designed to be run by libraries that don't have big budgets.

The design is really very good.

Kinrany
·
1 day ago
·
[ - ]

Is there a high level explanation of the model?

__MatrixMan__
·
1 day ago
·
[ - ]

To make the web distributed-archive-friendly I think we need to start referencing things by hash and not by a path which some server has implied it will serve consistently but which actually shows you different data at different times for a million different reasons.

If different data always gets a different reference, it's easy to know if you have enough backups of it. If the same name gets you a pile of snapshots taken under different conditions, it's hard to be sure which of those are the thing that we'd want to back up for that particular name.

Cheer2171
·
1 day ago
·
[ - ]

Done. It is called IPFS. The IA already supports it.

https://github.com/internetarchive/dweb-archive/blob/master/...

Groxx
·
1 day ago
·
[ - ]

Which has a rather lengthy section explaining why it's currently a failed experiment: https://github.com/internetarchive/dweb-archive/blob/master/...

(this doc is 5-6 years old though, and I'm not sure what may have changed since then)

In my own (toy-scale) IPFS experiments a couple years ago it has been rather usable, but also the software has been utterly insane for operators and users, and if I were IA I would only consider it if I budgeted for a from-scratch rewrite (of the stuff in use). Nearly uncontrollable and unintrospectable and high resource use for no apparent reason.

majorchord
·
1 day ago
·
[ - ]

IPFS has shown that the protocol is fundamentally broken at the level of growth they want to achieve and it is already extremely slow as it is. It often takes several minutes to locate a single file.

diggan
·
1 day ago
·
[ - ]

The beauty is that IA could offer their own distribution of IPFS that uses their own DHT for example, and they could allow only public read access to it. This would solve the slow part of finding a file, for IA specifically. Then the actual transfers tend to be pretty quick with IPFS.

What's the point of using IPFS then? Others can still spread the file elsewhere and verify it's the correct one, by using the exact same ID of the file, although on two different networks. The beauty of content-addressing I guess.

acdha
·
1 day ago
·
[ - ]

That isn’t solving the problem, it’s just giving them more of it to work on. IA has enough material that I’d be surprised if they didn’t hit IPFS’s design limits on their own, and they’d likely need to change the design in ways which would be hard to get upstream.

BlueTemplar
·
1 day ago
·
[ - ]

Several minutes sounds more than fine for this purpose ?

Especially if it's about having an Internet Archive backup.

Aachen
·
1 day ago
·
[ - ]

I think the point is that it's already slow at the current amount of data, let alone when you stuff dozens more PB into it

__MatrixMan__
·
1 day ago
·
[ - ]

Right, what I'm saying is that now we need to get the rest of the web (or at least the parts we want to keep) on board.

jonhohle
·
1 day ago
·
[ - ]

There was a startup called Space Monkey that sold NAS drives where you got a portion of the space and the rest was used for copies of other people’s content (encrypted). The idea was you could lose your device, plug in a new one and restore from the cloud. They ended up folding before any of their resilience claims could be tested (at least by me).

Would be people be willing to buy an IA box that hosted a shard of random content along with the things they wanted themselves?

mbirth
·
1 day ago
·
[ - ]

Does anyone remember wua.la? It worked similar in that you offered local disk space in exchange for cloud storage. It was later bought by LaCie and killed off shortly after.

gruez
·
1 day ago
·
[ - ]

What happens when the user base explodes (eg. due to this event), and a few months layer they all get bored and drop out?

scirob
·
21 hours ago
·
[ - ]

LigGen / SciHub are model citizens in this regard. Use the best tech for the job . Torrents + IPFS + simple http mirrors all over . While their data is big its not as big as Archive.org https://libgen.is/repository_torrent/ so I guess one still needs the funding to go to these decentralized nodes and they are there mostly for resiliancy

TZubiri
·
17 hours ago
·
[ - ]

If so, why do they keep jumping from one dubious domain and TLD from another?

skeaker
·
12 hours ago
·
[ - ]

That's mentioned in the comment you're replying to as "simple HTTP mirrors." Those are vulnerable to takedowns but are kept up for the sake of easy accessibility to the public that doesn't know how to access any of the other options. Their availability has no real bearing on the availability of the other options beyond the tech savviness of the general public.

stavros
·
1 day ago
·
[ - ]

I designed a system where you could say "donate this spare 2 TB of my disk space to the Internet Archive" and the IA would push 2 TB of data to you. This system also has the property that it can be reconstructed if the IA (or whatever provider) goes away.

Unfortunately, when I talked to a few archival teams (including the IA) about whether they'd be interested in using it, I either got no response or a negative one.

whywhywhywhy
·
22 hours ago
·
[ - ]

Because the incentive will be archiving things they believe should be archived, you need the process to begin with what urls do you want to be archiving, then people will be incentivized for archiving the juicy stuff IA is used for and you just throw some stuff they didn't ask to archive in the remit of them storing what they want.

stavros
·
19 hours ago
·
[ - ]

I don't think that's necessarily true, I have a spare TB that I'd be glad to donate to the IA to store whatever they want in.

TZubiri
·
17 hours ago
·
[ - ]

Hosting something at a volunteer's drive, without any guarantees, is pretty useless. They can cease hosting, or have disk damage, and you lose data

stavros
·
16 hours ago
·
[ - ]

You're essentially saying that having an extra copy of data is equally as reliable as not having an extra copy of data. I would encourage you to think about this a bit more.

TZubiri
·
16 hours ago
·
[ - ]

For these parameters, yes.

If you have a raid, then you have 2 copies with like 99.99% availability and 5 mean time years to failure.

With a volunteer drive you have like ?% availability and ?% years to failure? You can't depend on it.

Also the average value of data is very low, you don't want to be making many copies of for no reason.

stavros
·
15 hours ago
·
[ - ]

That would mean that even with a million volunteer drives storing a file, you still wouldn't be able to depend on them, which is plainly wrong.

> Also the average value of data is very low, you don't want to be making many copies of for no reason.

The reason is that the value of that data is high to the archivist, since they want to preserve it.

TZubiri
·
13 hours ago
·
[ - ]

A million is out of the parameters of the case.

Realistically you won't get enough volunteer-storage to cover one IA. And even if you did, it wouldn't satisfy the mission requirements, which is to store reliably for decades all of the data.

stavros
·
13 hours ago
·
[ - ]

This isn't meant to be storage for IA, it's meant to be a distributed backup.

TZubiri
·
12 hours ago
·
[ - ]

Ah my bad, so it's not a replacement of IA. In that case it makes sense

stavros
·
10 hours ago
·
[ - ]

Yes, the idea is that this is a replacement for the torrents they make public. In case the IA goes away, we'll have this distributed dataset to fall back on.

TZubiri
·
9 hours ago
·
[ - ]

An archive of an archive

boramalper
·
21 hours ago
·
[ - ]

Is this open source or do you have any design docs? I love the idea and would love to learn more about it.

stavros
·
21 hours ago
·
[ - ]

The idea is that it'll be open source, I have a rough design doc here:

https://docs.google.com/document/d/1qKgIjUTef-I-BLWjn4sEIbYo...

I'll write up a more detailed article on it, though, it'll be good to at least have the doc public somewhere.

4gotunameagain
·
1 day ago
·
[ - ]

Why reinvent the wheel ?

There are so many proven distributed archiving systems, a lot of which are mentioned in these comments.

stavros
·
21 hours ago
·
[ - ]

What system of these allows me to donate a bunch of disk space to a provider of my choosing, without thinking about it afterwards?

oytis
·
1 day ago
·
[ - ]

We'll need to find even more people willing to expose themselves to legal threats and cyberattacks then.

trompetenaccoun
·
1 day ago
·
[ - ]

The legal side is a big issue, true. The simplest and best workaround that I'm aware of is how the Arweave network handles it. They leave it up to the individual what parts of the data they want to host, but they're financially incentivized to take on rare data that others aren't hosting, because the rarer it is the more they get rewarded. Since it's decentralized and globally distributed, if something is risky to host in one jurisdiction, people in another can take that job and vice versa. The data also can not be altered after it's uploaded, and that's verifiable through hashes and sampling. Main downside in its current form is that decentralized storage isn't as fast as having central servers. And the experience can vary of course, depending on the host you connect to.

As for technical attacks, I'm not an expert but I'd assume it's more difficult for bad actors to bring down decentralized networks. Has the BitTorrent network ever gone offline because it was hacked for example? That seems like it would be extremely hard to do, not even the movie industry managed to take them down.

Aachen
·
1 day ago
·
[ - ]

> decentralized storage isn't as fast as having central servers.

With the 30-second "time to first byte" speed we all know and love from IA, I'm pretty sure it'd only get faster when you're the only person accessing an obscure document on a random person's shoebox in Korea as compared to trying to fetch it from a centralised server that has a few thousand other clients to attend to simultaneously

jmb99
·
1 day ago
·
[ - ]

> decentralized storage isn't as fast as having central servers.

Depending on scale that’s not necessarily true. I find even today there are many services that cannot keep up with my residential fiber connection (3Gbps symmetrical), whereas torrents frequently can. IA in particular is notoriously slow when downloading from their servers, and even taking into account DHT time torrents can be much faster.

Now if all of their PBs of data were cached in a CDN, yeah that’s probably faster than any decentralized solution. But that will take a heck of a lot more money to maintain than I think is possible for IA.

Aachen
·
1 day ago
·
[ - ]

I collect, archive, and host data. Haven't gotten any threats or attacks. Not one. The average r/selfhosted user hiding their personal OwnCloud behind the DDoS maffia seems more afraid than one needs to be even for hosting all sorts of things publicly. I guess this fearmongering comes from tech news about breaches and DDoS attacks on organisations, similar to regular news impacting your regular worldview regardless of how it's actually going in the world or how things personally affect you

trod123
·
1 day ago
·
[ - ]

Its not a problem until it suddenly is, and by the time it becomes a problem its too late. Its not fear mongering, its risk management and the laws are draconian and fail fundamental basis for a "rule of law", we have a "rule by law".

Aachen
·
16 hours ago
·
[ - ]

And that's how you get preppers: it's a remote problem until it's too late, let's prepare for every eventuality before it's suddenly too late!

Risk management is a balance, not fearmongering as you say. That's why I'd rather use advice from people with daily experience than look at the newsworthy experiences ("nothing happened today, again; regular security patches working fine" you'll never see) and conclude you'd attract threats and cyber attacks just by hosting backup copies of parts of the Internet Archive

NelsonMinar
·
1 day ago
·
[ - ]

Is anyone using ArchiveBox regularly? It's a self-hosted archiving solution. Not the ambitious decentralized system I think this comment is thinking of but a practical way for someone to run an archive for themselves. https://archivebox.io/

jumpingscript
·
23 hours ago
·
[ - ]

I am self-hosting ArchiveBox through yunohost, for the odd blog article I come across and like. Not a heavy user per se, but it's doing its thing reliably.

bigiain
·
1 day ago
·
[ - ]

@nikisweeting the dev of archivebox was active in a thread about out here last week.

https://news.ycombinator.com/item?id=41860909

I'd never heard of it, but their responses to question and comments in that thread were really really good (and I now have "install and configure archivebox on the media server" on my upcoming weekend projects list).

sersi
·
16 hours ago
·
[ - ]

If there was a way, I could dedicate 10 TB to IA and have the system automatically use those 10TB to store whatever is best I'd love that.

Right now there are torrents and I do keep any torrents I download from IA in my client for years but torrents means I only get to contribute by sharing the things I downloaded in the past.

hacknewslogin
·
11 hours ago
·
[ - ]

This sounds like a task that could be taken care of by public libraries. That, and supporting Tor. I just don't see it happening anytime soon, at least in the US.

TechSquidTV
·
1 day ago
·
[ - ]

This has really shown that the be true. I am stuck in a situation right now where I have some lost media I want to upload but they have been down for over a week. I plan to create a torrent in the meantime but that means relying on my personal network connection for the vast majority of downloads up front. I looked into CloudFlare R2, not terrible but not free either.

I was looking into using R2 as a web seed for the torrent but I don't _really_ want to spend much to upload content that is going to get "stolen" and reuploaded by content farms anyway you know?

tourmalinetaco
·
1 day ago
·
[ - ]

Why not subscribe to a seedbox? They’re about $5/2TB/mo. It protects your IP, you can buy for only the month, and since seedboxes are hosted in DMCA-resistant data centers you can download riskier torrents lightning fast, meaning you’re not just spending money for others, you can get something out of it too.

bigiain
·
1 day ago
·
[ - ]

Any hints or recommendations on how to find a decent seedbox vendor? (working email in profile if you'd rather not name any in public)

tourmalinetaco
·
1 day ago
·
[ - ]

I’ve only used r/Seedboxes on Reddit, and that’s yet to fail me. The specific one I mentioned is EvoSeedBox’s $5/mo tier with 130GB HDD + 2TB bandwidth which is all I’ve needed so far.

gruez
·
1 day ago
·
[ - ]

2TB of bandwidth or storage?

tourmalinetaco
·
1 day ago
·
[ - ]

Bandwidth, though some provide multi-TB storage (I assume you pay out the nose however).

Cheer2171
·
1 day ago
·
[ - ]

You say this as if the IA is not already deeply invested in the DWeb movement. If you go to a DWeb event in the Bay Area, there is a good chance it will be held at the IA.

sschueller
·
1 day ago
·
[ - ]

Yes, I was quite shocked when I found out that all their DCs are within driving distance.

absence5875
·
1 day ago
·
[ - ]

[dead]

patcon
·
1 day ago
·
[ - ]

The internet archive shepherded the early https://getdweb.net/ community, and works with groups like IPFS, so they're well aware and offering operational support to decentralized storage projects. This has been going since at least 2016 when I was involved in some projects involving environmental data archiving during the Trump transition

johndhi
·
1 day ago
·
[ - ]

Ipfs

delfinom
·
1 day ago
·
[ - ]

Yea so, who pays for the decentralized storage long term? What happens when someone storing decentralized data decides to exit? Will data be copied to multiple places, who is going to pay for doubling, tripling or more the storage costs for backups?

Centralized entities emerge to absorb costs because nobody else can do it as efficiently alone.

jmb99
·
1 day ago
·
[ - ]

At the moment, IA stores everything, and I imagine that most people are picturing a scenario where the decentralized data is in addition to IA's current servers. At least, that's the easiest bootstrapping path.

>What happens when someone storing decentralized data decides to exit?

They exit, and they no longer store decentralized data. At the very least, IA would still have their copy(s), and that data can be spread to other decentralized nodes once it has been determined (through timeouts, etc) that the person has exited.

> Will data be copied to multiple places[...]?

Ideally, yes. It is fairly trivial to determine the reliability of each member (uptime + hash checks), and reliable members (a few nines of uptime and hash matches) can be trusted to store data with fewer copies while unreliable members can store data with more copies. Could also balance that idea with data that's in higher demand, by storing hot data lots of times on less reliable members while storing cold data on more reliable members.

> who pays for the decentralized storage long term? [...] who is going to pay for doubling, tripling or more the storage costs for backups?

This is unanswered for pretty much any decentralized storage project, and is probably the only important question left. There are people who would likely contribute to some degree without a financial incentive, but ideally there would be some sort of reward. This in theory could be a good use for crypto, but I'd be concerned about the possible perverse incentives and the general disdain the average person has for crypto these days. Funding in general could come from donations received by IA, whatever excess they have beyond their operating costs and reserve requirements - likely would be nowhere near enough to make something like this "financially viable" (i.e. profitable) but it might be enough to convince people who were on the fence to chip in few hundred GB and some bandwidth. This is an open question though, and probably the main reason no decentralized storage project has really taken off.

rootsudo
·
1 day ago
·
[ - ]

I watched a hbo series about this once, I think it was called Pied Piper.

·
16 hours ago
·
[ - ]

throwaway984393
·
1 day ago
·
[ - ]

[dead]

sksxihve
·
1 day ago
·
[ - ]

There's no real financial incentive for people to archive the data as a singular entity so even less for a distributed collection. Also it's probably easier to fund a single entity sufficiently so they can have security/code audits than a bunch of entities all trying to work together.

riiii
·
1 day ago
·
[ - ]

Some people are motivated by more than just financial incentive.

sksxihve
·
1 day ago
·
[ - ]

That's true, but something like archiving the internet is very costly, IA has an annual budget in the tens of millions.

trompetenaccoun
·
1 day ago
·
[ - ]

Yes, it's a good point. Though they could take that money and reward people for hosting the data as well, couldn't they? They don't have to be in charge of hosting.

sksxihve
·
1 day ago
·
[ - ]

Yes, they could, that's not much different than a single company distributing the archive to multiple storage centers though. My original comment was about it being more cost effective for a single company to do that than coordinating with a bunch of disjoint entities.

trompetenaccoun
·
1 day ago
·
[ - ]

Our digital memory shouldn't be in the hands of a small number of organizations in my view. You're right about cost effectiveness. There are pros and cons to both but it's not just external threats that have to be considered.

History has always gotten rewritten throughout time. If you have a giant library it's easier for bad actors to gain influence and alter certain books, or remove them. This isn't just theoretical, under external pressure IA has already removed sites from its archive for copyright and political reasons.

There are also threats that are generally not even considered because they happen with rare frequency, but when they happen they're devastating. The library of Alexandria was burned by Julius Caesar during a war. Likewise, if all your servers are in one country that geographic risk, they can get destroyed in the event of a war or such. No one expects this to happen today in the US, but archives should be robust long term, for decades, ideally even centuries.

delfinom
·
1 day ago
·
[ - ]

>Our digital memory shouldn't be in the hands of a small number of organizations in my view.

I would wager at least 95% of "digital memory" archived is just absolute garbage from SEO spam to just some small websites holding no actual value.

The true digital memory of the world is almost entirely behind the walls of reddit, twitter, facebook, and very few other sites. The internet landscape has changed massively from the 90s and 2000s.

sprkwd
·
21 hours ago
·
[ - ]

We are currently in the middle of an information dark age and not many people have realised this yet.

·
1 day ago
·
[ - ]

BlueTemplar
·
1 day ago
·
[ - ]

So, about $0.01 per person per year ?

We are talking about an (almost) worldwide archive after all.

steeeeeve
·
10 hours ago
·
[ - ]

To everyone who wants a better alternative to IA, who thinks they have a different solution, who thinks it should be run by a different organization, etc.

Nobody has ever stopped a competitive alternative from existing. Feel free to give it a shot. You have a head start with all the work that they've done and shared.

rbanffy
·
23 hours ago
·
[ - ]

What kind of vandal attacks a library? We really need to find the people responsible.

Shank
·
18 hours ago
·
[ - ]

These kinds of internet attacks happen all the time to children who host Minecraft servers, small businesses, amateur programmers, etc. Finding the responsible party is often difficult if not impossible, and usually the FBI and other powers-that-be have bigger fish-to-fry (i.e., drug traffickers, cyber criminals who focus on extortion, etc) and are unable to devote resources to finding these types of attackers.

The sad reality is that a lot of people are unfairly attacked on the internet and many go unpunished due to lack of investigative focus, resources, etc.

gomizari
·
5 hours ago
·
[ - ]

Who profits from this attack? Who forced to remove books from IA?

gweinberg
·
1 day ago
·
[ - ]

Does anyone know who is targeting the Internet Archive, and why? I get the impression the attacks are too sophisticated for it to just be vandal punks.

ogurechny
·
1 day ago
·
[ - ]

I get the impression it's just pissing into the salt shaker. Internet Archive is obviously held together by duct tape (okay, okay, strong and durable duct tape) and personal willpower. Moreover, its main mission is spreading data, not hiding it from others to generate revenue.

Those who don't get the salt shaker bit, here's the original of the ancient wisdom:

https://web.archive.org/web/20060619131835/http://xelios.liv...

Choose any translation:

https://malaya-zemlya.livejournal.com/697779.html

https://personal-view.com/talks/discussion/25915/humor-hacke...

https://www.linkedin.com/pulse/hacker-restaurant-alexander-s...

exadeci
·
1 hour ago
·
[ - ]

Maybe some WhiteHat hackers telling IA about all their security issues being ignored so they hacked them to force them to do something about it as nothing seems that have been badly damaged

lolinder
·
1 day ago
·
[ - ]

> I get the impression the attacks are too sophisticated for it to just be vandal punks.

What gives that impression? Everything I've seen about the attacker's messaging says "vandal punk(s)" to me, and nothing in what I've seen of the IA's systems screams Fort Knox. It wouldn't surprise me if they actually had a pretty lax approach to security on the assumption that there's very little reason to target them.

dokyun
·
1 day ago
·
[ - ]

The group that claimed to be responsible for the first hack was said to be Russian-based, anti-U.S., pro-Palestine, and their reasoning for the attack was because of IA's violation of copyright....

I think you should draw your own more informed conclusions, but it smells a lot like feds to me.

MathMonkeyMan
·
1 day ago
·
[ - ]

What do Palestine, Russia, and the U.S. have to do with the Internet Archive? The Internet Archive is a supremely boring target politically.

71bw
·
21 hours ago
·
[ - ]

>supremely boring target politically.

Oh how wrong you are.

There is nothing boring in a target that can be used to validate others' lies and potential hypocrisy, changes in their policies etc.

The wayback machine itself serves as a truly priceless way to go back through someone's public life.

shiroiushi
·
1 day ago
·
[ - ]

He's pointing out that it doesn't make any sense: why would someone pro-Russian and anti-US care about violating western IP? In reality, it's the opposite: Russia is happy to help with that because they think it helps weaken the west.

small_scombrus
·
1 day ago
·
[ - ]

That's the point they're making. It's such a seeming non-sequitur that people are suspicious and coming up with fun theories.

elihu
·
19 hours ago
·
[ - ]

It seems plausible to me that someone is just trying to exacerbate political/social conflict. Russia has been known to do such things, and they often take multiple sides just to sow chaos. Self-identifying as Palestinian activists would be an effective way to get more people mad that otherwise wouldn't care.

This is also right before a major U.S. election, and that might not be a coincidence. Someone might be trying to get Trump elected by drawing attention to the Israel-Gaza conflict, a topic that isn't exactly a winning issue for Harris.

That's just one possibility of many though. I mean, might just be regular assholes that happen to also be pro-Palestinian. Or pro-Hamas, which isn't necessarily the same thing.

whywhywhywhy
·
22 hours ago
·
[ - ]

> The Internet Archive is a supremely boring target politically.

I mean it's where we go to prove a politician said something after they deleted it or where a government changes the wording of something...

I'd argue it's one of the juicer political targets if you're actually wanting to do something.

steffanA
·
18 hours ago
·
[ - ]

The people who DDoSed Internet Archive never claimed to be behind the breach. That was some media companies who misreported this.

The breach happened over a week before the DDoS attack, according to Troy Hunt.

Stop looking for conspiracy theories.

polytely
·
1 day ago
·
[ - ]

With the amount of comments calling for a leadership change my tinfoilhat theory is that this is a concerted effort to get a leadership change.

steffanA
·
18 hours ago
·
[ - ]

Nothing sophisticated about this attack.

·
1 day ago
·
[ - ]

xyst
·
1 day ago
·
[ - ]

Is it sophisticated if IA leaves the door wide open? I blame shit leadership.

jrm4
·
1 day ago
·
[ - ]

It strikes me as reasonable to assume (or at least strongly bet on) -- I'm not sure of the right phrase for it -- but like a mercenary type operation on behalf of some larger old media company?

There's just too much "means, motive and opportunity" there.

_fat_santa
·
1 day ago
·
[ - ]

I don't know what their funding model looks like but if they have some cash I'd say hiring a security team would be on top of the list of things to invest in.

brendoelfrendo
·
1 day ago
·
[ - ]

I believe that, at this point in time at least, IA's funding model consists of sweating profusely while awaiting a colossal legal judgement.

notmysql_
·
1 day ago
·
[ - ]

I sent them a resume almost a year ago, and got nothing back in response until yesterday. Looks like they are going through their backlog right now to find more hands.

TZubiri
·
1 day ago
·
[ - ]

Interesting, for a security position?

notmysql_
·
1 day ago
·
[ - ]

It was a while ago, I think it was for their general position option, though I did talk about sec experience in it

myself248
·
1 day ago
·
[ - ]

I'd like to imagine a world where every lawyer, when their case is helped by a Wayback Machine snapshot of something, flips a few bucks to IA. They could afford a world-class admin team in no time flat.

thaumasiotes
·
1 day ago
·
[ - ]

That's a terrible solution. The Wayback Machine takes down their snapshots at the request of whoever controls the domain. That's not archival.

If the state of a webpage in the past matters to you, you need a record that won't cease to exist when your opposition asks it to. This is the concept behind perma.cc.

db48x
·
1 day ago
·
[ - ]

No, they don’t delete the archived content. When the domain’s robots.txt file bans spidering, then the Wayback Machine _hides_ the content archived at that domain. It is still stored and maintained, but it isn’t distributed via the website. The content will be unhidden if the robots.txt file stops banning spiders, or if an appropriate request is made.

speerer
·
1 day ago
·
[ - ]

In some cases they do appear to delete, on request.

edit: "Other types of removal requests may also be sent to info@archive.org. Please provide as clear an explanation as possible as to what you are requesting be removed for us to better understand your reason for making the request.", https://help.archive.org/help/how-do-i-request-to-remove-som...

db48x
·
1 day ago
·
[ - ]

Nope. Nothing is deleted, just hidden.

rascul
·
1 day ago
·
[ - ]

How do you know?

db48x
·
1 day ago
·
[ - ]

I worked there for a short while.

bombcar
·
1 day ago
·
[ - ]

So if the Internet Archive accidentally archived child porn, they wouldn’t delete it?

I suspect they DO delete some things.

db48x
·
1 day ago
·
[ - ]

Don't be asinine; of course there are exceptions. But the general rule is that nothing is deleted. Even if you have a fancy expensive lawyer send them a C&D letter asking them to delete something or else, they’ll just hide it. You can’t tell the difference from the outside. In fact there are monitoring alarms that are triggered if something _is_ deleted.

thimabi
·
1 day ago
·
[ - ]

Claiming to have deleted something while just having hidden from public view… that’s basically begging content owners to sue and very easily win damages.

db48x
·
1 day ago
·
[ - ]

Copyright only regulates the distribution of copies of copyrighted works. Possessing copies and distributing copies to other people are two different things.

If you were photocopying a textbook and giving it to your classmates, the publisher could have their lawyer send you a Cease and Desist letter telling you to stop (or else). But if they told you to burn your copy of the textbook then they would be overreaching, and everyone would laugh at them when you took that story to the papers.

Legal reasoning from made‐up examples is generally a bad idea, but I think you can safely reason from that one.

I’m not privy to the actual communications in these cases, but I suspect that instead of replying back with “we deleted the content from the Archive”, they instead say something anodyne like “the content is no longer available via the Wayback Machine”. Smart lawyers will notice the difference, but then a smart lawyer wouldn’t have expected anything else.

jpc0
·
19 hours ago
·
[ - ]

I'm not going to look up legal precedent, hire a lawyer if you want that.

You are wrong, copyright specifically prohibits copying, not distribution. They can get a cease and desist that requests you destroy property and they ca get a court order backing that which will put you into contempt of court if you fail to do so.

Proving damages is easier with distribution, but that is a civil matter not a criminal matter.

db48x
·
18 hours ago
·
[ - ]

They’re not making copies either.

jpc0
·
9 hours ago
·
[ - ]

So they aren't making copies? How then do they have an archive of internet resources if not by copying said resources?

You do realise the "downloading" is implicitly a copy.

If you want to actually have a civil discussion then you need to make some reasonable argument than "They're not making copies either."

Sounds like whatever role you played at IA when you were there didn't give you any actual insight into what happens in operation and you simply tried to prove your point with an appeal to authority instead of backing it with facts and reason.

db48x
·
8 hours ago
·
[ - ]

Ahem. We are discussing items which have been hidden. A hidden item which users of the archive cannot access is not being copied. It’s just sitting there on a drive. Occasionally an automated process comes along and computes its hash to make sure there hasn’t been any bitrot. There’s no warehouse of undistributed copies of the thing that a court can order the Archive to destroy.

thaumasiotes
·
22 hours ago
·
[ - ]

> Legal reasoning from made‐up examples is generally a bad idea

What? That's the only way to do legal reasoning, and as an obvious consequence it's how both lawyers and judges do it.

db48x
·
20 hours ago
·
[ - ]

I would be better to quote the actual text of the law than to make up a silly hypothetical on the spot, but that would be more work.

Even better would be to quote from some case where a judge has applied the law to actual events.

null0pointer
·
1 day ago
·
[ - ]

What’s the reasoning behind hiding content upon request? Doesn’t that defeat the purpose of archival?

My intuition would say there are 3 cases when content ceases to become available at the original site:

- The host becomes unable to host the content for some reason (bankruptcy, death, etc.) in which case I assume the archive persists.

- The host is externally required to remove the content (copyright, etc.) in which case I assume IA would face the same external pressure? But I’m not sure on that.

- The host/owner has a change of heart about publishing the content. This borders more on IA acting as reputation management on the part of the original host/owner. Personally I think this is hardest to defend but also probably the least common case. In this case I’d think it’s most often to hide something the original host doesn’t want the public finding out later, but that also seems to make it more valuable to be publicly available in the archive. Plus, from a historian/journalist perspective, it’s valuable to be able to track how things change over time, and hiding this from the public prevents that. Though to be honest I’m kind of in two minds here because on the other hand I’m generally of the opinion that people can grow and change, and we shouldn’t hold people to account for opinions they published a decade ago, for example. I’m also generally in favor of the right to be forgotten.

Would appreciate your thoughts here.

db48x
·
1 day ago
·
[ - ]

It’s all about copyright. Copyright law in the US gives a monopoly on distribution of copies of things (hand‐waving because the definitions are hard, basically artistic works) to their author. Of course authors usually delegate that right to their publisher for practical and financial reasons. There are some fair use exceptions, but this basically makes it illegal for anyone else to make and distribute copies of the author’s work. Again, hand‐waving because I don't want to have to write a dissertation.

When IA shows you what a website looked like in the past, they are reproducing a copyrighted work and distributing it to you. In some cases, perhaps many, this is fair use. IA cannot really know ahead of time which viewers would be exercising their fair use rights and which would not. Instead, IA just makes everything available without trying to guess whether the access would fall under fair use or not. That means that many times, possibly most of the time, IA is technically breaking the law by illegally distributing copies of copyrighted works.

But _owning_ a copy of a copyrighted work is never prohibited by copyright. It doesn’t matter how you got the copy either.

Therefore, pretty much any time someone asks for something to be hidden or removed on copyright grounds, they go ahead and hide it. They don’t bother to delete it though, because copyright doesn’t require them to. If a copyright holder asks for it to be deleted then they are overreaching, and should know that any sane person would object. But as far as I am aware IA doesn’t actually bother to object in writing; they just hide the content and move on.

This means that researchers can visit the archive in person and request permission to see those copies. For example if you are studying the history of artistic techniques in video games using emulated software on IA, you might eventually notice that all the games from one major publisher are missing (except iirc the original Donkey Kong, because they don’t actually own the copyright on that one). You could then journey to the Archive in person to see the missing material and fill in the gaps in your history. Or you could just ignore them entirely out of spite. This is no different than viewing rare books held by any library, or viewing unexhibited artifacts held by a museum, etc

null0pointer
·
22 hours ago
·
[ - ]

Thanks for the detailed response, very informative. This sounds similar to DMCA takedown requests, though I’m not knowledgable enough to know the distinction. It’s a shame that to view hidden archives one needs to visit the archive in person, but I guess if IA were to respond to email requests for such archives they would be guilty of breaking the same distribution rule. The major difference between the rare books or museum examples and content on IA is that the digital artifacts are infinitely reproducible and transportable so the physical visit required to view them seems totally unnecessary on its face.

It’s a shame that to be able to run an above-board _Internet_ Archive one needs to bend to the whim of anachronistic copyright law and forego all the benefits of the internet in the first place. This seems like it would inevitably mean that any _internet_ archive that is truly accessible over the _internet_ would be forced to operate illegally in a similar manner to SciHub.

I know I hold a rather strong opinion regarding copyright law (I’m not looking to debate it here as I know others hold different opinions which is totally fine), but IMHO copyright law has been a major blight on humanity at large and especially the internet. Major reform is in order at the very least, if not total abolishment.

db48x
·
8 hours ago
·
[ - ]

Yea, it’s pretty weird. There’s no technical reason for it, merely a legal one.

Raed667
·
1 day ago
·
[ - ]

They do delete entire domains from the archive upon request & proof of ownership.

db48x
·
1 day ago
·
[ - ]

Again, no they don’t. They just hide them.

DoctorOetker
·
20 hours ago
·
[ - ]

That distinction becomes nearly moot in lots of cases:

* it prevents victims from performing discovery (gathering evidence) before starting a trial or confiding to an expensive lawyer whose loyalty may turn out to systematically lie with the perpetrators or highest bidders.

* it prevents people who requested a snapshot (and thus know a specific URL with relevant knowledge) from proving their version of events to acquaintances, say during or after a court case in the event their lawyers just spin a random story instead of submitting the evidence as requested, since disloyal lawyer will have informed counterparty and counterparties will have requested "removal" of the page at IA, resulting in psychological isolation of the victim since victim can no longer point to the pages with direct and or supporting evidence.

Anyone with even basic understanding of cryptographic hashes and signatures would understand that:

1) for a tech-savvy entity (which an internet archival entity automatically is expected to be)

2) in the face changing norms and values (regardless of static or changing laws: throughout history violations were systematically turned a blind eye to)

3) given the shameless nature of certain entities, self-describing their criminal behavior on their commercial webpages

Any person understanding above 3 points concludes that such an archival company can impossibly assume some imaginary "middle ground" between:

A) Defender of truth and evidence, freedom fighter, human rights activist, so that humanity can learn from mistakes and crimes

B) Status quomplicit opressor of evidence

Because any imaginary hypothetical "middle ground" entity would quickly be inundated by legal requests for companies hiding their suddenly permanently visible crimes, and simultaneously for reinstantiations by victims pleading public access to the evidence.

Once we know its either A or B, and recalling point "tech savvy" (point 1), we can summarily conclude that a class A archival entity would helpfully assist victims as follows: don't just provide easy page archival buttons, but also provide attestations: say zip files of the pages, with an autogenerated legalese PDF, with hashes of the page and the date of observation, cryptographically signed by the IA. This way a victim can prove to police, lawyers, judges, or in case those locally work against them, prove to friends, family, ... that the IA did in fact see the information and evidence.

I leave it to the reader to locate the attestation package zips for these pages, in order to ascertain that the IA is a class A organization, and not a class B one.

wl
·
19 hours ago
·
[ - ]

They ignore robots.txt these days.

db48x
·
7 hours ago
·
[ - ]

Hmm. I’ll have to try to remember that :)

·
1 day ago
·
[ - ]

myself248
·
1 day ago
·
[ - ]

Ooo, excellent. Yes, hiding items is imperfect, but I understood that it was legally required or something. (IANAL and IDFK, TBH) I wonder how perma.cc gets around that.

berdario
·
1 day ago
·
[ - ]

I'm afraid that it just hasn't been tested in court yet.

I haven't read this paper yet, but...

https://www.tesble.com/10.1080/0270319x.2021.1886785

from the abstract:

> The article concludes that Perma.cc's archival use is neither firmly grounded in existing fair use nor library exemptions; that Perma.cc, its "registrar" library, institutional affiliates, and its contributors have some (at least theoretical) exposure to risk

It seems that the article is about copyright, but of course there are several other reasons that might justify takedown of content stored on perma.cc:

- Right to be forgotten... perma.cc might be able to ignore it, but could this lead to perma.cc being blocked by european ISPs

- ITAR stuff

- content published by entities recognized by $GOVERNMENT as terrorist organizations

- revenge porn

- CSAM

myself248
·
12 hours ago
·
[ - ]

So, precisely the same constraints that IA operates under, just perma.cc isn't big enough yet to have been forced to comply with them?

I'll hold my breath.

immibis
·
1 day ago
·
[ - ]

Most likely by breaking the law.

speerer
·
1 day ago
·
[ - ]

That's correct, but only for present evidence - what about the past evidence, that you didn't know you needed until it was too late? IA is broad enough to cover the past five times out of ten.

·
1 day ago
·
[ - ]

RcouF1uZ4gsC
·
1 day ago
·
[ - ]

The Library of Congress should be archiving the Internet and it should have the budget required to do so.

This is in line with its mission as the "Library of Congress". Being able to have an accurate record of what was on the Internet at a specific point in time would be helpful when discussing legislation or potential regulation involving the internet.

awkwardpotato
·
1 day ago
·
[ - ]

The Library of Congress does currently archive limited collections of the internet[0]. They have a blog post[1] breaking down the effort, currently it's 8 full time staff with a team of part time members. According to Wikipedia[2], it's built on Heritrix and Wayback which are both developed by the Internet Archive (blog post also mentions "Wayback software"). Current archives are available at: http://webarchive.loc.gov/

[0] https://www.loc.gov/programs/web-archiving/about-this-progra...

[1] https://blogs.loc.gov/thesignal/2023/08/the-web-archiving-te...

[2] https://en.m.wikipedia.org/wiki/List_of_Web_archiving_initia...

tokai
·
1 day ago
·
[ - ]

As awkwardpotato write they do. Many national libraries all over the word treat the internet as covered by their requirements of legal deposit, and crawl their respective TLD.

amai
·
18 hours ago
·
[ - ]

Somebody is trying hard to change the history of the internet.

wkat4242
·
1 day ago
·
[ - ]

Ouch. Once can happen, twice in a row...

fallingknife
·
1 day ago
·
[ - ]

Once makes the second time more likely. Shows you are a soft target.

udev4096
·
1 day ago
·
[ - ]

Is it the same email spoofing attack vector of zendesk which was disclosed last week?

steffanA
·
1 day ago
·
[ - ]

Article says API token was stolen in original breach.

·
1 day ago
·
[ - ]

nonamepcbrand1
·
1 day ago
·
[ - ]

Waiting for trufflehog and gitguardian vendors to come up with article, tweets on how their tools would have stopped this incident :sweatsmile:

nchmy
·
1 day ago
·
[ - ]

It's Matt Mullenweg trying to erase the vast records of his deranged megalomania.

butz
·
1 day ago
·
[ - ]

Is there any way IA could be mirrored in read-only mode, while security concerns are addressed?

trod123
·
1 day ago
·
[ - ]

Depends on the topology, my guess would be no though. Generally speaking, a compromise requires a lot of non-public work to be done in a very short time period. If they don't know how they were initially compromised (and you can't take attacker's word on things), simply throwing up another copy isn't going to fix the issue and often eggs them on to continue.

You basically have to re-perimeterize your topology with known good working security, and re-examine trusted relationships starting with a core group of servers and services, and then expanding outwards, ensuring proper segmentation along the way. Its a lot easier with validated zero trust configurations, but even then its a real pain (especially when there is a hidden flaw in your zero-trust config somewhere) and its very heavy on labor. Servers and services also need to ensure they have not deviated from their initial known desired states.

Some bad guys set traps in the data/services as timebombs, that either cross-polinate, or re-compromise later. There are quite a lot of malicious ****s out there.

·
1 day ago
·
[ - ]

999900000999
·
1 day ago
·
[ - ]

Do any organizations have a mirror of this?

Even if it's not publicly available...

excerionsforte
·
18 hours ago
·
[ - ]

How do you donate to them?

256_
·
16 hours ago
·
[ - ]

Qui bono?

anthk
·
1 day ago
·
[ - ]

The Internet Archive had legal gems such as the Jamendo Album Collection, a huge CC haven. Yes, most of it under NC licenses, but for non-commercial streaming radio with podcasts, these have been invaluable.

Do you know Nanowar? They began there.

Also, as commercal music has been deliberately dumbed down for the masses (in paper, not by cheap talking), discovering Jamendo and Magnatune in late 00's has been like crossing a parallel universe.

bowsamic
·
21 hours ago
·
[ - ]

Honestly I'm totally on the side of the hackers in all this. The IA is the most important thing on the internet and the fact it has such bad security is absolutely inexcusable. Thank you to the hackers for bringing attention to this

TheFreim
·
1 day ago
·
[ - ]

> "It's dispiriting to see that even after being made aware of the breach weeks ago, IA has still not done the due diligence of rotating many of the API keys that were exposed in their gitlab secrets," reads an email from the threat actor.

This is quite embarrassing. One of the first things you do when breached at this level is to rotate your keys. I seriously hope that they make some systemic changes, it seems that there were a variety of different bad security practices.

ghostly_s
·
1 day ago
·
[ - ]

IA is in bad need of a leadership change. The content of the archive is immensely valuable (largely thanks to volunteers) but the decisions and priorities of the org have been far off base for years.

washadjeffmad
·
20 hours ago
·
[ - ]

Hot take, but the intersection of people with sufficient LIS / archival experience to run the place and who can live under constant legal peril without capitulating to adversarial interests is probably, what, a hundred in the world?

I'd say they need support. They didn't abandon or pervert their mission, they relied on people they trusted who weren't equipped to also handle security. If your house were broken into, I wouldn't start a neighborhood petition for you to move out, because you didn't cause it.

They may be in a rut, but short of you or someone else building an IA replacement that settles all of your concerns and commiting to it for twenty five years with no serious compromises, you're probably punching a little above your weight on the topic.

fngjdflmdflg
·
1 day ago
·
[ - ]

Do you have any examples?

wkat4242
·
1 day ago
·
[ - ]

Putting the organisation at risk by playing chicken with large publishing corporations. Trying to stretch fair use a little too far so they had to go to court.

soygem
·
1 day ago
·
[ - ]

[flagged]

fngjdflmdflg
·
1 day ago
·
[ - ]

I don't believe IA itself takes down pages that kiwifarms archives/links to. Rather they get a request to take it down and comply with it (correct me if I'm wrong here). I think IA is actually in a tough spot on this issue because they might be able to be sued eg. for defamation if they don't take down pages with personal info after a request to do so is made. Lastly, I doubt any new leadership would be less harsh on kiwifarms.

dazhengca
·
1 day ago
·
[ - ]

There was no illegal content on kiwi farms. Even then, I’d say taking down a single page by request is understandable. However, they surrendered to the mob and chose to stop archiving the entire site. This was to censor any criticism of the people involved, but as a result, we lost all of the other information on the rest of the site as well. It’s clear this organization cannot handle pressure, and is relying on people treating it kindly.

shkkmo
·
1 day ago
·
[ - ]

They chose to stop serving archives of a site that had started explicitly using tham as a distribution mechanism to get around much a much broader attempt to censor them.

I'm curious what other information on that site you think was valuable to have available to the general public? Nothing has been lost in terms of historical data, it's only the immediate disemmination that has been slowed.

I'm really trying to understand why I should disagree with the IA's choice here. The IA is an archival service, not a distribution platform and it is not their job to help you distribute content that other people find objectionable. Their job is to make and keep an archive of internet content so that we don't lose the historical record. Blocking unrestricted public access to some of that content doesn't harm that mission and can even support it.

tylerchilds
·
1 day ago
·
[ - ]

the funny thing about the internet archive is that anyone else on this planet could do exactly what they are doing, but they consistently choose not to.

kiwifarms could spin up their own infrastructure, serve their own content for the world, but it turns out technology is a social problem more than a technical problem.

anyone that wants to stand up and be the digital backbone of “kiwi farms” can, but only the internet archive gets flack for not volunteering to be the literal kiwi farm.

for example, the pirate bay goes offline all the time, but it turns out the people that use it, care enough to keep it online themselves.

wkat4242
·
1 day ago
·
[ - ]

That's something I completely support. There's a limit and that site crosses it.

superkuh
·
1 day ago
·
[ - ]

It's the least worst option. Remember when that happened with Mozilla? Now they're an ad company. Take the bad (some bad mis-steps re:multiple lending during the pandemic, not rotating keys immediately after a hack) with the good (staying true to the human centric mission and not the money flows).

ranger_danger
·
1 day ago
·
[ - ]

[flagged]

absence5875
·
1 day ago
·
[ - ]

[dead]

echelon
·
1 day ago
·
[ - ]

I support archival of films, books, and music, but those items need to be write-only until copyright expires. The purpose of the Internet Archive is to achieve a wide-reaching, comprehensive archival, not provide easy and free read access to commercial works.

Website caches can be handled differently, but bulk collection of commercial works can't have this same public access treatment. It's crazy to think this wouldn't be a huge liability.

Battling for copyright changes is valiant, but orthogonal. And the IA by trying to do both puts its main charter--archival--at risk.

The IA should let some other entity fight for copyright changes.

I say this as an IA proponent and donor.

withinboredom
·
1 day ago
·
[ - ]

I'd agree with you if you live in a country where you can walk into your local library and read these for "free." For people who live where there may not even be a library, your argument makes no sense except to make the publishers richer. They typically price some of these books at "library prices" so normal people won't be able to afford them, but libraries will.

sieabahlpark
·
1 day ago
·
[ - ]

Copyright is copyright. If you don't like the idea of a publisher owning the rights to content they published doesn't mean you have a right to their content. Let alone worldwide distribution of that content.

What makes you feel entitled to the content of the publisher before the copyright expires? Do you feel that you deserve access to everything because you've deemed the concept of ownership around book publishing immoral?

You can't just take a digital copy of a physical book and give it to everyone worldwide. That isn't your choice or decision to make nor is it ethical to ascribe malice to simply retaining distribution rights to content they own.

"Make publishers richer", it's actually just honoring the concept of ownership...

withinboredom
·
1 day ago
·
[ - ]

I don’t like the idea of infinite ownership, which is the current problem of copyright. The public may never be able to own these ideas and build off of them. Further, just because you own something in one country doesn’t mean you can own it in another country. For a physical example, you can’t own a gun in the US and take it to Australia.

If publishers didn’t engage in tactics like “library pricing” and preventing people from actually purchasing the books, I might feel differently. Right now, I see this archiving stuff as a Robin Hood story (which fwiw, every version of this story you may have seen/heard is probably still copyrighted) and I hope the publishers die or are replaced.

echelon
·
17 hours ago
·
[ - ]

Protesting copyright and enabling pirate-like access to materials is orthogonal to archiving the world's data and recording our history.

Internet Archive should focus on its mission of archival. Let other groups figure out copyright.

By taking on both tasks, IA risks everything and could stumble in its goal to be an archivist platform. We need an entity dedicated to recording history. IA is that. They're just biting off way too much to chew and making powerful enemies along the way.

withinboredom
·
14 hours ago
·
[ - ]

And what, we are just supposed to trust that they're actually archiving these things instead of relaxing on a beach somewhere pretending they are? If they were to only focus on archiving, nobody would know if anything has actually been archived for nearly 100 years; after we are all dead.

By making the archive public, sure, we have a bit of a "piracy" issue. However, we can also verify they are actually archiving the things they say they are, point out mistakes, and ask them to remove things from the archive.

absence5875
·
1 day ago
·
[ - ]

> but bulk collection of commercial works can't have this same public access treatment

And it doesn't.

echelon
·
1 day ago
·
[ - ]

The Internet Archive Lending Library did. And there are music, movie, and video game ROMs found throughout the user uploads.

IA should collect these materials, but they shouldn't be playing fast and loose by letting everyone have access to them. That's essentially providing the same services as the Pirate Bay under the guise of archivism.

This puts IA at extreme legal risk. Their mission is too important to play such games.

giantrobot
·
1 day ago
·
[ - ]

> I support archival of films, books, and music, but those items need to be write-only until copyright expires.

Which means no one alive today would ever be able to see them out of copyright. It also requires an unfounded belief that major copyright owning companies won't extend copyright lengths beyond current lengths which are effectively "forever".

sieabahlpark
·
1 day ago
·
[ - ]

[dead]

galleywest200
·
1 day ago
·
[ - ]

>"It's dispiriting to see that even after being made aware of the breach weeks ago..."

These people are not dispirited whatsoever, if anything they are half-cocked that these script kiddies found an easy target.

chrisrhoden
·
1 day ago
·
[ - ]

The words came from a message written by the people you are calling script kiddies, rather than being editorializing by bleepingcomputer, as you seem to believe.

compootr
·
1 day ago
·
[ - ]

script kiddie or blackhat hacker is irrelevant. IA has shit security practices, and that's a fact regardless of who figures that out

EasyMark
·
1 day ago
·
[ - ]

I highly doubt they are script kiddies. More than likely they are state actors or mercenaries of state actors attempting to bring down the free transmittal of information between regular folks. IA evidently has not so good security and wikipedia must be doing pretty well I guess? I can’t recall the last time one of these attacks worked on Wiki.

luckylion
·
1 day ago
·
[ - ]

Why would they publicly call them out and lay open the way they breached them if they were "attempting to bring down the free transmittal of information between regular folks"?

They could have done much worse but they chose not to and instead made it public. Which state actor does that?

Aachen
·
1 day ago
·
[ - ]

Subtitling: half clocked means not fully prepared

tgsovlerkhgsel
·
1 day ago
·
[ - ]

There are many "first things" you need to do if breached, and good luck identifying and doing them all in a timely fashion if you're a small organization, likely heavily relying on volunteers and without a formal security response team...

absence5875
·
1 day ago
·
[ - ]

[dead]

·
1 day ago
·
[ - ]

badlibrarian
·
1 day ago
·
[ - ]

Restating my love for Internet Archive and my plea to put a grownup in charge of the thing.

Washington Post: The organization has “industry standard” security systems, Kahle said, but he added that, until this year, the group had largely stayed out of the crosshairs of cybercriminals. Kahle said he’d opted not to prioritize additional investments in cybersecurity out of the Internet Archive’s limited budget of around $20 million to $30 million a year.

https://archive.ph/XzmN2

semicolon_storm
·
1 day ago
·
[ - ]

In security, industry standard seems to be about the same as military grade: the cheapest possible option that still checks all the boxes for SOC.

EasyMark
·
1 day ago
·
[ - ]

Military grade has different meanings. I’ve worked in the electronics industry a long time and will say with confidence that the pcbs and chips we sent to the military were our best. Higher temperature ranges, much more thorough environmental testing, many more thermal and humidity cycles, lots more vibration testing. However we also sell them for 5-10x our regular prices but in much lower quantities. It’s a failed meme in many instances as the internet uses it though.

incahoots
·
1 day ago
·
[ - ]

Basically, whatever the liability insurance wants for you to be in compliance, than that’s the standard.

Spivak
·
1 day ago
·
[ - ]

Hot take, this is the way it should be. If you want better security then you update the requirements to get your certification.

Security by its very nature has a problem of knowing when to stop. There's always better security for an ever increasing amount of money and companies don't sign off on budgets of infinity dollars and projects of indefinite length. If you want security at all you have bound the cost and have well-defined stopping points.

And since 5 security experts in a room will have 10 different opinions on what those stopping points should be— what constitutes "good-enough" they only become meaningful when there's industry wide agreement on them.

abadpoli
·
1 day ago
·
[ - ]

There never will be an adequate industry-wide certification. There is no universal “good enough” or “when to stop” for security. What constitutes “good enough” is entirely dependent on what you are protecting and who you are protecting it from, which changes from system to system and changes from day to day.

The budget that it takes to protect against a script kiddy is a tiny fraction of the budget it takes to protect from a professional hacker group, which is a fraction of what it takes to protect from nation state-funded trolls. You can correctly decide that your security is “good enough” one day, but all it takes is a single random news story or internet comment to put a target on your back from someone more powerful, and suddenly that “good enough” isn’t good enough anymore.

The Internet Archive might have been making the correct decision all this time to invest in things that further its mission rather than burning extra money on security, and it seems their security for a long time was “good enough”… until it wasn’t.

db48x
·
1 day ago
·
[ - ]

Yep. And worse, now matter how much you pay for security it is still possible for someone to make a mistake and publish a credential somewhere public.

goodpoint
·
1 day ago
·
[ - ]

> since 5 security experts in a room will have 10 different opinions

If that happens you need to seriously rethink your hiring process.

gjsman-1000
·
1 day ago
·
[ - ]

This ^

We can’t all have the latest EPYC processors with the latest bug fixes using Secure Enclaves and homomorphic encryption for processing user data while using remote attestation of code running within multiple layers of virtualization. With, of course, that code also being written in Rust, running on a certified microkernel, and only updatable when at least 4 of 6 programmers, 1 from each continent, unite their signing keys stored on HSMs to sign the next release. All of that code is open source, by the way, and has a ratio of 10 auditors per programmer with 100% code coverage and 0 external dependencies.

Then watch as a kid fakes a subpoena using a hacked police account and your lawyers, who receive dozens every day, fall for it.

gjsman-1000
·
1 day ago
·
[ - ]

[flagged]

evilduck
·
1 day ago
·
[ - ]

No, it’s your demeanor that is unbecoming and not worth engaging with. Villianizing your poor behavior not successfully baiting people into replying as you want is childish too. Take a breather.

codezero
·
1 day ago
·
[ - ]

[dead]

mmooss
·
1 day ago
·
[ - ]

A non-grownup analysis is to criticize a decision in hindsight. If Internet Archive shifted funds to security, it would mean cutting something from its mission. Given their history, it makes sense IMHO to spend on the mission and take the risk. As long as they have backups, a little downtime won't hurt them - it's not a bank or a hospital.

badlibrarian
·
20 hours ago
·
[ - ]

Downtime aside, best practices for running a library generally include not leaking usernames, email addresses, and eight years of front desk correspondence.

They sell paid services to universities and governments, so downtime isn't a great look either.

> it's not a bank

They tried that too. Didn't go well.

https://ncua.gov/newsroom/press-release/2016/internet-archiv...

mmooss
·
13 hours ago
·
[ - ]

> best practices for running a library generally include not leaking usernames, email addresses, and eight years of front desk correspondence

That's incorrect IMHO: You are describing outcomes; practices are about procedures. In particular, necessary to the understanding and use of best practices is that do not guarantee outcomes.

Any serious management balances risks, which includes the inevitability, though unpredictable, of negative outcomes. It's impossible to prevent them - not NASA, airlines, surgeons, etc, can prevent them all, and they accept that.

It's a waste of resources to spend more preventing them than you lose overall. Best practices do not provide perfect outcomes; they provide the most reduced trade-offs in risk and cost.

·
1 day ago
·
[ - ]

pessimizer
·
1 day ago
·
[ - ]

The Internet Archive has a management problem. They seem to be more comfortable disrupting libraries than managing an online, publicly accessible database of disputed, disorganized material.

Despite all of the positive self-talk, I don't know if they realize how important they are, or how easy it would be for them to find good help and advice if their management were transparent and everything was debated in public. That may have protected it to some extent; as a counterexample, Wikipedia has been extremely fragile due to its transparency and accessibility to everyone. With IA being driven by its creator's ideology, maybe that ideology should be formalized and set in stone as bylaws, and the torch passed to people openly debating how IA should be run, its operations, and what it should be taking on.

I don't mean they should be run by the random set of Confucian-style libertarian aphorisms that is running the credibility of Wikipedia into the ground, but Debian is a good model to follow. Or maybe do better than both?

mrweasel
·
1 day ago
·
[ - ]

> Debian is a good model to follow.

While I have no idea how Debian is actually funded I'd agree. One issue might be that The Internet Archive actually need to have people on staff, not sure if Debian has that requirement. You're not going to get people to man scanner or VHS players 8 hours a day without pay, at least not at this scale.

The Internet Archive needs a better funding strategy that asking for money on their own site. People aren't visiting them frequently enough for that to work. They need a fundraising team, and a good one.

Finding managers are probably even worse. They can't get a normal CEO type person, because they aren't a company and the type of people who apply to or are attracted to running non-profit, server the community, don't be evil organisation are frequently bat-shit crazy.

chambers
·
13 hours ago
·
[ - ]

I believe the monastic model would fit best. Tireless, thankless work for the greater good. Maintaining old records for decades and centuries for the sake of beyond ourselves. I imagine people who join up would eschew both profit maximizing (for profit) and moral adventurism (non profit/Internet Archive). a real vocation, not a career.

Sadly, SQlite is the only software organization I know of that has this spirit.

badlibrarian
·
1 day ago
·
[ - ]

Don't forget the time Brewster tried to run a bank -- Internet Archive Federal Credit Union. Or that the physical archives are stored on an active fault line and unlikely to receive prompt support during an emergency. Or that, when someone told him that archives are often stored in salt mines he replied, "cool, where can I buy one?"

·
1 day ago
·
[ - ]

avazhi
·
1 day ago
·
[ - ]

https://www.wired.com/story/internet-archive-memory-wayback-...

I appreciate their ethos and I've used the site many times (and donated!), but clearly it's at the point where Kahle et al just aren't equipped either personally (as a matter of technical expertise) or collectively (they are just a handful of people) to be dealing with what are probably in many cases nation-state attacks. Kahle's attitude towards (and misunderstanding of) copyright law is IMO proof that he shouldn't be running things, because his legal gambles (gambles that a first year law student could have predicted would fail spectacularly) have put IA at long term risk (see: Napster). And this information coming out over the past few weeks about their technical incompetence is arguably worse, because the tech side of things are what he and his team are actually supposed to be good at.

It's true that Google and Microsoft and others should be propping up the IA financially but that isn't going to solve the IA's lack of technical expertise or its delusional hippie ethos.

kmeisthax
·
1 day ago
·
[ - ]

> Confucian-style libertarian aphorisms that is running the credibility of Wikipedia

Can you elaborate? I'm aware of Wikipedia having very particular rules and lots of very territorial editors, but I'm not sure how this runs their credibility into the ground aside from pissing off the far right when they come in with an agenda to push.

bn-l
·
1 day ago
·
[ - ]

With everything that’s going on, it’s highly suspicious that this is happening right after they upset some very rich rent seekers.

amai
·
18 hours ago
·
[ - ]

> they upset some very rich rent seekers

Who are "they"? And who are the "very rich rent seekers"?

·
1 day ago
·
[ - ]

·
16 hours ago
·
[ - ]

karlgkk
·
1 day ago
·
[ - ]

> With everything that’s going on, it’s highly suspicious that this is happening right after they upset some very rich rent seekers.

Absolutely moronic and unbased implication. The “rent-seekers” won their case and have zero interest in being implicated in dumb palace-intrigue style hacking. I mean, fuck those guys, but to bring up allegations like that is big stupid.

knighthack
·
1 day ago
·
[ - ]

> Absolutely moronic and unbased implication. The "rent-seekers" won their case and have zero interest in being implicated in dumb palace-intrigue style hacking. I mean, fuck those guys, but to bring up allegations like that is big stupid.

That makes no sense.

The fact that they won their case gives even greater cause in ensuring that what they want goes through. Doesn't mean they have to be classy about it, or that Internet-based means of sabotage are impossible implications (given that the IA literally is about putting things up on the Internet that some want to be taken down).

lolinder
·
18 hours ago
·
[ - ]

> The fact that they won their case gives even greater cause in ensuring that what they want goes through.

Which is why they will continue their attack through the court system until they get everything they want, up to and including shutting down the archive for good. There's zero reason for them to risk being implicated in a crime when their opponent is already down for the count.

ianeigorndua
·
1 day ago
·
[ - ]

You don’t think you’re being a bit harsh here?

meowface
·
1 day ago
·
[ - ]

Conspiracy theorists exhaust many.

alexey-salmin
·
1 day ago
·
[ - ]

A genuine question to commenters asking to "put a grownup in charge of the thing" and saying that "Kahle shouldn't be running things": he built the thing, why exactly he can't run it the way he sees fit?

et-al
·
1 day ago
·
[ - ]

He is. But at the cost of the greater good.

Most of us care mainly about the Wayback Machine and archiving webpages; not borrowing books still under copyright and fighting publishers.

TZubiri
·
1 day ago
·
[ - ]

Speak for yourself, the internet archive successfully increased its scope and made creative contributions to case law (although it lost at the appeals court)

badlibrarian
·
21 hours ago
·
[ - ]

Internet Archive certainly made creative arguments, all of which were soundly rejected under Summary Judgment. This had the opposite effect on the future we both want.

Under discovery in the case, it turned out that Internet Archive didn't keep accurate records of what they loaned out either. Another example of sloppy engineering that directly impacts their core mission.

The fate of the organization now rests on the outcome of other lawsuits. In one, Internet Archive argues that they are allowed to digitize and publish Frank Sinatra records because the pops and crackles on them makes it Fair Use.

If they did all this cleanly under a different LLC, I'd sit back and enjoy the show. But they didn't.

·
16 hours ago
·
[ - ]

carapace
·
1 day ago
·
[ - ]

> the greater good

(Hot Fuzz reference. https://www.youtube.com/watch?v=oQzrR6nOkYg )

pvg
·
1 day ago
·
[ - ]

A good place to direct that question might be in a reply to the person who made that comment.

Suzuran
·
18 hours ago
·
[ - ]

He can run it the way he sees, but the way he sees is intentionally self-defeating; Instead of prioritizing keeping the archive as viable and complete as possible, he favors an all-or-nothing approach where either the Archive has accumulated and made obsolete all other knowledge bases in the world, or the Archive has been destroyed utterly by external threats. He would rather be burned to death as the last curator of the library at Alexandria than try to save anything.

·
1 day ago
·
[ - ]

MarcoZavala
·
1 day ago
·
[ - ]

[dead]

throwaway984393
·
1 day ago
·
[ - ]

[dead]

black_13
·
1 day ago
·
[ - ]

[dead]

Ajedi32
·
17 hours ago
·
[ - ]

[flagged]

·
16 hours ago
·
[ - ]