They said it didn’t matter, because the sheer volume of new data flowing in growing so fast made the old data just a drop in the bucket
One day, somebody is going to be tasked with deciding what gets deleted. It won't be pretty. Old and unloved video will fade into JPEG noise as the compression ratio gets progressively cranked, until all that remains is a textual prompt designed to feed an AI model that can regenerate a facsimile of the original.
I expect rather than deleting stuff, they'll just crank up the compression on storage of videos that are deemed "low value."
Searching hn.algolia.com for examples will yield numerous ones.
https://news.ycombinator.com/item?id=23758547
https://bsky.app/profile/sinevibes.bsky.social/post/3lhazuyn...
The energy bill for scanning through the terabytes of metadata would be comparable to that of several months of AI training, not to mention the time it would take. Then deleting a few million random 360p videos and putting MrBeast in their place would result in insane fragmentation of the new files.
It might really just be cheaper to keep buying new HDDs.
They allow search by timestamp, I’m sure YouTube can write algo to find zero <=1 view
The youtube shorts thing is buggy as shit, it'll just stop working a lot of the time, just won't load a video. Some times you have to go back and forth a few times to get it to load. It'll often desync the comments from the video, so you're seeing comments from a different video. Some times the sound from one short plays over the visuals of another.
It only checks for notifications when you open the website from a new tab, so if you want to see if you have any notifications you have to open youtube in a new tab. Refreshing doesn't work.
Seems like all the competent developers have left.
OTOH I'm 100.0% sure that google has a plan, been expecting this for years and in particular, has prior experience from free Gmail accounts being used for storage.
Hmmm, isn't the "free-ness" of YouTube because there were determined to outspend and outlast any potential competitors (ie supported by the Search business), in order to create a monopoly for then extracting $$$ from?
I'm kind of expecting the extracting part is only getting started. :(
Honestly, if you aren't taking full advantage within the constraints of the law of workarounds like this, you're basically losing money. Like not spending your entire per diem budget when on a business trip.
Which do you think has more value to me? (a) I save some money by exploiting the storage loophole (b) The existence of a cultural repository of cat videos, animated mathematics explainers, long video essays continue to be available to (some parts of) humanity (for the near future).
Anyway in this situation it's less that YouTube is providing us a service and more, it's captured a treasure trove of our cultural output and sold it back to us. Siphoning back as much value as we can is ethical. If YouTube goes away, we'll replace it - PeerTube or other federated options are viable. The loss of the corpus of videos would be sad but not catastrophic - some of it is backed up. I have ~5Tb of YouTube backed up, most of it smaller channels.
I agree generally with you that the word "value" is overencompassing to the point of absurdity though. Instrumental value is equated with moral worth, personal attachment, and distribution of scarcity. Too many concepts for one word.
Exactly which countries could they buy?
Let me guess: you haven’t actually asked gemini
None of us, in the original discussion threads, knew of it being done before then IIRC.
> Encoding: Files are chunked, encoded with fountain codes, and embedded into video frames
Wouldn't YouTube just compress/re-encode your video and ruin your data (assuming you want bit-by-bit accurate recovery)?
If you have some redundancy to counter this, wouldn't it be super inefficient?
(Admittedly, I've never heard of "fountain codes", which is probably crucial to understanding how it works.)
It only support 32k parts in total (or in reality that means in practice 16k parts of source and 16k parts of parity).
Lets take 100GB of data (relatively large, but within realm of reason of what someone might want to protect), that means each part will be ~6MB in size. But you're thinking you also created 100GB of parity data (6MB*16384 parity parts) so you're well protected. You're wrong.
Now lets say one has 20000 random bit error over that 100GB. Not a lot of errors, but guess what, par will not be able to protect you (assuming those 20000 errors are spread over > 16384 blocks it precalculated in the source). so at the simplest level , 20KB of errors can be unrecoverable.
par2 was created for usenet when a) the size of binaries being posted wasn't so large b) the size of article parts being posted wasn't so large c) the error model they were trying to protect was whole articles not coming through or equivalently having errors. In the olden days of usenet binary posting you would see many "part repost requests", that basically disappeared with par (then quickly par2) introduction. It fails badly with many other error models.
we can't have nice things