FFmpeg has issued a DMCA takedown on GitHub

524
181
merlindru
1 day ago
twitter.com

merlindru
·
1 day ago
·
[ - ]

The repo in question incorporated FFmpeg code while claiming their code is Apache 2.0-licensed over 1.5 years ago[1]

This is not allowed under the LGPL, which mandates dynamic linking against the library. They copy-pasted FFmpeg code into their repo instead.

[1] https://x.com/HermanChen1982/status/1761230920563233137

LeoWattenberg
·
1 day ago
·
[ - ]

Copy pasting code is allowed under LGPL, but doing so while removing license headers and attribution of code snippets would not be.

GuB-42
·
10 hours ago
·
[ - ]

Only if the code you copy pasted the LGPL part into is licenced under a compatible license, and Apache is not.

The simplest way to comply while keeping your incompatible license is to isolate the LGPL part into a dynamic library, there are other ways, but it is by far the most common.

chii
·
18 hours ago
·
[ - ]

copy/pasting, or using some other mechanism to do digital duplication is irrelevant - the removal of the existing license and essentially _re-license_ without authority is the problem, no matter what the mechanism of including the code is.

merlindru
·
10 hours ago
·
[ - ]

this is accurate and how i should have phrased it. i should not have mentioned dynamic linking; you're right it's not relevant

thank you!

compsciphd
·
4 hours ago
·
[ - ]

this is fundamentally false. The LGPL was created because of static linking where GPLd code would be in the distributed binary.

It's an open question (the FSF has their opinion, but it has never been adjudicated) if the GPL impacts dynamic linking.

One could argue that if one ships GPL dynamic libraries together with one's non GPLd code, one is shipping an aggregate and hence the GPL infects the whole, but its more complicated to say if one ships a non GPLd binary that runs on Debian, Redhat et al and uses GPLd libraries that they ship.

ajross
·
1 day ago
·
[ - ]

That's not it. The LGPL doesn't require dynamic linking, just that any distributed artifacts be able to be used with derived versions of the LGPL code. Distributing buildable source under Apache 2.0 would surely qualify too.

The problem here isn't a technical violation of the LGPL, it's that Rockchip doesn't own the copyright to FFMPEG and simply doesn't have the legal authority to release it under any license other than the LGPL. What they should have done is put their modified FFMPEG code into a forked project, clearly label it with an LGPL LICENSE file, and link against that.

FpUser
·
1 day ago
·
[ - ]

How does

"Distributing buildable source under Apache 2.0 would surely qualify too"

reconcile with

"doesn't own the copyright to FFMPEG and simply doesn't have the legal authority to release it under any license other than the LGPL"

dtech
·
1 day ago
·
[ - ]

You can distribute your own code under Apache along with FFMpeg under LGPL in one download

·
1 day ago
·
[ - ]

8note
·
1 day ago
·
[ - ]

if they licenced their own code under apache 2.0 as buildable with the lgpl ffmeg code, without relicensing ffmeg as apache itself

rvnx
·
1 day ago
·
[ - ]

Could there have been other / better moves with sending a reminder.

I think the devs of that Chinese company seemed to immediately acknowledge the attribution.

Now the OSS community loses the OSS code of IloveRockchip, and FFmpeg wins practically nothing, except recognition on a single file (that devs from Rockchip actually publicly acknowledged, though in a clumsy way) but loses in reputation and loses a commercial fork (and potential partner).

Blackthorn
·
1 day ago
·
[ - ]

How do you partner with someone who has so much contempt for you they ignore the license you've given them and, when called on it, simply ignore you?

PunchyHamster
·
1 day ago
·
[ - ]

They had ample warning and ignored the license. what you're even on about?

rvnx
·
1 day ago
·
[ - ]

[flagged]

akerl_
·
1 day ago
·
[ - ]

The amount of armchair quarterbacking here is wild.

rvnx
·
1 day ago
·
[ - ]

Then waiting to see how they addressed these points and what were the approaches taken and why ?

Here spent time to think and document all the IRC chats, the Twitter thread, the attitude of the SoC manufacturer, etc.

There has to be a backstory to suddenly come after 1.5 years for an issue that could have been solved in 10 minutes.

kelnos
·
1 day ago
·
[ - ]

Then why didn't Rockchip solve it in 10 minutes?

rvnx
·
1 day ago
·
[ - ]

Bad decision and risk/reward calculation for sure. If it's code that is core to your stuff, and it is GPL'd, it's (technically) very tricky to solve.

But here, as FFmpeg is LGPL and we talk about one single file, there is even less work to do in order to fix that.

justinclift
·
14 hours ago
·
[ - ]

Yeah, Rockchip seems to have screwed up badly but as per the GitHub DCMA notice:

https://github.com/github/dmca/blob/master/2025/12/2025-12-1...

> ... the offending repository maintainers were informed of the problem almost 2 years ago ([private]), and did nothing to resolve it. Worse, their last comment ([private]) suggests they do not intend to resolve it at all.

Seems like the reporter gave them a lot of time to fix the problem, then when it because obvious (to them) that it was never going to be fixed they took an appropriate next step.

kelnos
·
1 day ago
·
[ - ]

That's bullshit. The FFmpeg devs were well within their rights to even send a DMCA takedown notice, immediately, without asking nicely first.

This is what big corporations do to the little guys, so we owe big corporations absolutely nothing more.

They gave Rockchip a year and a half to fix it. It is the responsibility of Rockchip to take care of it once they were originally notified, and the FFmpeg dvelopers have no responsibility to babysit the Rockchip folks while they fulfill their legal obligations.

·
1 day ago
·
[ - ]

Fnoord
·
1 day ago
·
[ - ]

Yeah. This is like waiting 90 days before releasing a full disclosure on a vulnerability, and then complaining you could have contacted us and given us time, we only had 90 days now. Gaslighting 101. Those 90 days gives all those with a lot if resources and sitting on zero days (such as Cellebrite) time to play for free.

Blackthorn
·
1 day ago
·
[ - ]

Deadline and reminders? They aren't teachers and Rockchip isn't a student, they are the victims here and Rockchip is the one at fault. Let's stop literally victim blaming them for how they responded.

rvnx
·
1 day ago
·
[ - ]

To be clear: Rockchip is at fault, 100%. I would sue (and obv DMCA) any company who takes my code and refuses to attribute it.

If you immediately escalate to [DMCA / court] because they refuse to fix, then that's very fair, but suddenly like 2 years after silence (if, and only if that was the case, because maybe they spoke outside of Twitter/X), then it's odd.

akerl_
·
1 day ago
·
[ - ]

Maybe spend less time policing how other people are allowed to act, especially when you’re speculating wildly about the presence or content of communications

rvnx
·
1 day ago
·
[ - ]

It's a call to push the devs to freely say what happened in the background, there are many hints at that "I wonder if...?" "What could have happened that it escalated?" "Why there were no public reminders, what happened in the back", etc, etc, nothing much, these questions are deliberately open.

akerl_
·
1 day ago
·
[ - ]

Oh. Being rude and suggesting the devs made (in your opinion) a mistake based on your guess at their actions is not going to be an effective way to get them to elaborate on their legal strategy.

Also it’s rude, which is reason enough not to do it.

michaelmrose
·
1 day ago
·
[ - ]

In the adult world you don't get any warnings when you break the law.

superb_dev
·
1 day ago
·
[ - ]

We are not going to loose anything. If it’s got a strong enough community then someone will publish a fork with the problem fixed

windexh8er
·
1 day ago
·
[ - ]

Your original comment had this at the end...

> - Rockchip's code is gone > - FFmpeg gets nothing back > - Community loses whatever improvements existed > - Rockchip becomes an adversary, not a partner

This is all conjecture which is probably why you deleted it.

Their code isn't gone (unless they're managing their code in all the wrong ways), FFmpeg sends a message to a for-profit violation of their code, the community gets to see the ignorance Rockchip puts into the open source partnership landscape and finally... If Rockchip becomes an adversary of one of the most popular and notable OSS that they take advantage of, again, for profit then fuck Rockchip. They're not anything here other than a violator of a license and they've had plenty of warning and time to fix.

ksec
·
21 hours ago
·
[ - ]

The OP deleted that sentence and I don't think it should have be flagged and unseen by others so I have vouched for it. I understand a lot of people disagree with it, and may downvote it but that is different to flagging. ( I have upvoted in just in case )

He offer perspective from a Chinese POV, so I think it is worth people reading it. ( Not that I agree with it in any shape or form )

rvnx
·
14 hours ago
·
[ - ]

The sentence is actually just in the comment below: https://news.ycombinator.com/item?id=46396107

You are right, and the FFmpeg devs are also 100% right and I perfectly understand that.

In fact I like the idea to push the big corps and strongly enforce devs' rights.

I think earlier enforcement would have been beneficial here, just that dropping a bomb after 1 year of silence and no reminder (and we still don't know if that was the case), is a bit unpredictable, so I wanted to raise that question

ygombinador
·
12 hours ago
·
[ - ]

There hasn't been a year of silence. Multiple people from the community have continued bugging Rockchip to address the matter in a public issue on the now-gone Github repo. The idea of a potential DMCA claim was also brought.

All they could say was "we are too busy with the other 1000s chips we have, we will delay this indefinitely".

Ridiculous.

michaelmrose
·
1 day ago
·
[ - ]

If you have to hound them to stop breaking the law they were already an adversary and the easiest way to comply would be to simply follow the license in which case everyone wins

kevin_thibedeau
·
1 day ago
·
[ - ]

"In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License."

They should be covered as an aggregation, provided the LGPL was intact.

ajross
·
23 hours ago
·
[ - ]

The contention is that the ffmpeg code was "cut and pasted" without attribution and without preserving the license (e.g. the LGPLv2 LICENSE file). Obviously I can't check this because I don't have a clone and the repository is now blocked behind the DMCA enforcement. But at least Github/Microsoft seem to agree that there was a violation.

nhinck3
·
22 hours ago
·
[ - ]

Microsoft/Github have no say in enforcement of a DMCA claim.

ajross
·
22 hours ago
·
[ - ]

Uh... the repo has literally been taken down by GitHub: https://github.com/rockchip-linux/mpp

Not sure what you're trying to say here. DMCA takedown enforcement is 100% the responsibility of the Online Service Providers per statute. It's the mechanism by which they receive safe harbor from liability for hosting infringing content.

nhinck3
·
22 hours ago
·
[ - ]

Yes, but Microsoft/Github do not make any determination about the validity of the claim.

Once a valid (from a process perspective) claim is submitted, the provider is required to take the claimed content down for 10 days. From there the counter claim and court processes can go back and forth.

ranger_danger
·
1 day ago
·
[ - ]

LGPL does not mandate dynamic linking, it mandates the ability to re-assemble a new version of the library. This might mean distributing object files in the case of a statically-linked application, but it is still allowed by the license.

merlindru
·
10 hours ago
·
[ - ]

this is accurate - thank you for the correction

a_void_sky
·
1 day ago
·
[ - ]

they waited for more than 1.5 years and they did not forgot

mystraline
·
1 day ago
·
[ - ]

They were given 1.5 YEARS of lead time. And FLOSS should treat commercial entities the same way they treat us.

Seriously, if we copied in violation their code, how many hours would pass before a DMCA violation?

FLOSS should be dictatorial in application of the license. After all, its basically free to use and remix as long as you follow the easy rules. I'm also on the same boat that Android phone creators should also be providing source fully, and should be confiscated on import for failure of copyright violations.

But ive seen FLOSS devs be like "let's be nice". Tit for tat is the best game theory so far. Time to use it.

helterskelter
·
1 day ago
·
[ - ]

My understanding is that the GPL doesn't have fucktons of precedent behind it in court. You bet the house on a big case and lose, the precedent will stick with GPL and may even weaken all copyleft licenses.

Also, it's better to gently apply pressure and set a track record of violators taking corrective measures so when you end up in court one day you've got a list of people and corporate entities which do comply because they believed that the terms were clear enough, which would lend weight to your argument.

Saying this as a GPL hardliner myself.

Conan_Kudo
·
13 hours ago
·
[ - ]

It definitely does have precedent in multiple jurisdictions. Heck, SFC just won against Vizio enforcing the GPL's terms in the US, and there have been previous wins in France and Germany.

LeoWattenberg
·
13 hours ago
·
[ - ]

Most licenses, EULAs, contracts and so on don't have much precedent in court. There's no reason to believe that GPL would fold once subjected to sufficiently crafty lawyers.

fl0id
·
16 hours ago
·
[ - ]

AFAIK it has enough precedent (also depending a bit on jurisdiction, but you only need one) but the interpretations of what that/the license should cover differ. Like f e if you wanted to argue driver devs would have to open-source their firmware blobs or their proprietary driver loaded by a kernel shim you will have a tough time and prob lose

dzhiurgis
·
1 day ago
·
[ - ]

What happens when you want to mix two libraries with different licences?

koolba
·
1 day ago
·
[ - ]

If you own one of them, mix in LGPL code, and publish it, the result is entirely LGPL.

If you don’t own it and cannot legally relicense part as LGPL, you’re not allowed to publish it.

Just because you can merge someone else’s code does not mean you’re legally allowed to do so.

eqvinox
·
1 day ago
·
[ - ]

This is not correct; you're simply required to follow all applicable licenses at the same time. This may or may not be possible, but is in practice quite commonly done.

Nevermark
·
18 hours ago
·
[ - ]

> Just because you can merge someone else’s code does not mean you’re legally allowed to do so.

> This may or may not be possible

I am not sure what you are saying, that is different from the comment you replied to.

abigail95
·
15 hours ago
·
[ - ]

Completely depends on how much you've "mixed in", and facts specific to that individual work.

Fair use doesn't get thrown out the window because GPL authors have a certain worldview.

Second, there are a lot of non-copyrightable components to source code - if you can't copyright it - you certainly can't GPL it. These can be copied freely by anyone at any time.

·
1 day ago
·
[ - ]

kelnos
·
1 day ago
·
[ - ]

You determine if the licenses are compatible first. If they are, you're fine, as long as you fulfill the terms of both licenses.

If they aren't compatible, then you can't use them together, so you have to find something else, or build the functionality yourself.

Hendrikto
·
1 day ago
·
[ - ]

Some licenses, like LGPL, have provisions for this, some just forbid it.

In the specific ffmpeg case, you are allowed to dynamically link against it from a project with an incompatible license.

wmf
·
1 day ago
·
[ - ]

You should keep them in different directories and have the appropriate license for each directory. You can have a top-level LICENSE file explaining the situation.

LeFantome
·
1 day ago
·
[ - ]

This depends on the licenses.

Copyleft licenses are designed to prevent you mixing code as the licenses are generally incompatible with mixing.

More permissive license will generally allow you to mix licenses. This is why you can ship permissive code in a proprietary code base.

As for linking, “weak copyleft” license allow you to link but not to “mix” code. This is essentially the point of the LGPL.

patmorgan23
·
1 day ago
·
[ - ]

You dynamicly link against it

doctorpangloss
·
20 hours ago
·
[ - ]

I like FFmpeg, I hate doing the whole whataboutism thing, especially because FFmpeg is plainly in the right here, but... listen, FFmpeg as a product is a bunch of license violations too. Something something patents, something something, "doctrine of unclean hands." I worry that HN downvotes people who are trying to address the bigger picture here, because the net result of a lack of nuance always winds up being, "Okay, the real winners are Google and Apple."

saghm
·
18 hours ago
·
[ - ]

With the exception of the Apache license, most major licenses don't cover patents. I have no idea about proprietary licenses if that's what you're talking about here, but it's a bit unclear, so it might help to go into more details than "something something" if you're intending to make a compelling case.

Conan_Kudo
·
13 hours ago
·
[ - ]

The GNU v3 licenses all cover software patents too, and parts of FFmpeg are under those licenses too (though I guess not the code copied and subject to this takedown, which is LGPLv2.1+).

webstrand
·
19 hours ago
·
[ - ]

What licenses are they violating?

reedciccio
·
13 hours ago
·
[ - ]

They don't violate licenses otherwise they would have been blown out of existence a long time ago, drown lawsuit after lawsuit.

doctorpangloss
·
9 hours ago
·
[ - ]

The ones that Microsoft Apple and Google pay for, the codec licenses. FFmpeg believes that only end users need licenses for codecs, which is not only their belief, but it’s not a belief of Microsoft, Apple and Google, and it is true the sense of the status quo, but also, LGPL violations are a status quo. So you can see how it’s a bad idea for FFmpeg to make a stink about licenses.

kortilla
·
20 hours ago
·
[ - ]

Independent implementations of an algorithm are a different dimension from software licensing. So the “unclean hands” argument doesn’t hold water here.

Patents != copyright

7bit
·
14 hours ago
·
[ - ]

> I hate doing the whole whataboutism thing [...], but...

... yet, you did it anyway, without going into detail or providing any clue where these violations (as you claim) are.

If there's any substance to what you say, provide some details and proof, so it can be a constructive discussion, rather than just noise.

amszmidt
·
1 day ago
·
[ - ]

Incorporating compatible code, under different license is perfectly OK and each work can have different license, while the whole combined work is under the terms of another.

I'm honestly quite confused what FFmpeg is objecting to here, if ILoveRockchip wrote code, under a compatible license (which Apache 2.0 is wrt. LGPLv2+ which FFmpeg is licensed under) -- then that seems perfectly fine.

The repository in question is of course gone. Is it that ILoveRockchip claims that they wrote code that was written FFmpeg? That is bad, and unrelated to any license terms, or license compatibility ... just outright plagiarism.

papercrane
·
1 day ago
·
[ - ]

The DMCA notice is available here: https://github.com/github/dmca/blob/master/2025/12/2025-12-1...

The notice has a list of files and says that they were copied from ffmpeg, removed the original copyright notice, added their own and licensed under the more permissive Apache license.

amszmidt
·
1 day ago
·
[ - ]

Thanks for the link; sadly none of the links to the repo can be viewed to see what exactly occurred.

To those downvoting, curious why? Many of the links are not viewable, since GitHub hides them, so any discussion becomes quite tricky.

progval
·
1 day ago
·
[ - ]

You can find an archive of the links' targets at https://archive.softwareheritage.org/swh:1:dir:5861f19187336...

justinclift
·
14 hours ago
·
[ - ]

Interestingly, the repo has a LICENSES folder that contains the text of the licenses used in the repo:

https://archive.softwareheritage.org/browse/directory/ed4b20...

And yep, they only included the most openly permissive ones there (APL2 and MIT), completely skipping everything else. Ugh.

nottorp
·
13 hours ago
·
[ - ]

Maybe because if the ffmpeg people say they have a reason and they've waited 1 year and a half for compliance, we trust them more than whoever relicensed their code without permission.

kstrauser
·
1 day ago
·
[ - ]

I didn't downvote. I suspect people did because it sounded like you were defending ILoveRockchip's actions, based on either 1) not understanding what they did, and/or 2) not having access to the facts. People get snippy about abusing Free Software.

tkel
·
21 hours ago
·
[ - ]

Here is the github thread which had some recent discussion before the repo was taken down. Rockchip said something like it would be too much work to fix the licenses for all the new chips they are creating.

https://web.archive.org/web/20251103193914/https://github.co...

rendaw
·
20 hours ago
·
[ - ]

The archive isn't very good for the thread. Is that what they said though? From the comments I see they said "we're working on it, but we're busy with XYZ" and then months passed.

Archive won't load the remaining 3 items for me.

tkel
·
16 hours ago
·
[ - ]

Yes it's not loading on the archive but I was following the thread closely, it was said recently within the last few months which prompted further discussion and this action by ffmpeg.

mlrtime
·
13 hours ago
·
[ - ]

100% my assumptions but:

They were never really working on it, they thought it would just go away. Maybe some junior/senior dev took a shot at it, gave a project plan and it went nowhere.

bfrog
·
1 day ago
·
[ - ]

I wonder how this will work with AI stuff generating code without any source or attribution. It’s not like the LLMs make this stuff up out of thin air it comes from source material.

observationist
·
1 day ago
·
[ - ]

Best case scenario is it nukes the whole concept of software patents and the whole ugly industry of copyright hoarding. The idea that perpetual rent-seeking is a natural extension and intended outcome of the legal concepts of copyrights and patents is bizarre.

LeFantome
·
1 day ago
·
[ - ]

I cannot imagine it somehow impacts parents.

The “perpetual” part is the issue but “rent seeking” is the entire reason that copyright and patents exist to begin with.

zmgsabst
·
22 hours ago
·
[ - ]

No — providing funding to promote creation and discovery is why those exist; granting a temporary monopoly is the mechanism meant to accomplish that goal.

This sounds pedantic, but it’s important to not mistake the means for an end:

> To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries;

https://constitution.congress.gov/browse/article-1/section-8...

conartist6
·
11 hours ago
·
[ - ]

I wish I could get defensive protection against the millions and millions of bad software parents out there. But there is no such thing.

Did you know that Facebook owns the patent on autocompletes? Yahoo owned it and Facebook bought it from them as kind of a privately owned nuclear weapon to create a doctrine of mutually assured destruction with other companies who own nuclear-weapons-grade patents.

Of course the penalty for violating a patent is much worse if you know you are doing it, so companies are very much not eager to have the additional liability that comes with their employees being aware that every autocomplete is a violation of patent law.

·
18 hours ago
·
[ - ]

benced
·
20 hours ago
·
[ - ]

I don't think anyone really disputes what should be done when an LLM violates copyright in a way that would be a violation if a human did it.

Questions about LLMs are primarily about whether it's legal for them to do something that would be legal for a human to do and secondarily about the technical feasibility of policing them at all.

userbinator
·
1 day ago
·
[ - ]

Everything is a derivative work.

EagnaIonat
·
19 hours ago
·
[ - ]

> I wonder how this will work with AI stuff generating code without any source or attribution.

It's already fixed. Anything you make with AI cannot be protected in any way (UK gives some leeway on certain types of creations).

So if it mimics code from ffmpeg for example, then ffmpeg wins.

ranger_danger
·
1 day ago
·
[ - ]

Everything humans make up also comes from source material.

The real (legal) question in either case, is how much is actually copied, and how obvious is it.

beardbound
·
1 day ago
·
[ - ]

I mostly agree with you, but if a human straight up copies work under copyright they’re violating the law. Seems like a LLM should be held to the same standard unless they should be even less beholden to the law than people.

It’s also incredibly hard to tell if a LLM copied something since you can’t ask it in court and it probably can’t even tell you if it did.

ranger_danger
·
1 day ago
·
[ - ]

From what I have seen, the (US) courts seem to make a distinction between 100% machine-automated output with no manual prompting at all, versus a human giving it specific instructions on what to generate. (And yes I realize everything a computer does requires prior instruction of some kind.)

But the issue with copyright I think comes from the distribution of a (potentially derivative or transformative in the legal sense) work, which I would say is typically done manually by a human to some extent, so I think they would be on the hook for any potential violations in that case, possibly even if they cannot actually produce sources themselves since it was LLM-generated.

But the legal test always seems to come back to what I said before, simply "how much was copied, and how obvious is it?" which is going to be up to the subjective interpretation of each judge of every case.

viccis
·
20 hours ago
·
[ - ]

Not true. That's philosophical skepticism and was roundly refuted long ago.

alienbaby
·
1 day ago
·
[ - ]

Llm's do not verbatim disgorge chunks of the code they were trained on.

perryprog
·
1 day ago
·
[ - ]

I think it's probably less frequent nowadays, but it very much does happen. This still-active lawsuit[0] was made in response to LLMs generating verbatim chunks of code that they were trained on.[1]

[0] https://githubcopilotlitigation.com [1] https://www.theverge.com/2022/11/8/23446821/microsoft-openai...

AshamedCaptain
·
1 day ago
·
[ - ]

You can still very trivially get entire chunks of code from Copilot including even literal author names (simply by prodding with a doxygen tag).

neilv
·
1 day ago
·
[ - ]

They do, and, early on, Microsoft (and perhaps others) put in some checks to try to hide that.

idle_zealot
·
1 day ago
·
[ - ]

Surely they do sometimes?

kelseyfrog
·
1 day ago
·
[ - ]

A 26-sided die reproduces chuncks of source code. What's the dividing line?

AshamedCaptain
·
1 day ago
·
[ - ]

This is a multi-terabyte sized dice that is not at all random AND has most definitely copied the source code in question to begin with.

kelseyfrog
·
1 day ago
·
[ - ]

The die is certainly not multi-terabyte. A more realistic number would be 32k-sided to 50k-sided if we want to go with a pretty average token vocabulary size.

Really, it comes down to encoding. Arbitrarily short utf-8 encoded strings can be generated using a coin flip.

Dylan16807
·
1 day ago
·
[ - ]

The number of sides has nothing to do with the data within. It's not random and sometimes it repeats things in an obviously non-chance way.

kelseyfrog
·
23 hours ago
·
[ - ]

Of course, it's random and by chance - tokens are literally sampled from a predicted probability distribution. If you mean chance=uniform probability you have to articulate that.

It's trivially true that arbitrarily short reconstructions can be reproduced by virtually any random process and reconstruction length scales with the similarity in output distribution to that of the target. This really shouldn't be controversial.

My point is that matching sequence length and distributional similarity are both quantifiable. Where do you draw the line?

Dylan16807
·
23 hours ago
·
[ - ]

> Of course, it's random and by chance - tokens are literally sampled from a predicted probability distribution.

Picking randomly out of a non-random distribution doesn't give you a random result.

And you don't have to use randomness to pick tokens.

> If you mean chance=uniform probability you have to articulate that.

Don't be a pain. This isn't about uniform distribution versus other generic distribution. This is about the very elaborate calculations that exist on a per-token basis specifically to make the next token plausible and exclude the vast majority of tokens.

> My point is that matching sequence length and distributional similarity are both quantifiable. Where do you draw the line?

Any reasonable line has examples that cross it from many models. Very long segments that can be reproduced. Because many models were trained in a way that overfits certain pieces of code and basically causes them to be memorized.

kelseyfrog
·
23 hours ago
·
[ - ]

> Very long segments that can be reproduced

Right, and very short segments can also be reproduced. Let's say that "//" is an arbitrarily short segment that matches some source code. This is trivially true. I could write "//" on a coin and half the time it's going to land "//". Let's agree that's a lower bound.

I don't even disagree that there is an upper bound. Surely reproducing a repo in its entirety is a match.

So there must exist a line between the two that divides too short and too long.

Again, by what basis do you draw a line between a 1 token reproduction and a 1,000 token reproduction? 5, 10, 20, 50? How is it justified? Purely "reasonableness"?

Dylan16807
·
22 hours ago
·
[ - ]

Why do you want me to pick a number so bad?

There are very very long examples that are clearly memorization.

Like, if a model was trained on all the code in the world except that specific example, the chance of it producing that snippet is less than a billionth of a billionth of a percent. But that snippet got fed in so many times it gets treated like a standard idiom and memorized in full.

Is that a clear enough threshold for you?

I don't know where the exact line is, but I know it's somewhere inside this big ballpark, and there are examples that go past the entire ballpark.

I don't care where specifically the bound is.

kelseyfrog
·
22 hours ago
·
[ - ]

Ok, 1 it is then.

Dylan16807
·
22 hours ago
·
[ - ]

That is not good faith, my dude.

kelseyfrog
·
21 hours ago
·
[ - ]

It sounds like you do care where it is then, no?

Dylan16807
·
19 hours ago
·
[ - ]

I care that it's within the ballpark I spent considerable detail explaining. I don't care where inside the ballpark it is.

You gave an exaggerated upper limit, so extreme there's no ambiguity, of "entire repo".

I gave my own exaggerated upper limit, so extreme there's no ambiguity. And mine has examples of it actually happening. Incidents so extreme they're clear violations.

Maybe an analogy will help: The point at which a collection of sand grains becomes a heap is ambiguous. But when we have documented incidents involving a kilogram or more of sand in a conical shape, we can skip refining the threshold and simply declare that yes heaps are real. Incidents of major LLMs copying code, in a way that is full-on memorization and not just recreating things via chance and general code knowledge, are real.

You're the only person I've seen ever imply that true copying incidents are a statistical illusion, akin to a random die. Normally the debate is over how often and impactful they are, who is going to be held responsible, and what to do about them.

kelseyfrog
·
3 hours ago
·
[ - ]

To recap, the original statement was, "Llm's do not verbatim disgorge chunks of the code they were trained on." We obviously both disagree with it.

While you keep trying to drag this toward an upper bound, I'm trying to illustrate that a coin with "//" reproduces a chunk of code. Again. I don't see much of a disagreement on that point either. What I continue to fail to elicit from you is the salient difference between the two.

I'm trying to find a scissor that distills your vibes into a consistent rule and each time it's the rebutted like I'm trying to make an argument. If your system doesn't have consistency, just say so.

Dylan16807
·
36 minutes ago
·
[ - ]

I have a consistent rule. The rule is that if an LLM meets the threshold I set then it definitely violated copyright, and if it doesn't meet the threshold then we need more investigation.

We have proof of LLMs going over the threshold. So that answers the question.

Your illustrations are all in the "needs more investigation" area and they don't affect the conclusion.

We both agree that 1 token by itself is fine, and that some number is too many.

So why do you keep asking about that, as if it makes my argument inconsistent in some way? We both say the same thing!

We don't need to know the exact cutoff, or calculate how it varies. We only need to find violators that are over the cutoff.

How about you tell me what you want me to say? Do you want me to say my system is inconsistent? It's not. Having an area where the answer is unclear means the system is not able to answer every question, but it doesn't need to answer every question.

If you're accusing me of using "vibes" in a way that ruins things, then I counter that no I give nice specific and super-rare probabilities that are no more "vibes" based than your suggestion of an entire repo.

> What I continue to fail to elicit from you is the salient difference between the two.

Between what, "//" and the threshold I said?

The salient difference between the two is that one is too short to be copyright infringement and the other is so long and specific that it's definitely copyright infringement (when the source is an existing file under copyright without permission to copy). What more do you want?

Just like 1 grain of sand is definitely not a heap and 1kg of sand is definitely a heap.

If you ask me about 2, 3, 20 tokens my answer is I don't care and it doesn't matter and don't pretend it's relevant to the question of whether LLMs have been infringing copyright or not ("verbatim disgorge chunks").

·
7 hours ago
·
[ - ]

afiori
·
1 day ago
·
[ - ]

IIRC at a point it was 6 line of code

bobsmooth
·
1 day ago
·
[ - ]

ChatGPT has given me code with comments so specific I found the original 6 year old github.

DANmode
·
20 hours ago
·
[ - ]

How long ago was this?

Doesn’t invalidate your story in the slightest - I just know they’ve gotta be chasing this, specifically, like it’s life or death.

Mathnerd314
·
19 hours ago
·
[ - ]

Takedown letter: https://github.com/github/dmca/blob/master/2025/12/2025-12-1...

habibur
·
1 day ago
·
[ - ]

LGPL allows compiling the whole of ffmpeg into a so or lib and then dynamically linking from there for your closed source code. That's the main difference between LGPL and GPL.

But if you change or add something in building ffmpeg.so that should be GPLed.

Apparently they copied some files from ffmpeg mixed with their propitiatory code and compiled it as a whole. That's the problem here.

larschdk
·
1 day ago
·
[ - ]

Copyright law defines derivative work by substantial similarity and dependence, not by technical mechanisms like linking. Technical measures such as linking is not a copyright concept.

Dynamic linking is a condition for LGPL compliance, but it is not sufficient. Dynamic linking does not automatically prevent a combined work from being a derived work.

F3nd0
·
1 day ago
·
[ - ]

> Dynamic linking is a condition for LGPL compliance

No, it isn’t. The condition says to allow your users to make and use their own modifications to the part of the software which falls under the LGPL. Dynamic linking is only a convenient way of allowing this, not a requirement.

firesteelrain
·
1 day ago
·
[ - ]

Not familiar with Rockchip. Plenty of searches come up with cases of people incorporating ffmpeg into Rockchip projects. I still see the license files and headers. What is different with this DMCA takedown?

https://github.com/nyanmisaka/ffmpeg-rockchip

hogrug
·
1 day ago
·
[ - ]

The one you reference doesn't look like it misrepresents the licenses, i.e. of you use it to make your own derivative you would expect to have to share modifications you make to the LGPL code.

nottorp
·
13 hours ago
·
[ - ]

That's because you can incorporate LGPL code into any project as long as you respect the license.

Rockchip is a hardware platform, what hardware the code runs on isn't relevant here.

firesteelrain
·
12 hours ago
·
[ - ]

Since the link to the actual code is DMCA’d in Rockchip’s repository, anyone have a fork so we can form an opinion independently?

nottorp
·
6 hours ago
·
[ - ]

So the comments explaining the situation from the HN community aren’t good enough?

firesteelrain
·
4 hours ago
·
[ - ]

If the repository were still accessible, the only thing I’d be looking for is whether the LGPL portions were modified or statically linked. Absent that, the license-based explanation makes sense.

ThePowerOfFuet
·
1 day ago
·
[ - ]

https://xcancel.com/FFmpeg/status/2004599109559496984

cmrdporcupine
·
1 day ago
·
[ - ]

Alright, love it. Who do I donate to?

wmf
·
1 day ago
·
[ - ]

https://www.ffmpeg.org/donations.html

jalapenos
·
16 hours ago
·
[ - ]

Would be nice to live in a world where the stupidity of "intellectual property" and "copyright" had already been laughed out of the room. But alas we live in this one, the moral and legal equivalent of walking through a park covered in trash.

jimjimwii
·
10 hours ago
·
[ - ]

You largely have bill gates to thank. Code is math and anyone who insists otherwise is a personof questionable morals, intelligence, or both.

cryptica
·
1 day ago
·
[ - ]

The law doesn't seem to work anymore. There are so many cases where someone can do illegal stuff in plain sight and nothing can be done about it. Not everyone has tens of thousands or hundreds of thousands of dollars to spare to get a lawyer. By the time you manage to save up the money, you realize that this system is absolutely crooked and that you don't trust it to obtain justice anyway even with the lawyers and even if you are legally in the right.

The law exists mostly to oppress. It's exactly the argument that gun proponents make "Only the good guys obey gun laws, so only the bad guys have guns."

All the good guys are losing following the law, all the bad guys are winning by violating the law. Frankly, at this stage, they write the laws.

iinnPP
·
1 day ago
·
[ - ]

I recently had to deal with a ministry in Canada, where a worker who had been there since 20 years ago failed even a basic test of competence in reading comprehension. Then multiple issues with the OPC (Office of Privacy Commissioner) failing entirely on a basic issue.

Another example exists in Ontario's tenant laws constantly being criticized as enabling bad tenant behavior, but reading the statute full of many month delays for landlords and 2 day notices for tenants paints a more realistic picture.

In fact, one such landlord lied, admitted to lying, and then had their lie influence the decision in their favor, despite it being known to be false, by their own word. The appeal mentioned discretion of the adjudicator.

Not sure how long that can go on before a collapse, but I can't imagine it's very long.

martin-t
·
1 day ago
·
[ - ]

Incompetence is a taboo. It shouldn't be.

I think it should be perfectly OK to make value judgements of other people, and if they are backed by evidence, make them publicly and make them have consequences for that person's position.

iinnPP
·
1 day ago
·
[ - ]

A recent review of one of Canada's Federal Institutions showed the correct advice was given 17% of the time[0]. 83% failure rate. Not a soul has been fired unless something changed recently.

I do agree however with your assessment because any (additional) accountability would improve matters.

[0] https://globalnews.ca/news/11487484/cra-tax-service-calls-au...

cmrdporcupine
·
1 day ago
·
[ - ]

So here we have the good guys using the law and ... at least temporarily ... winning... so what's your point?

cryptica
·
1 day ago
·
[ - ]

You can see it everywhere. In this case, the fact that it took 2 years. And of course now that FFmpeg is getting more exposure in the media due to their association with AI hype, now they finally get 'fair' legal treatment... I don't call that winning. I see this over and over. Same thing all over the west.

I remember Rowan Atkinson (the UK actor) made a speech about this effect a couple of years ago and never heard about it since but definitely feeling it more and more... No exposure, no money, no legal representation. And at the same time we are being gaslit about our privilege.

martey
·
1 day ago
·
[ - ]

> In this case, the fact that it took 2 years. And of course now that FFmpeg is getting more exposure in the media due to their association with AI hype, now they finally get 'fair' legal treatment... I don't call that winning.

It took 2 years because FFmpeg waited 2 years to send a DMCA notice to Github, not because of delays in the legal system. I think you are conflating different unrelated issues here.

booleandilemma
·
1 day ago
·
[ - ]

I can't help but look at it as the sign of an empire in decline. It's only a matter of time before more people realize what you're saying (particularly the last two paragraphs) and the system falls apart.

nikitalita
·
1 day ago
·
[ - ]

someone post an archive link, I can't read that

nticompass
·
1 day ago
·
[ - ]

Does this work for you? https://xcancel.com/FFmpeg/status/2004599109559496984

darkamaul
·
1 day ago
·
[ - ]

Xlssid

antonvs
·
1 day ago
·
[ - ]

Is working around accessing an embargoed site really any better than just accessing it directly? Morally, what's the difference?

If everyone just actively boycotted that site, it would become irrelevant overnight. Anything else is simply condoning it continued existence. Don't kid yourself.

perryprog
·
1 day ago
·
[ - ]

The issue is that you need an account to view the replies, not that there's a moral opposition to visiting the website (though it could be that too).

antonvs
·
1 day ago
·
[ - ]

[flagged]

JCattheATM
·
1 day ago
·
[ - ]

The two are not mutually exclusive. Sometimes content is posted on a site people don't want to support, so making a copy of it and viewing/sharing the copy is preferable.

JCattheATM
·
1 day ago
·
[ - ]

What's stopping you from making the archive link yourself?

Tom1380
·
1 day ago
·
[ - ]

I'd like to learn but I can't find a guide on how to do it. Could you please share one?

stavros
·
1 day ago
·
[ - ]

Go to https://archive.is and paste the URL into the box.

booleandilemma
·
1 day ago
·
[ - ]

[flagged]

ThePowerOfFuet
·
1 day ago
·
[ - ]

https://xcancel.com/FFmpeg/status/2004599109559496984

LargoLasskhyfv
·
1 day ago
·
[ - ]

Clash of cultures. https://en.wikipedia.org/wiki/Shanzhai#Regulation vs. the 鬼子鬼佬老外

kstrauser
·
1 day ago
·
[ - ]

What's the clash? Is the claim that China lacks the notion of copyright, or that they don't care about the rule of law?

LargoLasskhyfv
·
23 hours ago
·
[ - ]

No, they have that. It's just not universally applied, or rather with 'some flexibility' instead :-)

It remains to be seen how much 'pull' FFmpeg has against the 'push' of Rockchip.

anonnon
·
1 day ago
·
[ - ]

Not a fan of the CCP, but GH in general has a big problem with users not understanding and respecting licensing and passing off others' code as their own, sometimems unintentionally, often not.

meindnoch
·
1 day ago
·
[ - ]

[flagged]

pico303
·
1 day ago
·
[ - ]

Maybe I’m not smart enough to grasp all these flowery words, but is this suggesting if I spend a few years writing some code, you should get to copy it for your own interests and without compensating me as long as your sales and marketing is better than mine?

I don’t think Rockchip learned from the ffmoeg code. They simply copied it outright without attribution.

agumonkey
·
1 day ago
·
[ - ]

I think both of you are right. But OP may think of the larger picture. A bit like 'move fast and break things', that sort of things where you blur the lines when it's valuable enough. Not that I agree with this ethical stance, but surely there's some sclerotic aspect of being too stiff on rules. It's a weird balance.

Nextgrid
·
1 day ago
·
[ - ]

> if I spend a few years writing some code, you should get to copy it for your own interests

If you publish the code, there's an argument to be made that yes, others should freely use it: if you could (or did) monetize the code yourself you wouldn't publish it. If you didn't, or failed trying to monetize it, maybe it's better for society if everyone else also gets to try?

LeFantome
·
21 hours ago
·
[ - ]

I do not even like the GPL but there are other forms of exchange other than monetary.

The license outlines the conditions of use. An argument could be made that ignoring the license means you are not paying the price specified.

array_key_first
·
18 hours ago
·
[ - ]

Right, but what incentives are we really pushing here?

If the only way to make any amount of money or, at least, not be stolen from, is to keep everything internal and be protectionist, then where is the progress?

So much of the modern world is built on open source. Do we really want every company and their mom recreating the world from scratch just so they don't get fucked over? Would things like the iPhone even exist in such a world?

adev_
·
1 day ago
·
[ - ]

> The LGPL is a product of a very specific moment: European legalism meeting American corporate compromise

If I tend to agree with the general message of the post, this specific point does not make any sense.

The LGPL and the GPL are 100% American products. They are originally issued from the the American Academic world with the explicit goal of twisting the arm of the (American) copyright system for ideological reasons.

That has zero relation to any European legalism.

nacozarina
·
1 day ago
·
[ - ]

re-framing this as a PRC vs West thing seems forced and weird

IncreasePosts
·
1 day ago
·
[ - ]

Why does China vigorously prosecute Chinese nationals when they pirate Chinese software?

martin-t
·
1 day ago
·
[ - ]

So progress is always good, no matter how many people's work you exploit without their consent? You have a nice car, can I just take it and use it myself? Why is code any different? Is slavery OK too?

A much more interesting problem is how to create prosperity without throwing people under the bus - with everybody who contributed profiting proportionally to their contribution.

·
1 day ago
·
[ - ]

LeFantome
·
21 hours ago
·
[ - ]

Capitalism has its downsides but one thing that it does better than all previous known systems is efficiently allocate resources that result in productivity. That is, it is the most efficient system we know.

Investment that does not result in utility for the investor leads to reduced investment. This is true regardless of if the “investment” is money or talent”.

Your suggestion that a system that allows people to ignore the price creators demand for their creations will be more efficient has been refuted over and over again throughout history.

jacquesm
·
1 day ago
·
[ - ]

Except of course for that one little detail where Chinese companies take out minor improvement patents to kick the door shut on open source projects that they build on top of.

alfiedotwtf
·
1 day ago
·
[ - ]

Software licensing is just another form of property rights, and property rights is what society uses to incentivise civility.

wiml
·
21 hours ago
·
[ - ]

> Software licensing is just another form of property rights

That's a pretty substantial assertion and without much to back it up.

The framing of copyright as basically the same as ownership of chattel or land is a propaganda campaign.

eithed
·
1 day ago
·
[ - ]

I guess who cares about civility if you're the last man standing.

Also - that word: civility. We're animals driven by self-interest. What should civility even mean here

throwaway150
·
1 day ago
·
[ - ]

> We're animals driven by self-interest. What should civility even mean here

That self-interest has led to cooperation between humans. Humans have evolved to work together, cooperate, form social bonds, and friendships because doing so improves survival and wellbeing over the long run. Civility is part of that toolkit. It is not a denial of self-interest. Civility is part of that self-interest.

alfiedotwtf
·
3 hours ago
·
[ - ]

Thank you, this is what I was trying to say… there are incentives to cooperate, even though individually we can be selfishly evil.

noodletheworld
·
1 day ago
·
[ - ]

> There's a pattern here that's bigger than FFmpeg

Why are you turning this into a discussion about China?

Its not about china.

Its about stealing.

Its not a complex, or western concept.

only-one1701
·
1 day ago
·
[ - ]

ChatGPT write this bro?

iLoveOncall
·
1 day ago
·
[ - ]

This perfectly summarizes my feeling about software licenses.

I've always found it beyond ridiculous. Either you post your code in public and you accept it'll be used by others, without any enforceable restriction, or you don't. It's as simple as that.

The rest is self-importance from bitter old men.

Telaneo
·
1 day ago
·
[ - ]

> I've always found it beyond ridiculous. Either you post your code in public and you accept it'll be used by others, without any enforceable restriction, or you don't. It's as simple as that.

If we can have this, but for everything, so films, books, TV, music and everything else, I'd agree. This however is not the world we live in. The amount of culture we could have from people remixing the past 50 years worth of culture would be incredible. Instead, we're stuck with the same stuff we were over 70 years ago.

The amount of progress we could make in software is probably on a similar level, but the problem is the same as it is with the cultural artefacts. So instead we're stuck in a world where money makes right, since you need money to uphold the laws intended to protect Intellectual Property™. I can't blame ffmpeg for working within the rules of the system, even if the system sucks.

dzaima
·
1 day ago
·
[ - ]

Or even just, have this also apply to the code produced by those using my code. But while that's not the case, copyleft licenses (especially GPL (not LGPL)) are a way to force it to be the case to at least limited extent.

iLoveOncall
·
1 day ago
·
[ - ]

Code is not culture, nor art. I'm not sure why you'd want to compare them.

Telaneo
·
1 day ago
·
[ - ]

I want more high quality code and I want more high quality culture. Both have one major obstacle in the way and is at the core of this post, my comment and yours: Copyright. I fail to see why we should make exceptions to copyright for the sake of code, but not for the sake of culture.

imska
·
1 day ago
·
[ - ]

Awesome comment. Thank you.

> Declining civilizations obsess over rules. Rising ones obsess over outcomes.

Heard that in a very different context. Care to mention what you are referring to? How do you know?

lofaszvanitt
·
22 hours ago
·
[ - ]

FOSS is and always was a scam, in order to feed tons of code to LLMs and kicking coders in the balls, so they could not monetize their work. And, noone cares about the licenses, everyone steals and robs whatever is at arms length.

dheera
·
1 day ago
·
[ - ]

Time to create a decentralized, blockchain-based GitHub (GitCoin?) and have every commit be a transaction on the chain. Nothing would ever be takedownable.

zzo38computer
·
1 day ago
·
[ - ]

Git already has a blockchain; what you will need to do next is to make copies of the objects of the repositories on other servers as well. (However, I don't know if the blockchain includes tags on git (it seems to me that it might not but I don't know enough about it), although it does include objects. Fossil includes tags in the blockchain as well as files, commits, etc.)

LeoWattenberg
·
1 day ago
·
[ - ]

I mean, torrenting is decentralised and not technically takedownable. But it was entirely possible to make it legally painful for people involved in it, as seen in eg. The Pirate Bay, megaupload or an entire cease-and-desist letter industry around individual torrenting users

Intentional noncompliance with copyright law can get you quite a distance, but there's a lot of money involved, so if you ever catch the wrong kind of attention, usually by being too successful, you tend to get smacked.

JCattheATM
·
1 day ago
·
[ - ]

> I mean, torrenting is decentralised and not technically takedownable.

It's fairly trivial to block torrent traffic.

michaelmrose
·
1 day ago
·
[ - ]

The cost would be incredible even for just a pointer to distributed file storage

odie5533
·
1 day ago
·
[ - ]

Github stores about 19 PB. That would cost about $20k a year on Filecoin. Filecoin currently has more supply than demand because it's speculation-driven right now.

dheera
·
1 day ago
·
[ - ]

There wouldn't be an org maintaining it. You would just buy $100 worth of GitCoin and that would be enough for 10000 commits, or something like that.

cmrdporcupine
·
1 day ago
·
[ - ]

Yes, using blockchain to defraud the GPL.

Checks out sufficiently dystopian, yep.

If you could work some gratuitous LLM in there, we could be a little closer to torment-nexus territory. Keep working at it.

waste_monk
·
21 hours ago
·
[ - ]

Ooh, how about instead of being able to author a commit message, you're forced to let an LLM write it for you based on the diff since last commit. And that the LLM runs distributed on the blockchain, so it's monstrously slow, and has to be paid for with a 'gas' analogue so there's huge transaction fees as well.

That's the most techbro-brained idea I could come up with.

ycombinatrix
·
1 day ago
·
[ - ]

git already uses a blockchain lol

asdfsa1314
·
21 hours ago
·
[ - ]

Chinese is steal of coding, which can pop device company use ffmpeg, 有利有弊