Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
HuggingFace Text Generation License No Longer Open-Source (github.com/huggingface)
84 points by bratao on July 29, 2023 | hide | past | favorite | 55 comments


Don't see any ill-will here. They changed license fairly early without soliciting years of contributions from others (like, only a few months old and probably a handful of contributions from public from what I can skim). They don't call it open-source any more and the new license doesn't contain any words of "open" or "free" or "source" in the name ("Hugging Face Optimized Inference License").

People make mistakes when choosing a license and should be OK if they course-correct fairly quickly.


Doesn't make it okay just because X months have elapsed & suddenly they were getting huge traction


I am more inclined to agree with the FSF in that open source should not unduly limit how I use a tool. If I can no longer embed it in a product I sell, it is source available, but not open source.

Regardless, if the library is worth anything (I am not familiar), I would suspect the pre 1.0 version to be forked and sucked up by AWS/Azure/etc similar to ElasticSearch.


Seems there's a fork already under the previous Apache-2.0 license by a non-hyperscaler user https://github.com/Preemo-Inc/text-generation-inference https://github.com/huggingface/text-generation-inference/iss...


The EU has a bill on the table that would make open source authors liable for patching bugs and exploits in commercial software.

If that bill goes through. This argument is nullified at the global scale. And it behooves open source authors to create licensing agreements Commercially.

Open Source simply means Open Source. Everything after the fact is up to the license and bad actors will always ignore such.


It most certainly does not nullify the argument at a global scale. Non EU based developers are not beholden to EU regulations in any way.


Can you link to the bill?



Open source is a misnomer. It should have been libre source.

(then this discussion wouldn't exist)


I am less inclined to agree with the FSF in that bourgeoisie who think they can hoover up innovations from the commons to build empires and oppress the masses should at least have to pay for the privilege. Thankfully I know I'm not alone in that appraisal, and the Gilded Age of Free and Open Source is coming to a close, as evidenced by projects like this getting a clue and so many robber barons being incensed that they have to actually pay for things they took for granted.


Nobody put a gun to their head and made them share their work. If they get indignant that people profited off of their backs then maybe sharing of the work was the wrong approach.

If they wanted to profit from their ideas, they could have started with selling the product from the start.


"If they get indignant that people profited off of their backs then maybe sharing of the work was the wrong approach."

No, no and no. I can want to publish the project to the community but at the same time want to restrict corporations from using it. It goes against the OSI open-source definition but.... we should maybe think about how much that definition is fit for purpose then.

This is like saying "if you don't want someone to steal your work, just don't publish it"


Then choose a license like AGPL and no corporation will touch your code. If you choose an Apache 2 license or similar, don't be shocked when people use it as the terms intend.

Personally, I will continue to release any code I open source as Apache 2, and if I intend to sell it, code will not be open source. You are free to think I am "wrong", but fortunately we need not agree with each other.


I don't have to intend to sell it in order to want to restrict others from doing so, though. Even if I want to allow others to sell the product or something which contains it, I can want to restrict them to certain usecases. (see "no more than 700M MAU" in LLaMa)

AGPL is not a full solution to this because the SaaS loophole.

Huggingface is entirely in the right here - there is zero reason for them to effectively subsidise their competitors who will obviously not contribute back in any meaningful way, but at the same time, want to publish the source, either for altruistic reasons or for publicity reasons.


> AGPL is not a full solution to this because the SaaS loophole.

"SaaS loophole" is the GPL. AGPL is the thing that closes it.


No it doesn’t. Anyone can take AGPL code and wrap it up as a SaaS. AGPL forces you to publish changes even as a SaaS which GPL does not. But that doesn’t close the AWS problem that Elastic and others faced.


The "loophole" is that you can modify and use the software without contributing back your changes. This has bad economics because then someone can take the work you've done, add 10% more effort to make it 10% better, and outcompete anyone else because their thing is 10% better than the free thing even though the original authors did 90% of the work and are still carrying most of the maintenance burden.

People using free software for free is not a loophole.


Source available / do whatever you want until 100M MAU seems like a great way to go. Lightweight users get lots of freedom and autonomy, big users have to pay to support the ecosystem.


And I am free to agree that authors can relicense their works under open source licenses that are inconvenient to the bourgeoisie. Such is life.


The only thing really missing here is a common community-focused anti-corporate license with a catchy name.


By that same logic: If you didn't want to run the risk of them relicensing their work going forward... you shouldn't have used it at all.


The FSF's answer to that is to use AGPLv3, which trips copyleft on network access.

The technicalities of actually using AGPLv3 is... well, if your only goal is to keep large businesses from touching your code, sure. But you can also do that with the weird funky postmodern licenses that technically confer no rights and could easily be used to set up copyleft trolling. If you just want rules that are fair and easy to comply with, this license has... problems. To make a long story short, AGPLv3 only makes sense for web development in interpreted languages that make it easy to list out the code of the site. In literally any other context[0] you're creating traps to sue people with.

MongoDB wanted to extend AGPLv3 even further with a requirement to release all supporting software - e.g. management UIs and things like that. OSI wouldn't put their blessing on it. Amazon just reimplemented the MongoDB API rather than complying with SSPL.

No, I don't think we should be entertaining the notion of API copyright. Hell, FOSS Patents thought Oracle was in the right purely because "oh it'd let Oracle assert GPL", even though it would completely pull the foundation out of the commons.

If you want to be even stricter, there's also the OpenWatcom license, which trips the copyleft whenever you use the software. So no private forking. This is actually less of a compliance headache than AGPLv3 - you don't need to allow source code download over the network, you just have to publish your modifications somewhere. The FSF refuses to touch it because they consider private forking to be a human right.

The problem with stopping "the bourgeoise" from hoovering up all the innovation from the commons is that doing so takes the software out of the commons. Its worse than AGPLv3, SSPL, or OpenWatcom. Anyone who is trying to do this is not trying to protect the commons, they're trying to join the bourgeoise. Because saying that you're not allowed to host the software for others, period, is the language of the proprietary world. Proprietary software licensing is so lucrative specifically because of usage restrictions - it lets you look into each user's wallet and extract the maximum amount of money from it.

[0] Hector Martin found AGPLV3 on the source code for an Ethernet PHY - how the hell do you put a "view source" link on an Ethernet packet?

He's also pointed out that if you just link to a Git repo as your compliance, that's not good enough, because you need to have the specific source code currently executing be accessible. You can't point to a commit inside of itself[1] The original idea was that AGPLv3 source would have quines in them that listed all the source code, and then AGPLv3 would prohibit removing the quines, but you really can only do that in interpreted languages (e.g. PHP, Python, or Ruby).

[1] Cryptographic hash algorithms would be no good if you could do this


It's sad that most organizations see open source as cheaper COGS rather than a way to solve their own problems more efficiently and improve critical infrastructure they rely on.

And so the only options are for HuggingFace to eat the cost of R&D, possibly to improve a direct competitor, or to limit commercial use in this way.


Isn't this what AGPL is for?


Yes, but then people just don't use it and/or build alternatives. Not to mention that even if it's irrational, some people just do not like GPL stuff.


Aren't the "some people" the exact people you're trying to get not to use it, i.e. direct competitors or SaaS companies who give nothing back?

And nothing stops you from selling them a separate license, in exchange for money.


Perhaps I'm not being clear. People (myself included) steer clear of anything with "GPL" in it as much as we possibly can, whether the reason is rational or irrational. The reasons don't matter - it's just what happens. And so saying something like "you can sell a separate license" doesn't change that dynamic.


I am unsure what you want. To make the source available for group X, but denied to group Y?

- X presumably being someone without any potential profitability path

- Y someone who can make money from it?

I can get the rationale of not wanting Amazon and Microsoft to profit from you work, but I am unsure how you thread this licensing needle without a GPL-similar license from the start. Either you put it out in the open without restrictions (and eat the potential consequences), encumber it in such a way that no corporation wants to touch it, or you never share it.


> People (myself included) steer clear of anything with "GPL" in it as much as we possibly can, whether the reason is rational or irrational. The reasons don't matter - it's just what happens.

Okay, so then irrational people get to shoot themselves in the foot by not using quality free software. What problem is? Presumably those people will be out-competed by ones who make more sensible decisions, or at least make less money.

> And so saying something like "you can sell a separate license" doesn't change that dynamic.

But it does though? If you don't want to use software licensed under the GPL then you can go pay them money to get a different license. Which they like because they get your money. Getting money from irrational people is one of the most profitable things in the world.

And if you're neither paying them nor contributing back changes in the spirit of the GPL, why should they care if you don't use it?


I understand your position here. You're trying to apply a logical argument to something that is often emotional in nature. These things don't square, I'm sorry to say. GPL has a reputation that, for better or for worse, means people will avoid it.


It’s still open source. The only new limitation is that you can’t monetize the model itself, which is fine. They have to make money.

You can still:

+ use it for personal use

+ use it as part of a commercial project

+ sell a hosted service of <v1.0

You cannot:

+ wrap an API around the library (v1.0+) and sell that, without a license from HF.

This is less restrictive in practice than some of the extreme Open Source copyleft licenses. It’s fine.


It is not open source since it violates the very definition of open source [1]. They (and you) are free to call whatever this license is something else and I am sure there are many great terms, but it is greatly dishonest to use a term others have worked hard to define for nearly thirty years while not adhering to the definition.

[1]: https://en.wikipedia.org/wiki/The_Open_Source_Definition

This reminds me of that language model coming out of the PRC about a year ago that claimed to be open, yet it turned out that it was only the code that was open and not the model itself. Which is fine (your labour on your terms), but use a different term in that case as I can assure you most people have an interest in code and weights, not just the former.


That is not open source. That is OSI's proprietary definition that postdates the notion of open source by almost a decade, and represents the third attempt by corporate america to appropriate free software for their own interests. Open source has always meant that the source is available. I would advise you learn about what open source[1] actually is and ignore the corporate trolls who continue to try to rewrite history and the dictionary to their benefit.

---

[1] https://www.arp242.net/open-source.html


I do not buy Martin Tournoij’s argument as it essentially boils down to “no one owns a language” and “there is some prior use that I could find”. Term formation is – as Martin rightfully points out – messy and taking this first-mention approach poorly reflects how consensus around meaning forms among us humans.

My take is (I think) pragmatic. I know for a fact that both OSI and FSF had a desire to delineate terms for their own agendas (both of which have been stated in the open). Both worked hard over decades to explore the implications of their definitions and we as a society also worked with them to form some kind of consensus of what these terms mean. Heck, it is not like Free Software is not problematic as a term due to the ambiguity (and prior usage…) of the word free and FSF has discussed this at great length. Likewise, Open Source is of course flawed as a term in its own way, but it is not useful to start muddying the waters around the term at this point as there is a consensus in place that allows us to have a discussion about Open Source and its flaws to begin with.

The Ethical Source movement for example object to both Open Source and Free Software licenses, while still adhering to freely sharing code. Great! I happen to disagree with them, but I think their reasoning is fine and they decided to invent their own (obviously faulty in some ways) term to delineate themselves from those they disagree with. Similarly, I wish this anti-corporate, source-available, whatever proto-movement that keeps popping up would get together, write a manifesto, and come up with a name rather than dive into discussions such as this and try to somehow argue that when the great majority of us used Open Source as a term for the last twenty or so years what we actually had in mind included licenses which restricted fields of use and discriminated against persons and/or groups. Sorry, I do not buy that, not even for a second.

As something mildly tangential. I find it confusing that members of this proto-movement seems to both support venture capital-fueled start-ups such as Hugging Face and at the same time frequently uses Marxist-esque terms. Heck, I have also seen attacks on Open Source authors from people using alt-right terms. Maybe there is more than one new movement out there? Regardless, I look forward to seeing what term they end up using and reading their reasoning and definition when they finally stop arguing semantics.


The consensus by those outside the milieu of bourgeoisie apologists is that 'open source' really does mean 'the source is available', full stop, nothing else added. That some people have trouble understanding this is not my problem. It is also not my problem that I refuse to be labelled so I can be conveniently dismissed. I've got history on my side, and people obviously are so sick to death of corporations that having someone stand up and state the obvious is refreshing to them.

In light of this, maybe you should consider the possibility that you're wrong and you're backing the wrong team.


The term Open Source is ambiguous and has always been rejected by the original proponents of Free (as in Freedom) software as a corporatation-derived attempt to lessen the ‘Freedom’ bit. So defending the term Open Source is probably not the hill to die on.


There are far more than two camps and FSF was born out of a culture that existed in the 60s and 70s of which the FSF is not a perfect representation either. In fact, several communities carrying that 60/70s spirit and using permissive licenses also disagree with the OSI mission and reject licenses that OSI approves (Apache 2.0 in particular). I do however think it is worth fighting against the erosion of a term that is widely accepted.


The specific term Open Source (which is what I was deliberately mentioning - rather than the notion of free or open software) is from the late 90s and was intentionally introduced to make free or open software more palatable to corporations. It itself is a deliberate watering down of the notion of free software, so it is funny to see it being defended like this.


> It itself is a deliberate watering down of the notion of free software, so it is funny to see it being defended like this.

No, because as I pointed out there are more than two camps and there are some of us that find even what OSI approves too restrictive. Just like OSI does not own the culture of open and free software, neither does FSF. Thus you can not claim that OSI watered down and corrupted the “true” spirit of free and open software. What they did was to formalise and create (some say appropriate) a term to serve their purposes and what they formalised inarguably captures a subset of what was and is open and free software. Regardless of whether we agree with their agenda or not, I do think it is fair to let them have the term they defined as it makes it easier to have a discourse around software licensing without all of the sudden including source-available licenses and the likes.


Note that huggingface never calls it “open source”.


Thank you, I was aware of this but looking at my initial comment I did not make this clear and they do deserve credit for communicating honestly and clearly, especially when so many others do not.


> They (and you) are free to call whatever this license is something else

I am also free to call it open source.


Indeed, you're free to call a pig a horse.


"‘When I use a word,’ Humpty Dumpty said in rather a scornful tone, ‘it means just what I choose it to mean–neither more nor less.’

‘The question is,’ said Alice, ‘whether you can make words mean different things–that’s all.’

‘The question is,’ said Humpty Dumpty, ‘which is to be master–that’s all’"

-- Humpty Dumpty to Alice, _Through the Looking Glass_

See https://www.arp242.net/open-source.html for an actual historical definition of open source, free of corporate propaganda.


Mr. Porse is the most famous Porse of all.


True, but surely you wouldn't intentionally label yourself ignorant? Also, wouldn't you consider willfully spreading false information rather insidious?


It's not false information. OSI doesn't own a trademark for "open source".


Evolution my dear brother. Evolution.


Rather, it is the dilution of the meaning of open source because true and pure open source such as GPL or Apache is too powerful for some people to accept.


TIL that giving innovations away to the bourgeoisie is 'powerful'. This is obviously some bizarre sado-masochistic definition of 'powerful' I was not previously familiar with involving whips, chains, and dragons.


Hahahah. I love some good ole sarcasm.


More like regression than evolution.


There's a gold rush going on right now. And everything AI-related is the gold. Hence non-open-source is now totally-open-source-don't-you-say-a-bad-thing-about-it-source! See: OpenAI.

Actually it's now totally OK to loooove megacorps and their lovely and caring ways to do things, after all it's totally a GOOD thing only a few select megacorps can do cutting edge training otherwise some closet nazi may build a hitler bot in his basent!


I'm not sure you understand what the Open Source movement was pushing for in the first place.

Exploitation of labour was, very literally, the main argument ESR put forward in favour of the Bazaar model -- and it is the reason Open Source as a term exists.


It's been totally OK to love corporations for about two decades running... ever since some clown told everyone that they should provide them with free labor (which is their idea of what 'free software' really means).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: