* an AI model is not a backup of the contents of all of the books in the sense that it would preserve their contents or similar such it might e.g. be useful for future generations
* Meta has (allegedly) been unfairly benefiting / profiting off of the copyrighted work of others by illegally reproducing copies of their work. Not just in the AI model sense[1], but actually (allegedly) downloading them directly from pirate repositories in a way that isn't straightforwardly fair use and even uploading some amount of this pirate data in return.
I feel like the parent commenter may have been making the typical argument for preservation of copyrighted materials, and I'm amenable to it... when it's regular people or non-profits doing that work, in a way that doesn't allow them to benefit unfairly or profit off of the hard work of others (or would be connected to such a process in some way).
Plaintiffs allege that Meta didn't just do all this, but also talked about how wrong it was and how to mitigate the seeding so they might upload as little as possible. So no matter how you slice it they allegedly 1) knew they were doing something at least a little bit wrong and 2) took steps to prevent the process that might otherwise have preserved the copied materials for the public interest.
And I feel like you probably knew all this, but maybe I'm missing something.
1: the typical argument wherein the model wouldn't exist without the ingested data, a lot of it is still in there, it is of course a derivative work and the question is really how derivative is it and what part of the work can they claim is their own contribution
And punishing them in the normal manner will be an incredibly small slap on the wrist, and do absolutely nothing to help us find out what will play out in court regarding a fair-use defense on training AI with copyrighted material.
Isn't there a "fruit of the poisoned tree" kind of thing? Sounds to me quite similar to the situation where you would murder your parent and get to keep the inheritance, even if you are convicted of murder. Inheriting stuff isn't illegal, yet, I think most jurisdictions would not allow you to keep it in this case.
There should be a problem with stuff obtained through illegal means, even if having that stuff is in principle legal. In this case, copyrighted material.
Obviously they would argue that having the data is only a consequence of the download part, and that part is legal. What I see is that these situations are always complicated, and if you're rich enough, you get to litigate the complications and come out with a slap on the wrist or maybe even clean hands, while if you are an ordinary citizen, you can't afford to delve into the complexities and get punished.
These days I'm starting to give up on the whole concept of the legal system being fair. They're not even pretending anymore.
There's such a mass of possible works that it hardly constrains someone that if you could cast a magic spell preventing someone from distributing or accessing your particular work and then burned it, your spell would have essentially no effect-- no one would notice it and no one would be harmed.
As long as discussion of a work that has published is not impeded, the public is not harmed even by these 50-years after life copyrights other than by that they are accumulated by certain companies who themselves become problems.
When someone decides to use someone's work without compensation he is, even though he is not deprived of the work itself, still robbed. But it's not a theft of goods, it's theft of service. The copyright infringer isn't the guy who steals your phone, it's the guy who even you have done some work for but who refuses to pay.
With this view you can also believe, without hypocrisy, that what the LLM firms are doing is wrong while what Schwartz did was not, since the authors in question weren't deprived of any royalties or payments due to them due to due to the publishing model for scientific works.
> But it's not a theft of goods, it's theft of service.
What service? If somebody washes your windshield without you asking, it isn't a theft of service to not pay them. A theft of service arises from entering into an agreement and then failing to pay as stipulated in that agreement.
Copyright isn't an agreement you can choose whether to participate in. Copyright is a legal enforcement system that imposes legal liability even on those who don't use it. You may not see this legal liability as "harm", but it absolutely is. Arguing that copyright extends to training is arguing for a dramatic increase in the scope and power of this legal enforcement system.
But the thing here distinguishing it from the windshield thing is that there are so many possible texts that you choosing their particular text is to choose the work they've done.
You think of choosing somebody's particular text as the way of contracting him. Just as it isn't a restriction of your freedom of speech that going into restaurant and ordering a meal creates a contract to pay, so it isn't a restriction of your freedom of speech when you choose to seek out and repeat somebody's very particular text.
Why Harry Potter when you have any of hundreds of million of stories of similar sort that you could easily write yourself? When you choose that one, you choose it because it's already been prepared by somebody else, just as you choose restaurant because they've done work and have food ready for you. By choosing the one that's already written you accept that the author has done work for you.
> so it isn't a restriction of your freedom of speech when you choose to seek out and repeat somebody's very particular text.
I hadn't made that claim, but I will in now that you've brought it up. Art operates as part of a discussion, the reference to and re-use of prior art is a key part of the how that happens. There are sooo many cases of copyright being used to limit the freedom of expression, that this really isn't disputable. Copyright clearly restricts speech.
> By choosing the one that's already written you accept that the author has done work for you.
No I don't, at least not in a sense that's different from the shoulders of all the people that author learned from and so on. Cultural works exist and take on roles in our cultural semiology, our memes our language without our choice. You can coose to not engage with a work, but you can't choose which works will be culturally relevant or not.
When you publish something, it becomes part of our shared culture and no-one has an inalienable right to own that. The limited rights we granted to encourage commercial creativity have already snowballed out of control and now people are blythly buying into another dramatic expansion of them.
But we're talking about extremely direct copying. Actual computerized copying, typically verbatim.
Doing things relating to discussion of a work are typically permitted, but you have no reason to use anybody's particular work other than to make use of the work he did in creating it.
> But we're talking about extremely direct copying. Actual computerized copying, typically verbatim.
Copyright doesn't just extend to "literal direct copying". When you claim copyright doesn't harm anyone, you can't ignore all the other types of activity it prohibits.
> Doing things relating to discussion of a work are typically permitted,
Only if you limit the meaning of "discussion" so much that it no longer includes the process of making art.
> but you have no reason to use anybody's particular work other than to make use of the work he did in creating it.
Did you not ready my comment? I already explained the reason. Creative works become part of our culture, you can't choose which works will do that, you can only choose to participate in that culture or not.
Copyright is a social system for artificially limiting access to our shared culture and thus also limits participation in that culture.
I understand the value of a limited copyright system, but anyone that claims that our copyright system doesn't cause harm or cost us anything isn't being realistic. Copyright duration should be far more limited and we need significant reforms to the DMCA. Personally, I think even all non-commercial distribution should be legal as copyright should only grant commercial rights.
I recommend Jason Furman's work for the practical mechanisms by which unnatural monopolies can be broken up and natural monopolies can be regulated.
On the other hand, you are interested in why the status quo isn't an accident and what we would need to do to accomplish those things, I recommend reading Reid Hoffman's book "Blitzscaling" side-by-side with Jacob Hacker and Paul Pierson's book "Winner-Take All Politics": you can see the same dynamics presented in two very different lights.
Okay, they would be smaller, but you said "big corporations should not be able to exist" and they would already be a big corporation with just search--they started this way.
Or, just to follow it through, let's say "WidgetBoss LLC" makes a new Widget that every single human has to have, they become the biggest company ever by making one widget. What will you do to make them smaller? Why?
I have a big problem with Google & Meta, and I can understand arguments about those companies. But not just "big companies" as a generality.
But that's how everyone speaks now. "Literally every billionaire is evil and exploiting blah blah blah"
I'm not sure if you're in good faith, but I will assume that you are.
> "Literally every billionaire is evil and exploiting blah blah blah"
Nope. Not every billionaire is evil and exploiting blah blah blah. But nobody deserves to be a billionaire, period.
> let's say "WidgetBoss LLC" makes a new Widget that every single human has to have, they become the biggest company ever by making one widget
Which hasn't happened because, obviously, it is not possible to become the biggest company ever by making something trivial.
It is not possible to promote your product by putting it at the top of the search results if you don't own the search engine.
It is not possible to get statistics about popular products in your webstore, copy them and put them at the top of the search results if you can't own both the webstore and the products.
It is not possible to force everybody to use your email provider in order to use their smartphone if you don't own both the email provider and the smartphone OS.
The issue is there's an asymmetry between buyer/seller for books, because a buyer doesn't know the contents until you buy the book. Reviews can help, but not if the reviews are fake/AI generated. In this case, these books are profitable if only a few people buy them as the marginal cost of creating such a book is close to zero.
This is starting to get pretty circular. The AI was trained on copyrighted data, so we can make a hypothesis that it would not exist - or would exist in a diminished state - without the copyright infringement. Now, the AI is being used to flood AI bookstores with cheaply produced books, many of which are bad, but are still competing against human authors.
The benefits are not clear: why should an "author" who doesn't want to bother writing a book of their own get to steal the words of people who aren't lazy slackers?
It's as much stealing as piracy is stealing, ie none at all. If you disagree, you and I (along with probably many others in this thread) have a fundamental axiomatic incompatibility that no amount of discussion can resolve.
It is not theft in the property sense, but it is theft of labor.
If a company interviewed me, had me solve a problem, didn't hire me or pay me in any way and then used the code I wrote in their production software, that would be theft.
That is the equivalent of what authors claiming they wrote AI books are doing. That they've fooled themselves into thinking the computer "wrote" the book, erasing all the humans whose labor they've appropriated, in my opinion makes it worse, not better. They are lying to the reader and themselves, and both are bad.
Stealing is not the right word perhaps, but it is bad, and this should be obvious. Because if you take the limit of these arguments as they approach infinity, it all falls apart.
For piracy, take switch games. Okay, pirating Mario isn't stealing. Suppose everyone pirates Mario. Then there's no reason to buy Mario. Then Nintendo files bankruptcy. Then some people go hungry, maybe a few die. Then you don't have a switch anymore. Then there's no more Mario games left to pirate.
If something is OK if only very, very few people do it, then it's probably not good at all. Everyone recycling? Good! Everyone reducing their beef consumption? Good! ... everyone pirating...? Society collapses and we all die, and I'm only being a tad hyperbolic.
In a vacuum making an AI book is whatever. In the context of humanity and pushing this to it's limits, we can't even begin to comprehend the consequences. I'm talking crimes against humanity beyond your wildest dreams. If you don't know what I'm talking about, you haven't thought long enough and creatively enough.
> Because if you take the limit of these arguments as they approach infinity, it all falls apart.
Not everyone is a Kantian, who has the moral philsophy you are talking about, the categorical imperative. See this [0] for a list of criticisms to said philosophy.
> In a vacuum making an AI book is whatever. In the context of humanity and pushing this to it's limits, we can't even begin to comprehend the consequences. I'm talking crimes against humanity beyond your wildest dreams. If you don't know what I'm talking about, you haven't thought long enough and creatively enough.
Not really a valid argument, again it's circular in reasoning with a lot of empty claims with no actual reasoning, why exactly is it bad? Just saying "you haven't thought long enough and creatively enough" does not cut it in any serious discussion, the burden of substantiating your own claim is on you, not the reader, because (to take your own Kantian argument) anyone you've debating could simply terminate the conversation by accusing you of not thinking about the problem deep enough, meaning that no one actually learns anything at all when everyone is shifting the burden of proof to everyone else.
It is, because the quote you quoted is in reference to what I said above.
I explained real consequences of pirating. Companies have gone under, individuals have been driven to suicide. This HAS happened.
It's logically consistent that if we do that, but increase the scale, then the harm will be proportionally increased.
You might disagree. Personally, I don't understand how. Really, I don't. My fundamental understanding of humanity is that each innovation will be pushed to it's limits. To make the most money, to do it as fast as possible, and in turn to harm the most people, if it is harmful. It is not in the nature of humanity to do something half-way when there's no friction to doing more.
This reality of humanity permeates our culture and societies. That's why the US government has checks and balances. Could the US government remain a democracy without them? Of course. We may have an infinite stream of benevolent leaders.
From my perspective, that is naive. And, certainly, the founding fathers agreed with me. That is one example - but look around you, and you will see this mentality permeates everything we do as a species.
> Stealing is not the right word perhaps, but it is bad, and this should be obvious.
Many people say things that they don't like "should be obvious"ly bad. If you can't say why, that's almost always because it actually isn't.
Have a look at almost any human rights push for examples.
.
> For piracy, take switch games.
It's a bad metaphor.
With piracy, someone is taking a thing that was on the market for money, and using it without paying for it. They are selling something that belongs to other people. The creator loses potential income.
Here, nobody is actually doing that. The correct metaphor is a library. A creator is going and using content to learn to do other creation, then creating and selling novel things. The original creators aren't out money at all.
Every time this has gone to court, the courts have calmly explained that for this to be theft, first something has to get stolen.
.
> If something is OK if only very, very few people do it
This is okay no matter how many people do it.
The reason that people feel the need to set up these complex explanatory metaphors based on "well under these circumstances" is that they can't give a straight answer what's bad here. Just talk about who actually gets harmed, in clear unambiguous detail.
Watch how easy it is with real crimes.
Murder is bad because someone dies without wanting to.
Burglary is bad because objects someone owns are taken, because someone loses home safety, and because there's a risk of violence
Fraud is bad because someone gets cheated after being lied to.
Then you try that here. AI is bad because some rich people I don't like got a bunch of content together and trained a piece of software to make new content and even though nobody is having anything taken away from them it's theft, and even though nobody's IP is being abused it's copyright infringement, and even though nobody's losing any money or opportunities this is bad somehow and that should be obvious, and ignore the 60 million people who can now be artists because I saw this guy on twitter who yelled a lot
Like. Be serious
This has been through international courts almost 200 times at this point. This has been through American courts more than 70 times, but we're also bound by all the rest thanks to the Berne conventions.
Every. Single. Court. Case. Has. Said. This. Is. Fine. In. Every. Single. Country.
Zero exceptions. On the entire planet for five years and counting, every single court has said "well no, this is explicitly fine."
Matthew Butterick, the lawyer that got a bunch of Hollywood people led by Sarah Silverman to try to sue over this? The judge didn't just throw out his lawsuit. He threatened to end Butterick's career for lying to the celebrities.
That's the position you're taking right now.
We've had these laws in place since the 1700s, thanks to collage. They've been hard ratified in the United States for 150 years thanks to libraries.
This is just silly. "Recycling is good and eating other things is good, but let's try piracy, and by the way, I'm just sort of asserting this, there's nothing to support any of this."
For the record, the courts have been clear: there is no piracy occurring here. Piracy would be if Meta gave you the book collection.
.
> In the context of humanity and pushing this to it's limits, we can't even begin to comprehend the consequences.
That's nice. This same non-statement is used to push back against medicine, gender theory, nuclear power, yadda yadda.
The human race is not going to stop doing things because you choose to declare it incomprehensible.
.
> I'm talking crimes against humanity beyond your wildest dreams.
Yeah, we're actually discussing Midjourney, here.
You can't put a description to any of these crimes against humanity. This is just melodrama.
.
> If you don't know what I'm talking about,
I don't, and neither do you.
"I'm talking really big stuff! If you don't know what it is, you didn't think hard enough."
Yeah, sure. Can you give even one credible example of Midjourney committing, and I quote, "crimes against humanity beyond your wildest dreams?"
Like. You're seriously trying to say that a picture making robot is about to get dragged in front of the Hague?
Sometimes I wonder if anti-AI people even realize how silly they sound to others
Okay. AI books make books 1 million times faster, let's say. Arbitrary, pick any number.
If I, a consumer, want a book, I am therefore 1 million times more likely to pick an AI book. Finding a "real" book takes insurmountable effort. This is the "needle in a haystack" I mentioned earlier.
The result is obvious - creators look potential money. And yes, it is actually obvious. If it isn't, reread it a few times.
To be perfectly and abundantly clear because I think you're purposefully misunderstanding me - I know AI is not piracy. I know that. It's, like, the second sentence I wrote. I said those words explicitly.
I am arguing that while it is not piracy, the harm it creates it identical in form to piracy. In your words, "creators lose potential income". If that is the standard, you must agree with me.
> how silly they sound to others
I'm not silly, you're just naive and fundamentally misunderstand how our societies work.
Capitalism is founded on one very big assumption. It is the jenga block keeping everything together.
Everyone must work. You don't work, you die. Nobody works, everyone dies.
Up until now, this assumption has been sound. The "edge cases", like children and disabled people, we've been able to bandaid with money we pool from everyone - what you know as taxes.
But consider what happens if this fundamental assumption no longer holds true. Products need consumers as much as consumers need products - it's a circular relationship. To make things you need money, to make money you must sell things, to buy things you must have money, and to have money you must make things. If you outsource the making things, there's no money - period. For anyone. Everyone dies. Or, more likely, the country collapses into a socialist revolution. Depending on what country this is, the level of bloodiness varies.
This has happened in the past already, with much more primitive technologies. FDR, in his capitalist genius, very narrowly prevented the US from falling into the socialist revolution with some aforementioned bandaid solutions - what we call "The New Deal". The scale at which we're talking about now is much larger, and the consequences more drastic. I am not confident another "New Deal" can be constructed, let alone implemented. And, I'm not confident it would prevent the death spiral. Again, we cut it very, very close last time.
> If they start out-competing humans, is that bad?
Not inherently, but it depends on what you mean by out-competing. Social media outcompeted books and now everyone's addicted and mental illness is more rampant than ever. IMO, a net negative for society. AI books may very well win out through sheer spam but is that good for us?
> Nobody has responded to me with anything about how authors are harmed
i imagine if books can be published to some e-book provider through an API to extract a few dollars per book generated (mulitiplied by hundreds), then eventually it'll be borderline impossible to discover an actual author's book. breaking through for newbie writers will be even harder because of all of the noise. it'll be up to providers like Amazon to limit it, but then we're then reliant on the benevolence of a corporation and most act in self interest, and if that means AI slop pervading every corner of the e-book market, then that's what we'll have.
kind of reminds me of solana memecoins and how there are hundreds generated everyday because it's a simple script to launch one. memecoins/slop has certainly lowered the trust in crypto. can definitely draw some parallels here.
> Nobody has responded to me with anything about how authors are harmed
The same way good law-abiding folk are harmed when Heroin is introduced to the community. Then those people won't be able to lend you a cup of sugar, and may well start causing problems.
AI books take off and are easy to digest, and before long your user base is quite literally too stupid to buy and read your book even if they wanted.
And, for the record, it's trivial to "out compete" books or anything else. You just cheat. For AI, that means making 1000 books that lie for every one real book. Can you find a needle in a haystack? You can cheat by making things addictive, by overwhelming with supply, by straight up lying, by forcing people to use it... there's really a lot of ways to "outcompete".
> It feels more like we just want to punish people, particularly rich people, particularly if they get away with stuff we're afraid to try.
If by "afraid to try" you mean "know to be morally reprehensible" and if by "punish people" you mean "punish people (who do things that we know to be morally reprehensible)", then sure.
But... you might just be describing the backbone of human society since, I don't know, ever? Hm, maybe there's a reason we have that perspective. No, it must just be silly :P
I just explained how it's morally reprehensible. The argument is right there, above the quote you chose to quote. Neat trick, but I'm sorry, a retort that does not make.
You didn't explain anything about why it is so, you just said it is, hence why I said it's your opinion. If you can't explain why, in more concrete terms, then there is no reason to believe your argument.
I just explained how AI books are able to cheat - they make more, faster, cheaper, and win based not on quality, never on quality, but rather by overwhelming. Such a strategy is morally reprehensible. It's like selling water by putting extra salt in everything.
Consumers are limited by humanity. We are all just meat sacks at the end of the day. We cannot, and will not, sift through 1 billion books to find the one singular one written by a person. We will die before then. But, even on a smaller scale - we have other problems. We have to work, we have family. Consumers cannot dedicate perfect intelligence to every single decision. This, by the way, is why free market analogies fall apart in practice, and why consumers buy what is essentially rat poison and ingest it when healthier, cheaper options are available. We are flawed by our blood.
We can run a business banking on the stupidity of consumers, sure. We can use deceit, yes. To me, this is morally reprehensible. You may disagree, but I expect an argument as to why.
> I just explained how AI books are able to cheat - they make more, faster, cheaper, and win based not on quality, never on quality, but rather by overwhelming. Such a strategy is morally reprehensible.
Okay, I fundamentally disagree with your premises, analogies to water and banking (or even in your other comment about piracy [0], as I have not seen any evidence of piracy leading directly to "suicides," as you say, and have instead actually benefited many companies [1]), and therefore conclusions, so I don't think we can have a productive conversation without me spending a lot of time saying why I don't equate AI production to morality, at all, and why I don't see AI writing billions of books having anything to do with morals.
That is why I said it is your opinion, versus mine which is different. Therefore I will save both our time by not spending more of it on this discussion.
You're of course allowed to disagree, but past a certain point you're yelling at clouds and people might think you're insane.
It's very simple logic, and it doesn't require your understanding to be true. Piracy is good for companies? Really? That's... your legitimate position?
If nobody is paying for anything how does a company operate? That's not a rhetorical question. Is it fairy dust? Perhaps magical pixies keep the lights running?
If you don't have explanations for even the simplest of problems with your position, your position isn't worth listening to.
Again, you're a Kantian and I'm not. Your arguments do not sway those who aren't, as I said, they are fundamentally different moral philosophies. If you cannot produce even the evidence of harm as you previously stated (please, link me suicide news reports directly caused by piracy, as you claimed) then "your position isn't worth listening to" either.
Does it make a difference? What I'm saying is plainly true and undeniable. I'll break it down, perhaps a bit slower this time so you can keep up.
You must agree companies require money to operate. No money, no company. You must also agree that piracy OR any action which takes money away from a company results in less money. In addition, you must agree every individual will take whichever action costs them the least amount of money.
Okay. Do you see where I'm going? Following these very simple rules, the result is that there is no money left for companies, and they go under.
Whether that's bad or not is, technically, debatable. Whether that's how it works or not, isn't.
I grow tired of having to explain very simple logic to bumbling idiots. Of course you're not a bumbling idiot. Rather, you're someone with a belief and a delusion. Meaning, you will simply ignore any and all reality to maintain your belief, even if, right before your very eyes, it is refuted. I don't know why people act this way. Maybe there's some medication that can help with that.
People might say I'm a prophet, maybe some kind of psychic. Really, I'm just a guy with, like, a quarter of a brain. We can often "see into the future" if we just rub some brain cells and put two and two together.
Until you can find away around these rules, perhaps some alternative economic system which has not been invented, there is nothing for you to refute. Not that you've been trying at all, your entire "argument" has been "erm, I disagree". Which, by the way, is not an argument. It's more of a statement, and one which is embarrassing to say out loud when you don't have anything to back it up with.
And, to be clear, this is well past the land of morality. I'm operating in a much simpler framework here. Even if you're under the belief everyone is perfect, or some people are perfect and some aren't, or whatever other moral beliefs - that doesn't change the rules and therefore doesn't change the result.
Logic can seem very consistent in a vacuum but again, because you can't find a single statistic to support your claims about piracy (while I already cited some evidence for my side), you cling to what you think is true, not what is empirically studied to be true. Your bloviating in as many paragraphs doesn't really mean anything, so unless you can cite something meaningful, I'm done with this nearly two week old conversation.
I think the concern goes to the point of copyright to begin with, which is to incentive people to create things. Will the inclusion of copyrighted works in llm training (further) erode that incentive? Maybe, and I think that's a shame if so. But I also don't really think it's the primary threat to the incentive structure in publishing.
Copyright was invented by publishers (the printing guild) to ensure that the capitalists who own the printing presses could profit from artificial monopolies. It decreases the works produced, on purpose, in order to subsidize publishing.
If society decides we no longer want to subsidize publishers with artificial monopolies, we should start with legalizing human creativity. Instead we're letting computers break the law with mediocre output while continuing to keep humans from doing the same thing.
LLMs are serving as intellectual property laundering machines, funneling all the value of human creativity to a couple of capitalists. This infringement of intellectual property is just the more pure manifestation of copyright, keeping any of us from benefitting from our labor.
Few company can amass such quantities of knowledge and leverage it all for their own, very-private profits. This is unprecedented centralization of power, for a very select few. Do we actually want that? If not, why not block this until we're sure this a net positive for most people?
Because they expect not to have to opens-source future models. Easy to open stuff as long as you strengthen your position and prevent the competition from emerging.
Ask Google about Android and what they now choose to release as part of AOSP vs Play Services.
He left out part of the quote, which is misappropriated as well. Wikipedia:
> This quotation is often incorrectly attributed to Francis M. Wilhoit:
> Conservatism consists of exactly one proposition, to wit: There must be in-groups whom the law protects but does not bind, alongside out-groups whom the law binds but does not protect.
> However, it was actually a 2018 blog response by 59-year-old Ohio composer Frank Wilhoit, years after Francis Wilhoit's death.