I think there are 2 claims which this article conflates. I'll jokingly call them "Weak Whitney" and "Strong Whitney".
"Weak Whitney" is the claim that this very terse style is comprehensible given sufficient study. I find this plausible.
"Strong Whitney" is the claim that in many circumstances this style is _better_ than a more normal style, with full variable names, whitespace, etc. I am much less persuaded by that claim. When the article says things like "Note that d probably stands for “dimension” or perhaps “depth”; I’m unsure on this point." I'm like, yes, congratulations, that is exactly the point of using more descriptive names and/or comments.
I don't think we have enough information to conclude exactly what happened. But my read is the researcher was looking for demo.filevine.com and found margolis.filevine.com instead. The implication is that many other customers may have been vulnerable in the same way.
Ah, I see now that I read too quickly - the "open demo environment" was clearly referencing the idea that the vendor (Filevine) would have a live demo, NOT that each client wanted an open playground demo account that is linked to a subset of their data (which would be utterly insane).
Lawyers can and will send cease and desist letters to people whether or not there is any legal basis for it. Often the threat of a lawsuit, even a meritless one, is enough to keep people quiet.
ArXiv has always had a moderation step. The moderators are unable to keep up with the volume of submissions. Accepting these reviews without moderation would be a change to current process, not "just like arXiv has always worked"
Unfortunately, (this kind of) AI doesn't accelerate review. (That's before you get into the ease of producing adversarial inputs: a moderation system not susceptible to these could be wired up backwards as a generation system that produces worthwhile research output, and we don't have one of those.)
I'm skeptical: use two different AIs which don't share the same weaknesses + random sample of manual reviews + blacklisting users that submit adversarial inputs for X years as a deterrent.
But how do you know an input is adversarial? There are other issues: verdicts are arbitrary, the false positive rate means you'd need manual review of all the rejects (unless you wanted to reject something like 5% of genuine research), you need the appeals process to exist and you can't automate that, so bad actors can still flood your bureaucracy even if you do implement an automated review process…
I'm not on the moderation bandwagon to begin with per the above, but if an organization invents a bunch of fake reasons that they find convincing, then any system they come up with is going to have its flaws. Ultimately, the goal is to make cooperation easy and defection costly.
> But how do you know an input is adversarial?
Prompt injection and jailbreaking attempts are pretty clear. I don't think anything else is particularly concerning.
> the false positive rate means you'd need manual review of all the rejects (unless you wanted to reject something like 5% of genuine research)
Not all rejects, just those that submit an appeal. There are a few options, but ultimately appeals require some stakes, such as:
1. Every appeal carries a receipt for a monetary donation to arxiv that's refunded only if the appeal succeeds.
2. Appeal failures trigger the ban hammer with exponentially increasing times, eg. 1 month, 3 months, 9 months, 27 months, etc.
Bad actors either respond to deterrence or get filtered out while funding the review process itself.
> I don't think anything else is particularly concerning.
You can always generate slop that passes an anti-slop filter, if the anti-slop filter uses the same technology as the slop generator. Side-effects may include: making it exceptionally difficult for humans to distinguish between adversarial slop, and legitimate papers. See also: generative adversarial networks.
> Not all rejects, just those that submit an appeal.
So, drastically altering the culture around how the arXiv works. You have correctly observed that "appeals require some stakes" under your system, but the arXiv isn't designed that way – and for good reason. An appeal is either "I think you made a procedural error" or "the valid procedural reasons no longer apply": adding penalties for using the appeals system creates a chilling effect, skewing the metrics that people need to gain insight as to whether a problem exists.
Look at the article numbers. Year, month, and then a 5-digit code. It is not expected that more than 100k articles will be submitted in a given month, across all categories. If the arXiv ever needs a system that scales in the way yours does, with such sloppy tolerances, then it'll be so different to what it is today that it should probably have a different name.
If we were to add stakes, I think "revoke endorsement, requiring a new set of endorsers" would be sufficient. (arXiv endorsers already need to fend off cranks, so I don't think this would significantly impact them.) Exponential banhammer isn't the right tool for this kind of job, and I think we certainly shouldn't be getting the financial system involved (see the famous paper A Fine is a Price by Uri Gneezy and Aldo Rustichini: https://rady.ucsd.edu/_files/faculty-research/uri-gneezy/fin...).
"As soon as profit can be made" is exactly what the article is warning about. This is exactly the "Human + AI" combination.
Within your lifetime (it's probably already happened) you will be denied something you care about (medical care, a job, citizenship, parole) by an AI which has been granted the agency to do so in order to make more profit.
Not exactly the same, but https://en.wikipedia.org/wiki/Very-long-baseline_interferome... is a technique used in radio astronomy which uses two receivers a long way apart and a very accurate clock to approximate having a radio telescope the size of the distance between them. By putting one receiver on a satellite, measurements have been made with a separation of 300,000km.
"Weak Whitney" is the claim that this very terse style is comprehensible given sufficient study. I find this plausible.
"Strong Whitney" is the claim that in many circumstances this style is _better_ than a more normal style, with full variable names, whitespace, etc. I am much less persuaded by that claim. When the article says things like "Note that d probably stands for “dimension” or perhaps “depth”; I’m unsure on this point." I'm like, yes, congratulations, that is exactly the point of using more descriptive names and/or comments.