More

emmender2 · on June 5, 2024

I was waiting for: "but humans do that too" and bingo.

on another note: an entire paper written on one prompt - is this the state of research these days ?

finally: a giant group of data-entry technicians are likely entering these exceptions into the training dataset at openai.

gwbas1c · on June 5, 2024

> an entire paper written on one prompt - is this the state of research these days

Years ago I attended an AI meetup where the organizer would discuss the differences between reductionist and wholistic research.

This is reductionist research.

In this case, reducing the paper to one prompt is to allow for objective comparison of models.

Otherwise, without a reductionist approach, how does one perform such an experiment and ensure that they are comparing apples to apples?

emmender2 · on June 5, 2024

what I would like to see is a parameterized class of prompts which can never be solved by the LLMs even when a finite number of them are manually added to the dataset.

gwbas1c · on June 5, 2024

Well, should we consider ChatGPT a Turing machine? Otherwise, I think an answer like that either requires significantly more research; insight; or general knowledge about how LLMs work.

IE, you're getting into areas that are analogous to Turing's theories. I don't think he came up with those theories overnight.

emmender2 · on June 5, 2024

yawn, we need ai bust to get these folks to see sense.

until then, they have a free pass to get away with such scare-mongering bs.

ben_w · on June 5, 2024

What would you need to see, short of the actual end of the world, to take the risk seriously?

I don't have to be in a car crash or get shot in the head to know this is a bad thing, and nobody sane is going to bother causing the end of the world just to convince you it's possible.

c1sc0 · on June 5, 2024

Maybe a small-scale demonstration then? Seems to have worked so far for nuclear weapons proliferation.

ben_w · on June 5, 2024

> Maybe a small-scale demonstration then?

What, precisely, does that even mean?

Critics (I initially said "you", but rereading this is ambiguous) clearly don't accept anything that currently exists as such a demonstration: not the models which are superhuman at strategy games; not the automation actually used by the real militaries despite dangerous flaws (whose bugs have resulted in NATO early warning systems being triggered by the moon and Soviet ones by the sun, or planes nose-diving because of numerical underflow); not the use of LLMs to automate propaganda; not Cambrige Analytica; not the lack of controls that resulted in the UN determining that Facebook bore some responsibility for the (ongoing) genocide in Myanmar; not the examples given in the safety report on GPT-4 prior to release showing how it was totally willing to explain how to make chemical weapons; not the report the other year where a drug safety system was turned into a chemical weapon discovery tool by deliberately flipping the sign of the reward function; and not the OpenAI report on maximal misalignment in their own models caused by flipping the sign of a reward function by accident.

What is the smallest "small scale" demonstration that people who currently laugh at the idea of the possibility of a problem, won't ignore?

c1sc0 · on June 5, 2024

Flipping the sign on a reward function to protect us from the next pandemic should fit the bill. If we ignore that I guess we deserve it?

ben_w · on June 5, 2024

> Flipping the sign on a reward function to protect us from the next pandemic should fit the bill

I don't understand, are you suggesting flipping the reward function of reproductive fitness itself, in vivo, of DNA/RNA?

And how is "protect" supposed to demonstrate danger? That's like saying "ACAB protestors are dumb, I'll only believe the police are evil when they catch a gunman"?

emmender2 · on May 25, 2024

did data-herald not find usecases or user problems to solve using its tech ?

are any startups applying LLMs profitable at all ? or is it just a mirage - ie, in the real world, startups are not able to solve users problems well using LLMs.

emmender2 · on May 17, 2024

When you tell them it is a fake, they will believe more strongly that it is real.

Dont we all live in the joyful bubble of beliefs many of which have no basis ?

emmender2 · on April 5, 2024

Researchers are trying their damndest to build a "reasoning" layer using LLMs as the foundation. But, they need to go back to the drawing-board and understand from first principles what it means to reason. For this in my view, they need to go back to epistemology (and refer to Peirce and logicians like him).

emmender2 · on March 27, 2024

this proves that all llm models converge to a certain point when trained on the same data. ie, there is really no differentiation between one model or the other.

Claims about out-performance on tasks are just that, claims. the next iteration of llama or mixtral will converge.

LLMs seem to evolve like linux/windows or ios/android with not much differentiation in the foundation models.

jobigoud · on March 27, 2024

It's even possible they converge when trained on different data, if they are learning some underlying representation. There was recent research on face generation where they trained two models by splitting one training set in two without overlap, and got the two models to generate similar faces for similar conditioning, even though each model hadn't seen anything that the other model had.

IshKebab · on March 27, 2024

That sounds unsurprising? Like if you take any set of numbers, randomly split it in two, then calculate the average of each half... it's not surprising that they'll be almost the same.

If you took two different training sets then it would be more surprising.

Or am I misunderstanding what you mean?

MajimasEyepatch · on March 27, 2024

It doesn't really matter whether you do this experiment with two training sets created independently or one training set split in half. As long as both are representative of the underlying population, you would get roughly the same results. In the case of human faces, as long as the faces are drawn from roughly similar population distributions (age, race, sex), you'll get similar results. There's only so much variation in human faces.

If the populations are different, then you'll just get two models that have representations of the two different populations. For example, if you trained a model on a sample of all old people and separately on a sample of all young people, obviously those would not be expected to converge, because they're not drawing from the same population.

But that experiment of splitting one training set in half does tell you something: the model is building some sort of representation of the underlying distribution, not just overfitting and spitting out chunks of copy-pasted faces stitched together.

evrial · on March 29, 2024

That's explanation of central limit theorem in statistics. And any language is mostly statistics and models are good at statistical guessing of the next word or token.

taneq · on March 27, 2024

If not are sampled from the same population then they’re not really independent, even if they’re totally disjoint.

evrial · on March 29, 2024

They are sourced mostly from the same population and crawled from everything can be crawled.

Tubbe · on March 27, 2024

Got a link for that? Sounds super interesting

d_burfoot · on March 27, 2024

https://en.wikipedia.org/wiki/Theory_of_forms

bobbylarrybobby · on March 27, 2024

I mean, faces are faces, right? If the training data set is large and representative I don't see why any two (representative) halves of the data would lead to significantly different models.

arcticfox · on March 27, 2024

I think that's the point; language is language.

If there's some fundamental limit of what type of intelligence the current breed of LLMs can extract from language, at some point it doesn't matter how good or expansive the content of the training set is. Maybe we are finally starting to hit an architectural limit at this point.

dumbfounder · on March 27, 2024

But information is not information. They may be able to talk in the same style, but not about the same things.

swalsh · on March 27, 2024

The models are commodities, and the API's are even similar enough that there is zero stickiness. I can swap one model for another, and usually not have to change anything about my prompts or rag pipelines.

For startups, the lesson here is don't be in the business of building models. Be in the business of using models. The cost of using AI will probably continue to trend lower for the foreseeable future... but you can build a moat in the business layer.

spxneo · on March 27, 2024

Excellent comment. Shows good awareness of economic forces at play here.

We are just going to use whatever LLM is best fast/cheap and the giants are in an arms race to deliver just that.

But only two companies in this epic techno-cold war have an economic moat but the other moat is breaking down inside the moat of the other company. The moat inside the moat cannot run without the parent moat.

rayval · on March 27, 2024

Intriguing comment that I don't quite follow. Can you please elaborate?

stolsvik · on March 28, 2024

Probably OpenAI running on Azure. But it was still convoluted.

stri8ed · on March 27, 2024

Or be in the business of building infrastructure for AI inference.

cheselnut · on March 27, 2024

Is this not the same argument? There are like 20 startups and cloud providers all focused on AI inference. I'd think application layer receives the most value accretion in the next 10 years vs AI inference. Curious what others think

sparks1970 · on March 27, 2024

Or be in the business of selling .ai domain names.

sroussey · on March 27, 2024

Embeddings are not interchangeable. However, you can setup your system to have multiple embeddings from different providers for the same content.

jimmySixDOF · on March 27, 2024

There are people who make the case for custom fine tuned embedding models built to match your specific types of data and associations. Whatever you use internally it gets converted to the foundation model of choice's formats by their tools on the edge. Still Embeddings and the chunking strategies feeding into them are both way too underappreciated parts of the whole pipeline.

swalsh · on March 27, 2024

Embeddings are indeed sticky, I was referring to the LLM model itself.

esafak · on March 28, 2024

That's not what investors believe. They believe that due to training costs there will be a handful of winners who will reap all the benefits, especially if one of them achieves AGI. You can tell by looking at what they've invested most in: foundation models.

phillipcarter · on March 28, 2024

I don't think I agree with that. For my work at least, the only model I can swap with OpenAI and get similar results is Claude. None of the open models come even close to producing good outputs for the same prompt.

n2d4 · on March 27, 2024

There's at least an argument to be made that this is because all the models are heavily trained on GPT-4 outputs (or whatever the SOTA happens to be during training). All those models are, in a way, a product of inbreeding.

fragmede · on March 27, 2024

But is it the kind of inbreeding that gets you Downs, or the kwisatz haderach?

batshit_beaver · on March 28, 2024

pram · on March 27, 2024

Consider the bulldog: https://youtube.com/watch?v=hUgmkCgMWbg

sumo43 · on March 27, 2024

Maybe true for instruct, but pretraining datasets do not usually contain GPT-4 outputs. So the base model does not rely on GPT-4 in any way.

mnemoni_c · on March 27, 2024

Yea it feels like transformer LLMs are in or getting closer to diminishing returns. Will need some new breakthrough, likely entirely new approach, to get to AGI levels

Tubbe · on March 27, 2024

Yeah, we need radically different architecture in terms of the neural networks, and/or added capabilities such as function calling and RAG to improve the current sota

mattsan · on March 27, 2024

can't wait for LLMs to dispatch field agent robots who search for answers in the real world thats not online /s

htrp · on March 27, 2024

skynet would like a word

throwaway74432 · on March 27, 2024

LLMs are a commodity

https://www.investopedia.com/terms/c/commodity.asp

paxys · on March 27, 2024

Maybe, but that classification by itself doesn't mean anything. Gold is a commodity, but having it is still very desirable and valuable.

Even if all LLMs were open source and publicly available, the GPUs to run them, technical know how to maintain the entire system, fine tuning, the APIs and app ecosystem around them etc. would still give the top players a massive edge.

throwaway74432 · on March 27, 2024

Of course realizing that a resource is a commodity means something. It means you can form better predictions of where the market is heading, as it evolves and settles. For example, people are starting to realize that these LLMs are converging on fungible. That can be communicated by the "commodity" classification.

YetAnotherNick · on March 27, 2024

Even in the most liberal interpretation of prove, it doesn't do that. GPT-4 was trained before OpenAI has any special data or deal with microsoft or the product market fit. Yet, no model has beaten it in a year. And google, microsoft, meta definitely have better data and more compute.

gerash · on March 27, 2024

The evaluations are not comprehensive either. All of them are improving and you can't expect any of them to hit 100% on the metrics (a la. bayes error rate). It gets increasingly difficult to move the metrics as they get better.

falcor84 · on March 27, 2024

> this proves that all llm models converge to a certain point when trained on the same data

They are also all trained to do well on the same evals, right? So doesn't it just boil down to neural nets being universal function approximators?

bevekspldnw · on March 27, 2024

The big thing for locally hosted is inference efficiency and speed. Mistral wears that crown by a good margin.

crooked-v · on March 27, 2024

Of course, part of this is that a lot of LLMs are now being trained on data that is itself LLM-generated...

emmender2 · on March 19, 2024

the parable also has a "separating perception from reality" flavor for the science types.

that is, the market is perception of what is out there, reinforced by the herd mentality. the reality is what actually exists.

eventually perception and reality tend to converge.

emmender2 · on March 15, 2024

thinking step-by-step requires 100% accuracy in each step. If you are 95% accurate in each step, after the 10th step, the accuracy of the reasoning chain drops to 59%. this is the fundamental problem with llm for reasoning.

reasoning requires deterministic symbolic manipulation for accuracy. only then it can be composed into long chains.

throwuwu · on March 15, 2024

You’ve never made a mistake in your reasoning?

Tongue in cheek but this has been considered and has resulted in experiments like tree of thought and various check your work and testing approaches. Thinking step by step is really just another way of saying make a plan or use an algorithm and when humans do either they need to periodically re-evaluate what they’ve done so far and ensure it’s correct.

The trick is training the model to do this as a matter of course and to learn which tool to apply at the right time which is what the paper is about wrt interspersed thoughts.

trenchgun · on March 15, 2024

>reasoning requires deterministic symbolic manipulation for accuracy

No, that is automation. Automated reasoning is a thing, indeed. And I can kind of see a world where there is a system which uses LLM for creative thinking, augmented with automated reasoning systems (think datalog, egg, SMT-solver, probabilistic model checking etc).

hesdeadjim · on March 15, 2024

I dream of a world where the majority of humans could come close to 59% after attempting a ten step logical process.

emmender2 · on March 16, 2024

wut

the average theorem in euclids' elements (written 2000 years back) would have a reasoning chain of at least 10 steps.

all of the mathematical machinery humans build need 100% accuracy in each step

emmender2 · on March 17, 2024

all human knowledge is created by a small number of people. most of us just regurgitate and use it.

think euclid, galileo, newton, maxwell, etc...

and all human knowledge is mathematical in nature (galileo said this).

what is meant here is that, facts and events in the world we perceive can be compressed into small models which are mathematical in nature and allow a deductive method.

human genius comprises of coming up with these models. This process is described by Peirce (and Kant before him) ie, inventing concepts and relations between them to comprise models of the world we live in.

imagine compressing all observed motion into a few equations of physics. or compress all electromagnetic phenomena into a few equations. and then use this machinery to make things happen.

imagine if we feed a lot of perceived motion data into a giant black-box (which could be a neural net) - and out comes a small model of that data comprising newton's equations (and similarly maxwellian equations).

But, this giant knowledge edifice is built on solid foundations of mathematical reasoning (newton said this).

human genius is to invent a mathematical language to describe imaginary worlds precisely, and then a scientific method to apply that language to model the real world.

emmender2 · on March 10, 2024

there are facts,events,narratives and there is knowledge.

knowledge consists of models of the world we have constructed and learnt, which abstract patterns of facts.

facts,narratives make for banter with friends (bonding) but knowledge helps with action (decision).

when reading, demarcate narratives from models, and/or layout the facts against known mental models. this may point to deficits in mental models, or missing models altogether.

most of my reading unfortunately is mindless soaking up of pointless narratives.

emmender2 · on March 5, 2024

i see many startups deeply understanding end-customer-workflows and usecases, and then experimenting with how LLM may improve that.

customer-service, code-assist, call-center are a few areas which show early promise wherein customers are willing to pay for the added value. outside of these areas, i am yet to see breakthrough applications for which people are willing to pay. let me know if this is mistaken.