Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>Sorry, it won’t. This might even be peak GPT. Training data comes from human content, and currently there is decades worth of pure human content available. But new content will come in slowly, and it will probably take decades just to double the amount of training data we have today.

One word: AlphaZero. Deepmind ran out of human go games to study, but it turned out that self-play was dramatically better. Your argument only holds if a) there's a linear relationship between the amount of training data and the quality of a model and b) GPT is close to maximally efficient in converting training data into useful weights. Both of these premises are demonstrably false.

GPT-4 is, in the scheme of what's possible, an incredibly primitive model that uses training data very inefficiently. In spite of that, a dumb brute force architecture still managed to vastly exceed everyone's expectations and advance the SOTA by a huge leap.



In go, or similarly chess, the AI can play stupendous number of games against itself and get accurate feedback for every single game. Everything is there to create your own training set just from knowing the rules. But outside of such games, how does an AI create it's own training data when there is no function to tell you how well you are doing? This might be a dumb question, I don't have any idea on how LLMs work


One such function is “what happens next?” which may work as well in the real world as on textual training data. Certainly it’s part of how human babies learn, via schemas.


Creating something is much harder than verifying it.

A simple setup for improving coding skills is the following:

1. GPT is given a coding task to implement as a high level prompt.

2. It generates unit tests to verify that the implementation is correct.

3. It generates code to implement the algorithm.

4. It runs the generated code against the generated unit tests. If there are errors generated by the interpreter/compiler, go back to Step 3, modify the code appropriately and try again.

5. If there are no errors found, take the generated code as a positive example and update the model weights with reinforcement learning.


What if it’s wrong at step 2?


The most naive way you could do things could be to procedurally generate immense amounts of python code, then ask the model to predict whether the code will compile, whether it will crash, what its outputs will be given certain inputs, etc.


Code execution is also a good way to collect feedback signals.


Well, there sort of is a linear relationship between the ammount of training data and the quality of the model [1]

[1]: https://arxiv.org/abs/2203.15556




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: