Right, getting more corner cases training data won't solve an architecture problem. AI in general quickly impresses when it's mostly right but improving from there is the challenge.
More data won't help if the problem is the tools as such.
Grandparent said the hard part is getting rid of the last 1%, parent claimed Elon said the dame when he said 99.99% of the trait data is useless.
But it's not the same.
Elon thinks he just needs the right data to solve the problem but it could be impossible even if he gets the data because of the limitations of the used type of AI.
If you need a screwdriver but only have a hammer more nails won't help.
AI is here used by me as an umbrella term for computer decision systems.
And yet on a very simple drive I have to intervene 4-6 times over a distance of 8 miles. How is this not useful? It would have been easier to ask people to record how to drive roads by now and use video game track logic where you race a ghost by now…
The only time fsd works ‘ok’, single lane roads with 90 degree stop signs / turns.
I don’t believe that the current hardware can handle what is needed to have passable FSD for an average consumer.
No. For the easy 99.999% of driving they keep very little of the training data.
Basically you want to minimize manual interventions (aka disengagements). When the driver intervenes, they keep a few seconds before (30 seconds?) and after that intervention and add that to the training data.
So their training data is basically just the exceptional cases.
They need to just make sure they don’t overfit so that the learned model actually does have some “understanding” of why decisions are made and can generalize.
It's not clear that a bunch of cascaded rectified linear functions will every generalize to near 100%. The error floor is at a dangerous level regardless of training. AGI is needed to tackle the final 1%>
The universal approximation theorem disagrees. The question is how large the network should be and how much training data it needs. And for now it can only be tested experimentally.
The universal approximation theorem does not apply once you include any realistic training algorithms / stochastic gradient descent. There isn't a learnability guarantee.
You said it only depends on network size, I'm saying it more likely is impossible regardless of network size due to fundamental limits in training methods.
https://www.teslarati.com/fsd-distance-driven-training-musk/