Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's exactly what Elon has said. He's said that 99.99% of all vehicle miles are useless from a training perspective.

https://www.teslarati.com/fsd-distance-driven-training-musk/



That's not the same. Elon thinks if he gets the right training data he gets rid of the dangerous 1% of errors.

But that is not the case if it's a general problem of the used AI.


Right, getting more corner cases training data won't solve an architecture problem. AI in general quickly impresses when it's mostly right but improving from there is the challenge.


What are you using for "AI"? AI is a buzzword, unless you know specifically how they are using it, you can't make this claim.

Many marketing teams are just using AI when it is really ML doing the work, or it could be both.


There is no claim just a fact.

More data won't help if the problem is the tools as such.

Grandparent said the hard part is getting rid of the last 1%, parent claimed Elon said the dame when he said 99.99% of the trait data is useless.

But it's not the same. Elon thinks he just needs the right data to solve the problem but it could be impossible even if he gets the data because of the limitations of the used type of AI.

If you need a screwdriver but only have a hammer more nails won't help.

AI is here used by me as an umbrella term for computer decision systems.


And yet on a very simple drive I have to intervene 4-6 times over a distance of 8 miles. How is this not useful? It would have been easier to ask people to record how to drive roads by now and use video game track logic where you race a ghost by now…

The only time fsd works ‘ok’, single lane roads with 90 degree stop signs / turns.

I don’t believe that the current hardware can handle what is needed to have passable FSD for an average consumer.


So isn't that a deep problem to his FSD architecture?


No. For the easy 99.999% of driving they keep very little of the training data.

Basically you want to minimize manual interventions (aka disengagements). When the driver intervenes, they keep a few seconds before (30 seconds?) and after that intervention and add that to the training data.

So their training data is basically just the exceptional cases.

They need to just make sure they don’t overfit so that the learned model actually does have some “understanding” of why decisions are made and can generalize.


It's not clear that a bunch of cascaded rectified linear functions will every generalize to near 100%. The error floor is at a dangerous level regardless of training. AGI is needed to tackle the final 1%>


The universal approximation theorem disagrees. The question is how large the network should be and how much training data it needs. And for now it can only be tested experimentally.


The universal approximation theorem does not apply once you include any realistic training algorithms / stochastic gradient descent. There isn't a learnability guarantee.


There's no theorem that SGD is insufficient. So, as I said, it's empirical.


You said it only depends on network size, I'm saying it more likely is impossible regardless of network size due to fundamental limits in training methods.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: