Most people don't realize that Logistic regression can get ~90% accuracy on MNIST.
As a big fan of starting with simple models first and adding complexity later, I've frequently been told that "logistic regression won't work!" for problems where it can in fact perform excellent.
When faced with this resistance to logistic regression I'll often ask what they think the baseline performance of it would be on MNIST. The guesses I hear most often are 20-30%.
People, even machine learning people, often don't realize the rapidly diminishing returns you get for adding a lot of complexity in your models. Sometimes it's worth it, but it's always good to start simple first.
It's also been my experience that if you don't get good performance with a simple model, you are very unlikely to get great performance from a more complex one.
Came here to say the same thing, actually NCD can probably do much better than 78%. Li & Vitanyi's book about Kolmogorov complexity has some interesting unsupervised examples.
A simple CNN as implemented in Keras tutorial can easily exceed 98%. 78% is very poor performance for MNIST even if model complexity is penalized.
When would a 90% accuracy on a dataset like mnist ever be useful? And I mean useful as in usable for actual products or software. especially considering mnist is more of a toy dataset.
I think that's why machine learning is the way to go for this type of detection, why go with anything else than a CNN (in this case) when it is now trivial to set up and train? Again, unless it's just to mess around with, 90% mnist accuracy is not useful in the real world
I don't think the point was that you should use logistic regression on MNIST. In lesser-known problems, say a custom in-house model, if you don't try the simpler approach first, you'll never know that your more complex solution is not worth the extra expense, or is actually worse than a simpler, cheaper model. MNIST is well-known to have nearly perfect solutions at this point, but for most novel problems, the data scientist has no idea what is theoretically possible.
Now, you can say that CNNs or other techniques are easily accessible these days, and almost trivial to set up. But they may not be trivial to train and run in terms of compute in the real world.
> People, even machine learning people, often don't realize the rapidly diminishing returns you get for adding a lot of complexity in your models.
Are you me? I was just having this argument at work with someone about using an old (fasttext/word2vec) model vs. the overhead on fine-tuned BERT model for a fairly simple classification problem.
To a large degree this happens because logistic regression is not a sexy approach that one can add to their CV. Everyone wants to solve problems with big, complicated and buzzwordy models, because that sells (and is perhaps more interesting as well).
It's really a tragedy, because so many classic models would work fine for real world applications.
Extending existing features with their non-linear mappings would improve logistic regression too, probably to the svc level (rbf or poly kernel is that but implicit).
Linear models are really well researched, and today with the compute and with proper training and data preparation they can easily get to satisfying levels of performance for a variety of tasks.
this is true unless you can use large pretrained models, which are very simple to use, are very resistant to ask sorts of noise, & would get ultra high accuracy with just logistic regression on the output of the model
Most people don't realize that Logistic regression can get ~90% accuracy on MNIST.
As a big fan of starting with simple models first and adding complexity later, I've frequently been told that "logistic regression won't work!" for problems where it can in fact perform excellent.
When faced with this resistance to logistic regression I'll often ask what they think the baseline performance of it would be on MNIST. The guesses I hear most often are 20-30%.
People, even machine learning people, often don't realize the rapidly diminishing returns you get for adding a lot of complexity in your models. Sometimes it's worth it, but it's always good to start simple first.
It's also been my experience that if you don't get good performance with a simple model, you are very unlikely to get great performance from a more complex one.