Then, where to read more about this? People who built autograd and other frameworks like Pytorch, mxnet, etc. should have learnt them in details somewhere. Where? AFAIK mxnet came out of academia (probably CMU).
Here's what you do: you watch this video of Andrew Karpathy [1] called "Becoming a backprop ninja". Then you pick up a function that you like and implement this backprop (which is a different way of saying reverse mode automatic differentiation) using just numpy. If you use some numpy broadcasting, an np.sum, some for-loops, you'll start getting a good feel for what's going on.
Then you can go and read this fabulous blog post [2], and if you like what you see, you go to the framework built by its author, called Small Pebble [3]. Despite the name, it's not all that small. If you peruse the code you'll get some appreciation of what it takes to build a solid autodiff library, and if push comes to shove, you'll be able to build one yourself.
I don't have a great answer. Most modern descriptions are shallow and/or unclear. My favorite discussions were actually in Werbos's original papers.
A nice overview was Backpropagation through time: what it does and how to do it, 1990. The rule itself is stated very clearly there, but without proof. The proof can be found in Maximizing long-term gas industry profits in two minutes in lotus using neural network methods, 1989 (which I believe was copied over from his earlier thesis, which I could never find a copy of).
The source code is much more interesting than some Twitter bro's hot take.
"We took a wrong turn w/ software."
... apparently.
Save your energy and check out the repo instead.
I don't totally blame the guy though. If you want to play the stupid Twitter game, you have to add a trite but provocative hot take to every post you make, lest your writings languish in obscurity. The only way to win is not to play, truly.
Takes me back .. once upon a time I was an admin on old OG Freenode + #C + #lisp and recall when that was floated to take potshots at and use as a beta test crowd.
The first non reddit team submissions were some very dry comp sci papers and tech manuals .. the cats came not long after.
He is not a Twitter bro though. He has developed software extensively. He is a PL researcher, a professor at BrownU, now working in computing education.
No doubt in real life he's a smart guy. I don't dispute that. It proves my point even more - when on Twitter, even the smartest people get reduced to playing the dumb hot take games.
Thanks for the link. Among other things it made be feel old, as in I have a post on my blog from back then writing about the Lisp to Python switch that Reddit was carrying out and it’s interesting to see computer programmers that look like they’re in their early 30s or such and not knowing about it at all.
Those were more interesting and especially more vibrant times when it came to the web. Granted, the money was not as abundant as it is now.
Surprised to see these introductory courses haven't been mentioned yet.
These courses [0] [1] are on EdX and are taught by UBC Professor Gregor Kiczales. The explanations are so lucid it made recursion click for me. You can audit these courses without paying a single dime. They are based on the book How to Design Programs [2] which has much more stuff than the courses.This book is used in UBC, UWaterloo, NorthEastern and many other places.
Along the same lines there is another book for beginners called A Data-Centric Introduction to Computing [3]. It is used in Brown for their introductory courses.
These two part course rearranged my brain about what really programming is about. It is about working with data and the way the data is arranged determines how most of the program/functionality (pun intended) is written.
Programming is not a series of instructions to be executed by the computer. That was when assembly language programming was the main stay of programming computers. Not now.
https://en.wikipedia.org/wiki/Tensor_calculus is a term you're much likelier to find as the subject of a book. Matrices are just specific special cases of tensors, after all, and mathematicians like to generalize.
A good book on differential geometry will also probably start with an overview of tensor calculus.
Hubbard and Hubbard's Vector Calculus is a great introduction to multivariable calculus and linear algebra that ends with an introduction to differential forms. And it has a complete solutions manual you can purchase as well.
I feel pure joy when I see HtDP mentioned anywhere on HN. This book made programming click for me. Turned me from fidling with code to thinking it through in a systematic way.
I hope the authors extend this someday to imperative programming too and compare and contrast.
https://arxiv.org/abs/2304.06035