Beating pytorch and tensorflow kernels has been easy to do with ml compilers since ~2018. You typically train and evaluate your model in one of these frameworks then hand off the computation graph to a compiler like Apache TVM or your hardware vendor’s proprietary one. They should test their kernels against those kernels.
ML guided heuristic search over compute schedules is as old as 2013 (Halide for image processing)
ML guided heuristic search over compute schedules is as old as 2013 (Halide for image processing)