NVidia has decided that the market demand to do everything in Python justifies t...

NVidia has decided that the market demand to do everything in Python justifies the development cost of making Python fast in CUDA.

Thus now you can use PTX directly from Python, and with the new cu Tiles approach, you can write CUDA kernels in a Python subset.

Many of these tools get combined because that is what is already there, and the large majority of us don't want, or has the resources, to spend bootstrapting a whole new world.

Until there is some monetary advantage in doing so.