There's also explicit warp synchronization, i.e. __syncwarp(). More on warp primitives here: https://developer.nvidia.com/blog/using-cuda-warp-level-prim...