Image segmentation is almost a solved problem. There is no reason why it should get confused even with a vision only system. Their problem is most likely that they don't have enough compute to process a history of frames and instead process a single image at a time leading to jumps in the segmentation results and those random jumps cause unpredictable braking.