Existing perception systems used during intervention operations rely on data from optical cameras, which limits capabilities in poor visibility or lighting conditions. In this work, we propose the opti-acoustic fusion method Sonar-MASt3R, which uses MASt3R to extract dense correspondences from optical camera data in real-time and pairs it with geometric cues from an acoustic 3D reconstruction to ensure robustness in turbid conditions. Experimental results using data recorded from an “opti-acoustic eye-in-hand” configuration across turbidity values ranging from 0.5 to 12 NTU highlight this method’s improved robustness to turbidity relative to baseline methods.