Robust Real-Time Endoscopic Stereo Matching under Fuzzy Tissue Boundaries

1Shanghai Jiao Tong University 2Hangzhou Dianzi University

RRESM enables real-time, high-quality depth estimation for binocular endoscopic images with fuzzy tissue boundaries.

Abstract

Real-time acquisition of accurate scene depth is essential for automated robotic minimally invasive surgery. Stereo matching with binocular endoscopy can provide this depth information. However, existing stereo matching methods, designed primarily for natural images, often struggle with endoscopic images due to fuzzy tissue boundaries and typically fail to meet real-time requirements for high-resolution endoscopic image inputs.

To address these challenges, we propose RRESM, a real-time stereo matching method tailored for endoscopic images. Our approach integrates a 3D Mamba Coordinate Attention module that enhances cost aggregation through position-sensitive attention maps and long-range spatial dependency modeling via the Mamba block, generating a robust cost volume without substantial computational overhead. Additionally, we introduce a High-Frequency Disparity Optimization module that refines disparity predictions near tissue boundaries by amplifying high-frequency details in the wavelet domain. Evaluations on the SCARED and SERV-CT datasets demonstrate state-of-the-art matching accuracy with a real-time inference speed of 42 FPS.

Framework Overview

Framework

Samples on Different Datasets

SCARED

SCARED sample

SERV-CT

SERV-CT sample

BibTeX

@article{ding2025rresm,
  title={Robust Real-Time Endoscopic Stereo Matching under Fuzzy Tissue Boundaries},
  author={Ding, Yang and Han, Can and Du, Sijia and Wang, Yaqi and Qian, Dahong},
  journal={arXiv preprint arXiv:2503.00731},
  year={2025}
}