Anonymous, Senior Member, IEEE
Background: Due to the tortuous and complicated characteristics of the human intestinal tract, effective navigation, including colonoscope localization and environment perception, is significant for both manual and robot-assisted colonoscopy procedures. Accurate pose estimation of the colonoscope is crucial for visual navigation, as self-tracking in vivo is challenging due to the dynamic nature of the colon tissue. However, existing pose estimation methods often suffer from the mislocation in keypoint extraction and visual distortions under colonoscopy scenario.
Methods: To address these problems, we propose AKA-Net, a self-supervised network crafted for keypoint extraction in a colonoscopy environment. Specifically, we propose a novel attentional keypoints adjustment (AKA) module, which improves the performance of keypoint extraction by leveraging historical attention residual to efficiently refine the locating of keypoints in the current frame. Additionally, we propose two optimization strategies to address the visual distortion issues in colonoscopic images, which separately suppress motion artifacts in the low-frequency information and illumination discrepancy in the high-frequency information.
Results: Our network undergoes training on a dataset collected from a simulated colonoscopy environment powered by the Unity platform, and its effectiveness is subsequently validated through a custom-made colon phantom data acquisition setup.
Conclusions: The outcomes demonstrate an Average Trajectory Estimation Error (ATE) of 1.26±0.48mm on the Unity dataset and 1.44±0.49mm on the Phantom dataset, highlighting the potential of our network for application in both manual and robot-assisted colonoscopy procedures.
Method Overview
Network Architecture
AKA Module
Training Pipeline
Dataset Overview
Experimental Setup
Qualitative Results
Quantitative Results
Error Analysis
Error Analysis
Our network undergoes training on a dataset collected from a simulated colonoscopy environment powered by the Unity platform, and its effectiveness is subsequently validated through a custom-made colon phantom data acquisition setup. The outcomes demonstrate an Average Trajectory Estimation Error (ATE) of 1.26±0.48mm on the Unity dataset and 1.44±0.49mm on the Phantom dataset, highlighting the potential of our network for application in both manual and robot-assisted colonoscopy procedures.
| Model | ATE (mm) ↓ | RMSE (mm) ↓ | RE (rad) ↓ | Type |
|---|---|---|---|---|
| Akaze | 2.83±1.72 | 0.60±0.17 | 0.64±0.006 | Traditional |
| Brisk | 2.80±1.93 | 0.60±0.18 | 0.48±0.008 | Traditional |
| ORB | 3.27±2.16 | 0.62±0.18 | 0.54±0.011 | Traditional |
| SC-Depthv2 | 9.68±5.66 | 1.26±0.39 | 0.44±0.001 | Self-supervised |
| Endo-SfMLearner | 9.27±3.25 | 1.26±0.23 | 0.45±0.017 | Self-supervised |
| AKA-Net (Ours) | 1.26±0.48 | 0.51±0.20 | 0.43±0.001 | Self-supervised |