AKA-Net: A Self-supervised Colonoscopy Pose Estimation Method with Attentional Keypoints Adjustment Module

Anonymous, Senior Member, IEEE

Anonymous Anonymous Anonymous Anonymous

Resources 📚

Abstract 📝

Background: Due to the tortuous and complicated characteristics of the human intestinal tract, effective navigation, including colonoscope localization and environment perception, is significant for both manual and robot-assisted colonoscopy procedures. Accurate pose estimation of the colonoscope is crucial for visual navigation, as self-tracking in vivo is challenging due to the dynamic nature of the colon tissue. However, existing pose estimation methods often suffer from the mislocation in keypoint extraction and visual distortions under colonoscopy scenario.

Methods: To address these problems, we propose AKA-Net, a self-supervised network crafted for keypoint extraction in a colonoscopy environment. Specifically, we propose a novel attentional keypoints adjustment (AKA) module, which improves the performance of keypoint extraction by leveraging historical attention residual to efficiently refine the locating of keypoints in the current frame. Additionally, we propose two optimization strategies to address the visual distortion issues in colonoscopic images, which separately suppress motion artifacts in the low-frequency information and illumination discrepancy in the high-frequency information.

Results: Our network undergoes training on a dataset collected from a simulated colonoscopy environment powered by the Unity platform, and its effectiveness is subsequently validated through a custom-made colon phantom data acquisition setup.

Conclusions: The outcomes demonstrate an Average Trajectory Estimation Error (ATE) of 1.26±0.48mm on the Unity dataset and 1.44±0.49mm on the Phantom dataset, highlighting the potential of our network for application in both manual and robot-assisted colonoscopy procedures.

Images 🖼️

Method Overview

Figure 10

Method Overview

Network Architecture

Figure 2

Network Architecture

AKA Module

Figure 1

AKA Module

Training Pipeline

Figure 3

Training Pipeline

Dataset Overview

Figure 4

Dataset Overview

Experimental Setup

Figure 5

Experimental Setup

Qualitative Results

Figure 6

Qualitative Results

Quantitative Results

Figure 7

Quantitative Results

Error Analysis

Figure 8

Error Analysis

Error Analysis

Figure 9

Error Analysis

Results 📊

Our network undergoes training on a dataset collected from a simulated colonoscopy environment powered by the Unity platform, and its effectiveness is subsequently validated through a custom-made colon phantom data acquisition setup. The outcomes demonstrate an Average Trajectory Estimation Error (ATE) of 1.26±0.48mm on the Unity dataset and 1.44±0.49mm on the Phantom dataset, highlighting the potential of our network for application in both manual and robot-assisted colonoscopy procedures.

Unity Dataset Results

Average Trajectory Estimation Error (ATE) 1.26±0.48mm
Root Mean Square Error (RMSE) 0.51±0.20mm
Rotation Error (RE) 0.43±0.001rad

Phantom Dataset Results

Average Trajectory Estimation Error (ATE) 1.44±0.49mm
Root Mean Square Error (RMSE) 0.68±0.50mm
Rotation Error (RE) 0.83±0.046rad
Model ATE (mm) ↓ RMSE (mm) ↓ RE (rad) ↓ Type
Akaze 2.83±1.72 0.60±0.17 0.64±0.006 Traditional
Brisk 2.80±1.93 0.60±0.18 0.48±0.008 Traditional
ORB 3.27±2.16 0.62±0.18 0.54±0.011 Traditional
SC-Depthv2 9.68±5.66 1.26±0.39 0.44±0.001 Self-supervised
Endo-SfMLearner 9.27±3.25 1.26±0.23 0.45±0.017 Self-supervised
AKA-Net (Ours) 1.26±0.48 0.51±0.20 0.43±0.001 Self-supervised