1. Depth image enhancement is crucial for many high-level 3D vision tasks, but the captured depth images from commercial depth sensors still suffer from various quality issues.

2. Traditional RGB-guided methods for depth image enhancement have limitations due to their reliance on prior assumptions/models and the difficulty in collecting real-world noisy-clean pairs for supervised learning.

3. The proposed self-supervised learning method exploits the dependency between RGB and depth modalities to significantly boost the enhancement performance on real-world noisy depth images without requiring any noisy-clean pairs. The method also utilizes an augmentation of the cross-modal dependency maximization formulation based on optimal transport theory to achieve further performance improvement.

