PPEA-Depth: Progressive Parameter-Efficient Adaptation for Self-Supervised Monocular Depth Estimation
Yue-Jiang Dong1    Yuan-Chen Guo1    Ying-Tian Liu1    Fang-Lue Zhang2    Song-Hai Zhang1
1Tsinghua University    2Victoria University of Wellington
Abstract
Self-supervised monocular depth estimation is of significant importance with applications spanning across autonomous driving and robotics. However, the reliance on self-supervision introduces a strong static-scene assumption, thereby posing challenges in achieving optimal performance in dynamic scenes, which are prevalent in most real-world situations. To address these issues, we propose PPEA-Depth, a Progressive Parameter-Efficient Adaptation approach to transfer a pre-trained image model for self-supervised depth estimation. The training comprises two sequential stages: an initial phase trained on a dataset primarily composed of static scenes, succeeded by an expansion to more intricate datasets involving dynamic scenes. To facilitate this process, we design compact encoder and decoder adapters to enable parameter-efficient tuning, allowing the network to adapt effectively. They not only uphold generalized patterns from pre-trained image models but also retains knowledge gained from the preceding phase into the subsequent one. Extensive experiments demonstrate that PPEA-Depth achieves state-of-the-art performance on KITTI, CityScapes and DDAD datasets.
Top: The conventional training approach employs a consistent process for both static and dynamic datasets: it includes using a pre-trained image model as an encoder and fine-tuning all U-Net parameters for each dataset. Bottom: Our novel two-stage training paradigm integrates adapters to progressively tailor the pre-trained image models for depth perception initially on simple dataset (static scenes primarily) and then extends to intricate datasets (with dynamic scenes).
Qualitative Results
(Test Images are From CityScapes)
From Left to Right: the original input image, the estimated depth obtained by full fine-tuning a U-Net from scratch, and the estimated depth produced by our PPEA-Depth approach.