Consistent-Teacher: Towards Reducing Inconsistent Pseudo-targets in Semi-supervised Object Detection


CVPR 2023 (🔥Hightlight, 2.5% of all submissions🔥)

1Sensetime Research, 2Shanghai AI Lab, 3National University of Singapore, 4Peking University
*Co-first Author
DeRy pipeline

Consistent-Teacher includes 3 modules to stabilize the pseudo labels in SSOD, namely adaptive anchor assignment (ASA), 3D feature alignment module (FAM-3D) and Gaussian Mixture Model based threshold (GMM). We can train your detector to 40 mAP on MS-COCO with 10% labels.

Abstract

In this study, we dive deep into the inconsistency of pseudo targets in semi-supervised object detection (SSOD). Our core observation is that the oscillating pseudo targets undermine the training of an accurate semi-supervised detector. It not only injects noise into student training but also leads to severe overfitting on the classification task. Therefore, we propose a systematic solution, termed Consistent-Teacher , to reduce the inconsistency. First, adaptive anchor assignment (ASA) substitutes the static IoU-based strategy, which enables the student network to be resistant to noisy pseudo bounding boxes; Then we calibrate the subtask predictions by designing a 3D feature alignment module (FAM-3D). It allows each classification feature to adaptively query the optimal feature vector for the regression task at arbitrary scales and locations. Lastly, a Gaussian Mixture Model (GMM) dynamically revises the score threshold of the pseudo-bboxes, which stabilizes the number of ground-truths at an early stage and remedies the unreliable supervision signal during training. Consistent-Teacher provides strong results on a large range of SSOD evaluations. It achieves 40.0 mAP with ResNet-50 backbone given only 10% of annotated MS-COCO data, which surpasses previous baselines using pseudo labels by around 3 mAP. When trained on fully annotated MS-COCO with additional unlabeled data, the performance further increases to 47.7 mAP. Our code is open-sourced at https://github.com/Adamdad/ConsistentTeacher.

Motivation

Inconsistency refers to the fact that the pseudo boxes may be highly inaccurate and vary greatly at different stages of training. motivation

Motivation: Inconsisteny For SSOD

(Left) We compare the training losses between the Mean-Teacher and our Consistent-Teacher. In Mean-Teacher, inconsistent pseudo targets lead to overfitting on the classification branch, while regression losses become difficult to converge. In contrast, our approach sets consistent optimization objectives for the students, effectively balancing the two tasks and preventing overfitting.

(Right) Snapshots for the dynamics of pseudo labels and assignment. The Green and Red bboxes refer to the ground-truth and pseudo bbox for the polar bear. Red dots are the assigned anchor boxes for the pseudo label. The heatmap indicates the dense confidence score predicted by the teacher (brighter the larger). The nearby board is finally misclassified as a polar bear in the baseline while our adaptive assignment prevents overfitting.


Results

Our work address inconsistency in the SSOD. Here are some sample detection results at different timestep of during training

Red: False Positive; Blue: True Postive; Green: Ground-truth

Mean-Teacher Consistent-Teacher
Computer man Computer man
Computer man Computer man
Computer man Computer man

We also plot the different situtations caused by the inconsistency, and how we address them.

Red: False Positive; Blue: True Postive; Orange: Ground-truth

Computer man

Demos

Here are some demo videos for in the wild object detection

BibTeX

@article{wang2023consistent,
      author    = {Xinjiang Wang, Xingyi Yang, Shilong Zhang, Yijiang Li, Litong Feng, Shijie Fang, Chengqi Lyu, Kai Chen, Wayne Zhang },
      title     = {Consistent-Teacher: Towards Reducing Inconsistent Pseudo-targets in Semi-supervised Object Detection},
      journal   = {The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR)},
      year      = {2023},
  }