In this study, we dive deep into the inconsistency of pseudo targets in semi-supervised object detection (SSOD). Our core observation is that the oscillating pseudo targets undermine the training of an accurate semi-supervised detector. It not only injects noise into student training but also leads to severe overfitting on the classification task. Therefore, we propose a systematic solution, termed Consistent-Teacher , to reduce the inconsistency. First, adaptive anchor assignment (ASA) substitutes the static IoU-based strategy, which enables the student network to be resistant to noisy pseudo bounding boxes; Then we calibrate the subtask predictions by designing a 3D feature alignment module (FAM-3D). It allows each classification feature to adaptively query the optimal feature vector for the regression task at arbitrary scales and locations. Lastly, a Gaussian Mixture Model (GMM) dynamically revises the score threshold of the pseudo-bboxes, which stabilizes the number of ground-truths at an early stage and remedies the unreliable supervision signal during training. Consistent-Teacher provides strong results on a large range of SSOD evaluations. It achieves 40.0 mAP with ResNet-50 backbone given only 10% of annotated MS-COCO data, which surpasses previous baselines using pseudo labels by around 3 mAP. When trained on fully annotated MS-COCO with additional unlabeled data, the performance further increases to 47.7 mAP. Our code is open-sourced at https://github.com/Adamdad/ConsistentTeacher.
(Left) We compare the training losses between the Mean-Teacher and our Consistent-Teacher. In Mean-Teacher, inconsistent pseudo targets lead to overfitting on the classification branch, while regression losses become difficult to converge. In contrast, our approach sets consistent optimization objectives for the students, effectively balancing the two tasks and preventing overfitting.
(Right) Snapshots for the dynamics of pseudo labels and assignment. The Green and Red bboxes refer to the ground-truth and pseudo bbox for the polar bear. Red dots are the assigned anchor boxes for the pseudo label. The heatmap indicates the dense confidence score predicted by the teacher (brighter the larger). The nearby board is finally misclassified as a polar bear in the baseline while our adaptive assignment prevents overfitting.
Red: False Positive; Blue: True Postive; Green: Ground-truth
Mean-Teacher | Consistent-Teacher |
We also plot the different situtations caused by the inconsistency, and how we address them.
Red: False Positive; Blue: True Postive; Orange: Ground-truth
Here are some demo videos for in the wild object detection
@article{wang2023consistent,
author = {Xinjiang Wang, Xingyi Yang, Shilong Zhang, Yijiang Li, Litong Feng, Shijie Fang, Chengqi Lyu, Kai Chen, Wayne Zhang },
title = {Consistent-Teacher: Towards Reducing Inconsistent Pseudo-targets in Semi-supervised Object Detection},
journal = {The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR)},
year = {2023},
}