Enhancing object detection efficiency with transformers through multi-level feature integration

Dung Nguyen Van-Dung Hoang Van-Tuong-Lan Le

Journal of Computer Science and Cybernetics

2026 vol. 42 pp. 1-18 Vietnam Academy of Science and Technology

doi.org/https://doi.org/10.15625/1813-9663/21114

Abstract

This paper presents a novel approach to enhancing object detection efficiency by inte-grating multi-level features within a transformer architecture. Traditional object detection methodsoften rely on single-level feature representations, which may limit their ability to accurately detectobjects of varying sizes and complexities. By leveraging multi-level feature integration within thetransformer framework, our method captures a richer set of spatial and semantic information, lead-ing to more precise and robust object detection. The powerful attention mechanisms of transformersare utilized to effectively combine these features, improving detection accuracy and localization. Theproposed approach is evaluated on the PASCAL VOC benchmark dataset, demonstrating superiorperformance over conventional single-level feature-based methods. Experimental results show thatour model achieves an mAP@0.5 of 87% on PASCAL VOC, outperforming recent state-of-the-artmethods while maintaining computational efficiency. These findings highlight the potential of multi-level feature integration within transformers in advancing the field of object detection.Keywords. high resolution, multi-level features, object detection, transformer.

Publisher ↗