Target Detection Model Based on Once Bidirectional Feature Pyramid Network

Yunchuan Zhang; Lin Jiang; Li Lin

doi:10.3788/LOP220555

Journals >Laser & Optoelectronics Progress >Volume 60 >Issue 2 >Page 0215005 > Article

Laser & Optoelectronics Progress
Vol. 60, Issue 2, 0215005 (2023)

Target Detection Model Based on Once Bidirectional Feature Pyramid Network

Yunchuan Zhang, Lin Jiang^*, and Li Lin

Author Affiliations

Faculty of Science, Kunming University of Science and Technology, Kunming 650500, Yunnan, China

show less

DOI: 10.3788/LOP220555 Cite this Article Set citation alerts

Yunchuan Zhang, Lin Jiang, Li Lin. Target Detection Model Based on Once Bidirectional Feature Pyramid Network[J]. Laser & Optoelectronics Progress, 2023, 60(2): 0215005 Copy Citation Text

show less

Fig. 1. SSD model framework

Download full size

Fig. 2. Proposed model framework

Download full size

Fig. 3. Once Bi-FP module

Download full size

Fig. 4. Top to bottom feature fusion module

Download full size

Fig. 5. Bottom-to-top feature fusion module

Download full size

Fig. 6. Prediction module

Download full size

Fig. 7. FSSD model framework

Download full size

Fig. 8. Comparison of average precision of object detection model in PASCAL VOC2007 test set

Download full size

Fig. 9. Comparison of detection results between OBSSD model and SSD^* model. (a) cow; (b) car, boat; (c) bird, potted plants

Download full size

Block	Layer	Operation	Specific operational detail	Output feature size
Block 1	Conv1_1	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$300 \times 300 \times 64$
Block 1	Conv1_2	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$300 \times 300 \times 64$
Block 2	Pooling1	MaxPooling	$k = 2$ ， $s = 2$	$150 \times 150 \times 64$
	Conv2_1	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$150 \times 150 \times 128$
	Conv2_2	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$150 \times 150 \times 128$
Block 3	Pooling2	MaxPooling	$k = 2$ ， $s = 2$	$75 \times 75 \times 128$
	Conv3_1	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$75 \times 75 \times 256$
	Conv3_2	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$75 \times 75 \times 256$
	Conv3_3	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$75 \times 75 \times 256$
Block 4	Pooling3	MaxPooling	$k = 2$ ， $s = 2$	$38 \times 38 \times 256$
	Conv4_1	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$38 \times 38 \times 512$
	Conv4_2	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$38 \times 38 \times 512$
	Conv4_3	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$38 \times 38 \times 512$
Block 5	Pooling4	MaxPooling	$k = 2$ ， $s = 2$	$19 \times 19 \times 512$
	Conv5_1	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$19 \times 19 \times 512$
	Conv5_2	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$19 \times 19 \times 512$
	Conv5_3	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$19 \times 19 \times 512$
Block 6	Pooling5	MaxPooling	$k = 2$ ， $s = 1$ ， $p = 1$	$19 \times 19 \times 512$
	Conv6	Conv，Act	$k = 3$ ， $p = 6$ ， $d = 6$ ；ReLU	$19 \times 19 \times 1024$
	Conv7	Conv，Act	$k = 1$ ；ReLU	$19 \times 19 \times 1024$
Block 7	Conv8_1	Conv，Act	$k = 1$ ；ReLU	$19 \times 19 \times 256$
Block 7	Conv8_2	Conv，Act	$k = 3$ ， $s = 2$ ， $p = 1$ ；ReLU	$10 \times 10 \times 512$
Block 8	Conv9_1	Conv，Act	$k = 1$ ；ReLU	$10 \times 10 \times 128$
Block 8	Conv9_2	Conv，Act	$k = 3$ ， $s = 2$ ， $p = 1$ ；ReLU	$5 \times 5 \times 256$
Block 9	Conv10_1	Conv，Act	$k = 1$ ；ReLU	$5 \times 5 \times 128$
Block 9	Conv10_2	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$3 \times 3 \times 256$
Block 10	Conv11_1	Conv，Act	$k = 1$ ；ReLU	$3 \times 3 \times 128$
Block 10	Conv11_2	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$1 \times 1 \times 256$

Table 1. SSD backbone network structure

Efficient feature layer	Size	Number of prior frames per grid
Conv4_3	$38 \times 38$	4
Conv7	$19 \times 19$	6
Conv8_2	$10 \times 10$	6
Conv9_2	$5 \times 5$	6
Conv10_2	$3 \times 3$	4
Conv11_2	$1 \times 1$	4

Table 2. Number of prior frames of a single grid on effective feature layer

Stage	Optimizer	Batch_size	Freeze_train	Initial_Lr	Lr_scheduler	Epoch
1	Adam	32	True	0.0005	ReduceLROnPlateau	50
1	Adam	16	False	0.0001	ReduceLROnPlateau	150
2	SGD-M	32	True	0.001	MultiStepLR	50
	SGD-M	16	False	0.001	MultiStepLR	50

Table 3. Training strategies

Method	Dataset	Backbone	Input size	FPS	mAP /%
Faster^［4］	VOC07+12	VGG16	$600 \times 1000$	7	73.2
SSD（Baseline）^［10］	VOC07+12	VGG16	$300 \times 300$	59	74.3
SSD^*［10］	VOC07+12	VGG16	$300 \times 300$	52.6	76.9
DSSD^［11］	VOC07+12	ResNet-101	$321 \times 321$	13.6	78.6
DSOD^［29］	VOC07+12	DS/64-192-48-1	$300 \times 300$	17.4	77.7
RSSD^［12］	VOC07+12	VGG16	$300 \times 300$	35	78.5
FSSD^［30］	VOC07+12	VGG16	$300 \times 300$	65.8	78.8
ESSD^［31］	VOC07+12	VGG16	$300 \times 300$	25	79.4
FASSD^［32］	VOC07+12	ResNet-50	$300 \times 300$	30	78.1
DFSSD^［33］	VOC07+12	DenseNet-S-32-1	$300 \times 300$	11.6	78.9
FDSSD^［17］	VOC07+12	VGG16	$300 \times 300$	12.6	79.1
OBSSD	VOC07+12	VGG16	$300 \times 300$	41.7	80.8

Table 4. Comparison results of detection accuracy and detection speed on PASCAL VOC2007 test set

Method	mAP /%	areo	bicycle	bird	boat	bottle	bus	car	cat	chair	cow
Faster^［4］	73.2	76.5	79.0	70.9	65.5	52.1	83.1	84.7	86.4	52.0	81.9
SSD^［10］（baseline）	74.3	75.5	80.2	72.3	66.3	47.6	83.0	84.2	86.1	54.7	78.3
SSD^*［10］	76.9	76.9	86.6	74.5	66.4	50.4	85.0	84.7	87.3	61.0	78.7
DSSD^［11］	78.6	81.9	84.9	80.5	68.4	53.9	85.6	86.2	88.9	61.1	83.5
ESSD^［31］	79.4	82.6	86.1	79.8	72.2	54.7	86.8	86.9	88.2	62.8	85.2
OBSSD	80.8	82.7	89.7	81.5	71.8	53.7	90.7	90.0	90.6	64.8	86.2
Model	mAP /%	table	dog	horse	mbike	person	plant	sheep	sofa	train	tv
Faster^［4］	73.2	65.7	84.8	84.6	77.5	76.7	38.8	73.6	73.9	83.0	72.6
SSD^［10］（baseline）	74.3	73.9	84.5	85.3	82.6	76.2	48.6	73.9	76.0	83.4	74.0
SSD^*［10］	76.9	78.2	86.1	89.4	86.0	79.8	48.5	76.1	80.3	86.9	76.1
DSSD^［11］	78.6	78.7	86.7	88.7	86.7	79.7	51.7	78.0	80.9	87.2	79.4
ESSD^［31］	79.4	78.2	87.5	88.0	87.0	80.0	56.1	80.2	80.4	88.7	78.1
OBSSD	80.8	77.3	87.9	90.0	88.1	82.0	54.2	80.5	83.1	90.2	80.0

Table 5. Comparison of average precision results of 20 categories in PASCAL VOC2007 test set

Model	mAP@0.3 /%	mAP@0.5 /%	Size /MB	FPS
SSD^［10］		74.3	25.1	59
SSD^*［10］	80.8	76.9	25.1	52.6
PMSSD^*	82.9	78.2	25.6	48.2
OBMSSD^*	84.2	80.1	25.8	44.3
OBSSD^*	85.2	80.8	27.4	41.7

Table 6. Results of ablation experiment

Yunchuan Zhang, Lin Jiang, Li Lin. Target Detection Model Based on Once Bidirectional Feature Pyramid Network[J]. Laser & Optoelectronics Progress, 2023, 60(2): 0215005

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information

微信扫一扫：分享

微信扫一扫：分享