Author Affiliations
Faculty of Science, Kunming University of Science and Technology, Kunming 650500, Yunnan, Chinashow less
Fig. 1. SSD model framework
Fig. 2. Proposed model framework
Fig. 3. Once Bi-FP module
Fig. 4. Top to bottom feature fusion module
Fig. 5. Bottom-to-top feature fusion module
Fig. 6. Prediction module
Fig. 7. FSSD model framework
Fig. 8. Comparison of average precision of object detection model in PASCAL VOC2007 test set
Fig. 9. Comparison of detection results between OBSSD model and SSD* model. (a) cow; (b) car, boat; (c) bird, potted plants
Block | Layer | Operation | Specific operational detail | Output feature size |
---|
Block 1 | Conv1_1 | Conv,Act | ,;ReLU | | Conv1_2 | Conv,Act | ,;ReLU | | Block 2 | Pooling1 | MaxPooling | , | | Conv2_1 | Conv,Act | ,;ReLU | | Conv2_2 | Conv,Act | ,;ReLU | | Block 3 | Pooling2 | MaxPooling | , | | Conv3_1 | Conv,Act | ,;ReLU | | Conv3_2 | Conv,Act | ,;ReLU | | Conv3_3 | Conv,Act | ,;ReLU | | Block 4 | Pooling3 | MaxPooling | , | | Conv4_1 | Conv,Act | ,;ReLU | | Conv4_2 | Conv,Act | ,;ReLU | | Conv4_3 | Conv,Act | ,;ReLU | | Block 5 | Pooling4 | MaxPooling | , | | Conv5_1 | Conv,Act | ,;ReLU | | Conv5_2 | Conv,Act | ,;ReLU | | Conv5_3 | Conv,Act | ,;ReLU | | Block 6 | Pooling5 | MaxPooling | ,, | | Conv6 | Conv,Act | ,,;ReLU | | Conv7 | Conv,Act | ;ReLU | | Block 7 | Conv8_1 | Conv,Act | ;ReLU | | Conv8_2 | Conv,Act | ,,;ReLU | | Block 8 | Conv9_1 | Conv,Act | ;ReLU | | Conv9_2 | Conv,Act | ,,;ReLU | | Block 9 | Conv10_1 | Conv,Act | ;ReLU | | Conv10_2 | Conv,Act | ,;ReLU | | Block 10 | Conv11_1 | Conv,Act | ;ReLU | | Conv11_2 | Conv,Act | ,;ReLU | |
|
Table 1. SSD backbone network structure
Efficient feature layer | Size | Number of prior frames per grid |
---|
Conv4_3 | | 4 | Conv7 | | 6 | Conv8_2 | | 6 | Conv9_2 | | 6 | Conv10_2 | | 4 | Conv11_2 | | 4 |
|
Table 2. Number of prior frames of a single grid on effective feature layer
Stage | Optimizer | Batch_size | Freeze_train | Initial_Lr | Lr_scheduler | Epoch |
---|
1 | Adam | 32 | True | 0.0005 | ReduceLROnPlateau | 50 | Adam | 16 | False | 0.0001 | ReduceLROnPlateau | 150 | 2 | SGD-M | 32 | True | 0.001 | MultiStepLR | 50 | | SGD-M | 16 | False | 0.001 | MultiStepLR | 50 |
|
Table 3. Training strategies
Method | Dataset | Backbone | Input size | FPS | mAP /% |
---|
Faster[4] | VOC07+12 | VGG16 | | 7 | 73.2 | SSD(Baseline)[10] | VOC07+12 | VGG16 | | 59 | 74.3 | SSD*[10] | VOC07+12 | VGG16 | | 52.6 | 76.9 | DSSD[11] | VOC07+12 | ResNet-101 | | 13.6 | 78.6 | DSOD[29] | VOC07+12 | DS/64-192-48-1 | | 17.4 | 77.7 | RSSD[12] | VOC07+12 | VGG16 | | 35 | 78.5 | FSSD[30] | VOC07+12 | VGG16 | | 65.8 | 78.8 | ESSD[31] | VOC07+12 | VGG16 | | 25 | 79.4 | FASSD[32] | VOC07+12 | ResNet-50 | | 30 | 78.1 | DFSSD[33] | VOC07+12 | DenseNet-S-32-1 | | 11.6 | 78.9 | FDSSD[17] | VOC07+12 | VGG16 | | 12.6 | 79.1 | OBSSD | VOC07+12 | VGG16 | | 41.7 | 80.8 |
|
Table 4. Comparison results of detection accuracy and detection speed on PASCAL VOC2007 test set
Method | mAP /% | areo | bicycle | bird | boat | bottle | bus | car | cat | chair | cow |
---|
Faster[4] | 73.2 | 76.5 | 79.0 | 70.9 | 65.5 | 52.1 | 83.1 | 84.7 | 86.4 | 52.0 | 81.9 | SSD[10](baseline) | 74.3 | 75.5 | 80.2 | 72.3 | 66.3 | 47.6 | 83.0 | 84.2 | 86.1 | 54.7 | 78.3 | SSD*[10] | 76.9 | 76.9 | 86.6 | 74.5 | 66.4 | 50.4 | 85.0 | 84.7 | 87.3 | 61.0 | 78.7 | DSSD[11] | 78.6 | 81.9 | 84.9 | 80.5 | 68.4 | 53.9 | 85.6 | 86.2 | 88.9 | 61.1 | 83.5 | ESSD[31] | 79.4 | 82.6 | 86.1 | 79.8 | 72.2 | 54.7 | 86.8 | 86.9 | 88.2 | 62.8 | 85.2 | OBSSD | 80.8 | 82.7 | 89.7 | 81.5 | 71.8 | 53.7 | 90.7 | 90.0 | 90.6 | 64.8 | 86.2 | Model | mAP /% | table | dog | horse | mbike | person | plant | sheep | sofa | train | tv | Faster[4] | 73.2 | 65.7 | 84.8 | 84.6 | 77.5 | 76.7 | 38.8 | 73.6 | 73.9 | 83.0 | 72.6 | SSD[10](baseline) | 74.3 | 73.9 | 84.5 | 85.3 | 82.6 | 76.2 | 48.6 | 73.9 | 76.0 | 83.4 | 74.0 | SSD*[10] | 76.9 | 78.2 | 86.1 | 89.4 | 86.0 | 79.8 | 48.5 | 76.1 | 80.3 | 86.9 | 76.1 | DSSD[11] | 78.6 | 78.7 | 86.7 | 88.7 | 86.7 | 79.7 | 51.7 | 78.0 | 80.9 | 87.2 | 79.4 | ESSD[31] | 79.4 | 78.2 | 87.5 | 88.0 | 87.0 | 80.0 | 56.1 | 80.2 | 80.4 | 88.7 | 78.1 | OBSSD | 80.8 | 77.3 | 87.9 | 90.0 | 88.1 | 82.0 | 54.2 | 80.5 | 83.1 | 90.2 | 80.0 |
|
Table 5. Comparison of average precision results of 20 categories in PASCAL VOC2007 test set
Model | mAP@0.3 /% | mAP@0.5 /% | Size /MB | FPS |
---|
SSD[10] | | 74.3 | 25.1 | 59 | SSD*[10] | 80.8 | 76.9 | 25.1 | 52.6 | PMSSD* | 82.9 | 78.2 | 25.6 | 48.2 | OBMSSD* | 84.2 | 80.1 | 25.8 | 44.3 | OBSSD* | 85.2 | 80.8 | 27.4 | 41.7 |
|
Table 6. Results of ablation experiment