• Laser & Optoelectronics Progress
  • Vol. 60, Issue 2, 0215005 (2023)
Yunchuan Zhang, Lin Jiang*, and Li Lin
Author Affiliations
  • Faculty of Science, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
  • show less
    DOI: 10.3788/LOP220555 Cite this Article Set citation alerts
    Yunchuan Zhang, Lin Jiang, Li Lin. Target Detection Model Based on Once Bidirectional Feature Pyramid Network[J]. Laser & Optoelectronics Progress, 2023, 60(2): 0215005 Copy Citation Text show less
    SSD model framework
    Fig. 1. SSD model framework
    Proposed model framework
    Fig. 2. Proposed model framework
    Once Bi-FP module
    Fig. 3. Once Bi-FP module
    Top to bottom feature fusion module
    Fig. 4. Top to bottom feature fusion module
    Bottom-to-top feature fusion module
    Fig. 5. Bottom-to-top feature fusion module
    Prediction module
    Fig. 6. Prediction module
    FSSD model framework
    Fig. 7. FSSD model framework
    Comparison of average precision of object detection model in PASCAL VOC2007 test set
    Fig. 8. Comparison of average precision of object detection model in PASCAL VOC2007 test set
    Comparison of detection results between OBSSD model and SSD* model. (a) cow; (b) car, boat; (c) bird, potted plants
    Fig. 9. Comparison of detection results between OBSSD model and SSD* model. (a) cow; (b) car, boat; (c) bird, potted plants
    BlockLayerOperationSpecific operational detailOutput feature size
    Block 1Conv1_1Conv,Actk=3p=1;ReLU300×300×64
    Conv1_2Conv,Actk=3p=1;ReLU300×300×64
    Block 2Pooling1MaxPoolingk=2s=2150×150×64
    Conv2_1Conv,Actk=3p=1;ReLU150×150×128
    Conv2_2Conv,Actk=3p=1;ReLU150×150×128
    Block 3Pooling2MaxPoolingk=2s=275×75×128
    Conv3_1Conv,Actk=3p=1;ReLU75×75×256
    Conv3_2Conv,Actk=3p=1;ReLU75×75×256
    Conv3_3Conv,Actk=3p=1;ReLU75×75×256
    Block 4Pooling3MaxPoolingk=2s=238×38×256
    Conv4_1Conv,Actk=3p=1;ReLU38×38×512
    Conv4_2Conv,Actk=3p=1;ReLU38×38×512
    Conv4_3Conv,Actk=3p=1;ReLU38×38×512
    Block 5Pooling4MaxPoolingk=2s=219×19×512
    Conv5_1Conv,Actk=3p=1;ReLU19×19×512
    Conv5_2Conv,Actk=3p=1;ReLU19×19×512
    Conv5_3Conv,Actk=3p=1;ReLU19×19×512
    Block 6Pooling5MaxPoolingk=2s=1p=119×19×512
    Conv6Conv,Actk=3p=6d=6;ReLU19×19×1024
    Conv7Conv,Actk=1;ReLU19×19×1024
    Block 7Conv8_1Conv,Actk=1;ReLU19×19×256
    Conv8_2Conv,Actk=3s=2p=1;ReLU10×10×512
    Block 8Conv9_1Conv,Actk=1;ReLU10×10×128
    Conv9_2Conv,Actk=3s=2p=1;ReLU5×5×256
    Block 9Conv10_1Conv,Actk=1;ReLU5×5×128
    Conv10_2Conv,Actk=3p=1;ReLU3×3×256
    Block 10Conv11_1Conv,Actk=1;ReLU3×3×128
    Conv11_2Conv,Actk=3p=1;ReLU1×1×256
    Table 1. SSD backbone network structure
    Efficient feature layerSizeNumber of prior frames per grid
    Conv4_338×384
    Conv719×196
    Conv8_210×106
    Conv9_25×56
    Conv10_23×34
    Conv11_21×14
    Table 2. Number of prior frames of a single grid on effective feature layer
    StageOptimizerBatch_sizeFreeze_trainInitial_LrLr_schedulerEpoch
    1Adam32True0.0005ReduceLROnPlateau50
    Adam16False0.0001ReduceLROnPlateau150
    2SGD-M32True0.001MultiStepLR50
    SGD-M16False0.001MultiStepLR50
    Table 3. Training strategies
    MethodDatasetBackboneInput sizeFPSmAP /%
    Faster4VOC07+12VGG16600×1000773.2
    SSD(Baseline)10VOC07+12VGG16300×3005974.3
    SSD*[10VOC07+12VGG16300×30052.676.9
    DSSD11VOC07+12ResNet-101321×32113.678.6
    DSOD29VOC07+12DS/64-192-48-1300×30017.477.7
    RSSD12VOC07+12VGG16300×3003578.5
    FSSD30VOC07+12VGG16300×30065.878.8
    ESSD31VOC07+12VGG16300×3002579.4
    FASSD32VOC07+12ResNet-50300×3003078.1
    DFSSD33VOC07+12DenseNet-S-32-1300×30011.678.9
    FDSSD17VOC07+12VGG16300×30012.679.1
    OBSSDVOC07+12VGG16300×30041.780.8
    Table 4. Comparison results of detection accuracy and detection speed on PASCAL VOC2007 test set
    MethodmAP /%areobicyclebirdboatbottlebuscarcatchaircow
    Faster473.276.579.070.965.552.183.184.786.452.081.9
    SSD10(baseline)74.375.580.272.366.347.683.084.286.154.778.3
    SSD*[1076.976.986.674.566.450.485.084.787.361.078.7
    DSSD1178.681.984.980.568.453.985.686.288.961.183.5
    ESSD3179.482.686.179.872.254.786.886.988.262.885.2
    OBSSD80.882.789.781.571.853.790.790.090.664.886.2
    ModelmAP /%tabledoghorsembikepersonplantsheepsofatraintv
    Faster473.265.784.884.677.576.738.873.673.983.072.6
    SSD10(baseline)74.373.984.585.382.676.248.673.976.083.474.0
    SSD*[1076.978.286.189.486.079.848.576.180.386.976.1
    DSSD1178.678.786.788.786.779.751.778.080.987.279.4
    ESSD3179.478.287.588.087.080.056.180.280.488.778.1
    OBSSD80.877.387.990.088.182.054.280.583.190.280.0
    Table 5. Comparison of average precision results of 20 categories in PASCAL VOC2007 test set
    ModelmAP@0.3 /%mAP@0.5 /%Size /MBFPS
    SSD1074.325.159
    SSD*[1080.876.925.152.6
    PMSSD*82.978.225.648.2
    OBMSSD*84.280.125.844.3
    OBSSD*85.280.827.441.7
    Table 6. Results of ablation experiment
    Yunchuan Zhang, Lin Jiang, Li Lin. Target Detection Model Based on Once Bidirectional Feature Pyramid Network[J]. Laser & Optoelectronics Progress, 2023, 60(2): 0215005
    Download Citation