[1] Aggarwal J K, Ryoo M S. Human activity analysis: a review[J]. ACM Computing Surveys, 2011, 43(3): 16.
[2] Datta R, Joshi D, Li J, et al. Image retrieval: ideas, influences, and trends of the new age[J]. ACM Computing Surveys, 2008, 40(2): 5.
[3] Krüger V, Kragic D, Ude A, et al. The meaning of action: a review on action recognition and mapping[J]. Advanced Robotics, 2007, 21(13): 1473-1501.
[4] Palmese M, Trucco A. From 3D sonar images to augmented reality models for objects buried on the seafloor[J]. IEEE Transactions on Instrumentation and Measurement, 2008, 57(4): 820-828.
[7] Lu Q H, Wu Z W, Fan Y B, et al. An improved mobile vehicle detection method based on Gaussian mixture model[J]. Journal of Optoelectronics·Laser, 2013, 24(4): 751-757.
[8] Felzenszwalb P F, Girshick R B, McAllester D, et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627-1645.
[9] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]. Advances in Neural Information Processing Systems, 2012: 1097-1105.
[10] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[11] Girshick R. Fast R-CNN[C]∥Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[12] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[13] He K, Gkioxari G, Dollár P, et al. Mask R-CNN[C]∥Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[14] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[15] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector[C]. European Conference on Computer Vision, 2016: 21-37.
[16] Yang M, Ruan Y D, Chen L K, et al. New video recognition algorithms for inland river ships based on faster R-CNN[J]. Journal of Beijing University of Posts and Telecommunications, 2017, 40(S1): 130-134.
[17] Cao S Y, Liu Y H, Li X Z. Vehicle detection method based on fast R-CNN[J]. Journal of Image and Graphics, 2017, 22(5): 671-677.
[18] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[19] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks[C]. European Conference on Computer Vision, 2014: 818-833.
[20] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv: 1409.1556, 2014.
[21] Ouyang W, Wang X, Zhang C, et al. Factors in finetuning deep model for object detection with long-tail distribution[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 864-873.
[22] Shrivastava A, Gupta A, Girshick R. Training region-based object detectors with online hard example mining[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 761-769.
[23] Dong Z, Jia Y. Vehicle type classification using distributions of structural and appearance-based features[C]∥Proceedings of the IEEE International Conference on Image Processing, 2013: 4321-4324.