[1] FEICHTENHOFER C, PINZ A, ZISSERMAN A. Detect to track and track to detect[C]//Proceedings of the IEEE International Conference on Computer Vision, October 22-29, 2017, Venice, Italy. New York: IEEE, 2017: 3038-3046.
[2] ZHANG Y, WANG C, WANG X, et al. FairMOT: on the fairness of detection and re-identification in multiple object tracking[J]. International journal of computer vision,2021, 129: 3069-3087.
[3] PENG J L, WANG Q, WANG X. Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking[C]//16th European Conference on Computer Vision, August 23-28, 2020, Glasgow, UK. Heidelberg: Springer, 2020: 145-161.
[4] ZHOU X, KOLTUN V, KR?HENBüHL P. Tracking objects as points[C]//16th European Conference on Computer Vision, August 23-28, 2020, Glasgow, UK. Heidelberg: Springer, 2020: 474-490.
[5] ZHANG Y, WANG C, WANG X, et al. Bytetrack: multi-object tracking by associating every detection box[C]//17th European Conference on Computer Vision,October 24-28, 2022, Tel Aviv, Israel. Heidelberg: Springer, 2022: 1-21.
[6] CHEN X, PENG H, WANG D, et al. SeqTrack: sequence to sequence learning for visual object tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-22, 2023, Vancouver, Canada. New York: IEEE, 2023: 14572-14581.
[7] LIU M, ZHU M. Mobile video object detection with temporally-aware feature maps[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 18-22, 2018, Salt Lake City, UT, USA. New York: IEEE, 2018: 5686-5695.
[8] BERTASIUS G, TORRESANI L, SHI J. Object detection in video with spatiotemporal sampling networks[C]//15th European Conference on Computer Vision, September 8-14, 2018, Munich, Germany. Heidelberg:Springer, 2018: 331-346.
[9] GUO C, ZHENG N, TAN Y, et al. Progressive sparse local attention for video object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 27-November 2, 2019, Seoul, Korea. New York: IEEE, 2019: 3909-3918.
[10] TANG P, WANG C, WANG X, et al. Object detection in videos by high quality object linking[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 42(5): 1272-1278.
[11] XU Y, BAN Y, DELORME G, et al. TransCenter: transformers with dense representations for multiple-object tracking[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 45(6): 7820-7835.
[12] YU F, WANG D, SHELHAMER E, et al. Deep layer aggregation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 18-22, 2018, Salt Lake City, UT, USA. New York: IEEE, 2018: 2403-2412.
[13] LEAL-TAIXé L, MILAN A, REID I, et al. Motchallenge 2015: towards a benchmark for multi-target tracking[EB/OL]. (2015-04-01) [2023-12-23]. https://arxiv.org/abs/1504.01942.
[14] MILAN A, LEAL-TAIXé L, REID I, et al. MOT16: a benchmark for multi-object tracking[EB/OL]. (2016-03-01) [2023-12-23]. https://arxiv.org/abs/1603.00831.
[15] SHAO S, ZHANG Y, ZENG W, et al. Crowdhuman: a benchmark for detecting human in a crowd[EB/OL]. (2018-05-01) [2023-12-23]. https://arxiv.org/abs/1805.00123.
[16] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 16-21, 2012, Providence, RI, USA. New York: IEEE, 2012: 3354-3361.
[17] CAESAR H, BANKITI V, LANG A, et al. Nuscenes: a multimodal dataset for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 14-19, 2020, Seattle, WA, USA. New York: IEEE, 2020: 11621-11631.
[18] DOLLáR P, WOJEK C, SCHIELE B, et al. Pedestrian detection: a benchmark[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,June 20-25, 2009, Miami, FL, USA. New York: IEEE, 2009: 304-311.
[19] ZHANG S, BENENSON R, SCHIELE B. Citypersons: a diverse dataset for pedestrian detection[C]//Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 2017: 3213-3221.
[20] XIAO T, LI S, WANG B, et al. Joint detection and identificationfeature learning for person search[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 2017: 3415-3424.
[21] ZHENG L, ZHANG H, SUN S, et al. Person re-identification in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 2017: 1367-1376.
[22] ESS A, LEIBE B, SCHINDLER K, et al. A mobile vision system for robust multi-person tracking [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2008, Anchorage,AK, USA. New York: IEEE, 2008: 1-8.