[1] An S, Boussaid F, Bennamoun M. How can deep rectifier networks achieve linear separability and preserve distances?[C], 514-523(2015).
[2] Arora S, Cohen N, Hazan E. On the optimization of deep networks: implicit acceleration by overparameterization[C](2018).
[3] Guo S, Alvarez J M, Salzmann M. Expandnets: linear over-parameterization to train compact convolutional networks[C], 33, 1298-1310(2020).
[4] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 60, 84-90(2017).
[5] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition[C], 770-778(2016).
[6] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[EB/OL]. https://arxiv.org/abs/1409.1556
[7] Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: optimal speed and accuracy of object detection[EB/OL]. https://arxiv.org/abs/2004.10934
[8] Feng C J, Zhong Y J, Gao Y et al. TOOD: task-aligned one-stage object detection[C], 3490-3499(2021).
[9] Duan K W, Bai S, Xie L X et al. CenterNet: keypoint triplets for object detection[C], 6568-6577(2019).
[10] Lin T Y, Dollár P, Girshick R et al. Feature pyramid networks for object detection[C], 936-944(2017).
[11] Liu W, Anguelov D, Erhan D et al. SSD: single shot MultiBox detector[M]. Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016, 9905, 21-37(2016).
[12] Redmon J, Divvala S, Girshick R et al. You only look once: unified, real-time object detection[C], 779-788(2016).
[13] Wang X L, Zhang R F, Kong T et al. Solov2: dynamic and fast instance segmentation[C], 33, 17721-17732(2020).
[14] Yu C Q, Xiao B, Gao C X et al. Lite-HRNet: a lightweight high-resolution network[C], 10435-10445(2021).
[15] Sun K, Xiao B, Liu D et al. Deep high-resolution representation learning for human pose estimation[C], 5686-5696(2019).
[16] Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation[M]. Navab N, Hornegger J, Wells W M, et al. Medical image computing and computer-assisted intervention-MICCAI 2015, 9351, 234-241(2015).
[17] Godard C, Aodha O M, Brostow G J. Unsupervised monocular depth estimation with left-right consistency[C], 6602-6611(2017).
[18] Ranftl R, Bochkovskiy A, Koltun V. Vision transformers for dense prediction[C], 12159-12168(2021).
[19] Vaswani A, Shazeer N, Parmar N et al. Attention is all you need[C], 5998-6008(2017).
[20] Carion N, Massa F, Synnaeve G et al. End-to-end object detection with transformers[M]. Vedaldi A, Bischof H, Brox T, et al. Computer vision-ECCV 2020, 12346, 213-229(2020).
[21] Liu Z, Lin Y T, Cao Y et al. Swin transformer: hierarchical vision transformer using shifted windows[C], 9992-10002(2021).
[22] Lecun Y, Bottou L, Bengio Y et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 86, 2278-2324(1998).
[23] Howard A G, Zhu M, Chen B et al. Mobilenets: efficient convolutional neural networks for mobile vision applications[EB/OL]. https://arxiv.org/abs/1704.04861
[24] Goyal A, Bochkovskiy A, Deng J et al. Non-deep networks[EB/OL]. https://arxiv.org/abs/2110.07641
[25] Han K, Wang Y H, Tian Q et al. GhostNet: more features from cheap operations[C], 1577-1586(2020).
[26] Tan M X, Le Q V. Efficientnet: rethinking model scaling for convolutional neural networks[C], 6105-6114(2019).
[27] Molchanov P, Tyree S, Karras T et al. Pruning convolutional neural networks for resource efficient inference[C](2017).
[28] Zhang L F, Song J B, Gao A N et al. Be your own teacher: improve the performance of convolutional neural networks via self distillation[C], 3712-3721(2019).
[29] Chen G, Choi W, Yu X et al. Learning efficient object detection models with knowledge distillation[C], 30, 743-752(2017).
[30] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[EB/OL]. https://arxiv.org/abs/1503.02531
[31] Scardapane S, Scarpiniti M, Baccarelli E et al. Why should we add early exits to neural networks?[J]. Cognitive Computation, 12, 954-966(2020).
[32] Kauffmann L, Ramanoël S, Peyrin C. The neural bases of spatial frequency processing during scene perception[J]. Frontiers in Integrative Neuroscience, 8, 37(2014).
[33] Kaya Y, Hong S, Dumitras T. Shallow-deep networks: understanding and mitigating network overthinking[C], 3301-3310(2019).
[34] Huang G, Chen D, Li T et al. Multi-scale dense networks for resource efficient image classification[C](2018).
[35] Zhou W, Xu C, Ge T et al. Bert loses patience: fast and robust inference with early exit[C], 33, 18330-18341(2020).
[36] Teerapittayanon S, McDanel B, Kung H T. BranchyNet: Fast inference via early exiting from deep neural networks[C], 2464-2469(2016).
[37] Wołczyk M, Wójcik B, Bałazy K et al. Zero time waste: recycling predictions in early exit neural networks[C], 34, 2516-2528(2021).
[38] Li H, Zhang H, Qi X J et al. Improved techniques for training adaptive deep networks[C], 1891-1900(2019).
[39] Passalis N, Raitoharju J, Tefas A et al. Efficient adaptive inference for deep convolutional neural networks using hierarchical early exits[J]. Pattern Recognition, 105, 107346(2020).
[40] Kouris A, Venieris S I, Laskaridis S et al. Multi-exit semantic segmentation networks[EB/OL]. https://arxiv.org/abs/2106.03527v1
[41] Xin J, Tang R, Lee J et al. DeeBERT: dynamic early exiting for accelerating BERT inference[C], 2246-2251(2020).
[42] Schwartz R, Stanovsky G, Swayamdipta S et al. The right tool for the job: matching model and instance complexities[C], 6640-6651(2020).
[43] Chen X, Dai H, Li Y et al. Learning to stop while learning to predict[C], 1520-1530(2020).
[44] Liu Y J, Meng F D, Zhou J et al. Faster depth-adaptive transformers[C], 35, 13424-13432(2021).
[45] Jie Z Q, Sun P, Li X et al. Anytime recognition with routing convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1875-1886(2021).
[46] Huang C, Lucey S, Ramanan D. Learning policies for adaptive tracking with deep feature cascades[C], 105-114(2017).
[47] Dai X, Kong X N, Guo T. EPNet: learning to exit with flexible multi-branch network[C], 235-244(2020).
[48] Duggal R, Freitas S, Dhamnani S et al. Elf: an early-exiting framework for long-tailed classification[EB/OL]. https://arxiv.org/abs/2006.11979
[49] Wang X, Li Y. Harmonized dense knowledge distillation training for multi-exit architectures[C], 35, 10218-10226(2021).
[50] Phuong M, Lampert C. Distillation-based training for multi-exit architectures[C], 1355-1364(2019).
[51] Liu B L, Rao Y M, Lu J W et al. MetaDistiller: network self-boosting via meta-learned top-down distillation[M]. Vedaldi A, Bischof H, Brox T, et al. Computer vision-ECCV 2020, 12359, 694-709(2020).
[52] Wang X L, Li Y M. Gradient deconfliction-based training for multi-exit architectures[C], 1866-1870(2020).
[53] Wu Z X, Nagarajan T, Kumar A et al. BlockDrop: dynamic inference paths in residual networks[C], 8817-8826(2018).
[54] Wang Y, Shen J H, Hu T K et al. Dual dynamic inference: enabling more efficient, adaptive, and controllable deep inference[J]. IEEE Journal of Selected Topics in Signal Processing, 14, 623-633(2020).
[55] Wang X, Yu F, Dou Z Y et al. SkipNet: learning dynamic routing in convolutional networks[M]. Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018, 11217, 420-436(2018).
[56] Veit A, Belongie S. Convolutional networks with adaptive inference graphs[J]. International Journal of Computer Vision, 128, 730-741(2020).
[57] Fan A, Grave E, Joulin A. Reducing transformer depth on demand with structured dropout[C](2020).
[58] Huang G, Sun Y, Liu Z et al. Deep networks with stochastic depth[M]. Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016, 9908, 646-661(2016).
[59] Di J L, Tang J, Wu J et al. Research progress in the applications of convolutional neural networks in optical information processing[J]. Laser & Optoelectronics Progress, 58, 1600001(2021).
[60] Luo W, Li Y, Urtasun R et al. Understanding the effective receptive field in deep convolutional neural networks[EB/OL]. https://arxiv.org/abs/1701.04128
[61] Xiao W X, Li H F, Zhang Y F et al. Medical image fusion based on multi-scale feature learning and edge enhancement[J]. Laser & Optoelectronics Progress, 59, 0617029(2022).
[62] Jeon G W, Choi J H, Kim J H et al. LarvaNet: hierarchical super-resolution via multi-exit architecture[M]. Bartoli A, Fusiello A. Computer vision-ECCV 2020 workshops, 12537, 73-86(2020).
[63] Ma T H, Tan H, Li T Q et al. Road extraction from GF-1 remote sensing images based on dilated convolution residual network with multi-scale feature fusion[J]. Laser & Optoelectronics Progress, 58, 0228001(2021).
[64] Peng C, Zhang X Y, Yu G et al. Large kernel matters: improve semantic segmentation by global convolutional network[C], 1743-1751(2017).
[65] Yu H H, Winkler S. Image complexity and spatial information[C], 12-17(2013).
[66] Perkiö J, Hyvärinen A. Modelling image complexity by independent component analysis, with application to content-based image retrieval[M]. Alippi C, Polycarpou M, Panayiotou C, et al. Artificial neural networks-ICANN 2009, 5769, 704-714(2009).
[67] Han Y, Huang G, Song S et al. Dynamic neural networks: a survey[EB/OL]. https://arxiv.org/abs/2102.04906
[68] Yu T, Kumar S, Gupta A et al. Gradient surgery for multi-task learning[C], 33, 5824-5836(2020).
[69] Li H, Xu Z, Taylor G et al. Visualizing the loss landscape of neural nets[C], 6391-6401(2018).
[70] Nguyen Q, Hein M. The loss surface and expressivity of deep convolutional neural networks[C](2018).
[71] Liu B, Liu X, Jin X et al. Conflict-averse gradient descent for multi-task learning[C], 18878-18890(2021).
[72] Sener O, Koltun V. Multi-task learning as multi-objective optimization[C], 525-536(2018).
[73] Li Y, Ji R, Lin S et al. Interpretable neural network decoupling[EB/OL]. https://arxiv.org/abs/1906.01166v2