Abstract
With the increasing prevalence of public self-service terminals, traditional touch-based human-computer interaction (HCI) methods face significant challenges in terms of hygiene, efficiency, and accessibility. This study focuses on the need for contactless interaction in public settings and proposes a static gesture recognition system based on the YOLOv8 object detection framework. The system supports natural gesture-based control for commands such as "Confirm," "Cancel," "Scroll Up," and "Scroll Down." A hybrid dataset combining a public benchmark (HaGRID) and a custom-built gesture image set was constructed to train the model. Leveraging YOLOv8's efficient architecture, the system implements an end-to-end recognition-to-feedback pipeline optimized for real-time performance on macOS devices. Experimental results demonstrate strong performance in terms of accuracy (92.6%), inference speed (38.7 FPS), and mean average precision (mAP@0.5 = 84.3%), surpassing conventional models. Finally, limitations in generalization, dynamic gesture recognition, and customization are discussed, along with future directions including dynamic modeling and multimodal integration to enhance adaptability and intelligence.
References
[1] Chen W. Research on dynamic gesture detection and recognition algorithms based on deep learning. Mod Inf Technol. 2025;9(8). Chinese.
[2] Zhou G. Research on gesture segmentation and recognition algorithm based on feature fusion [master's thesis]. Hebei University; 2024. Chinese.
[3] Zhu J. Gesture recognition research based on visible light communication perception integration [master's thesis]. Sichuan University; 2024. Chinese.
[4] Li L. Research on augmented reality gesture recognition based on deep learning technology [master's thesis]. Beijing University of Posts and Telecommunications; 2023. Chinese.
[5] Li H. Skeleton-based gesture recognition research based on spatiotemporal transformer [master's thesis]. Lanzhou University; 2023. Chinese.
[6] Song X, Tian Z, Dong M, et al. Model construction for improving human-machine team collaborative efficiency in intelligent interaction systems. Packag Eng. 2023;44(20). Chinese.
[7] Qu J, Jiao H, Wang Q, et al. Review of direct interaction human-machine interface research. J Ordnance Equip Eng. 2023;44(12). Chinese.
[8] Wang W, Gu Y, Yu S, et al. Research progress on key technologies of human-machine interaction design in intelligent cockpits. Mech Des. 2024;41(8). Chinese.
[9] Zhou F, Jin L, Dong J, et al. Review of convolutional neural networks. J Comput Res Dev. 2017;40(6). Chinese.
[10] Guo Q, Yu H, Wang Z, et al. Review of image classification models based on convolutional neural networks. Electron Technol Appl. 2023;46(9). Chinese.
[11] Li Z, Xu H, Duan B, et al. Research on image emotional feature extraction based on deep learning CNN models. Libr Inf Serv. 2019;63(11). Chinese.
[12] Zhou J, Wang J. Review of YOLO object detection algorithms. J Changzhou Inst Technol. 2023;36(1). Chinese.
[13] Mao S, Wang W. Review of YOLO series object detection algorithms based on deep learning. J Yan'an Univ (Nat Sci Ed). 2024;43(2). Chinese.
[14] Yang F, Li J. Review of YOLO object detection algorithms for autonomous driving. Automot Eng. 2023;(11). Chinese.
[15] Wang A, Chai Y, Li Q, et al. Design of oil and gas pipeline perimeter security system based on YOLO. Chem Autom Instrum. 2024;51(6). Chinese.
[16] Zhu C, Feng H, Ou Y, et al. Research on face auto-tracking camera robot system based on YOLO3. Telev Technol. 2028;42(9). Chinese.