Article Open Access http://dx.doi.org/10.26855/acc.2025.10.001
Research on Elastic Scaling Strategies for Large Language Models in Cloud Platform Service Invocation
Linghong Cheng
Security Org, Microsoft, Redmond, WA 98052, USA.
*Corresponding author: Linghong Cheng
Published: September 24,2025
Abstract
This study investigates the resource scheduling challenges arising from the deployment of large language models (LLMs) in cloud-based service invocation. It analyzes the computational structures required for model deployment and the fluctuating workload patterns during inference. Key mechanisms such as containerized deployment, service warm-up, load balancing, and runtime state monitoring are examined to map how mainstream platforms manage elastic resource allocation. The applicability of elastic scaling strategies is further summarized in the context of multi-model concurrency and edge computing scenarios. The results indicate that building coordinated strategies based on state awareness and invocation prediction can significantly enhance service stability and resource utilization.
Keywords
Large Language Models; Cloud Platform; Elastic Scaling; Resource Scheduling; Service Invocation
References
[1] Perron EB, Luan H, Victor GB, et al. Moving Beyond ChatGPT: Local Large Language Models (LLMs) and the Secure Analysis of Confidential Unstructured Text Data in Social Work Practice. Res Soc Work Pract. 2025;35(6):695-710.
[2] Altozano A, Minissi ME, Zaragoza GL, et al. Enhancing Psychological Assessments with Open-Ended Questionnaires and Large Language Models: An ASD Case Study. IEEE J Biomed Health Inform. 2025.
[3] Kim Y, Kim B, Song T, et al. Neighbor-aware shared container instance warming framework for serverless edge computing. Future Gener Comput Syst. 2026;174:107986.
[4] Baghdasaryan A, Bunarjyan T, Poghosyan A, et al. Knowledge retrieval and diagnostics in cloud services with large language models. Expert Syst Appl. 2024;255:124736.
[5] Patrizio A. HPE announces a cloud service for large language models. Network World (Online). 2023.
[6] Why VoIP Is the Smart Move for Small Offices: Cloud Service Networks Shares Key Benefits for Local Businesses. M2 Presswire. 2025.
[7] Cloud Services Solutions Marks 20 in the ERP Space. Wireless News. 2025.
[8] SR S, Aburukba R. Federated Learning-Driven IoT Request Scheduling for Fault Tolerance in Cloud Data Centers. Mathematics. 2025;13(13):2198.
How to cite this paper
Research on Elastic Scaling Strategies for Large Language Models in Cloud Platform Service Invocation
How to cite this paper: Linghong Cheng. (2025) Research on Elastic Scaling Strategies for Large Language Models in Cloud Platform Service Invocation. Advances in Computer and Communication, 6(4), 162-167.
DOI: http://dx.doi.org/10.26855/acc.2025.10.001