magazinelogo

Advances in Computer and Communication

ISSN Online: 2767-2875 Downloads: 149867 Total View: 1058522
Frequency: quarterly CODEN: ACCDC3
Email: acc@hillpublisher.com
Article Open Access http://dx.doi.org/10.26855/acc.2025.10.001

Research on Elastic Scaling Strategies for Large Language Models in Cloud Platform Service Invocation

Linghong Cheng

Security Org, Microsoft, Redmond, WA 98052, USA.

*Corresponding author: Linghong Cheng

Published: September 24,2025

Abstract

This study investigates the resource scheduling challenges arising from the deployment of large language models (LLMs) in cloud-based service invocation. It analyzes the computational structures required for model deployment and the fluctuating workload patterns during inference. Key mechanisms such as containerized deployment, service warm-up, load balancing, and runtime state monitoring are examined to map how mainstream platforms manage elastic resource allocation. The applicability of elastic scaling strategies is further summarized in the context of multi-model concurrency and edge computing scenarios. The results indicate that building coordinated strategies based on state awareness and invocation prediction can significantly enhance service stability and resource utilization.

Keywords

Large Language Models; Cloud Platform; Elastic Scaling; Resource Scheduling; Service Invocation

References

[1] Perron EB, Luan H, Victor GB, et al. Moving Beyond ChatGPT: Local Large Language Models (LLMs) and the Secure Analysis of Confidential Unstructured Text Data in Social Work Practice. Res Soc Work Pract. 2025;35(6):695-710.

[2] Altozano A, Minissi ME, Zaragoza GL, et al. Enhancing Psychological Assessments with Open-Ended Questionnaires and Large Language Models: An ASD Case Study. IEEE J Biomed Health Inform. 2025.

[3] Kim Y, Kim B, Song T, et al. Neighbor-aware shared container instance warming framework for serverless edge computing. Future Gener Comput Syst. 2026;174:107986.

[4] Baghdasaryan A, Bunarjyan T, Poghosyan A, et al. Knowledge retrieval and diagnostics in cloud services with large language models. Expert Syst Appl. 2024;255:124736.

[5] Patrizio A. HPE announces a cloud service for large language models. Network World (Online). 2023.

[6] Why VoIP Is the Smart Move for Small Offices: Cloud Service Networks Shares Key Benefits for Local Businesses. M2 Presswire. 2025.

[7] Cloud Services Solutions Marks 20 in the ERP Space. Wireless News. 2025.

[8] SR S, Aburukba R. Federated Learning-Driven IoT Request Scheduling for Fault Tolerance in Cloud Data Centers. Mathematics. 2025;13(13):2198.

How to cite this paper

Research on Elastic Scaling Strategies for Large Language Models in Cloud Platform Service Invocation

How to cite this paper: Linghong Cheng. (2025) Research on Elastic Scaling Strategies for Large Language Models in Cloud Platform Service Invocation. Advances in Computer and Communication6(4), 162-167.

DOI: http://dx.doi.org/10.26855/acc.2025.10.001