Habibi, Sama (2025) Efficient Resource Orchestration For Distributed Large Language Model Inference At The Edge. [Thesis]
10739800.pdf
Download (1MB)
Abstract
The increasing deployment of Large Language Models (LLMs) in real-time andresource-constrained environments has exposed critical limitations of centralizedcloud inference, including high latency, cost, and scalability concerns. This thesisaddresses these challenges by proposing two integrated solutions for efficient and fairdistributed LLM inference at the edge: the Fair Cost-Efficient Incentive Mechanism(FCIM) and the Adaptive Dynamic Scheduling Algorithm (ADSA). FCIM introducesan auction-based layer allocation framework that ensures truthful participation andfairness among heterogeneous devices, dynamically balancing task latency, rewardcost, memory feasibility, and system-wide alignment. ADSA complements FCIM byscheduling layer execution in a deadline-aware and preemption-conscious manner,reducing queuing delay while adapting to fluctuating device availability and networkconditions.Together, FCIM and ADSA offer a scalable, incentive-compatible, and resourceefficientapproach for edge-based inference. The mechanisms are extensively evaluatedthrough simulations across diverse model architectures, including GPT-Neo,GPT-3, and BLOOM, under varying GPU configurations and bidding scenarios. Resultsdemonstrate that FCIM reduces communication overhead by up to 54.7% andtask processing time by 36.9%, while ADSA decreases queuing delays by 39% comparedto conventional schedulers. Fairness is quantitatively validated using Jain’sindex over both reward and layer distributions, with FCIM consistently outperformingbaseline methods. This thesis establishes a principled foundation for deploying LLMs in federated, latency-sensitive environments and offers insights into futureextensions involving reinforcement learning, multi-tenant inference, and real-worldedge deployments.
| Item Type: | Thesis |
|---|---|
| Uncontrolled Keywords: | Scheduling, Distributed AI, Edge Computing, Fair IncentiveMechanism, Large Language Models, Resource Allocation.-- Uyarlanabilir Zamanlama, Dağıtık Yapay Zekâ, Uç Bilişim,Adil Teşvik Mekanizması, Büyük Dil Modelleri, Kaynak Tahsisi. |
| Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics |
| Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Electronics Faculty of Engineering and Natural Sciences |
| Depositing User: | Dila Günay |
| Date Deposited: | 29 Dec 2025 10:59 |
| Last Modified: | 29 Dec 2025 10:59 |
| URI: | https://research.sabanciuniv.edu/id/eprint/53543 |


