Efficient Resource Orchestration For Distributed Large Language Model Inference At The Edge

Warning The system is temporarily closed to updates for reporting purpose.

Habibi, Sama (2025) Efficient Resource Orchestration For Distributed Large Language Model Inference At The Edge. [Thesis]

PDF
10739800.pdf

Download (1MB)

Abstract

The increasing deployment of Large Language Models (LLMs) in real-time andresource-constrained environments has exposed critical limitations of centralizedcloud inference, including high latency, cost, and scalability concerns. This thesisaddresses these challenges by proposing two integrated solutions for efficient and fairdistributed LLM inference at the edge: the Fair Cost-Efficient Incentive Mechanism(FCIM) and the Adaptive Dynamic Scheduling Algorithm (ADSA). FCIM introducesan auction-based layer allocation framework that ensures truthful participation andfairness among heterogeneous devices, dynamically balancing task latency, rewardcost, memory feasibility, and system-wide alignment. ADSA complements FCIM byscheduling layer execution in a deadline-aware and preemption-conscious manner,reducing queuing delay while adapting to fluctuating device availability and networkconditions.Together, FCIM and ADSA offer a scalable, incentive-compatible, and resourceefficientapproach for edge-based inference. The mechanisms are extensively evaluatedthrough simulations across diverse model architectures, including GPT-Neo,GPT-3, and BLOOM, under varying GPU configurations and bidding scenarios. Resultsdemonstrate that FCIM reduces communication overhead by up to 54.7% andtask processing time by 36.9%, while ADSA decreases queuing delays by 39% comparedto conventional schedulers. Fairness is quantitatively validated using Jain’sindex over both reward and layer distributions, with FCIM consistently outperformingbaseline methods. This thesis establishes a principled foundation for deploying LLMs in federated, latency-sensitive environments and offers insights into futureextensions involving reinforcement learning, multi-tenant inference, and real-worldedge deployments.
Item Type: Thesis
Uncontrolled Keywords: Scheduling, Distributed AI, Edge Computing, Fair IncentiveMechanism, Large Language Models, Resource Allocation.-- Uyarlanabilir Zamanlama, Dağıtık Yapay Zekâ, Uç Bilişim,Adil Teşvik Mekanizması, Büyük Dil Modelleri, Kaynak Tahsisi.
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Electronics
Faculty of Engineering and Natural Sciences
Depositing User: Dila Günay
Date Deposited: 29 Dec 2025 10:59
Last Modified: 29 Dec 2025 10:59
URI: https://research.sabanciuniv.edu/id/eprint/53543

Actions (login required)

View Item
View Item