Habibi, Sama and Erçetin, Özgür (2025) Edge-LLM inference with cost-aware layer allocation and adaptive scheduling. IEEE Access, 13 . pp. 131614-131637. ISSN 2169-3536
Full text not available from this repository. (Request a copy)
Official URL: https://dx.doi.org/10.1109/ACCESS.2025.3592308
Abstract
This paper addresses two key challenges in distributed Large Language Model (LLM) inference at the edge: (1) cost-efficient and fair task allocation, and (2) dynamic scheduling under deadline constraints. We propose two mechanisms: the Fair Cost-Efficient Incentive Mechanism (FCIM) for task and layer assignment, and the Adaptive Dynamic Scheduling Algorithm (ADSA) for execution scheduling on individual devices. FCIM is an auction-based mechanism that selects cost-effective, memory-feasible devices while minimizing task latency, reward cost, and device usage. Its adaptive reward design ensures positive utility and fairness, even under shifting system priorities. ADSA enables preemption-aware, deadline-driven scheduling by dynamically reordering tasks based on arrival time and workload characteristics. Simulations demonstrate that FCIM reduces communication overhead by 54.7% and task completion time by 36.9% compared to static and performance-driven baselines, while ADSA reduces queueing delay by 39% under strict deadline constraints.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Adaptive Scheduling; Distributed AI; Edge Computing; Fair Incentive Mechanism; Large Language Models; Resource Allocation |
| Divisions: | Faculty of Engineering and Natural Sciences |
| Depositing User: | Özgür Erçetin |
| Date Deposited: | 04 Sep 2025 11:24 |
| Last Modified: | 04 Sep 2025 11:24 |
| URI: | https://research.sabanciuniv.edu/id/eprint/52136 |


