Edge-LLM inference with cost-aware layer allocation and adaptive scheduling

Warning The system is temporarily closed to updates for reporting purpose.

Habibi, Sama and Erçetin, Özgür (2025) Edge-LLM inference with cost-aware layer allocation and adaptive scheduling. IEEE Access, 13 . pp. 131614-131637. ISSN 2169-3536

Full text not available from this repository. (Request a copy)

Abstract

This paper addresses two key challenges in distributed Large Language Model (LLM) inference at the edge: (1) cost-efficient and fair task allocation, and (2) dynamic scheduling under deadline constraints. We propose two mechanisms: the Fair Cost-Efficient Incentive Mechanism (FCIM) for task and layer assignment, and the Adaptive Dynamic Scheduling Algorithm (ADSA) for execution scheduling on individual devices. FCIM is an auction-based mechanism that selects cost-effective, memory-feasible devices while minimizing task latency, reward cost, and device usage. Its adaptive reward design ensures positive utility and fairness, even under shifting system priorities. ADSA enables preemption-aware, deadline-driven scheduling by dynamically reordering tasks based on arrival time and workload characteristics. Simulations demonstrate that FCIM reduces communication overhead by 54.7% and task completion time by 36.9% compared to static and performance-driven baselines, while ADSA reduces queueing delay by 39% under strict deadline constraints.
Item Type: Article
Uncontrolled Keywords: Adaptive Scheduling; Distributed AI; Edge Computing; Fair Incentive Mechanism; Large Language Models; Resource Allocation
Divisions: Faculty of Engineering and Natural Sciences
Depositing User: Özgür Erçetin
Date Deposited: 04 Sep 2025 11:24
Last Modified: 04 Sep 2025 11:24
URI: https://research.sabanciuniv.edu/id/eprint/52136

Actions (login required)

View Item
View Item