Mitigating Vulnerability Leakage From LLMS For Secure Code Analysis

Gülay, Bengü (2025) Mitigating Vulnerability Leakage From LLMS For Secure Code Analysis. [Thesis]

PDF
10743271.pdf
Download (348kB)

Abstract

Large Language Models (LLMs) are increasingly integrated into software developmentworkflows, offering powerful capabilities for code analysis, debugging, andvulnerability detection. However, their ability to infer and expose vulnerabilitiesin source code raises security concerns, particularly regarding unintended informationleakage when sensitive code is shared with these models. This thesis investigatesdefense strategies to mitigate such leakage: traditional obfuscation techniquesand a novel deception-based approach involving honeypot vulnerabilities. We constructeda dataset of 400 C and Python code snippets spanning 51 CWE categoriesand evaluated their vulnerability detection performance across three state-of-theartLLMs: GPT-4o, GPT-4o-mini, and LLaMA-4. Firstly, we applied obfuscationmethods—including comment removal, identifier renaming, control/data flow transformations,dead code insertion, full encoding, and LLM-based rewriting—and measuredtheir impact on LLM detection accuracy and functionality retention. Deadcode insertion and control flow obfuscation proved most effective in suppressingvulnerability leakage, though aggressive techniques like encoding impaired functionalitycomprehension. Secondly, we introduced honeypot vulnerabilities combinedwith misleading strategies that were proven effective earlier—such as control flowobfuscation, data flow obfuscation, and identifier renaming—and additional techniqueslike cyclomatic complexity increases and misleading comments. Honeypotssignificantly reduced vulnerability detection accuracy by over 60 percentage pointsin some cases, while maintaining high functional clarity, with LLM-generated similarity scores consistently above 4.1 on a 5-point scale. Misleading comments emergedas a lightweight yet robust defense across all models. These findings underscore theneed to balance security and usability in AI-assisted development and highlight ethicalconsiderations, as similar techniques could potentially be misused to concealmalicious flaws from automated audits.

Item Type:	Thesis
Uncontrolled Keywords:	vulnerability detection, information leakage, obfuscation, honeypots,code privacy. -- güvenlik açığı tespiti, bilgi sızıntısı, karartma, bal küpü, kodgizliliği.
Subjects:	T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware
Divisions:	Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng. Faculty of Engineering and Natural Sciences
Depositing User:	Dila Günay
Date Deposited:	15 Jan 2026 16:49
Last Modified:	15 Jan 2026 16:49
URI:	https://research.sabanciuniv.edu/id/eprint/53628

Actions (login required)

: View Item