New tricks to old codes: can AI chatbots replace static code analysis tools?

Öztürk, Ömer Said and Ekmekçioğlu, Emre and Çetin, Orçun and Arief, Budi and Hernandez-Castro, Julio (2023) New tricks to old codes: can AI chatbots replace static code analysis tools? In: 2023 European Interdisciplinary Cybersecurity Conference, EICC 2023, Stavanger, Norway

Full text not available from this repository. (Request a copy)


The prevalence and significance of web services in our daily lives make it imperative to ensure that they are - as much as possible - free from vulnerabilities. However, developing a complex piece of software free from any security vulnerabilities is hard, if not impossible. One way to progress towards achieving this holy grail is by using static code analysis tools to root out any common or known vulnerabilities that may accidentally be introduced during the development process. Static code analysis tools have significantly contributed to addressing the problem above, but are imperfect. It is conceivable that static code analysis can be improved by using AI-powered tools, which have recently increased in popularity. However, there is still very little work in analysing both types of tools' effectiveness, and this is a research gap that our paper aims to fill. We carried out a study involving 11 static code analysers, and one AI-powered chatbot named ChatGPT, to assess their effectiveness in detecting 92 vulnerabilities representing the top 10 known vulnerability categories in web applications, as classified by OWASP. We particularly focused on PHP vulnerabilities since it is one of the most widely used languages in web applications. However, it has few security mechanisms to help its software developers. We found that the success rate of ChatGPT in terms of finding security vulnerabilities in PHP is around 62-68%. At the same time, the best traditional static code analyser tested has a success rate of 32%. Even combining several traditional static code analysers (with the best features on certain aspects of detection) would only achieve a rate of 53%, which is still significantly lower than ChatGPT's success rate. Nonetheless, ChatGPT has a very high false positive rate of 91%. In comparison, the worst false positive rate of any traditional static code analyser is 82%. These findings highlight the promising potential of ChatGPT for improving the static code analysis process but reveal certain caveats (especially regarding accuracy) in its current state. Our findings suggest that one interesting possibility to explore in future works would be to pick the best of both worlds by combining traditional static code analysers with ChatGPT to find security vulnerabilities more effectively.
Item Type: Papers in Conference Proceedings
Uncontrolled Keywords: ChatGPT · AI · Static code analysis · PHP vulnerabilities · Tools evaluation · Vulnerability detection · AI in cyber security
Divisions: Faculty of Engineering and Natural Sciences
Depositing User: Orçun Çetin
Date Deposited: 09 Aug 2023 11:32
Last Modified: 09 Aug 2023 11:32

Actions (login required)

View Item
View Item