Detecting Vulnerabilities with Large Language Models
Learning Objectives
- You know that large language models can be used to detect vulnerabilities in code.
- You know of Shift Left Security.
Large language models can also be used to detect vulnerabilities in code. As an example, already a prompt such as “I want you to act as a vulnerability detection system. Is the following function buggy? Please answer only with Yes or No.” can be useful in identifying vulnerabilities.
I want you to act as a vulnerability detection system. Is the following function buggy? Please answer only with Yes or No.
def insert_user(cursor, name):
cursor.execute(f"INSERT INTO users (name) VALUES ({name}";)
Yes
The above prompt is a slightly modified version from the article A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection that explores, among other things, the effect of prompts for detecting vulnerabilities. The study highlights that a basic prompt — the one shown above — performed the best for detecting vulnerabilities for four out of eleven large language models tested. At the same time, the study also highlight that state-of-the-art large language models are still poor at detecting vulnerabilities.
Despite this, they can be a useful addition to the toolkit of a software developer. As an example, a developer can use a large language model to suggest possible vulnerabilities in code that they have written.
Even if a model suggests that there are no vulnerabilities, developers should be aware of the limitations of such models and review and understand the code themselves.
In addition to detecting vulnerabilities, models can also be used to explain vulnerabilities. As an example, the following prompt can be used to ask the model to explain a vulnerability in code and to highlight the CWE categorization for the vulnerability.
I want you to act as a vulnerability detection system. Is the following function buggy? If yes, explain the vulnerability in two sentences and provide the relevant CWE code. If not, only respond with No.
def insert_user(cursor, name):
cursor.execute(f"INSERT INTO users (name) VALUES ({name}";)
Yes, the function is buggy. It is vulnerable to SQL injection as it directly inserts the user-provided name without proper sanitization or parameterization. This could allow an attacker to manipulate the SQL query and potentially access or modify the database. Relevant CWE code: CWE-89.
In the above, the model highlights that the vulnerability is related to the CWE-89, which is correct. However, the SQL injection vulnerability outlined above is among the most common vulnerabilities in web applications, and it is important to note that the model might not perform as well for other vulnerabilities.
Another aspect to detecting vulnerabilities is related to the timeliness of the information. While common vulnerabilities are well-known (and unfortunately repeated over and over again), there are also zero-day vulnerabilities that are not yet known to the public.
When we consider large language models, one of their downsides is that they have a cutoff point in terms of training data. The training of models take time, and the curation of the training data takes time. This means that the models do not in effect have information of recently released vulnerabilities. There are, however, developments on this front. As an example, large language models can be included with plugins to browse the internet, which could be used to retrieve up to date information.
Similarly, one could explore the possibility of retrieval-augmented generation and other methodologies for adding information for vulnerability detection. With retrieval-augmented generation, for example, zero-day vulnerabilities could be continuously loaded to a local database, which would be queried when prompting the large language model.
Large language models and other AI tools can be an additional tool for shift left in software security. By integrating security practices earlier in the software development lifecycle, developers can identify and fix vulnerabilities sooner, potentially reducing the cost and impact of security incidents.
For more details about shift left security, see e.g. Shift Left Security: Best Practices for Getting Started.