Source Code Generation
Learning Objectives
- You know the term source code generation and know how large language models can be used to aid in source code generation.
Source code generation refers to the task of generating source code for solving a problem based on a textual description. The textual description can be a description of the problem or a description of a wanted solution. Code generation is among the more widely studied areas of large language models for software engineering.
For a review of how large language models have been used in software engineering, see the article Large Language Models for Software Engineering: A Systematic Literature Review.
As an example, we could ask a for a Python function that filters out strings containing the word “Error” from a given list of strings. The model then generates the code for the function.
I need a Python function that can be given a list of strings. The function should return a list that contains only the strings that contain the string "Error".
Here is a Python function that filters out strings containing the word "Error" from a given list of strings:
```python
def filter_errors(strings):
return [s for s in strings if "Error" in s]
# Example usage
strings = ["Error: Something went wrong", "Success: Operation completed", "Error: Another error occurred"]
filtered_strings = filter_errors(strings)
print(filtered_strings)
```
When you run this code with the example list of strings, it will output `['Error: Something went wrong', 'Error: Another error occurred']`, which contains only the strings that contain the word "Error".
As another example, we can ask a model to create a function that implements the Quicksort algorithm. The model then generates the code for the function.
Create a quicksort function in Python.
Certainly! Here's an implementation of the quicksort algorithm in Python:
def quicksort(array):
if len(array) <= 1:
return array
pivot = array[len(array) // 2]
left = [x for x in array if x < pivot]
middle = [x for x in array if x == pivot]
right = [x for x in array if x > pivot]
return quicksort(left) + middle + quicksort(right)
...
In the above example, we prompt the model to generate code for the quicksort algorithm, which is a specific sorting algorithm.
In practice, however, programming languages come with standard libraries that typically include efficient sorting algorithms. As an example, Python comes with a sort algorithm called Timsort.
Whether the model highlights this in the response depends on the prompt and the model capabilities.
By adjusting the prompt, we can generate the code for different programming languages. In the example below, we ask for an implementation of the quicksort algorithm in JavaScript instead of Python.
Create a quicksort function in JavaScript.
Of course! Here's a simple implementation of the QuickSort algorithm in JavaScript:
function quickSort(arr) {
if (arr.length <= 1) {
return arr;
}
const pivot = arr[arr.length - 1];
const left = [];
const right = [];
for (let i = 0; i < arr.length - 1; i++) {
if (arr[i] < pivot) {
left.push(arr[i]);
} else {
right.push(arr[i]);
}
}
return [...quickSort(left), pivot, ...quickSort(right)];
}
...
Note that although we simply changed the language from Python to JavaScript in the prompt, the implementation also marginally changes. In the Python example, the pivot was chosen from the middle of the sorted array, while in the JavaScript version, the pivot was chosen from the end of the array.
Although the brief examples here showed code generation for the Python and JavaScript languages, you can as well ask for code generation for other programming languages, naturally depending on the capabilities of the model. As an example, LLMs could be prompted for HTML, SQL statements, or even for Excel formulas.
Prompting is also a way to guide the model to generate code that is more in line with the desired outcome. Having the models engage in brainstorming — creating multiple approachs for solving the problem at hand — and picking up the best ideas can also increase the performance of LLMs in (algorithmically) more complex programming problems.
For additional details of brainstorming with LLMs, see the article “Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation”.
One of the benefits of using large language models for source code generation is that the solution from the large language model can act as a proof of concept (POC). Such POCs can act as “dirty hacks” when looking for a quick solution, as inspiration when looking for how a problem could be approached, and as a starting point for further refinement.