Coding with Large Language Models

Code Review and Summarization

Learning Objectives

You know examples of how large language models can be used for code review and code summarization.
You can provide examples of upsides and downsides of automating code reviews.

Code review is a process where developers review each other’s code to improve the quality of the codebase and to share knowledge, while code summarization is a process of creating a textual description of the code. They are in a sense distinct tasks, but both require forming an understanding of the code. In this chapter, we will look at how large language models can be used for code review and code summarization.

Code review

The starting point of code review is assessing whether the code is correct and whether it follows the best practices of the project. This is a task that requires a lot of knowledge about the project and the programming language. For example, the code reviewer needs to know the agreed upon style practices of the project, the architecture of the project, the syntax and semantics of the used programming language(s). Naturally, the reviewer also needs to be able to understand the purpose of the code and the changes that the code introduces.

Reviewing code is time-consuming and can require a lot of effort. Thus, it is not surprising that there’s been a lot of effort in easing parts of the code review process. As an example, the pull request functionality of version control systems that was introduced more than a decade opens up possibilities for code reviews at ease, when coupled with a meaningful developer (and user) experience.

The talk How GitHub Uses GitHub to Build GitHub by Zach Holman outlines the possibilities of pull requests for code reviews and how GitHub uses pull requests internally.

Parts of the process are being also automated. For example, there are tools that can automatically check whether the code follows the agreed upon style practices, and there are also a variety of tools such as Amazon’s CodeGuru Reviewer that analyze source code and offers suggestions for improving it.

Large language models can also be used for reviewing code. At the simplest form, one could ask that a large language model reviews our code and provides suggestions on it.

Perform a code review of the following code:

import requests

class SimpleJokeAPI:
BASE_URL = "https://simple-joke-api.deno.dev"

@classmethod
def get_random_joke(cls):
"""Retrieve a random joke."""
response = requests.get(f"{cls.BASE_URL}/random")
if response.status_code == 200:
joke_data = response.json()
return joke_data
else:
response.raise_for_status()

# Usage example
if __name__ == "__main__":
joke = SimpleJokeAPI.get_random_joke()
print(joke['setup'])
print(joke['punchline'])

Overall, the code looks straightforward and does its job of fetching a random joke from the SimpleJokeAPI. However, there are some improvements and best practices that we can suggest:

- Error Handling: While you're raising an exception for unsuccessful responses, you might want to handle these exceptions more gracefully so that users get meaningful error messages.

-Response Structure Validation: You're directly accessing joke['setup'] and joke['punchline']. However, there's no validation to ensure that the response JSON contains these keys.

Class Design: The class only has a single class method and no instance methods or properties. This could have been a simple function instead of a class unless there are plans to expand the class in the future.

Docstrings and Comments: While the method docstring for get_random_joke is simple and clear, it might be useful to provide more context about the expected return value and potential exceptions.

Constant Naming: Conventionally, constants are named in uppercase. The name BASE_URL follows this convention, but if there were other constants, they should adhere to the same convention.

Here's a revised version of the code:

...

In the above example, we’ve given the large language model a code that we previously created using a large languag model, and find a number of issues with it. The issues are not necessarily wrong, but they are suggestions that could be used to improve the code. The model also outputs a revised version, which we have omitted from the above example.

In the above example, we did not provide any specific guidelines for the model. However, we could also provide the model with a set of guidelines that it should follow. For example, we could ask the model to check whether the code follows the PEP 8 style guide for Python code.

At the same time, while we could ask a large language model whether the code follows the PEP 8 style guide, we could also use a Python style guide checker tool such as pep8 to handle the task.

There’s been efforts towards training large language models specifically for code reviews. The article “Automating Code Review Activities by Large-Scale Pre-training” presents a study where the authors collected code reviews and code changes from open-source projects and used them to train a large language model for code reviews. The code reviews produced by the model are often sensible, although as is the case with code reviews, the developer would have to go over the suggestions and decide whether they are appropriate or not. The pre-trained model for “CodeReviewer” is available on Hugging Face at https://huggingface.co/microsoft/codereviewer.

To automate or not to automate?

While code reviews can be automated to some extent, there are also reasons why parts of code reviews should not be automated. For example, code reviews are a way to share knowledge, and fully automating them would remove this aspect. Similarly, through code reviews, one may start to observe larger patterns in the code base that could be improved, and automation would likely reduce such insights.

Code summarization

Code summarization is the process of forming a textual explanation of given code. In a similar way as large language models are becoming better at forming code from text, they are becoming better at forming text from code. We can, for example, ask a large language model to summarize a given code.

Please summarize the following code:

import requests

class SimpleJokeAPI:
BASE_URL = "https://simple-joke-api.deno.dev"

@classmethod
def get_random_joke(cls):
"""Retrieve a random joke."""
response = requests.get(f"{cls.BASE_URL}/random")
if response.status_code == 200:
joke_data = response.json()
return joke_data
else:
response.raise_for_status()

# Usage example
if __name__ == "__main__":
joke = SimpleJokeAPI.get_random_joke()
print(joke['setup'])
print(joke['punchline'])

This code is a simple Python script that fetches a random joke from the "simple-joke-api" hosted on "deno.dev". Here's a summarized breakdown:

The requests library is imported to make HTTP requests.

A class SimpleJokeAPI is defined with:
A constant BASE_URL pointing to the base endpoint of the joke API.
A class method get_random_joke that:
Makes a GET request to the /random endpoint of the API to retrieve a random joke.
If the request is successful (HTTP status code 200), it parses the JSON response and returns the joke data.
If the request fails (any status code other than 200), it raises an error with the specific failure reason.

At the end of the script, there's a simple usage example that:
Fetches a random joke by calling the get_random_joke method.
Prints the setup of the joke.
Prints the punchline of the joke.

So, when you run the script, it will fetch a random joke from the mentioned API and display its setup followed by its punchline.

The above explanation is a bit long, but it does provide a good summary of the code. The model has also provided a breakdown of the code, which can be a useful addition to the summary. We could also ask for a more concise summary.

The code defines a SimpleJokeAPI class that fetches a random joke from the "https://simple-joke-api.deno.dev" URL. If executed as the main program, it prints the setup and punchline of the retrieved joke.

Code summarization and commit messages

One recently proposed use case of code summarization with large language models is to generate commit messages for code changes. The idea is that the code changes are summarized by a model, which are then added as a commit message to describe the changes.

The gptcommit project provides a tool for generating commit messages using OpenAI’s API. Read the project announcement at https://zura.wiki/post/never-write-a-commit-message-again-with-the-help-of-gpt-3/.

← Working with APIs

Utility and Trustworthiness of Generated Code →