Code Translation and Rewrites
Learning Objectives
- You know of the possibility to use large language models for rewriting and translating code.
Large language models excel at transforming text from one form to another — it is essentially the task that they have been trained for. Due to this, they can be used to translate code from one language to another, given that they have had sufficient data of both languages in their training set (or given that we can prompt the model in a way to outline the syntax).
For example, large language models could be used to transition an outdated codebase to a maintained programming language, or to rewrite code to make it more readable.
As an example, some estimates of COBOL use have highlighted that there are 220 billion lines of COBOL in use (2017).
The following code outlines a program that sums the numbers from 1 to 5 in BASIC, using the GOTO statement to control the flow of the program.
10 LET N = 1
20 LET S = 0
30 LET S = S + N
40 PRINT S
50 LET N = N + 1
60 IF N <= 5 GOTO 30
70 END
We can ask a large language model to translate the program to another language. In the following, we ask for a translation of it into Python.
Please translate the following program into the Python programming language. Respond only with the Python code.
10 LET N = 1
20 LET S = 0
30 LET S = S + N
40 PRINT S
50 LET N = N + 1
60 IF N <= 5 GOTO 30
70 END
N = 1
S = 0
while N <= 5:
S = S + N
print(S)
N = N + 1
Above, the code is correctly translated into Python. The code does not yet follow some of the best practices of Python — it, for example, uses an uppercase variable name, which is not common in Python. Let’s ask for a rewrite from the large language model.
Please rewrite the following Python code to make it more understandable. Pay attention to meaningful variable names.
N = 1
S = 0
while N <= 5:
S = S + N
print(S)
N = N + 1
current_number = 1
sum_of_numbers = 0
while current_number <= 5:
sum_of_numbers += current_number
print(sum_of_numbers)
current_number += 1
Now, the code is also more understandable. The variable names have been changed to be more descriptive, and the code is easier to read.
While the above example shows a transition from an older programming language to a newer one, including making it more readable, it is possible to also create less readable code. As an example, in the following, we ask the LLM to translate Python code to the Shakespeare Programming Language.
Could you please translate the following Python code to the Shakespeare programming language?
a = 1
b = 2
a += 1
if a == b:
print("Hello!")
else:
print("World!")
The Shakespeare Programming Language (SPL) is an esoteric programming language that reads like Shakespearean play. Here's how you might translate the provided Python code into SPL:
The Simple Addition.
Romeo, a young man with remarkable patience.
Juliet, a likewise young woman of remarkable grace.
Act I: The beginning.
Scene I: The setting.
[In this scene, Romeo and Juliet get their initial values.]
Romeo:
Listen to your heart!
Juliet:
Listen to your heart!
Listen to your heart!
Scene II: The increment.
[In this scene, Romeo's value is incremented by one.]
Juliet:
Am I better than you?
Romeo:
If so, let us proceed to scene IV.
... (and so on) ...
In a similar vein, code could also be translated from one language to another in the search of more efficient execution. For example, code written in Python could be translated to C++ or Rust to make it run faster.
The Rosetta Code site has a large collection of code snippets in many different languages. Try choosing a random page from Rosetta Code and translating code on the page from one language to another. What do you observe?