One of the biggest advantages of Large Language Models (LLMs) is their ability to translate content while considering its context. Today, I’m sharing how to automate the translation process using OpenAI and a simple JSON script.
In previous posts, we discussed the importance of structuring source content to provide extra context to translators, as well as leveraging LLMs to enhance Machine Translation (MT) results. Building on that foundation, we’ll now explore automating translations with a script.
⚠️ Important note: LLMs (like free MT engines) retain data shared with them unless you use a paid version. To protect sensitive or personal identifiable information, avoid using such data in this process.
Pricing is something to keep in mind as well. During my trials (translating fewer than 1,000 words), I spent approximately $0.22.
The script
import json
import openai
# Set your OpenAI API key
openai.api_key = "Add-Your-OpenAI-API-Key-Here"
# Read content to be translated from en3.json (English content)
with open('en3.json', 'r', encoding='utf-8') as en_file:
en_data = json.load(en_file)
# Initialize a dictionary to store Spanish translations
es_data = {}
# Perform translation
for key, value in en_data.items():
en_content = value.get("String", "N/A")
instructions = value.get("Instructions", "No instructions provided.")
# Prepare prompt message
prompt_message = (
f"Note: For buttons, translate verbs in the infinitive form; for labels, use the conjugated form.\n"
f"Action requested: Translate the following English content into Spanish (Spain), considering the provided key and instructions. "
f"Provide only the translated text without additional comments.\n\n"
f"Key: {key}\n"
f"Instructions: {instructions}\n"
f"English: {en_content}"
)
# Call OpenAI API
try:
response = openai.chat.completions.create(
model="gpt-4o", # Specify the model
messages=[
{"role": "system", "content": "You are an expert Spanish (Spain) translator."},
{"role": "user", "content": prompt_message}
],
max_tokens=100,
temperature=0.1
)
# Extract evaluated translation
translation = response.choices[0].message.content.strip()
# Store the Spanish translation
es_data[key] = {"String": translation}
# Print translation progress
print(f"Key: {key}")
print(f"English: {en_content}")
print(f"Spanish: {translation}")
print("-" * 50)
except Exception as e:
print(f"Error processing key '{key}': {e}")
# Write the updated Spanish translations to es_translation.json
with open('es_translation.json', 'w', encoding='utf-8') as es_translation_file:
json.dump(es_data, es_translation_file, indent=4, ensure_ascii=False)
Before I start talking about the script, I’d like to mention that I’m not a Python expert. I asked ChatGPT for assistance when I got stuck. One key line to highlight is openai.api_key. OpenAI provides a free trial of $5 or one month (whichever comes first), after which a paid plan is required.
How it works
The script processes English content, including keys and instructions, and sends them to OpenAI for translation. While I didn’t include additional context for this demonstration, providing clear task-specific context would likely improve the model’s performance.
Here’s an example of the input file I used:
{
"lbl_email": {
"String": "Email",
"Instructions": "This is a label where next to it the user will introduce their email address"
},
"btn_email": {
"String": "Email",
"Instructions": "This is a button to email someone"
},
"lbl_obtain_percentage_discount": {
"String": "Obtain a {{percentage_discount}} discount by clicking here",
"Instructions": "Label informing the user that they can obtain a percentage (for example 10%) discount."
},
"lbl_obtain_currency_amount_discount": {
"String": "Obtain a {{currency_amount}} discount by clicking here",
"Instructions": "Label informing the user that they can obtain a discount of a certain amount in their currency (for example €5)."
},
"lbl_confirm_email": {
"String": "Confirm email",
"Instructions": "This is a label"
},
"btn_confirm_email": {
"String": "Confirm Email",
"Instructions": "This is a button"
}
}
The output
Here’s the output I received:
Key: lbl_email
English: Email
Spanish: Correo electrónico
--------------------------------------------------
Key: btn_email
English: Email
Spanish: Enviar correo electrónico
--------------------------------------------------
Key: lbl_obtain_percentage_discount
English: Obtain a {{percentage_discount}} discount by clicking here
Spanish: Obtén un {{percentage_discount}} de descuento haciendo clic aquí
--------------------------------------------------
Key: lbl_obtain_currency_amount_discount
English: Obtain a {{currency_amount}} discount by clicking here
Spanish: Obtén un descuento de {{currency_amount}} haciendo clic aquí
--------------------------------------------------
Key: lbl_confirm_email
English: Confirm your email
Spanish: Confirma tu correo electrónico
--------------------------------------------------
Key: btn_confirm_email
English: Confirm Email
Spanish: Confirmar correo electrónico
--------------------------------------------------
From this output, we see that labels were translated accurately, with proper verb conjugations and context-specific translations for percentage-based and fixed-amount discounts. This highlights the value of providing context to LLMs during the translation process.
The importance of the prompt
The script itself is important, but arguably, the prompt used to interact with the LLM is even more critical. In this case, the sections prompt_message and messages=[ ] define the instructions that guide the model. Fine-tuning these instructions took some time, but getting them right was essential for achieving accurate translations.
Thank you for taking the time to read this! I hope you found this post informative and helpful. Automating translations using LLMs has great potential, especially when context is provided through structured metadata like keys and instructions. See you in our next post!