GILT Ninjas

Ninja Power in Globalization, Internationalization, Localization and Translation

Automating QA process

Using LLMs to review source content before translation

One of the biggest challenges in localization is dealing with ambiguous or incorrect source content. Because of the automation pipelines many companies use, content is often sent for translation the moment it’s detected – whether it’s ready or not.

This creates two key problems:

  1. Ambiguous content: When a string can be interpreted in more than one way, it’s easy for translators to misunderstand the context – especially in languages that require more specificity. If the company has a robust localization process, this will be caught during LQA, but even with that the company will pay twice: once for the translation and again for the fix.
  2. Incorrect content (e.g., typos, grammar issues): If a source string contains a typo or poor grammar, the content will need to be updated sooner or later, which also means that it will need to be translated again – wasting both time and money.

To avoid these issues, we believe companies should perform basic quality checks before content is merged into the repos that trigger localization workflows. LLMs are a great fit for this task, and we’ve started testing a simple, effective approach.

A simple example

Let’s say a developer adds the following strings to a feature, and they’re about to be sent for translation:

“description_dialog_notary_enrollment”: {

“String”: “Notary ensures all your document are authenticated and securely stored.” “Instructions”: “This time Notary refers to the role. Please translate that word.” 

“close”: {

“String”: “Close”

“Instructions”: “” 

“email”: {

“String”: “Email”

“Instructions”: “” 

“cta_email”: {

“String”: “Email”

“Instructions”: “Button to email someone” 

“email”: {

“String”: “Email”

“Instructions”: “” 

“To_sent”: {

“String”: “Sent by {{due_date_full_format}}”

“Instructions”: “” 

“To_sent”: {

“String”: “Sent by {{requester_name}}”

“Instructions”: “” 

“To_sent”: {

“String”: “Sent by {{1}}”

“Instructions”: “” 

}

As you can see, there’s a typo (“document” instead of “documents”), duplicate keys, inconsistent instructions, and several unclear placeholders.

We began with a basic prompt like this:

“I will provide a set of strings that will be translated into Japanese, Korean, Spanish, and French. They are in JSON format with associated keys and instructions. Can you identify whether accurate translation is likely (90% or higher), and flag potential issues?”

While this worked to an extent, it was too vague and didn’t catch grammar errors or typos effectively. The scoring also felt too high.

We improved our results by breaking the process into two parts. First, we gave the LLM a clear scoring framework:

“I will provide some strings in the next prompt, but let’s set some standards of how strings should be evaluated so you can provide better percentages. 

  • If the key doesn’t indicate the UI element → –30%
  • If placeholders are unclear or generic → –30%
  • If the string can have multiple interpretations without explanation → –30%
  • If there’s a typo or incorrect grammar → –50%

Keep in mind that if there are any specific style guides that are needed, here is the place where you should include the reference to them.

Then, we submitted the strings for evaluation using a second prompt. Here’s a sample of the output we received:

KeyAccuracyDeduction reasonsSuggestions to improve
description_dialog_notary_enrollment40%✅ UI context in key
❌ Typo in “document” → “documents” (–50%)
Fix typo. Clarify Notary role if needed for specific locales.
close40%❌ Key lacks UI info (–30%)
❌ Noun/verb ambiguity (–30%)
Rename to button_close. Add instruction about its usage.
email40%❌ Key lacks UI info
❌ Ambiguity (label vs. action)
Rename to label_email_input. Clarify intent in instructions.
cta_email70%✅ Context clear
✅ Instruction present
Slightly improve instruction: “CTA to send an email.”
To_sent (due_date)49%❌ No UI context
✅ Placeholder OK
❌ Meaning unclear
Rename to label_sent_by_due_date. Clarify that it’s a deadline.
To_sent (requester_name)49%❌ Same as aboveRename and clarify the role of the requester in instructions.
To_sent ({{1}})34%❌ Generic key
❌ Generic placeholder
❌ Ambiguous meaning
Rename to label_sent_by_user. Replace {{1}} with {{sender_name}}.

This structured approach makes it easy to spot issues before they reach localization. You can even implement a minimum accuracy threshold (e.g., 70–80%) before allowing content to proceed to translation.

Thank you for taking the time to read this post! I hope you found it helpful. Keep in mind that this is just an initial attempt at using LLMs to pre-screen source strings, and the results already look quite promising. With more refinement (e.g., referencing your company’s style guide, automating improvements), this process could become a powerful gatekeeping tool to raise quality and reduce localization costs.

See you in our next post!

Discover more from GILT Ninjas

Subscribe now to keep reading and get access to the full archive.

Continue reading