Sunny greetings, dear reader!
Aspects such as casing, spelling, punctuation, and terminology are especially important when it comes to translation. Some of these aspects can affect the output when using technologies like translation memory, machine translation, and language models.
This is the last part of our mini-series, “Writing with a global mindset”. In parts one and two, we discussed the fundamentals that every writer should keep in mind while working on content that will be translated into another language. Even if your company is currently only present in one market, this can change in the future. Plan ahead when it comes to content; you never know what lies ahead for your company!
A quick and short excursion for all the writers which are new to the topic of translation memory, machine translation and language models.
Translation memory (TM)
A TM is a bilingual database that stores terms, phrases, sentences, and paragraphs already translated. It functions to automatically identify and pre-translate similar phrases in future translations. This ensures not only consistency but can also increase the translation volume and cut costs.
Machine translation (MT)
MT doesn’t need much of an introduction because most of you have already used it. MT is designed to transform an original text in a specified language into its translated counterpart in another language. There are different kinds of MT, but that’s a topic for another post.
Language model (LM)
Now we are entering serious AI territory. There are different kinds of LMs, large and small. An LM is a machine learning model to predict and generate plausible content. For instance, an LM is the feature that auto-completes words or ChatGPT.
Done with the very basics, but what does this have to do with some of the aspects you read about in part one and two? If the English source is poor and a translator would have difficulties interpreting the content, I ask you, how is a machine going to be able to get it right?
A TM is only as good as the content that is fed to it. If the source is poor, the TM output is likely to be poor, too. For instance, if you have the same sentence but in different variations due to casing, spelling, and punctuation errors, then every variation costs money and time during translation.
Simple example
Correct sentence: The cat sat on the mat.
Variation 1: The Cat sat on the mat.
Variation 2: the cat sat on the mat
Imagine the correct sentence was previously translated and is now in the TM. If the very same sentence comes up in future content, there will be a 100% or even an ICE1 match from the TM, and the translator most likely doesn’t need to do anything. Your content will be matched with the existing translation. If you use variation 1 or 2, it is not going to be a 100% match, and the translator needs to interfere2. Hence, it costs more time, and as we all know, time is money. And that can add up quickly – correct casing and consistency is important!
In part two, I already mentioned that the usage of placeholders and cross-references can be problematic for translation when not leveraged correctly. Here is another example of an issue:
This is how it looks in the code:
“FollowOurDocumentation”: “Follow the steps as explained in our {{Documentation}}.”
Issue:
“Documentation” can be anything, for instance, business proposal, user guide, or report. For the English source it is not relevant what the placeholder “Documentation” is going to be, but for the translator it is super important to know, and here is why: Many languages have grammatical gender, and depending on the gender, words like adjectives need to be inflected:
Follow the steps as explained in our user guide.
Folgen Sie der Anleitung in unserem Benutzerhandbuch.
Follow the steps as explained in our guideline.
Folgen Sie der Anleitung in unserer Richtlinie.
The German word for “user guide” (Benutzerhandbuch) is neuter, while the German translation for “guideline” (Richtlinie) is feminine, so the possessive adjective “our” needs to be inflected in accordance to the different gender – unserem vs. unserer.
Other languages might have other struggles with an unknown placeholder value. So this type of sentence structure needs to be avoided.
As I mentioned before, if a human already has a hard time deciphering your content, a machine is most likely to fail big time.
At times there is the question, why the translator doesn’t translate the source as they see fit to make it work on their end instead of changing the source.
Here is an example: Let’s say we have the word “Email” to translate. In some languages we need to know if this is referring to an email itself or to an email address and translate it accordingly. In such cases, it’s best to alter the source to “Email address”. If the source is kept as “Email” and the translator is translating it as “Email address”, there is a potential dilemma for future projects. Let’s say in the next project “Email” will have the meaning of being an actual email. It will then be matched with the translation of email address. See the issue? Hacking the TM might be necessary sometimes but it is not advised. Of course, this situation differs, particularly in marketing translations, where there is often a separate TM for marketing content.
MT will handle a source “Email” that means “Email address” poorly and will for sure fail in translating it. Furthermore, when it comes to MT, the output can be affected by poor punctuation and sentence complexity3.
Additionally, and you might have already considered this, “email” can also function as a verb (yippee, just what we needed, another interpretation!). As you can see from this simple example, it may seem straightforward and innocent in English, but it can initiate an entire interpretation process for translators and the translation results can go totally sideways.
Another issue can be the casing. For instance, if you have a product called GN Email, the best practice is to consistently keep “GN” and “Email” together as “GN Email” to clarify that this is the product name and to prevent confusion with the general meaning of “email.” This also entails implementing style guide rules for product names to ensure that the casing remains consistent for product names.
As mentioned in part one, spelling is important as well. If you use the spelling “e-mail”, use it consistently.
And then there are LLMs, the newish craze in the translation industry. LLMs can be used for translation, but they do require a huge dataset to learn from, and LLMs might be biased at times depending on the dataset fed. However, currently, I see LLMs more as a checker of existing translations. LLMs can be super useful when there is enough information available for the source, such as instructions and string details. They can then be used to verify if the translator stuck to the provided information.
To circle back to ensuring the accuracy of the source, particularly in UI strings, it’s crucial for you to provide enough details for the translators or, better said, to ensure that the developer you work with implements instructions. The best practice is for you to provide those instructions to the developer. This way, you ensure that the developer doesn’t forget them, and you also guarantee that the instructions make sense, especially as the source language might not be the native language of the developer. LMs are only as good as the information presented to them.
I understand that sometimes there isn’t enough space for lengthy wording in the UI or on your slides, so you need to use “email” instead of “email address”. We just discussed this; here is where you need to provide enough context for the translators so they know what this is about.
When writing about spacing, it’s important to note that translations can often be much longer than the original source. So if your source is already in a tight space, it can be a buzzkill for the translation.
No matter if your content will be handled by translators or some kind of machine, as a writer, you should take pride in your content, and the aim is to help, inform or entertain the reader. At the same time, follow best practices to ensure your content is ready for easy translation and to help your company smoothly conquer international markets.
On to the next piece of content to write!
- ICE is short for In Context Exact and means that the content has been translated previously within the same context. That can mean either the surrounding content is the very same or it can mean that the key, which is an identifier of the content, is the same. ↩︎
- This depends on the fuzzy match settings, but it is certainly a different topic (I almost sound like The Neverending Story from Michael Ende). ↩︎
- https://www.cambridge.org/core/journals/recall/article/abs/an-investigation-of-machine-translation-output-quality-and-the-influencing-factors-of-source-texts/0112BA1949638F2EF46180D56516BF04 ↩︎