Key Considerations for Localization in Data Model Design: Handling Names and Cultural Variability

When talking about localization, one key aspect that is usually overlooked is whether internationalization/localization needs to be taken into consideration when generating data models.

The general answer to this is no. However, certain data models need to be carefully designed if we want our company to not fall short (even in the US!).

When thinking about localization and internationalization, we tend to assume that only frontend elements are affected (order of fields, their size, placement, etc.). But there is a key aspect to consider as well: How is that data going to be stored so it can be reused properly when the time comes?

As a first example, I want to mention that names are quite complex depending on the culture they come from (we will have a separate post just for this topic!). The main challenge is that a big majority of companies associate them with a particular region, when this shouldn’t be. For example, in English-American culture, it is standard to use just one last name “John Smith”, sometimes with middle names “John Augustine Smith” and sometimes, suffixes, “John Augustine Smith, Jr.”. That’s why when developers have to address the issue of how to parse a name, they think, “it is easy! We just ask for the full name and from there we can obtain every single part!”:

“Full_name”: “John Augustine Smith, Jr.”

“First_name”: “John”

“Middle_name”: “Augustine”

“Last_name”: “Smith”

“Suffix”: “Jr.”

However, this has a big issue. What happens with people’s names that are not English-American? In the US, it is increasingly common to have people from other cultures. Did you know that only around 58% of the American population are Non-Latino White Americans?

This means that if you build your product too English centric, you might be missing a potential market of 42% of the American population. For example, around 19.1% of the total population (63.7 million) is Hispanic/Latina. And this is very important for names.

Hispanic names can be a bit complex since they can have multiple first names, multiple last names, and they might use characters that are not in the English alphabet. If we look at a name like “Juan de Dios Martínez Rodríguez” there is no way to programmatically differentiate the last and the first name from each other.

For that reason, a better approach to this challenge is:

Always ask for the different parts of a name. Don’t take the full string and from there parse different parts. If you get a full name, you get that, nothing more, nothing less.
Don’t assume that the English alphabet will cover all the characters in a name. You have to keep in mind that while some states have strict laws that prohibit obscenities, numbers, and names that are too long, other states have no restrictions. You should use UTF-8 Encoding to ensure that all the characters are properly displayed and stored. In that way you are ready not only for American names but also to go to international markets.
Don’t assume that spaces or numbers are not allowed. If you have to prohibit certain characters, it should only be for a very good reason.
Don’t assume any length for the different fields. Some people might have just a last name, some just a first name, some will be very short, some will be very long.
Don’t assume any order. Some cultures are {{First_name}} {{Last_name}}, some {{Last_name}} {{First_name}}.
When possible, ask the user how they want to be called. (For example, their name might be Richard, but they want to be called Rick).

All this brings us to the question of how to store all this information in the backend. The trickiest part is that once you establish a way to store the information, it is quite hard to move away from it (and expensive).

For that reason, even though your company might only be thinking about a particular market, it is always a good idea to create a model that is flexible enough to cover the most general use case without being too complex.

Names are so complex that no one will be able to figure out how to store all name types correctly, but if you coordinate with your globalization team, you can cover a very big portion of them. For example, looking at the data models from Microsoft and Adobe, it seems they have a string just for the last name. This is fine until you need to discern the first and second last names of a person (which is common in Spain). You might think, well, the first word will be the first last name and the second will be the second last name. Ok, but what about “Espinosa de los Monteros del Rey González”? The first last name would be: “Espinosa de los Monteros del Rey” and the second “González”. So my first advice for the full name also applies to the last name:

Don’t take the full string and parse from there the different parts. If you get a last name, you get that, nothing more, nothing less.

However, as I mentioned, maybe Microsoft and Adobe don’t have a use case to have two separate last names (or, for example, the current first and last name and any previous first and last names), and for that reason, they did it this way, so not covering this is not an issue. The issue will come if one day, they need that use case, and then they need to coexist with two different data models, do a complete migration to the new one, or have issues in something as important as their customer names.

For that reason, when you are going to store something that changes with the culture or region, like names, phone numbers, addresses, etc., make sure you seek advice from experts (like Unicode CLDR) to use a model that covers you for current and future needs.

Thank you very much for taking the time to read this. I hope you found it informative. As you can see, the considerations for proper localization go beyond the current needs, and it is always a good idea to have a roadmap of where the company wants to go in the future.

See you soon in a new post!

GILT Ninjas

Key considerations when generating data models

Key considerations when generating data models

Discover more from GILT Ninjas