A rather good post today by Paul Sawers in TNW Industry assesses the state of the internet as it begins to noticeably evolve from being defined as an English-language network to a more homogenous system embracing other languages such as Arabic, Russian and Chinese.

A big deal, you might say. Except that it is a big deal in a significant area – the use of languages that allow characters beyond those that the English language historically used when defining a domain name: the unique address of a website, for instance.

In practical terms, it means you can define an internationalized domain name in the language of your choice rather than only in English. So the internet address of your website that has a Russian domain name, for instance, can be wholly in the Russian language including the country code (rather than .ru). Or Chinese, Arabic,  Greek – almost any language you care to choose.

As Paul explains:

[…] because of technical constraints and the need to ensure domain names remain interoperable around the world, the Domain Name System (DNS) has traditionally been restricted to 37 ASCII characters: A-Z, 0-9 and the trusty old hyphen. Internationalized domain names (IDNs) are domains that support one or more non-ASCII characters, such as www.ø and ???????.com.

The permitted character set of the DNS has precluded the full representation of many languages in their native alphabets (scripts) within domain names. However, ICANN did approve the Internationalizing Domain Names in Applications (IDNA) system many years ago, and this system maps Unicode strings into the valid DNS character set using Punycode.

In short, this allows the transliteration or conversion between Unicode domain names and their ASCII equivalents (prefixed with xn--), thus allowing users to navigate the Internet in their own language. The IDNA system is designed to ensure that the Web doesn’t fragment into a number of localized versions separated by script.

So, Internationalized Domain Names (IDNs) have been available for registration at the second level for a while, meaning in countries such as Japan you could register a domain using a local script rather than a Latin-based one – however, it would still have been appended with ‘.jp’, rather than a local script equivalent.

And this was the big change that came into effect last year. It became possible to register IDNs for ccTLDs such as ????????. for Saudi Arabia, and .?? for Russia, and this at last meant domain names – including the country code – could contain non-Latin based characters throughout. This opened up the Internet’s addressing system to the majority of the world’s population, who have little comprehension of Latin-based scripts.

This gives the internet truly global potential in how people perceive it and are able to use it in the language of their choice, not just English.

This chart from Swedish internet monitoring and analytics firm Royal Pingdom showing internet users versus population size illustrates the real potential of this development:

In eighteen of the twenty countries listed, English is not the native language.

Paul Sawers goes into a great deal of detail, much of it technical. But his post is worth reading if you want to get a good sense of new possibilities around the world and how use of the internet will likely evolve.

