Since 2003, Punycode has been available to encode a wider character set than ASCII into domains that still adhere to rules for domain names. This allows users to enter characters from languages with alphabets larger than or in the place of ASCII characters for domain names and have them resolve as expected.
A major downside of Punycode and support for expanded character sets for domain names is known as International Domain Name homo graph attack: https://en.wikipedia.org/wiki/IDN_homograph_attack#Homographs_in_internationalized_domain_names . This allows attackers to show a domain name that looks extremely similar to a more well-known domain name in order to pilfer data. From Wikipedia:
For example, a regular user of exаmple.com may be lured to click on it unquestioningly as an apparently familiar link, unaware that the third letter is not the Latin character "a" but rather the Cyrillic character "а" and is thus an entirely different domain from the intended one.
Firefox (as released with English as the default language) handles this by decoding all internationalized domain names as their decoded forms, so `I❤️.ws` gets decoded as `xn--i-7iq.ws`. This is a safe way to defend against IDN attacks, however the user experience is poor. Whether the user clicked a link or even if the user directly typed the URL, the address becomes a string of letters that do not have any meaningful interpretation for a user.
I propose that for a subset of the extended character set allowed via Punycode, characters that do not have ANY ASCII lookalikes (for example, kanji or emojis) in the entire domain are not decoded and instead are rendered with their non-Punycode encoded characters.
In addition to a more pleasant experience for users using domains with characters other than ASCII, it would also improve the experience for domains with emoji in them. A large portion of emoji are also easily distinguishable from ASCII characters. If the domain has even one character that is visually similar to an ASCII character, Firefox should retain current behavior by decoding the domain and presenting that to the user.
... View more