cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Ironphoenix
New member
Status: New idea

Bad actors are using look-alike URL's in email and browsers' automatic encoding to direct people to malicious websites. A safe example is https://connеct.mozilla.org , which transforms to https://xn--connct-6of.mozilla.org/ , which doesn't exist. The Roman e is replaced with the Cyrillic е, which looks identical (side-by-side: eеeеeе). Browsers can help reduce the risk of people getting caught by this by having a setting which disables this encoding by default, and offering users the option of enabling it once when they click on a link with disallowed characters or as a permanent setting.

Thanks, and best regards,

Mike B.

8 Comments
Status changed to: New idea
Jon
Community Manager
Community Manager

Thanks for submitting an idea to the Mozilla Connect community! Your idea is now open to votes (aka kudos) and comments.

luis123456789
Making moves

How about no?

It already has been discussed at nauseam that showing non-Latin links as "weird" by default, promotes a US-centric internet and turns internationalized domains and communities into second-class citizens, among other racist effects. English is but one language in the world, it's not even close to a majority.

 

I'm not fully sure what a better UI is for showing the links, considering it's HTML standard that links can and should go decorated (ie.: a <a> element with an inner HTML description). Probably a tooltip with a basic WHOIS fetch, but that would require an external connection to a third party. As for the pages proper, this can be solved by Page Info, since the shield icon with page info is already shown for all pages there's no reason not to place domain info there.

 

 

Ironphoenix
New member

How about thinking a bit more, then? I agree that (further) privileging anglophones is not the way to go, but there are ways to generalize the idea.

Flagging anomalous characters could be context-dependent, e.g. warning for a Roman e in a Cyrillic context (Привет and Привeт being indistinguishable as well: the second uses a Roman e instead of the Cyrilic е). An index of lookalikes and expected context would not be too hard to set up, I think. It would also be good to handle hidden diacritics, such as a combining dot above (U+0307) with an i.

luis123456789
Making moves

In general the idea of flagging anomalous characters is not bad in a general sense. The problem is the context is usually not decided by the browser's or the page's set language: if your browser is set to English and German, you are reading Wikipedia in German and they quote a section in French wich includes a link with french characters, should that be marked anomalous? It certainly most likely isn't (unless you don't trust Wikipedia...).

Another option which I've seen discussed in the past is an opt-in to flagging links that have characters from mixed character sets regardless of context. This would in particular help with the case of hidden diacritics, because otherwise "untangling" the text into eg.: "·i" would produce a web render that for reproducibility and standarization purposes is wrong.

In this case it's important that the two emphasized constraints are respected: the flagging has to be opt-in and it has to happen regardless of context. Since otherwise we get into the problem of trying to mind-read the user and going into "did you just assume my gender language?".

 

The next closest aproach would be to flag characters in the href of URLs that don't match the lang="" attribute of their inherited context. But this requires, ofc, buy-in from page developers since they'd have to properly announce content language at the eg.: <div>, <span> level.

And then we get to the UI part: what do we mean with "flagging"? We can't add any effect that could be imitate (or undone!) with CSS, such as bolding the links or changing the color, because that alters presentation, is not reproducible, and adds a point for fingerprinting. Any flagging would have to be done at the chrome level, which means either in the toolbars, or at the link's contextual menu (in which case the warning is only accessible if the user context-clicks instead of action-clicks).

 

 

Ironphoenix
New member

I think the relevant context is extremely local: consider the adjacent characters as the relevant alphabet.

As for flagging, a popup warning when one action-clicks on the link would be my personal preference, at least on opt-in, and maybe even as a default setting with an option in the pop-up to disable the feature. I agree that people could probably find a way to neutralize an in-page flag.

Visibly different characters are less of an issue: https://connéct.mozilla.org/ is at least noticeable to someone paying attention, just like https://conncet.mozilla.org/ . The é is at home amid the Latin alphabet, и obviously isn't (e.g. https://coииect.mozilla.org/ ), but something that looks like e is visually ambiguous.

luis123456789
Making moves

Hmmm so a pre-landing page... and now I remember that Firefox has pre-landing pages for domains that are warned by Google's """"Safe"""" browsing program. So yeah that makes sense, it should be doable and doesn't get in the way of the referer view or structure.

As for what information would the pre-landing show, I can think of at least five things of relevance to choose from - but maybe with some checks because two of those require an external connection (privacy, tracking, etc):

  • the domain name, as it was obtained from the DNS record (so likely the punycode version)
  • what DNS resolver delivered it to help identify attacks or simply wrong/outdated information (eg.: "you were sent here from [your ISP's resolver]")
  • the list of scripts used to render the doman name (eg.: "this domain name uses characters in Latin, Greek and Cherokee")
  • a view of the WHOIS record for the domain ("Owned by Connect Badpeople Cryptoscams dot com")
  • a view of the WHOIS record of the domain that would match a simplified / "ASCIIfied" view (eg.: of "owned by Mozilla Goodpeople dot org" for connect.mozilla.org instead of coииect.mozilla.org)

On when / if to deliver the pre-landing:

It might be a good idea to make it configurable - not just enable/disable, but also to give some amplitude of choice for the user on what character relationships are "trustable". I figure this would partly be set by the user's Accept-Language and partly by about:config? (because I can't think of a simple way to "demonstrate" this kind of UI)

For example, where I am it wouldn't make sense to use eg.: emojis in URLs in particular in the domain part. It *would* be unexpected to find greek, diacritic Latin ("Latin-D", "Latin-E") or georgian but if anything if they showed up they'd likely be more trustable than cyrillic, armenian or fullwidth CJK, given the current technopolitical climate, so I'd certainly like some forewarning about those. However, as time marches on I want to be able to update my model of trust and the way Firefox interacts with it.

 

Ironphoenix
New member

Sounds like a decent starting point; thanks!

It would be great if some of you folks who regularly use legitimate non-basic-Latin URL's could weigh in on how y'all would want this to look like.

luis123456789
Making moves

I mean, there are two different "levels" of what one expects to see or not see. One is URI paths, where it is very likely that if one path in the tree could produce one internationalized link, then it'd produce lots more (eg.: WIkipedia article names, which all start with the same prefix), and another one is domains, where if an internationalized / mixed set name is provided, you just have to deal with The One (after all if the domain name is already suspect, any path inside already is as well).

 

For URI paths, query strings, etc..., I'd expect an i18n / mixed set link to show exactly as it's written in the language it's written. If I go to https://es.wikipedia.org/wiki/Ñandú , I certainly expect to see that and not eg.: https://es.wikipedia.org/wiki/%nglkrwgb%39u5%9u23%u5%93u5%y235%32;and%92u592352 or somesuch crap - that looks 325% less trustable. Or worse, "font not available" squares.

More importantly: what I'm seeing in the address bar should correspond to what I'm seeing on the statusbar of a link elsewhere leading to that page. Of course that means that if the link I'm being given is URL-encoded, then the *link* should show to me just like that: %992957235wiggles, not "Ñandú". An important precent for web security is that we show the user exactly what we're telling them (and thus exactly what we're being told).

If a link landed me there and I had opted-in to a preference to raise notice of mixed-set or internationalized alphabets in URL paths, I'd expect to see a notice badge added to the Site Information / Padlock visual area. Let's call this preference "security.uri.mixed-characters.warn-on-links" or something. As for what counts as "mixed sets" in this scenario, it could start with any mixed set not within the user's set Languages: if I'm browsing or searching in German, I *expect* to see German diacritics and stuff.

The notgification, when opened, should show something like "the current URL was provided with a mixed character set, verify that you intended to land here and not in ${IDEALIZED_LATINIZED_LINK}".

The notification in Site Information should also include an option to whitelist either the parent path or the entire (sub)domain. After all, for sites such as eg.: Wikipedia, I'd rather NOT have to manually whitelist every Español article page ever.  If I go to Español Wikipedia and find one legit link with Español diacritics, I'm likely to find *more* legit links with Español diacritics, after all.

 

For domains, however, the situation is a little bit different.

If I didn't know it was safe and I saw it in the addressbar, I'd expect it to see it exactly as I typed it or pasted it (eg.: " https://www.ñandú.cl/ ", or more properly for this exercise, " https://connеct.mozilla.org/ " with the weird-e ), and then have a site badge or an indicator in the Site Information / Padlock area of the addressbar.

In addition to that, if I had opted in into a preference to be pre-landed to a warning page raised on mixed-set or i18n alphabets for domain names, similar to eg.: antiphishing measures, I'd expect to see the landing page that I discussed in my previous post. Once again, the reminder that what counts as a mixed set is decided by the user by setting their Languages, not by Firefox trying to be Too Anglo. The landing page should provide a general warning and then offer the option to, via a button or another user-initiated action, initiate a query to third parties for the DNS and WHOIS resolutions of both the current domain and the "idealized" domain, so that the user can compare. Besides, by being a user-initiated action, this respects user privacy and is far less likely to be blocked or disabled by eg.: Arkenfox, Librewolf, etc.

Once again, regardless of the landing page, I should still see the Site Information / Padlock notice in this case. Of course, with an option to add an exception.