cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Ironphoenix
New member
Status: New idea

Bad actors are using look-alike URL's in email and browsers' automatic encoding to direct people to malicious websites. A safe example is https://connеct.mozilla.org , which transforms to https://xn--connct-6of.mozilla.org/ , which doesn't exist. The Roman e is replaced with the Cyrillic е, which looks identical (side-by-side: eеeеeе). Browsers can help reduce the risk of people getting caught by this by having a setting which disables this encoding by default, and offering users the option of enabling it once when they click on a link with disallowed characters or as a permanent setting.

Thanks, and best regards,

Mike B.

7 Comments
Status changed to: New idea
Jon
Community Manager
Community Manager

Thanks for submitting an idea to the Mozilla Connect community! Your idea is now open to votes (aka kudos) and comments.

luis123456789
Making moves

How about no?

It already has been discussed at nauseam that showing non-Latin links as "weird" by default, promotes a US-centric internet and turns internationalized domains and communities into second-class citizens, among other racist effects. English is but one language in the world, it's not even close to a majority.

 

I'm not fully sure what a better UI is for showing the links, considering it's HTML standard that links can and should go decorated (ie.: a <a> element with an inner HTML description). Probably a tooltip with a basic WHOIS fetch, but that would require an external connection to a third party. As for the pages proper, this can be solved by Page Info, since the shield icon with page info is already shown for all pages there's no reason not to place domain info there.

 

 

Ironphoenix
New member

How about thinking a bit more, then? I agree that (further) privileging anglophones is not the way to go, but there are ways to generalize the idea.

Flagging anomalous characters could be context-dependent, e.g. warning for a Roman e in a Cyrillic context (Привет and Привeт being indistinguishable as well: the second uses a Roman e instead of the Cyrilic е). An index of lookalikes and expected context would not be too hard to set up, I think. It would also be good to handle hidden diacritics, such as a combining dot above (U+0307) with an i.

luis123456789
Making moves

In general the idea of flagging anomalous characters is not bad in a general sense. The problem is the context is usually not decided by the browser's or the page's set language: if your browser is set to English and German, you are reading Wikipedia in German and they quote a section in French wich includes a link with french characters, should that be marked anomalous? It certainly most likely isn't (unless you don't trust Wikipedia...).

Another option which I've seen discussed in the past is an opt-in to flagging links that have characters from mixed character sets regardless of context. This would in particular help with the case of hidden diacritics, because otherwise "untangling" the text into eg.: "·i" would produce a web render that for reproducibility and standarization purposes is wrong.

In this case it's important that the two emphasized constraints are respected: the flagging has to be opt-in and it has to happen regardless of context. Since otherwise we get into the problem of trying to mind-read the user and going into "did you just assume my gender language?".

 

The next closest aproach would be to flag characters in the href of URLs that don't match the lang="" attribute of their inherited context. But this requires, ofc, buy-in from page developers since they'd have to properly announce content language at the eg.: <div>, <span> level.

And then we get to the UI part: what do we mean with "flagging"? We can't add any effect that could be imitate (or undone!) with CSS, such as bolding the links or changing the color, because that alters presentation, is not reproducible, and adds a point for fingerprinting. Any flagging would have to be done at the chrome level, which means either in the toolbars, or at the link's contextual menu (in which case the warning is only accessible if the user context-clicks instead of action-clicks).

 

 

Ironphoenix
New member

I think the relevant context is extremely local: consider the adjacent characters as the relevant alphabet.

As for flagging, a popup warning when one action-clicks on the link would be my personal preference, at least on opt-in, and maybe even as a default setting with an option in the pop-up to disable the feature. I agree that people could probably find a way to neutralize an in-page flag.

Visibly different characters are less of an issue: https://connéct.mozilla.org/ is at least noticeable to someone paying attention, just like https://conncet.mozilla.org/ . The é is at home amid the Latin alphabet, и obviously isn't (e.g. https://coииect.mozilla.org/ ), but something that looks like e is visually ambiguous.

luis123456789
Making moves

Hmmm so a pre-landing page... and now I remember that Firefox has pre-landing pages for domains that are warned by Google's """"Safe"""" browsing program. So yeah that makes sense, it should be doable and doesn't get in the way of the referer view or structure.

As for what information would the pre-landing show, I can think of at least five things of relevance to choose from - but maybe with some checks because two of those require an external connection (privacy, tracking, etc):

  • the domain name, as it was obtained from the DNS record (so likely the punycode version)
  • what DNS resolver delivered it to help identify attacks or simply wrong/outdated information (eg.: "you were sent here from [your ISP's resolver]")
  • the list of scripts used to render the doman name (eg.: "this domain name uses characters in Latin, Greek and Cherokee")
  • a view of the WHOIS record for the domain ("Owned by Connect Badpeople Cryptoscams dot com")
  • a view of the WHOIS record of the domain that would match a simplified / "ASCIIfied" view (eg.: of "owned by Mozilla Goodpeople dot org" for connect.mozilla.org instead of coииect.mozilla.org)

On when / if to deliver the pre-landing:

It might be a good idea to make it configurable - not just enable/disable, but also to give some amplitude of choice for the user on what character relationships are "trustable". I figure this would partly be set by the user's Accept-Language and partly by about:config? (because I can't think of a simple way to "demonstrate" this kind of UI)

For example, where I am it wouldn't make sense to use eg.: emojis in URLs in particular in the domain part. It *would* be unexpected to find greek, diacritic Latin ("Latin-D", "Latin-E") or georgian but if anything if they showed up they'd likely be more trustable than cyrillic, armenian or fullwidth CJK, given the current technopolitical climate, so I'd certainly like some forewarning about those. However, as time marches on I want to be able to update my model of trust and the way Firefox interacts with it.

 

Ironphoenix
New member

Sounds like a decent starting point; thanks!

It would be great if some of you folks who regularly use legitimate non-basic-Latin URL's could weigh in on how y'all would want this to look like.