?

User tests: Successful: Unsuccessful:

avatar smz
smz
21 May 2015

This fixes regular expressions used for email validation (see: #6982)


Joomla doesn't allow ' char in user email, JFormRuleEmail pattern has ` but not '

It is allowed to have an apostrophe ' in the e-mail address by the RF5322 standard as long as it is an ASCII character.

Irish email as example: o'sullivan@example.com

avatar smz smz - open - 21 May 2015
avatar zero-24 zero-24 - change - 21 May 2015
Category JavaScript Libraries
avatar zero-24 zero-24 - change - 21 May 2015
The description was changed
Status New Pending
avatar zero-24 zero-24 - change - 21 May 2015
Labels Added: ?
avatar smz
smz - comment - 21 May 2015

@zero-24 (I suppose...) Thanks for fixing my initial comment! I anyway suggest to use example.com as an example...

avatar smz
smz - comment - 21 May 2015

oops... I can't understand what Travis is complaining for...

avatar smz smz - change - 21 May 2015
The description was changed
avatar smz
smz - comment - 22 May 2015

nope... it doesn't work: problem in validate.js. Working on it, please stand-by... sorry!

avatar smz
smz - comment - 22 May 2015

Ready for testing. Sorry for the initial (hopfully...) f##k-up.

avatar bertmert
bertmert - comment - 22 May 2015

I tried to add a new user in backend and frontend. Tested emails

hello're@example.org
o'sullivan@example.com

Result is

Error
Save failed with the following error: Please enter a valid email address

Warning
Registration failed: Please enter a valid email address.

I think it's the PHP validation because page reloads after form submit and then message occurs.

avatar smz
smz - comment - 22 May 2015

@bertmert
after applying the patch, please refresh the JS by shift+reload... (or clear the browser cache)...

avatar smz
smz - comment - 22 May 2015

btw... where are you testing? I tested in com_contacts...

avatar smz
smz - comment - 22 May 2015

silly me: you already stated where you are testing! let me try...

avatar smz
smz - comment - 22 May 2015

You're right: it doesn' work with com_users (it works with com_contacts, anyway...) I have to find out the reason, but now I'm leaving for some hours. Please stand-by...

avatar smz
smz - comment - 22 May 2015

even in com_contact JS validation is OK: o'sullivan is accepted (black field), o"sullivan is rejected (red field), but then it doesn't save... must be something within the component itself... will check later... sorry.

avatar bertmert
bertmert - comment - 22 May 2015

I'll send you a PR to branch .smz:EmailValidation.

There's another validation in libraries/joomla/mail/helper.php::isEmailAddress line 130

$allowed = 'a-zA-Z0-9.!#$%&’*+\/=?^_`{|}~-';

called by libraries/joomla/table/user.php::check line 198

Changed line to

$allowed = 'a-zA-Z0-9.!#$%&\'*+\/=?^_`{|}~-';

and all works now.

avatar Denitz
Denitz - comment - 22 May 2015

I see BC issue:
Initial 'grave accent' char ` (0x60) was replaced with 'single quote' ' (0x27), so all current emails with 0x60 will be invalidated and users won't be able to save profiles.

We should add new allowed char (single quote) to patterns but not replace any existing chars.

avatar smz
smz - comment - 22 May 2015

@bertmert thanks for your PR for fixing validation in com_user! Merged.
Actually I think JMailHelper::isEmailAddress() needs more fixes as it allows invalid characters in the domain part of email addresses. I'll look into that and see if it is worth fixing...

(oops... while I'm writing this, I see Travis failed after merging your PR... will look into that...)

@Denitz IMHO this is not breaking B/C: this is fixing a bug introduced by someone who used the wrong apostrophe (my bet is that he/she was using a Mac while editing...). Try register an email address containing the grave accent char with Gmail, outlook.com, Yahoo or any other major email provider and see if you can...

avatar smz
smz - comment - 22 May 2015

Travis is failing because of this:

There were 2 failures:
1) JMailHelperTest::testIsEmailAddress with data set #19 ('o\'reilly@there.com', false)
Failed asserting that true matches expected false.
/home/travis/build/joomla/joomla-cms/tests/unit/suites/libraries/joomla/mail/JMailHelperTest.php:301
phar:///home/travis/.phpenv/versions/5.3.29/bin/phpunit/phpunit/TextUI/Command.php:152
phar:///home/travis/.phpenv/versions/5.3.29/bin/phpunit/phpunit/TextUI/Command.php:104

2) JMailHelperTest::testIsEmailAddress with data set #20 ('o’reilly@there.com', true)
Failed asserting that false matches expected true.
/home/travis/build/joomla/joomla-cms/tests/unit/suites/libraries/joomla/mail/JMailHelperTest.php:301
phar:///home/travis/.phpenv/versions/5.3.29/bin/phpunit/phpunit/TextUI/Command.php:152
phar:///home/travis/.phpenv/versions/5.3.29/bin/phpunit/phpunit/TextUI/Command.php:104

IMHO it is a mistake in test unit (same mistake we had in code).
I'm calling PLT to see what we should do...

avatar smz
smz - comment - 22 May 2015

Some further considerations (food for thought...):

  • in the phpmailer library there is a similar validation regex. Although a bit more sophisticated (in the domain checking part) it is compatible with the one introduced with this PR. The previous regex we used wasn't.
  • I think actually both our regex and the phpmailer one are wrong: quoted strings are allowed in email address (e.g.: "John Doe"@example.com)
  • some think email addresses should not be checked for RFC compliance: see http://girders.org/blog/2013/01/31/dont-rfc-validate-email-addresses/
avatar smz
smz - comment - 22 May 2015

More information:

The HTML5 standard defines the rule for validating the <input type=email> element with the following PCRE regexp:

/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

According to the standard itself, the above is not a strict implementation of the relevant RFC5322:

This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

I think the more sensible solution for us is to follow the HTML5 standard and implemant the same validation rules. This is, I guess, particullay true for media/system/js/html5fallback.js.

A further consideration is that we should be consistent with that (or any other) validation rule: if we use a rule in our JS code, we cannot use a different rule in our PHP code.

Due to the above, I will soon submit to this PR a commit in line with the above considerations (it will fail with Travis for the same reasons it is failing right now).

Also: in libraries/joomla/form/rule/email.php in theory there is provision for validating only the local-part of an email address (i.e: without the domain part), but this wasn't actually implemented. I'm unsure what to do with that... In the meanwhile I will do what was previously done, that is always check for a full email address.

If you think there is anything wrong with the above, please let me know...

avatar dgt41
dgt41 - comment - 23 May 2015

if we use a rule in our JS code, we cannot use a different rule in our PHP code

I don’t think I agree with that. JS validation is to check (coarse) that an email address is provided and PHP should make sure that this address actually exists etc

avatar smz
smz - comment - 23 May 2015

Dimitris, our PHP code has never checked that an address actually exist (not even the domain): it has always done a formal check of the email address, the as JS does.

avatar dgt41
dgt41 - comment - 23 May 2015

Sergio, I just stated what my personal preference is here. It’s not something that should prevent you from doing you think is best.
But here is some infos that support this:
For js regex whatever you will come up to I don’t think its gonna be bulletproof https://fightingforalostcause.net/content/misc/2006/compare-email-regex.php
For php this is easier: http://isemail.info

avatar dgt41
dgt41 - comment - 23 May 2015

@test works for o’sulivan@yahoo.gr

avatar smz
smz - comment - 23 May 2015

I know, Dimitris, I know! I didn't took it as a "personal thing" at all!

That regexp is only what HTML5 does and it is the same as our previous one (with the single-quote bug corrected) as far as regards the local-part, and a bit more sophisticated in the domain part.

That "bit more" of sophistication will also allow us to simplify a lot isEmailAddress in libraries/joomla/mail/helper.php (which right now validate the domain-part piece by piece)

b0fbc5d 23 May 2015 avatar smz YANM!
avatar smz
smz - comment - 23 May 2015

Now that I fixed my noob's mistake (what a shame!), the behavior is as before: it works, but it fails Travis unit tests (which IMHO are wrong).

Next commit will be for JMailHelper::isEmailAddress() in libraries/joomla/mail/helper.php

I decided to keep it separate in case there is any concern about it: it will be easily revertable.

avatar smz
smz - comment - 23 May 2015

Added fix for unit test, in case PLT agrees... (@zero-24, @wilsonge, @mbabker, or anybody else from PLT may have a look at this, please?)

Ready for testing.

avatar infograf768
infograf768 - comment - 23 May 2015

Grave accent IS authorized in email addresses, not acute accent.
I agree with @Denitz : taking it off would not be B/C
(There is no way a coder on Macintosh can confuse grave accent with single quote)

avatar smz
smz - comment - 23 May 2015

@infograf768

Grave accent IS authorized in email addresses.

Sorry, JM, but I don't think it is. Beside the HTML5 standard that I already linked, please have a look at http://en.wikipedia.org/wiki/Email_address#Local_part

Anyway, if there is a consensus we should allow it because we previously did, I'll correct the regexp. This will not be consistent with what the phpmailer library (that we use) does, so there may be situations where such an address will not be deliverable by phpmailer.

The unit test should anyway be corrected to allow the single quote...

(There is no way a coder on Macintosh can confuse grave accent with single quote)

I think (but I'm unsure) I've seen that happening not because of the keyboard but because of automatic translation by some Macintosh program (Safari?)

avatar dgt41
dgt41 - comment - 23 May 2015

@infograf768 Sergio is right that macs are messing around with the quoting. take a look at use smart quotes and dashes in control panel, it’s also the default behavior and most people don’t even know about it!
screenshot 2015-05-23 14 58 11

avatar smz
smz - comment - 23 May 2015

Doubts, damn doubts...

RFC 3969, which is more readable than RFC 5321 states the following:

Without quotes, local-parts may consist of any combination of
alphabetic characters, digits, or any of the special characters

! # $ % & ' * + - / = ? ^ _ ` . { | } ~

and thus grave accent (ASCII 0x60) seems to be excluded, but then it continues on with:

period (".") may also appear, but may not be used to start or end the
local part, nor may two or more consecutive periods appear. Stated
differently, any ASCII graphic (printing) character other than the
at-sign ("@"), backslash, double quote, comma, or square brackets may
appear without quoting
. If any of that list of excluded characters
are to appear, they must be quoted.

... and this seems to include (ASCII 0x60)

avatar smz
smz - comment - 23 May 2015

... but it seems that it is the descriptive part to be in error. E.g., the invalid column : and semicolumn ; characters and are not cited inte "other than" part of the description.

avatar smz
smz - comment - 23 May 2015

I introduced a test for the total address length in JMailHelper::isEmailAddress() (before it wasn't tested) and corrected the maximum domain-part length.

But we have a problem: in #_users the email column is defined as varchar(100) while to correctly store every valid email address it should be varchar(255)

Now we have two options:

  • correct my test to limit the global address length to 100 (non-compliant, but probably more than enough)
  • extend the email column to be varchar(255) (actually 254 will suffice)
avatar smz
smz - comment - 23 May 2015

Ouch! I just realized I'm doing Punycode conversion for IDNA domains wrong: I'm converting the domain-part as a whole, while it should be converted part by part (label by label, techically) as it was already done in JMailHelper::isEmailAddress()

But JS validation was already making my same mistake, so i think it was impossible for our russian/chinese/japanese/etc.. friends to introduce IDNA addresses: I'd really like to hear from them about this possible issue...

serendipity, serendipity...

avatar smz
smz - comment - 24 May 2015

I've now fixed the IDNA domains validation both in validate.js and in JMailHelper::isEmailAddress().
Domains with non-roman characters that where previously refused by the JS validation are now accepted. I don't know if there was an issue already open for this: in case I think it can be closed.

I've been unable to do the same IDNA treatement in html5fallback.js. I'm also unsure when this is used and if it is important to fix it too.

To sum it up, the outstanding issues/decisions-to-be-taken are:

  • Decide if the grave accent should be accepted as a valid character in the email address local part. My vote (and my code as it is now) is against this.
  • Decide if we should extend the email column in #_users to varchar(255) or we should introduce further checking against an email address being longer than 100 chars
  • Modify unit test to allow the single quote and, in case, disallow the grave accent.
  • Decide if we should correctly treat IDNA domains also within html5fallback.js (in case I think I'll need some help...)

I hope all is well...

avatar smz
smz - comment - 24 May 2015

One bit more of information about the JS IDNA issue:

previously the whole address was "Punycoded", so user@李安.cn (user@example.cn) would become (and checked as) xn--user@-9d3hl881a.cn, while the correct IDNA representation is user@xn--49st7y.cn

The mistake not necessarily precluded the possibility to input non-roman email addresses, but the result was anyway wrong and unpredictable.

avatar infograf768
infograf768 - comment - 24 May 2015

@smz
See bestiejs/punycode.js#20
We may have to update punycode.js

avatar infograf768
infograf768 - comment - 24 May 2015

Concerning grave accent please see here:
https://tools.ietf.org/html/rfc3696#section-3

Without quotes, local-parts may consist of any combination of
   alphabetic characters, digits, or any of the special characters

      ! # $ % & ' * + - / = ?  ^ _ ` . { | } ~

avatar smz
smz - comment - 24 May 2015

Ahh!!! Now I see and, in a way, it seems both of us are right:

Grave accent (ASCII 0x60) is indeed valid and... it already is (and always has been) in my (and previous) regexp.

but...

That thing I removed from the regexp (and I think should be removed from test units) is not an ASCII grave accent, but an Unicode "right single quotation mark", which is not ASCII, but U+2019 (hex sequence: E2 80 99)

Try looking at current code with an Hex Editor...

avatar smz
smz - comment - 24 May 2015

Concerning Punycode and "international" content in email headers:

Punycode should only be applied to domain names and never to the local part of en email address. This is a consequence of the IDNA standard, allowing international domain names (the "D" in IDNA is for Domain) without resorting to Unicode in the DNS. Of this I'm totally and positively sure.

Supporting international names in the local part of an email address is a totally different affair and requires a transition to Unicode email headers. There is a reason why in my IDNA examples above I used an ASCII "local part"...

If we want to support international (i.e. Unicode) local parts there is much more we must do at very different levels, starting from being sure we have an Unicode enabled SMTP server under the hood, to completely review our validation rules which, at this time, allows only non Unicode local-parts.

Please have a look at: http://en.wikipedia.org/wiki/International_email

avatar smz
smz - comment - 24 May 2015

@dgt41 Dimitris, does the .gr registry allows domains names in Greek? In case, do you have (or can you create) a mailbox in such a domain (possibly two: one with an ASCII local part and one with a Greek local part)?

Thanks!

avatar dgt41
dgt41 - comment - 24 May 2015

@smz Haven’t tried, TBH, give me 5mins to check!

avatar smz
smz - comment - 24 May 2015

No prob and no hurry! Take your time...

avatar dgt41
dgt41 - comment - 24 May 2015

@smz Damn all my sites use google, zoho and msn for the mail part. I am gonna need some time for this

avatar smz
smz - comment - 24 May 2015

No prob... in case you can, please let me know (direct email, so we don't litter this conversation too much...)

avatar infograf768
infograf768 - comment - 25 May 2015

That thing I removed from the regexp (and I think should be removed from test units) is not an ASCII grave accent, but an Unicode "right single quotation mark", which is not ASCII, but U+2019 (hex sequence: E2 80 99)

Correct.

Concerning our punycode.js
#7005 (comment)

I suggest again to update it and test.

avatar infograf768
infograf768 - comment - 25 May 2015

@dgt41 Dimitris, does the .gr registry allows domains names in Greek? In case, do you have (or can you create) a mailbox in such a domain (possibly two: one with an ASCII local part and one with a Greek local part)?

We do have a Russian test domain (with a joomla site btw)
My experience is that utf8 local parts as are no good.

avatar smz
smz - comment - 25 May 2015

That thing I removed from the regexp (and I think should be removed from test units) is not an ASCII grave accent, but an Unicode "right single quotation mark", which is not ASCII, but U+2019 (hex sequence: E2 80 99)

Correct.

So... how can we have the test unit corrected?

Concerning our punycode.js
#7005 (comment)

I suggest again to update it and test.

I see: I downloaded the latest rev (1.3.2) and now they're doing exactly the same thing I did in validate.js, so there is probably no need for us doing it there. I will test and report.

avatar smz
smz - comment - 25 May 2015

@dgt41, @infograf768
I've registered the Greek domain πανταρει.gr ("panta rei: everything flows"... isn'it it nice? Heraclitus would be happy!) and will contribute it to the community for whatever test will be needed. Punycode domain name is: xn--hxakmqrqkw.gr

As soon as possible I will set up a (Joomla, of course!) site for it and email service (which, for the time being, will not allow UTF8 local parts). I'll keep you informed.

avatar wilsonge
wilsonge - comment - 25 May 2015

I dunno how much it helps but there was a long discussion about dealing with validating emails a long time ago joomla/joomla-platform#1833

avatar smz
smz - comment - 25 May 2015

@wilsonge Thanks, interesting reading! And I like what the framework came to:

return (boolean) filter_var($email, FILTER_VALIDATE_EMAIL);

KISS code, which is normally the best solution...

I'll check if it will pass our test corpus...

I anyway see a (recurring) mistake in that discussion too: IP addresses, when used in an email address domain-part, must be enclosed within square brackets. Once again:

local-part@[192.168.0.1] <-- OK This will be taken as an IP address
local-part@192.168.0.1 <-- KO! (it is subdomain "192.168" of the domain "0" in (non existent) TLD "1")

(Yes, you can have numeric domain names (at least in the .it TLD): I used to have some of those for the telephone company I was working for...)

Do we want to this to pass? Putting an IP address in the domain-part is something normally done by spammers only...

avatar smz
smz - comment - 25 May 2015

@infograf768
You were right: punycode.js v1.3.2 fixes the handling of email addresses: no need to modify validate.js (I will anyway fix all those extra spaces in validate.js...)

Please see my EmailValidation_PunycodeJS_132 branch.
In it validate.js has a console.log() to display the "punycoded" value (it will be removed before committing to joomla-cms, in case)

In that branch I also modified /libraries/cms/html/behavior.php so that it loads punycode.min.js, i.e. the original name of the minified version of punycode.js: I think it is better to keep original file names for external libraries whenever it is possible (think of composer...). Is that OK?

avatar dgt41
dgt41 - comment - 25 May 2015

@smz just update punycode-uncompressed.js to be inline with the other scripts (my take). I guess we can’t take this file out as it will b B/C(?)

avatar smz
smz - comment - 26 May 2015

@dgt41, well... we never loaded any *-uncompressed.js file anywhere...

I think the worst (an unlikely) can happen is that some 3rd party extensions directly loads punycode.js expecting the minified version and it gets the expanded version instead...

At least another JS library in /media/system/js have now the new (and IMHO more logical) naming convention: jquery.Jcrop.min.js

avatar dgt41
dgt41 - comment - 26 May 2015

-uncompressed.js should be loading when in debug mode. But removing it or not is more a PLT decision, so don’t look at me

avatar smz
smz - comment - 26 May 2015

ah, really? -uncompressed.js are automatically loaded while in debug mode? I didn't know that, my bad! In this case I will revert my decision and leave filenames unchanged...

(we will sooner or later have to do something as we go toward a more automatic method of updating external libraries, composer or whatever...)

avatar smz
smz - comment - 26 May 2015

... fixed in my branch, but (as there is no reason to wait) I will now commit it here too..

avatar smz
smz - comment - 26 May 2015

Fixed. Thanks @dgt41!

avatar smz
smz - comment - 26 May 2015

Oh, cap! merge conflicts... why???

avatar dgt41
dgt41 - comment - 26 May 2015

validate.js changed recently, do a rebase

avatar smz
smz - comment - 26 May 2015

... and html5fallback.js as well... :unamused:

avatar smz
smz - comment - 26 May 2015

Actually only html5fallback.js was changed.

I tried a "git rebase --interactive", but I'm now in deep mud: I have a "DETACHED HEAD", I can't switch branch and I'm stuck without any possibility of doing anything.

My editor opened with the following:

pick e4140cb Fixes email validation regular expressions
pick 479ec12 Fixed regexp as PHP string
pick 3a48af0 Forgot to fix it in another place
pick 63b959c This should work!
pick c58bcde ... but this is the good one!
pick 211a568 Allow apostrophe in com_user emails
pick 226e118 New HTML5 compliant regexp
pick e9df3cb Ouch! What a noob's mistake!!
pick b0fbc5d YANM!
pick 4059299 JMailHelper::isEmailAddress()
pick 7b6cf7c Fix for unit test
pick e323b02 I missed a couple of CS glitches...
pick 4e8b731 Address length in JMailHelper::isEmailAddress()
pick 9ee840a It is useless to check domain-part length
pick f31893d Fix IDNA domains vaildation
pick ca861b1 Made validate.js safer
pick 08f527d A minor modification to validate.js
pick 105eb87 CS in validate.js
pick 5ebb8bc Updated punycode.js to rev. 1.3.2

# Rebase 110c7b1..5ebb8bc onto 110c7b1
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

... but I don't have the slightest idea of what I should do with that... :confounded:

avatar smz
smz - comment - 26 May 2015

I managed to abort the rebase, but I now need guidance on how to do it correctly... sorry, guys!

avatar smz smz - change - 26 May 2015
Status Pending Closed
Closed_Date 0000-00-00 00:00:00 2015-05-26 15:33:29
Closed_By smz
avatar smz smz - close - 26 May 2015
avatar smz
smz - comment - 26 May 2015

I closed this as I was unable to resolve conflicts. New PR is #7041

Add a Comment

Login with GitHub to post a comment