User tests: Successful: Unsuccessful:
This fixes regular expressions used for email validation (see: #6982)
Joomla doesn't allow '
char in user email, JFormRuleEmail pattern has ` but not '
It is allowed to have an apostrophe ' in the e-mail address by the RF5322 standard as long as it is an ASCII character.
Irish email as example: o'sullivan@example.com
Category | ⇒ | JavaScript Libraries |
Status | New | ⇒ | Pending |
Labels |
Added:
?
|
oops... I can't understand what Travis is complaining for...
nope... it doesn't work: problem in validate.js. Working on it, please stand-by... sorry!
Ready for testing. Sorry for the initial (hopfully...) f##k-up.
I tried to add a new user in backend and frontend. Tested emails
hello're@example.org
o'sullivan@example.com
Result is
Error
Save failed with the following error: Please enter a valid email address
Warning
Registration failed: Please enter a valid email address.
I think it's the PHP validation because page reloads after form submit and then message occurs.
btw... where are you testing? I tested in com_contacts...
silly me: you already stated where you are testing! let me try...
You're right: it doesn' work with com_users (it works with com_contacts, anyway...) I have to find out the reason, but now I'm leaving for some hours. Please stand-by...
even in com_contact JS validation is OK: o'sullivan is accepted (black field), o"sullivan is rejected (red field), but then it doesn't save... must be something within the component itself... will check later... sorry.
I'll send you a PR to branch .smz:EmailValidation.
There's another validation in libraries/joomla/mail/helper.php::isEmailAddress line 130
$allowed = 'a-zA-Z0-9.!#$%&’*+\/=?^_`{|}~-';
called by libraries/joomla/table/user.php::check line 198
Changed line to
$allowed = 'a-zA-Z0-9.!#$%&\'*+\/=?^_`{|}~-';
and all works now.
I see BC issue:
Initial 'grave accent' char ` (0x60) was replaced with 'single quote' ' (0x27), so all current emails with 0x60 will be invalidated and users won't be able to save profiles.
We should add new allowed char (single quote) to patterns but not replace any existing chars.
@bertmert thanks for your PR for fixing validation in com_user! Merged.
Actually I think JMailHelper::isEmailAddress()
needs more fixes as it allows invalid characters in the domain part of email addresses. I'll look into that and see if it is worth fixing...
(oops... while I'm writing this, I see Travis failed after merging your PR... will look into that...)
@Denitz IMHO this is not breaking B/C: this is fixing a bug introduced by someone who used the wrong apostrophe (my bet is that he/she was using a Mac while editing...). Try register an email address containing the grave accent char ’
with Gmail, outlook.com, Yahoo or any other major email provider and see if you can...
Travis is failing because of this:
There were 2 failures:
1) JMailHelperTest::testIsEmailAddress with data set #19 ('o\'reilly@there.com', false)
Failed asserting that true matches expected false.
/home/travis/build/joomla/joomla-cms/tests/unit/suites/libraries/joomla/mail/JMailHelperTest.php:301
phar:///home/travis/.phpenv/versions/5.3.29/bin/phpunit/phpunit/TextUI/Command.php:152
phar:///home/travis/.phpenv/versions/5.3.29/bin/phpunit/phpunit/TextUI/Command.php:104
2) JMailHelperTest::testIsEmailAddress with data set #20 ('o’reilly@there.com', true)
Failed asserting that false matches expected true.
/home/travis/build/joomla/joomla-cms/tests/unit/suites/libraries/joomla/mail/JMailHelperTest.php:301
phar:///home/travis/.phpenv/versions/5.3.29/bin/phpunit/phpunit/TextUI/Command.php:152
phar:///home/travis/.phpenv/versions/5.3.29/bin/phpunit/phpunit/TextUI/Command.php:104
IMHO it is a mistake in test unit (same mistake we had in code).
I'm calling PLT to see what we should do...
Some further considerations (food for thought...):
More information:
The HTML5 standard defines the rule for validating the <input type=email>
element with the following PCRE regexp:
/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
According to the standard itself, the above is not a strict implementation of the relevant RFC5322:
This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.
I think the more sensible solution for us is to follow the HTML5 standard and implemant the same validation rules. This is, I guess, particullay true for media/system/js/html5fallback.js
.
A further consideration is that we should be consistent with that (or any other) validation rule: if we use a rule in our JS code, we cannot use a different rule in our PHP code.
Due to the above, I will soon submit to this PR a commit in line with the above considerations (it will fail with Travis for the same reasons it is failing right now).
Also: in libraries/joomla/form/rule/email.php
in theory there is provision for validating only the local-part of an email address (i.e: without the domain part), but this wasn't actually implemented. I'm unsure what to do with that... In the meanwhile I will do what was previously done, that is always check for a full email address.
If you think there is anything wrong with the above, please let me know...
if we use a rule in our JS code, we cannot use a different rule in our PHP code
I don’t think I agree with that. JS validation is to check (coarse) that an email address is provided and PHP should make sure that this address actually exists etc
Dimitris, our PHP code has never checked that an address actually exist (not even the domain): it has always done a formal check of the email address, the as JS does.
Sergio, I just stated what my personal preference is here. It’s not something that should prevent you from doing you think is best.
But here is some infos that support this:
For js regex whatever you will come up to I don’t think its gonna be bulletproof https://fightingforalostcause.net/content/misc/2006/compare-email-regex.php
For php this is easier: http://isemail.info
@test works for o’sulivan@yahoo.gr
I know, Dimitris, I know! I didn't took it as a "personal thing" at all!
That regexp is only what HTML5 does and it is the same as our previous one (with the single-quote bug corrected) as far as regards the local-part, and a bit more sophisticated in the domain part.
That "bit more" of sophistication will also allow us to simplify a lot isEmailAddress
in libraries/joomla/mail/helper.php (which right now validate the domain-part piece by piece)
Now that I fixed my noob's mistake (what a shame!), the behavior is as before: it works, but it fails Travis unit tests (which IMHO are wrong).
Next commit will be for JMailHelper::isEmailAddress()
in libraries/joomla/mail/helper.php
I decided to keep it separate in case there is any concern about it: it will be easily revertable.
Grave accent IS authorized in email addresses.
Sorry, JM, but I don't think it is. Beside the HTML5 standard that I already linked, please have a look at http://en.wikipedia.org/wiki/Email_address#Local_part
Anyway, if there is a consensus we should allow it because we previously did, I'll correct the regexp. This will not be consistent with what the phpmailer library (that we use) does, so there may be situations where such an address will not be deliverable by phpmailer.
The unit test should anyway be corrected to allow the single quote...
(There is no way a coder on Macintosh can confuse grave accent with single quote)
I think (but I'm unsure) I've seen that happening not because of the keyboard but because of automatic translation by some Macintosh program (Safari?)
@infograf768 Sergio is right that macs are messing around with the quoting. take a look at use smart quotes and dashes in control panel, it’s also the default behavior and most people don’t even know about it!
Doubts, damn doubts...
RFC 3969, which is more readable than RFC 5321 states the following:
Without quotes, local-parts may consist of any combination of
alphabetic characters, digits, or any of the special characters! # $ % & ' * + - / = ? ^ _ ` . { | } ~
and thus grave accent (ASCII 0x60) seems to be excluded, but then it continues on with:
period (".") may also appear, but may not be used to start or end the
local part, nor may two or more consecutive periods appear. Stated
differently, any ASCII graphic (printing) character other than the
at-sign ("@"), backslash, double quote, comma, or square brackets may
appear without quoting. If any of that list of excluded characters
are to appear, they must be quoted.
... and this seems to include (ASCII 0x60)
... but it seems that it is the descriptive part to be in error. E.g., the invalid column :
and semicolumn ;
characters and are not cited inte "other than" part of the description.
I introduced a test for the total address length in JMailHelper::isEmailAddress()
(before it wasn't tested) and corrected the maximum domain-part length.
But we have a problem: in #_users the email column is defined as varchar(100)
while to correctly store every valid email address it should be varchar(255)
Now we have two options:
varchar(255)
(actually 254 will suffice)Ouch! I just realized I'm doing Punycode conversion for IDNA domains wrong: I'm converting the domain-part as a whole, while it should be converted part by part (label by label, techically) as it was already done in JMailHelper::isEmailAddress()
But JS validation was already making my same mistake, so i think it was impossible for our russian/chinese/japanese/etc.. friends to introduce IDNA addresses: I'd really like to hear from them about this possible issue...
serendipity, serendipity...
I've now fixed the IDNA domains validation both in validate.js and in JMailHelper::isEmailAddress()
.
Domains with non-roman characters that where previously refused by the JS validation are now accepted. I don't know if there was an issue already open for this: in case I think it can be closed.
I've been unable to do the same IDNA treatement in html5fallback.js. I'm also unsure when this is used and if it is important to fix it too.
To sum it up, the outstanding issues/decisions-to-be-taken are:
I hope all is well...
One bit more of information about the JS IDNA issue:
previously the whole address was "Punycoded", so user@李安.cn
(user@example.cn) would become (and checked as) xn--user@-9d3hl881a.cn
, while the correct IDNA representation is user@xn--49st7y.cn
The mistake not necessarily precluded the possibility to input non-roman email addresses, but the result was anyway wrong and unpredictable.
@smz
See bestiejs/punycode.js#20
We may have to update punycode.js
Concerning grave accent please see here:
https://tools.ietf.org/html/rfc3696#section-3
Without quotes, local-parts may consist of any combination of
alphabetic characters, digits, or any of the special characters
! # $ % & ' * + - / = ? ^ _ ` . { | } ~
Ahh!!! Now I see and, in a way, it seems both of us are right:
Grave accent (ASCII 0x60) is indeed valid and... it already is (and always has been) in my (and previous) regexp.
but...
That thing I removed from the regexp (and I think should be removed from test units) is not an ASCII grave accent, but an Unicode "right single quotation mark", which is not ASCII, but U+2019 (hex sequence: E2 80 99)
Try looking at current code with an Hex Editor...
Concerning Punycode and "international" content in email headers:
Punycode should only be applied to domain names and never to the local part of en email address. This is a consequence of the IDNA standard, allowing international domain names (the "D" in IDNA is for Domain) without resorting to Unicode in the DNS. Of this I'm totally and positively sure.
Supporting international names in the local part of an email address is a totally different affair and requires a transition to Unicode email headers. There is a reason why in my IDNA examples above I used an ASCII "local part"...
If we want to support international (i.e. Unicode) local parts there is much more we must do at very different levels, starting from being sure we have an Unicode enabled SMTP server under the hood, to completely review our validation rules which, at this time, allows only non Unicode local-parts.
Please have a look at: http://en.wikipedia.org/wiki/International_email
No prob and no hurry! Take your time...
No prob... in case you can, please let me know (direct email, so we don't litter this conversation too much...)
That thing I removed from the regexp (and I think should be removed from test units) is not an ASCII grave accent, but an Unicode "right single quotation mark", which is not ASCII, but U+2019 (hex sequence: E2 80 99)
Correct.
Concerning our punycode.js
#7005 (comment)
I suggest again to update it and test.
@dgt41 Dimitris, does the .gr registry allows domains names in Greek? In case, do you have (or can you create) a mailbox in such a domain (possibly two: one with an ASCII local part and one with a Greek local part)?
We do have a Russian test domain (with a joomla site btw)
My experience is that utf8 local parts as are no good.
That thing I removed from the regexp (and I think should be removed from test units) is not an ASCII grave accent, but an Unicode "right single quotation mark", which is not ASCII, but U+2019 (hex sequence: E2 80 99)
Correct.
So... how can we have the test unit corrected?
Concerning our punycode.js
#7005 (comment)I suggest again to update it and test.
I see: I downloaded the latest rev (1.3.2) and now they're doing exactly the same thing I did in validate.js, so there is probably no need for us doing it there. I will test and report.
@dgt41, @infograf768
I've registered the Greek domain πανταρει.gr ("panta rei: everything flows"... isn'it it nice? Heraclitus would be happy!) and will contribute it to the community for whatever test will be needed. Punycode domain name is: xn--hxakmqrqkw.gr
As soon as possible I will set up a (Joomla, of course!) site for it and email service (which, for the time being, will not allow UTF8 local parts). I'll keep you informed.
I dunno how much it helps but there was a long discussion about dealing with validating emails a long time ago joomla/joomla-platform#1833
@wilsonge Thanks, interesting reading! And I like what the framework came to:
return (boolean) filter_var($email, FILTER_VALIDATE_EMAIL);
KISS code, which is normally the best solution...
I'll check if it will pass our test corpus...
I anyway see a (recurring) mistake in that discussion too: IP addresses, when used in an email address domain-part, must be enclosed within square brackets. Once again:
local-part@[192.168.0.1] <-- OK This will be taken as an IP address
local-part@192.168.0.1 <-- KO! (it is subdomain "192.168" of the domain "0" in (non existent) TLD "1")
(Yes, you can have numeric domain names (at least in the .it TLD): I used to have some of those for the telephone company I was working for...)
Do we want to this to pass? Putting an IP address in the domain-part is something normally done by spammers only...
@infograf768
You were right: punycode.js v1.3.2 fixes the handling of email addresses: no need to modify validate.js (I will anyway fix all those extra spaces in validate.js...)
Please see my EmailValidation_PunycodeJS_132 branch.
In it validate.js has a console.log() to display the "punycoded" value (it will be removed before committing to joomla-cms, in case)
In that branch I also modified /libraries/cms/html/behavior.php
so that it loads punycode.min.js, i.e. the original name of the minified version of punycode.js: I think it is better to keep original file names for external libraries whenever it is possible (think of composer...). Is that OK?
@dgt41, well... we never loaded any *-uncompressed.js file anywhere...
I think the worst (an unlikely) can happen is that some 3rd party extensions directly loads punycode.js expecting the minified version and it gets the expanded version instead...
At least another JS library in /media/system/js have now the new (and IMHO more logical) naming convention: jquery.Jcrop.min.js
-uncompressed.js should be loading when in debug mode. But removing it or not is more a PLT decision, so don’t look at me
ah, really? -uncompressed.js are automatically loaded while in debug mode? I didn't know that, my bad! In this case I will revert my decision and leave filenames unchanged...
(we will sooner or later have to do something as we go toward a more automatic method of updating external libraries, composer or whatever...)
... fixed in my branch, but (as there is no reason to wait) I will now commit it here too..
Oh, cap! merge conflicts... why???
validate.js changed recently, do a rebase
... and html5fallback.js as well...
Actually only html5fallback.js was changed.
I tried a "git rebase --interactive", but I'm now in deep mud: I have a "DETACHED HEAD", I can't switch branch and I'm stuck without any possibility of doing anything.
My editor opened with the following:
pick e4140cb Fixes email validation regular expressions
pick 479ec12 Fixed regexp as PHP string
pick 3a48af0 Forgot to fix it in another place
pick 63b959c This should work!
pick c58bcde ... but this is the good one!
pick 211a568 Allow apostrophe in com_user emails
pick 226e118 New HTML5 compliant regexp
pick e9df3cb Ouch! What a noob's mistake!!
pick b0fbc5d YANM!
pick 4059299 JMailHelper::isEmailAddress()
pick 7b6cf7c Fix for unit test
pick e323b02 I missed a couple of CS glitches...
pick 4e8b731 Address length in JMailHelper::isEmailAddress()
pick 9ee840a It is useless to check domain-part length
pick f31893d Fix IDNA domains vaildation
pick ca861b1 Made validate.js safer
pick 08f527d A minor modification to validate.js
pick 105eb87 CS in validate.js
pick 5ebb8bc Updated punycode.js to rev. 1.3.2
# Rebase 110c7b1..5ebb8bc onto 110c7b1
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out
... but I don't have the slightest idea of what I should do with that...
I managed to abort the rebase, but I now need guidance on how to do it correctly... sorry, guys!
Status | Pending | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2015-05-26 15:33:29 |
Closed_By | ⇒ | smz |
@zero-24 (I suppose...) Thanks for fixing my initial comment! I anyway suggest to use example.com as an example...