User tests: Successful: Unsuccessful:
This fixes regular expressions used for email validation (see: #6982)
Joomla didn't allow the '
char in user email addresses because JMailHelper::isEmailAddress() and JFormRuleEmail::test() used ’
(Unicode "Right single quotation mark") instead of '
(ASCII apostrophe, 0x27) in their regexp.
According to RFC 3696, it is allowed to have apostrophes in email addresses as long as they are ASCII character (0x27) (Irish email example: o'sullivan@example.com
).
You can test this patch by trying to enter valid and/or invalid email addresses in "User Manager" and in "Contacts". In most cases, anyway, you will be actually testing JS validation (PHP validation has been tested by unit test).
Only expected cases accepted by JS and rejected by PHP are those of these kinds:
.user@example.com
user.@example.com
user..name@example.com
Reference material for email addresses validation rules:
Labels |
Added:
?
|
To recap, the outstanding issues/decisions-to-be-taken are:
Modify unit tests to allow the single quote in email addresses and instead disallow the "right single quotation mark" (which is Unicode U+2019 and shouldn't be there) Done!
Decide if we want to fix Done!JFormRuleEmail::test()
to correctly treat email addresses without the domain-part (i.e. check only the local part). By design it should do that, but it doesn't and it didn't in original code. B/C issues if I fix that?
In Fixed!JFormRuleEmail::test()
we make less thorough checks compared to what we do in JMailHelper::isEmailAddress()
: should we fix that?
As in #_users the email column is defined as varchar(100)
, but email addresses can be 254 characters long, we should decide if we want to extend that column to varchar(255)
or we want to introduce further checking against an email address being longer than 100 chars (non standard, but probably more than enough)
Decide if we should correctly treat IDNA domains also in html5fallback.js
, which at this time doesn't. In case I think I'll need some help...
Category | ⇒ | JavaScript Libraries |
Status | New | ⇒ | Pending |
@infograf768, @bakual, @wilsonge Do we want this bug fixed? If we want I think unit tests should be fixed beforehand...
I think the deal is are there any use valid cases of emails that used to pass that will now fail (as per that platform link I shared in the old one). I'm super super nervous with this of invalidating users that were previously valid which is going to cause all kinds of issues.
But to clarify I do want this bug fixed as well - I'm just very nervous
I'm super super nervous with this of invalidating users that were previously valid which is going to cause all kinds of issues.
George, I can understand this, but it is not the usernames, it is their e-mail addresses and we are talking about invalid email addresses that would have never worked, while we prohibit valid email addresses...
Accepting non-RFC compliant e-mail addresses is a bug. Not fixing it means Joomla does not adhere to an internet standard. Fixing it may cause some users issues, but quite frankly, if their data was invalid before and it's invalid after and the only change is now it's rightfully flagged as invalid data, then it's a bug fix. Period.
ok so on that basis is there anything with return (boolean) filter_var($email, FILTER_VALIDATE_EMAIL);
(from joomla/joomla-platform#1833) that block genuinely valid emails?
... and so... ?
Tests need to pass obviously (you can change them with this PR). And some user testing would be helpful as well.
OK, will add test units mods later on today within this PR (confirmed? no separate PR needed?) Thanks!
Yes tests in this PR please :)
OK, I'm on the right way:
o’reilly@there.com
(which has Unicode "Right single quotation mark") was passing because I didn't test regexps with the u
modifier (current code is just an hack that I intend to modify). This is a mistake that was already present and I must check in other places too.
Now, if I only could understand why joe<>bob@bob.come
pass the test...
Aaaaargh! The u
modifier doesn't work for me... Any UTF8 regexp expert around??
Not expert but can you paste here the regex and the failing text?
Here it is:
$regex_local_part = '^[a-zA-Z0-9.!#$%&\'*+\/=?^_{|}~-]+';
if (!preg_match(chr(1) . $regex_local_part . chr(1) . 'u', $local) || substr($local, -1) == '.' || $local[0] == '.' || preg_match('/\.\./', $local))
{
return false;
}
It passes $local = 'o’reilly';
(beware the Unicode "Right single quotation mark" in it...) while it shouldn't
It also passes joe<>bob
while it shouldn't
I think I know what's happening:
The regexp matches the string AFTER the "Right single quotation mark". Same for joe<>bob: it matches the string after the >
.
The regexp misses an "End-of-String" condition:
$regex_local_part = '^[a-zA-Z0-9.!#$%&\'*+\/=?^_{|}~-]+\Z';
Sorry, meant to say that:
The regexp matches the string BEFORE the "Right single quotation mark". Same for joe<>bob: it matches the string before the <
.
OK, that was it!
NOT READY FOR PRIME TIME YET (I have to fix the same in other places too...)
I think this will not make it for 3.4.2, but better for 3.5.0 in good shape than 3.4.2 in bad shape....
Not sure but this might help or confuse
https://tedclancy.wordpress.com/2015/06/03/which-unicode-character-should-represent-the-english-apostrophe-and-why-the-unicode-committee-is-very-wrong/
My takeaway is that there is more than one way to write an apostrophe
On 19 Jun 2015 17:47, "Sergio Manzi" notifications@github.com wrote:
OK, That's was it! [image: ]
NOT READY FOR PRIME TIME YET (I have to fix the same in other places
too...)I think this will not make it for 3.4.2, but better for 3.5.0 in good
shape than 3.4.2 in bad shape....—
Reply to this email directly or view it on GitHub
#7041 (comment).
Interesting article, Brian, on which I totally agree. Last section says it all: "Common bloody sense... apostrophes are not closing quotation marks!"
In our context, anyway, the only allowable "apostrophes" are the good old ugly ASCII apostrophe, 0x27 and (if you want to call it an apostrophe...) the acute accent, 0x60. On this RFC 3696, paragraph 3 is very clear.
Title |
|
Title |
|
Ready for testing! Thanks!
Title |
|
Title |
|
@infograf768 JM, can you please review/test this?
Status | Pending | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2015-07-13 08:10:45 |
Closed_By | ⇒ | smz |
Status | Closed | ⇒ | New |
Closed_Date | 2015-07-13 08:10:43 | ⇒ | |
Closed_By | smz | ⇒ |
Status | New | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2015-07-13 10:41:18 |
Closed_By | ⇒ | smz |
@infograf768, @dgt41, @Denitz
I closed #7005 as I wasn't able to resolve conflicts and this is the new PR handling email addresses validation.
Sorry for the confusion...