Go to database fix page, with enabled redirects and have data in there
Fix the new database collation
Duplicate entry 'http://www.kunena.org/Array/169-K-16-and-K-17-Support-Archive/11' for key 'idx_link_old' SQL=ALTER TABLE #__redirect_links ADD UNIQUE idx_link_old ( old_url (100));
on the fix with the message:
•The Joomla! Core database tables have not been converted yet to UTF-8 Multibyte (utf8mb4).
J3.5.0-Beta3
richard67
Ah, no, sorry! Is not an old issue and not data. The index on this column was in fact introduced by the utf8mb4 conversion, I remember now. #9221
@wilsonge I know you discussed that once with the guys, and maybe you can ask Eli for advice, and so I ping you.
Old max. lenght for indexes with Innodb storage engine was 255 characters, with utf8mb4 it can be only 191.
With the changes for utf8mb4 we enlarged thes column to 400 but limited the lenght which goes into the index to 100.
We did so for the alias columns (where it makes sense) and for the old_url column of the redirect links, too.
What could we do to solve this?
We could increase this from 100 to 191 for the redirect links old URLs, but then this issue can still happen, not a good idea.
Or we change the lenght of the column itself to 191, too, so no duplicate key can occur anymore due to this "truncation" of what goes into the index, but max. lenght would be 191 then. Better but not nice because so short URLs then.
And if we limit the lenght to 191 this is a kind of BC break, and worth a note in the release notes.
Or maybe it's enough to change the storage engine for this table to MyIsam?
That would be the easiest thing if this helps, but is maybe not supported on every database?
@wilsonge As URLs can become arbitrary long (but recommended to stay below 2048 because this is the limit of the sitemap protocol), we were limited before with the 255 characters also too much.
Idea: A way out would be either to have no unique key (bad), or to use an md5 or sha checksum or digest as the unique key to make sure the URLs are unique.
Would this be a way?
Using MyISAM as storage engine would not help much, would be a few characters more only in utf8mb4.
I'll chat to Eli - but as a starter for 10 lets increase the index back to its new max (191)
FYI: I tested again a 3.4.8 updated to staging and using FIX (had to do it twice as reported before) and did not have the issue.
One of my redirect urls in utf8 was:
http://localhost:8888/testwindows/trunkgitnew/ta/பன்-மொழி-படிப்படியாக/136-ஒரே-உள்ளடக்கத்தைப்-பல்வேறு-மொழிகளின்-இடைமுகங்களில்-காட்சிப்படுத்தவும்.html
148 characters, and all went fine.
If that url had not been converted in core, it would have been
http://localhost:8888/testwindows/trunkgitnew/ta/%E0%AE%AA%E0%AE%A9%E0%AF%8D-%E0%AE%AE%E0%AF%8A%E0%AE%B4%E0%AE%BF-%E0%AE%AA%E0%AE%9F%E0%AE%BF%E0%AE%AA%E0%AF%8D%E0%AE%AA%E0%AE%9F%E0%AE%BF%E0%AE%AF%E0%AE%BE%E0%AE%95/136-%E0%AE%92%E0%AE%B0%E0%AF%87-%E0%AE%89%E0%AE%B3%E0%AF%8D%E0%AE%B3%E0%AE%9F%E0%AE%95%E0%AF%8D%E0%AE%95%E0%AE%A4%E0%AF%8D%E0%AE%A4%E0%AF%88%E0%AE%AA%E0%AF%8D-%E0%AE%AA%E0%AE%B2%E0%AF%8D%E0%AE%B5%E0%AF%87%E0%AE%B1%E0%AF%81-%E0%AE%AE%E0%AF%8A%E0%AE%B4%E0%AE%BF%E0%AE%95%E0%AE%B3%E0%AE%BF%E0%AE%A9%E0%AF%8D-%E0%AE%87%E0%AE%9F%E0%AF%88%E0%AE%AE%E0%AF%81%E0%AE%95%E0%AE%99%E0%AF%8D%E0%AE%95%E0%AE%B3%E0%AE%BF%E0%AE%B2%E0%AF%8D-%E0%AE%95%E0%AE%BE%E0%AE%9F%E0%AF%8D%E0%AE%9A%E0%AE%BF%E0%AE%AA%E0%AF%8D%E0%AE%AA%E0%AE%9F%E0%AF%81%E0%AE%A4%E0%AF%8D%E0%AE%A4%E0%AE%B5%E0%AF%81%E0%AE%AE%E0%AF%8D.html
i.e. 804 characters
oops, that was the new url column...
As JM reported, I updated a 3.4.8 install to beta 3 had to apply "fix" twice.
This message appeared after the first fix.
Table 'sfc3_user_profiles' does not have column 'profile_value' with type 'TEXT'. (From file 3.3.4-2014)
No errors after second fix applied.
Steve
@Stevec4 @infograf768 and on c.c @wilsonge The issue that the
Table 'sfc3_user_profiles' does not have column 'profile_value' with type 'TEXT'. (From file 3.3.4-2014)
message being shown only when something else was fixed before I had also before the changes for utf8mb4, e.g. when I tested 3.5.0. Beta 1 or even when I updated 3.4.7 to 3.4.8 and before, too.
It is related to how the database schema manager processes the update files and statements from them.
If there is for example an sql file adding a new column to a table or modifying a column's data type, and then later comes for example another sql where this column is dropped, then a first run of the database schema manager will not find any problems after having processed all the sql update files in the chronological order (given by the names which sort in the right way).
But on a second run, the 1st sql is checked again and now complains because it expects the column as defined in this sql, not knowing about the
later drop, and so this is shown as a database problem to be fixed and will be fixed then.
This is why I wrote in the testing instructions of my PR for the utf8mb4 conversion that this might happen, that you use the fix button and then see some new pronlems so you have to use the button twice or even more times.
Because now the utf8mb4 conversion forces something to be fixed, this problems becomes visible for all and not only for some of the users.
We (you if you want, or I will do later) can open a new issue for that, but please keep it separate from the utf8mb4 related issues.
Richard was there to be a message for the user to know about applying fix to the database? I didn't notice one. Just curious
Steve
Hmm, I just see my previous comment is not 100% correct, there is only 1 update file modifying this column, so it might be slightly different.
But I am 100% sure I saw this already in past, before Beta 2, maybe with Beta 1 or even when updating 3.4.7. to 3.4.8.
Update 1: Meanwhile I have seen that when updasting 3.4.8 to Beta 3, the column "profile_value" of the "#__user_profiles" table has data type "TEXT", as it should be.
After the first use of the "Fix" button the data type is then "MEDIUMTEXT". I will check if this comes from the utf8mb4 comversion or what else causes this data type changem but it is not another, later update sql as I thought in my 2nd last comment.
This causes the db schema manager then to see thas it has to be changed to "TEXT" in the second run.
It seems to be an implicit conversion done by mysql, there is not any update sql doing this.
I'll check now if this is caused by statements from the utf8(mb4) conversion and provide an "Update 2" comment here.
@Stevec4 No, there was not planned to have any special message or so. It was not me who planned the utf8mb4 changes, I only fixed them.
I mean in the admin page to alert the user to apply fix
Steve
On Mon, Feb 29, 2016 at 9:53 AM, Richard Fath notifications@github.com
wrote:
@Stevec4 https://github.com/Stevec4 No, there was not planned to have
any special message or so. It was not me who planned the utf8mb4 changes, Ionly fixed them.
This comment was created with the J!Tracker Application
https://github.com/joomla/jissues at issues.joomla.org/joomla-cms/9247
https://issues.joomla.org/tracker/joomla-cms/9247.—
Reply to this email directly or view it on GitHub
#9247 (comment).
Update 2: @wilsonge I have found the reason.
The statement
ALTER TABLE
#__user_profiles
CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
performed on utf8(mb4) conversion changes the data type of the "profile_value" column from "TEXT" to "MEDIUMTEXT".
I have to check if this happens also for other columns of other tables where nothing is reported because nothing in any update sql and so nothing checked, and what is the best way to handle that, and make a PR then.
Update 3: @wilsonge Seems to be a MySQL bug: https://bugs.mysql.com/bug.php?id=31291, but we can workaround that by converting them back at the end of the utf8(mb4) conversion.
Am just investigating details and will make a PR later today.
@Stevec4 I understood what you mean.
@wilsonge Where can we discuss possible solutions which would require some decision form a PLT member (e.g. you) or discussion with Eli? I do not wanna hijack this issue for long discussions. You can email me to 'richard.fath' . 'at character' . 't-online.de'.
@810 @infograf768 @Stevec4 I opened a new issue #9258 with a detailled analysis.
@wilsonge George, can you discuss that issue #9258 with everybody necessary, e.g. Eli @aschkenasy ?
Yes we can :) I have a skype text conversation I can add you into if you ping it to my email?
If I had your email ...
Yah that's where I was about to email you and then got distracted by work stuff and lost my track. Apologies! Adding you now!
Labels |
Added:
?
|
Status | New | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2016-03-01 00:35:03 |
Closed_By | ⇒ | wilsonge |
OK talked this over with Eli and we are going to remove the unique constraint here - it's not the index length that is the problem - but that the data truncation from 255 to 191 characters causes two url's over 191 characters to be the same.
Separately to this @richard67 is going to increase the length of the old url field I believe so it can take a larger size of urls (resurrecting #4781)
Fixed it - thankyou!
@wilsonge The enlargement of the links (all old url, new url and referrer) I will do tomorrow.
I will do it for 2048 characters, not for 2083 chars as it was in #4781.
2083 chars was the limit of IE8 or so, other browsers have much higher limits, we cannot follow them all.
Such long URLs are normally generated on the fly and are dynamic.
The sitemap protocol limits to 2048.
We should keep this limit because we can assume people wanna have their URL to be able in a sitemap and so not will exceed this limit.
Let me know if you disagree.
i think there is no limit for a uri lenght in the http rfc. it all comes down to what the app supports.
But there is a limit in sitemap protocol of 2048, so i think it's a good number.
fix works, but when click on fix, the page keeps spinning, and will not stop
@andrepereiradasilva Is exactly how you say
@810 New PR #9269 - please test.
The problem is that since the change to UTF8MB4, the index of the "old_url" column is limited to the first 100 chacters of the URL, and so this duplicate key problem comes.
@wilsonge What shall we do? 100 characters is a bit short for URLs today. Sorry I saw it when working on the UTF8MB4 thing but not further thought about it.
The other columns where we did this were all "alias" columns, for those 100 characters are ok, so we have this problem only for this 1 column which deals with URLs.