User tests: Successful: Unsuccessful:
Pull Request for Issue #40543.
The parsing and processing code so far used non-utf8-safe functions like substr(), strrpos(), etc. This resulted in broken parsing in some cases and thus the behavior of #40543.
Please copy the text from #40543 into an article and replace the p-tags between the paragraphs with br-tags instead. Save the content. If you look at the entry in #__finder_links for that content item, you will see the content in the description column and some words have been pushed together, removing the space between them.
No space between some words.
All words are seperated with one space.
Please select:
Documentation link for docs.joomla.org:
No documentation changes for docs.joomla.org needed
Pull Request link for manual.joomla.org:
No documentation changes for manual.joomla.org needed
Category | ⇒ | Administration com_finder |
Status | New | ⇒ | Pending |
Labels |
Added:
PR-5.0-dev
|
If that is the case (I'm not saying that it is) we already have that problem and this PR wouldn't exactly change that situation. But I think that it is not a problem, because the code looks for the last space in the read block and would load any missing bytes after that. More interesting in my opinion would be to raise the amount of read data. Right now its 2kb, but I think we should raise that to at least 8kb, more likely something along 20kb...
If that is the case (I'm not saying that it is) we already have that problem and this PR wouldn't exactly change that situation. But I think that it is not a problem, because the code looks for the last space in the read block and would load any missing bytes after that. More interesting in my opinion would be to raise the amount of read data. Right now its 2kb, but I think we should raise that to at least 8kb, more likely something along 20kb...
maybe you are right hitting a character code 20 in an utf string at that position might be unlikely.
Increasing the length could help too, don't know why the limit is only 2kb wouled expect at 4k
I will create a seperate PR to increase that parsing limit to a higher number. I would use 8kb for the time being.
thanks
we read 2048 bytes in per fread, couldn't this break utf8 encoded characters?