User tests: Successful: Unsuccessful:
Pull Request for Issue #12408 .
Strip head first because head may contain script or style blocks.
Copy-paste an entire web page with a head block that contains inline CSS and/or JavaScript into a new article. You might see CSS or JS terms in the autosuggester. If you don't, try another web page until you do. You can try this page: https://www.joomla.org/announcements/release-news/5418-joomla-254-released.html then search for "col2".
Apply this PR.
Copy-paste the same article again so it gets re-indexed. The CSS/JS terms should no longer appear in the autosuggester.
None.
Status | New | ⇒ | Pending |
Labels |
Added:
?
|
Category | ⇒ | Administration Components |
is the one in the info https://www.joomla.org/announcements/release-news/5418-joomla-254-released.html
The problem went deeper than I originally thought.
In order to keep memory use down, the indexer::tokenizeToDb method was chunking the input string before handing over to the parser. The parser was also chunking, which although redundant, was, I thought, at least harmless. However, it turns out that the indexer chunking process was, on certain input strings, placing chunk boundaries inside HTML tags. The HTML parser was then not recognising them as being tags and was generating tokens for HTML attributes which should have been removed.
The fix is to understand that the format-dependent (in this case, HTML) parser must always pre-process the input before chunking can take place.
What I've now done is remove chunking from the indexer, but also simplified the code a little, removing some duplication and reducing indentation by removing some else clauses.
@alikon Please re-test and also make sure that everything that should be indexed is still being indexed. Thank you.
i've tested before the merge of 3.7.x branch to staging
well done chris
I have tested this item
I have tested this item
Thanks
Status | Pending | ⇒ | Ready to Commit |
RTC
Labels |
Removed:
?
|
Category | Administration Components | ⇒ | Administration com_finder Components |
Milestone |
Added: |
@chrisdavenport can you have a look at the conflicts, thanks
You should test again, just in case I screwed something up.
Status | Ready to Commit | ⇒ | Fixed in Code Base |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2016-12-13 12:19:19 |
Closed_By | ⇒ | rdeutz |
i've followed the test instruction but still "col2" is present after patch
i've purged the indexer before to copy again the same page