?
avatar agilare
agilare
13 Oct 2016

Steps to reproduce the issue

  1. enable autosuggets in a smartsearch module
  2. in this module in front-end, in the search input, write some text
  3. if you write text like a CSS class name, this one will be one of the suggestions. in the image below, CSS class names are in red) screen shot 2016-10-13 at 07 25 29

Expected result

  • suggest only real content, not code

Actual result

  • it suggests too much texte of article, including code (html, javascript...)

System information (as much as possible)

  • Joomla 3.6.2
  • all browsers

Additional comments

  • maybe the solution would be in the indexing process, changing parameters in order to ignore code ?

Votes

# of Users Experiencing Issue
1/1
Average Importance Score
4.00

avatar agilare agilare - open - 13 Oct 2016
avatar chrisdavenport
chrisdavenport - comment - 13 Oct 2016

Smart Search does remove CSS and JavaScript before parsing the text. It also ignores anything inside <noscript> tags and anything in the <head> and it strips all HTML tags. You can see the code that does it here:
https://github.com/joomla/joomla-cms/blob/staging/administrator/components/com_finder/helpers/indexer/parser/html.php

There must be something that is preventing that from happening in your case. Can you examine the HTML of the affected content items carefully and make sure that the markup is valid? Invalid markup is likely to confuse the parser and lead to leakage of class names into the index.

avatar mbabker
mbabker - comment - 13 Oct 2016

Actually, I can confirm on www.joomla.org and developer.joomla.org that there are CSS classes in the search index.

avatar chrisdavenport
chrisdavenport - comment - 13 Oct 2016

Not sure if #12411 will fix it. Please test.

avatar brianteeman brianteeman - change - 29 Oct 2016
Status New Confirmed
avatar brianteeman
brianteeman - comment - 19 May 2017

looking at the examples posted by @mbabker it would appear that #12411 did not fix it and that the issue only occurs when you have an empty class.

https://developer.joomla.org/search.html?q=span1

On this url the only css class that is in the index is <div class="span1"> </div> no other css appears to be indexed on that page

avatar mbabker
mbabker - comment - 19 May 2017

Did we need to re-run the index after the 3.7 update to get that change?

avatar brianteeman
brianteeman - comment - 19 May 2017

@mbabker no idea. shouldnt the index be being regularly updated anyway.

If you could do a reindex anyway and see if it is resolved completely or if my finding above about empty class is still valid that would be great.

avatar mbabker
mbabker - comment - 19 May 2017

We don't have it on a cron to do regular updates so it relies on the plugins to update the index as we make content updates on the sites. I'm running an update through the CLI for the dev site now (that one takes some time because of the JoomlaCode archive), I'll check the result when that's finished.

avatar brianteeman
brianteeman - comment - 19 May 2017

thanks

avatar mbabker
mbabker - comment - 19 May 2017

The full reindex has finished on the developer site, span1 is no longer a search term there.

avatar brianteeman
brianteeman - comment - 19 May 2017

Thanks for that. In this case I am going to close this as fixed with #12411

avatar brianteeman brianteeman - change - 19 May 2017
The description was changed
Status Confirmed Closed
Closed_Date 0000-00-00 00:00:00 2017-05-19 15:57:30
Closed_By brianteeman
avatar brianteeman brianteeman - close - 19 May 2017

Add a Comment

Login with GitHub to post a comment