? Success

User tests: Successful: Unsuccessful:

avatar pe7er
pe7er
7 Nov 2015

This PR fixes the Smart Search indexing of articles: currently an article is always saved, even if you save it with status Trashed or Unpublished. With this PR an article will only be indexed by Smart Search when its status is Published.

Testing instructions

Enable Smart Search plugin & Index the content.

Before this PR

Create a new article

Content > Article > [new], add some specific keywords to search for later on,
set Status to Trashed and save.

com_finder_deleted_is_indexed

Search article in Smart Search: Indexed Content

Go to the Smart Search and search for the specific keywords to test that the Trashed article has been indexed as well.

com_finder_deleted_is_indexed-2

After this PR

After this PR only articles with Status Published should be indexed upon saving.
Redo the previous steps with some other specific keywords and test that only articles with state
Published are indexed upon saving.

avatar pe7er pe7er - open - 7 Nov 2015
avatar pe7er pe7er - change - 7 Nov 2015
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 7 Nov 2015
Labels Added: ?
avatar wilsonge
wilsonge - comment - 7 Nov 2015

This looks wrong. The indexer is designed to index all content, regardless of publish state, and deal with tracking that state on its own. What is the aim of this change?

avatar pe7er
pe7er - comment - 8 Nov 2015

IMHO it feels wrong that a component called "Smart Search" uses a "not-so-smart index" procedure ;-)

Indexing content that should not be findable on front-end searches (unpublished & trashed items) sound unneeded. Besides, it costs unnecessary server resources and database space.

The aim of this change is to not do unnecessary things and save server resources (during indexing) and database space (in the #__finder_links_terms0 till #__finder_links_termsf tables that can become very huge).

avatar Fedik
Fedik - comment - 8 Nov 2015

I agree with @wilsonge ...
@pe7er how "Smart Search" will track the state changes after this PR? :wink:

avatar pe7er
pe7er - comment - 8 Nov 2015

Could you please test this PR @Fedik ?

  • Create an article with status = unpublished, it won't be indexed/added to the smart search index.
  • Publish that unpublished article, and it will be added to the Smart Search index.
avatar mbabker
mbabker - comment - 8 Nov 2015

1) Why only com_content.article?
2) This change breaks a lot of workflow at a very bad place in the overall indexing system. It only allows published articles to be processed at all times and does not account for other workflows (switching an article from published to archived will cause the indexer to not run any updates).

If it is your intent to change this behavior, it must be done deeper in the indexer library.

avatar zero-24 zero-24 - change - 9 Nov 2015
Category Search
avatar chrisdavenport
chrisdavenport - comment - 13 Nov 2015

There is also a performance issue that you need to be aware of when changing behaviour like this. If you have a very large article, indexing/de-indexing the article can take many seconds, making the UI unresponsive.

avatar wilsonge
wilsonge - comment - 14 Nov 2015

For all the reasons described above I'm going to close this PR. It's not something the project is interested in at this point.

avatar wilsonge wilsonge - change - 14 Nov 2015
Status Pending Closed
Closed_Date 0000-00-00 00:00:00 2015-11-14 00:09:32
Closed_By wilsonge
avatar wilsonge wilsonge - close - 14 Nov 2015
avatar wilsonge wilsonge - close - 14 Nov 2015
avatar pe7er
pe7er - comment - 8 Dec 2015

Sorry for my delay in response. For all the reasons mentioned above I am ok with closing this PR & leaving it as it is. Thanks @wilsonge @mbabker & @chrisdavenport for evaluating this PR!

avatar chrisdavenport
chrisdavenport - comment - 8 Dec 2015

I wouldn't go so far as to say we're not interested. The intention behind this PR is good and valid. It's just that the solution requires more thought.

One approach might be to make use of the state field in the links table and have that mapped to the state of the content item. Thus changes to content item state could be handled quickly without needing to index/de-index any terms. From memory this might already have been partially addressed in the code and just needs revisiting and properly implementing.

The fly-in-the-ointment is the autocompleter, which needs fast access to the terms index. The operations required to determine if a particular term belongs to at least one published and (ACL) accessible content item are not simple and have significant performance implications for autocompletion, which is why they are not implemented at present.

Add a Comment

Login with GitHub to post a comment