User tests: Successful: Unsuccessful:
This PR fixes the Smart Search indexing of articles: currently an article is always saved, even if you save it with status Trashed or Unpublished. With this PR an article will only be indexed by Smart Search when its status is Published.
Enable Smart Search plugin & Index the content.
Content > Article > [new], add some specific keywords to search for later on,
set Status to Trashed and save.
Go to the Smart Search and search for the specific keywords to test that the Trashed article has been indexed as well.
After this PR only articles with Status Published should be indexed upon saving.
Redo the previous steps with some other specific keywords and test that only articles with state
Published are indexed upon saving.
Status | New | ⇒ | Pending |
Labels |
Added:
?
|
IMHO it feels wrong that a component called "Smart Search" uses a "not-so-smart index" procedure ;-)
Indexing content that should not be findable on front-end searches (unpublished & trashed items) sound unneeded. Besides, it costs unnecessary server resources and database space.
The aim of this change is to not do unnecessary things and save server resources (during indexing) and database space (in the #__finder_links_terms0 till #__finder_links_termsf tables that can become very huge).
1) Why only com_content.article?
2) This change breaks a lot of workflow at a very bad place in the overall indexing system. It only allows published articles to be processed at all times and does not account for other workflows (switching an article from published to archived will cause the indexer to not run any updates).
If it is your intent to change this behavior, it must be done deeper in the indexer library.
Category | ⇒ | Search |
There is also a performance issue that you need to be aware of when changing behaviour like this. If you have a very large article, indexing/de-indexing the article can take many seconds, making the UI unresponsive.
For all the reasons described above I'm going to close this PR. It's not something the project is interested in at this point.
Status | Pending | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2015-11-14 00:09:32 |
Closed_By | ⇒ | wilsonge |
Sorry for my delay in response. For all the reasons mentioned above I am ok with closing this PR & leaving it as it is. Thanks @wilsonge @mbabker & @chrisdavenport for evaluating this PR!
I wouldn't go so far as to say we're not interested. The intention behind this PR is good and valid. It's just that the solution requires more thought.
One approach might be to make use of the state field in the links table and have that mapped to the state of the content item. Thus changes to content item state could be handled quickly without needing to index/de-index any terms. From memory this might already have been partially addressed in the code and just needs revisiting and properly implementing.
The fly-in-the-ointment is the autocompleter, which needs fast access to the terms index. The operations required to determine if a particular term belongs to at least one published and (ACL) accessible content item are not simple and have significant performance implications for autocompletion, which is why they are not implemented at present.
This looks wrong. The indexer is designed to index all content, regardless of publish state, and deal with tracking that state on its own. What is the aim of this change?