I'd like to work on an improvement to com_finder, so that we can choose what should be the minimum character number of a word, so that it would be indexed. At the moment we index everything, like "I" or "is", which doesn't seem like an optimal solution. Unfortunately, after digging a little bit, I realized that com_finder is quite complicate, so I was wondering does anyone have a deeper knowledge of it, and could point me where to look, and would it be too complicated to add such parameter?
In installation I have (not in english language), "I" seems to be indexed, as autocomplete returns result starting with it. Speaking of that, different languages have different common words, I don't know if there is an option to have them per language?
This doesn't seem like something which can be accomplised in time for 3.3, which means it stays until 4.0, I guess, at least not by me as my time is limited and this is not critical for my own needs (how else to improve the system, but when you fix things you need :P). I will get back to it at some later point, in the meantime it's good to have it on the list here.
In a default Joomla installation there are only English common words. You can add your own (English or non-English) by simply adding rows to the jos_finder_terms_common table. Unfortunately, there is no user interface available to manage entries in that table, nor is there any way to make the common words part of a language pack so they can be installed automatically. Both of those would be very useful enhancements in my opinion.
It's too late for 3.3, but you won't need to wait for 4.0. Standby for some news on that score.
@chrisdavenport is the man who knows most things about that component :)
Thanks for the tip Chris, that's helpful, especially since it's upgrade safe :-) There's definitely no "I" in that table (or 'he' and similar), but since I am not a native speaker, I would rather not contribute to that section, I don't know enough to decide whether something should be considered a stop word or not.
If you are hinting that a change of the development strategy is on the way, so that we won't be feature locked in our shiniest and best for a couple of years, I am looking forward for that announcement ;)
Status | New | ⇒ | Confirmed |
Hi @chivitli ,
As Chris hinted, we changed our development strategy, so now you can propose code for 3.5, 3.6, 3.7, and so on. You can see the changes at http://developer.joomla.org/news/586-joomla-development-strategy.html
Please give us an update if you're still interested in working on this item. If so, it could be added to 3.5 or later, depending on when the code is done, reviewed, and committed.
Thanks for your time!
Category | ⇒ | Search |
Labels |
Added:
?
|
Status | Confirmed | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2015-05-05 13:33:19 |
Closed_By | ⇒ | roland-d |
That sounds like it could be a useful improvement, although short words tend to be quite common and there is a common words exclusion table. I would have thought that "I" and "is" would be in that table, but if not perhaps they should be added.
Yes, Smart Search (finder) is pretty complicated, but it is well worth exploring and studying and there is certainly plenty of room for improvements to be made. You might like to take a look at the Search Working Group page, http://docs.joomla.org/Search_Working_Group where you will find some of the ideas and a not-quite-up-to-date list of outstanding issues that need fixing and/or testing. Looking into the outstanding bugs would be a great way to start learning your way around it before taking the plunge and adding a new feature.
There are certainly people around who have a good knowledge of how it works, so please feel free to ask questions on the dev-cms mailing list.