User tests: Successful: Unsuccessful:
Pull Request resolves #47447.
This PR fixes duplicate-term indexing collisions in com_finder when different spellings normalize to the same token (example: "Resumé" and "Resume").
Changes included:
ResuméResumeIn some datasets, Finder indexing could hit duplicate-term insertion collisions when multiple terms normalize to the same value, causing indexing failure/errors.
Finder indexing completes successfully even when normalized duplicate terms exist, and all affected articles are indexed correctly.
Please select:
Documentation link for guide.joomla.org:
No documentation changes for guide.joomla.org needed
Pull Request link for manual.joomla.org:
No documentation changes for manual.joomla.org needed
| Status | New | ⇒ | Pending |
| Category | ⇒ | Administration com_finder JavaScript Unit Tests |
Pull Request resolves #47472.
@SRV-KILLER09 Pull requests shall refer to issues, not to other pull requests.
So you should link to issue #47447 .
Or you link to the comment in the other PR, but not using the magic "resolves " keyword, e.g.:
Pull Request for #47472 (comment)
Hii, I’ve updated the PR description to reference the actual issue (#47447) instead of the old PR.
@SRV-KILLER09 Can the other PR be closed? Looking at the way how it is implemented I would count it as a new feature anway, so it would not go into 5.4-dev but into 6.2-dev, and in 6.2-dev it is done with this PR here. So I think the other one can be closed.
@SRV-KILLER09 Can the other PR be closed? Looking at the way how it is implemented I would count it as a new feature anway, so it would not go into 5.4-dev but into 6.2-dev, and in 6.2-dev it is done with this PR here. So I think the other one can be closed.
I’ve closed the old PR since the updated implementation is now in the 6.2-dev. Thankyou!
| Labels |
Added:
Unit/System Tests
PR-6.2-dev
|
||
| Category | Administration com_finder JavaScript Unit Tests | ⇒ | Administration com_categories com_content com_finder Language & Strings JavaScript Unit Tests |
| Labels |
Added:
Language Change
|
||
| Category | Administration com_finder JavaScript Unit Tests com_categories com_content Language & Strings | ⇒ | Administration com_finder JavaScript Unit Tests |
| Labels |
Added:
Feature
Removed: Language Change |
||
I am reworking the failing system test and will push a clean update soon...
Hi @SRV-KILLER09 — thanks for working on this.
I want to share additional findings that may help with this PR. The LEFT JOIN + IS NULL approach works for cross-article duplicates, but there's a second scenario it doesn't cover: intra-article duplicates.
When a word appears in both the title (context 0) and the body (context 1) of the same article, the aggregation loop inserts two rows in tokens_aggregate with the same (term, language) but different term_weight. Since term_weight is in the GROUP BY, both rows survive as distinct groups and the INSERT fails — even though neither term exists yet in finder_terms, so both pass the IS NULL filter.
The minimal fix that covers both cases is to reduce the GROUP BY to just (term, language) and aggregate the rest with MIN():
Verified in production on Joomla 5.4.5 at colon.com.uy. Full repo with analysis: https://github.com/dariofin/joomla-finder-duplicate-fix
Hope this helps with the 6.2 refactor!
@SRV-KILLER09 Pull requests shall refer to issues, not to other pull requests.
So you should link to issue #47447 .
Or you link to the comment in the other PR, but not using the magic "resolves " keyword, e.g.: