User tests: Successful: Unsuccessful:
Pull Request for Issue # .
For MySQL based sites with a large number of articles (more than a few hundred) that have custom fields enabled the com_content search plugin ends up being so slow that it can cripple a site completely. For a test dataset (details below) of 3,900 articles with 2 custom fields each the database query for a search took between 700,000 and 1,300,000 milliseconds to return a set of search results (on an i7-6700k 4Ghz 8 core processor with 32GB or ram) . In addition the search results for matching all on a multi-select field incorrectly returns zero results.
Postgres performs a LOT better (query times in the region of 400 milliseconds) but the changes described below still cut the query times by 50%.
The plugin has been re-written and rather than joining all the custom fields rows and searching for custom fields at the same time as all the other content fields we now search custom fields and incorporate the results into the main query. In Postgres this can be done as a subquery but MySQL treats this independent subquery as a 'dependent subquery' with punishing performance implications. This may explain in part why MySQL performs so badly with the existing code.
This test requires a large dataset of articles/custom fields to test. It can be generated using com_overload ( Copyright (C) 2011 Nicholas K. Dionysopoulos) - an updated version is attached below that will work in Joomla 3.8 and that also creates 2 custom fields and populates these with the articles.
You should install this component and view it in the backend. A decent dataset to test with would use values of 3 categories, 3 levels, 100 articles -> this will give 3900 articles (WARNING this will take several minutes to generate the data)
This adds a text field with random values of "one" to "seven" as text and a multi-select list field with 2 random values from opt1, opt2. opt3, opt4 and op5.
Then use the search module/com_search in the frontend and search for
"seven" or "opt1" or "opt1 opt2" etc.
1; The correct set of results within a couple of seconds max.
Go and make a cup of coffee and then drink it, call your friends, have another cup of coffee - depending on your processor you are likely to be waiting 20 minutes for the results to return.
The search for 'opt1 opt2' and match all returns NO values which is NOT correct since there are several hundred matches
If you implement the patch you will get a VERY FAST result for 1. and the correct results for 2.
I have tested the changes on Postgres and it supports a subquery rather than having a separate list of ids from a different query. I have not been able to test this on SQL server.
None
Status | New | ⇒ | Pending |
Category | ⇒ | Front End Plugins |
Labels |
Added:
?
|
Agree entirely about the article ids and fetching the custom fields in one go. We need to do the same elsewhere in custom fields too. I'll look at the code formatting - but my IDE is setup to match Joomla's standards already
I'll look at the code formatting - but my IDE is setup to match Joomla's standards already
There's an extra indent on the method's code. Take that out and it should be mostly OK at a cursory glance.
Thanks @mbabker - not sure where the extra tab came from. That had tidied up the diff a lot.
Do we really care about aligning assignment operators?
@nibra I'll do a new PR later to implement the idea of picking up all the field values in one pass. We need to do the same elsewhere in the core (in my opinion too) instead of duplicated or repetitive queries.
Do we really care about aligning assignment operators?
it does make code more readable
i've made a quick test on mysql and after apply pr no need to
make a cup of coffee and then drink it, call your friends, have another cup of coffee
to get results
i'll make some test on postgresql too
I tested you changes at a real project: 8 Fields per article within 6000 articles in 4 different languages.
It seems to work .-)
Thanks Angie
One more idea, maybe we should use a param; search in fields as well, so it is more flexible
Thanks @GeraintEdwards for the proper test instructions. Now I could easily reproduce the issue on my dev server.
But is there really no way to do that server type independent and in a more easy way? I tried a different approach in #18929 to split up the queries. What do you guys think?
Hi @laoneo - see my comments on your other PR.
Re the server type dependent code - as I explained in my comment in the code the subquery approach should be the fastest since the database engine can do its own optimisations. Unfortunately MySQL had a bug where the independent sub query is treated as if it were dependent and gets executed once for every article searched (and a ridiculous performance hit).
Postgres treats the query correctly and is about twice as fast as a subquery compared to passing the field ids in as a string.
So its a trade off between easier to read code and a small performance hit in Postgres.
@angieradtke I suppose it does make sense to be able to mark specific fields as 'searchable' and other not. That would require a new config option for individual fields - @laoneo what do you think?
What's the reason to not search in a field? If you want such an option, then I would add it as a parameter in the search plugin in which fields to search.
So its a trade off between easier to read code and a small performance hit in Postgres.
I'm more worried about other extension developers who want to implement a search plugin. Mostly the content search plugin acts as example. It is already very complex.
In general most sites will not want to search for matching text in media fields but I guess some people may want to search for matching image file name?
The setting could be in the plugin but if someone then adds a new field they need to remember to go back to the plugin to change its setting there too. Seems more natural to have the setting field specific to my mind.
On a related note I guess fields that only display in the backend should not be searched??
@GeraintEdwards
the custom fields can be used for various approaches. You can store article related data there. This is they are developed for. But you can use them as well to add CSS-Classnames to parts of the page.
For example : You got a three column layout and you want that the last column of one row is larger then the others you can add col-6 col-md-2 to this column using custom fields. This makes layout styling very flexible.
Such values I don't wanna see in my search results .
i don't think an if
about dbserver type make the code unreadable, plus as already stated mysql and postgresql do things in a different way so imho is fine to have different behaviuor depending upon dbserver type, and we have the same behaviuor in different place already
this is a really big performance gain, making search again usable,
so please can we consider to give this pr high priority and then do the other related consideration to other future pr's
The current code is ok - it works and solves a real problem. IMO it should be merged.
@laoneo Your concerns are absolutely valid. The plugin contains a lot of duplicated code which should be avoided. The PR does not make it worse, it just follows the style, the plugin is written in.
Nevertheless, in a next, separate step, the code for custom fields integration should be moved to a helper method (or better an event handler / "nested" plugin), that provides the right WHERE clause.
I agree with @nibra the search is hitting large sites very badly already and I would suggest merging this PR now and then working on moving the custom code to a helper/model class, taking account of which fields should be searched (as per the valid suggestions from @angieradtke) and incorporating similar changes into the contacts search plugin when that code is ready. If that can be done in time for 3.8.3 then nothing is lost by merging this change now and a more refined solution later - but I don't think we should risk waiting for a more refined solution that may possibly not be merged in time for 3.8.3.
@nibra - I have removed the commented out code and largely restyled the file. I suggest the field value collection/assignement change is made into another PR since the same changes need to be applied in other parts of the core to eliminate duplicate/repeated queries.
@alikon @angieradtke Could you please re-test? Functionality should not have changed, but just to be on the safe side...
I'll do this weekend
Il 01 dic 2017 12:50 PM, "Niels Braczek" notifications@github.com ha
scritto:
@alikon https://github.com/alikon @angieradtke
https://github.com/angieradtke Could you please re-test? Functionality
should not have changed, but just to be on the safe side...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#18915 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AALFsQBIW5SWmZAHDXe1yv99oKxXPfTWks5s7-gQgaJpZM4QvB9K
.
I have tested this item
This is a massive improvement to Joomla! Performance and is a real leap forward for customfields. I tested using 3 categories and 1500 articles per category = 4500 articles.
Searched for a keyword within most articles. We get a 504 gateway error after around 5 minutes (Local server for testing).
Applied patch, search result returned with page rendered in: 1180.36 ms which is pretty fast given the content being search.
Retested successful. Great job .-) This is a big step towards the professional usage of the custom_fields.
Altered test of @angieradtke as successfully.
Status | Pending | ⇒ | Ready to Commit |
Ready to Commit after two successful tests.
I have tested this item
on postgresql
the offending statement is when we have a.id IN (cfv.item_id)
cause we are comparing the id
field of #__content
table wich is a SERIAL
with the item_id
field of #__fields_values
table wich is a VARCHAR
we need an explicit cast
from
$wheres2[] = 'a.id IN( ' . (string) $subQuery . ')';
to
$wheres2[] = 'CAST(a.id AS VARCHAR) IN(' . (string) $subQuery . ')';
lines 125,177,216
with this changes in place even on postgresql the performance are better
Status | Ready to Commit | ⇒ | Pending |
Removing RTC so we can fix postgres :)
Priority | Medium | ⇒ | Urgent |
Really sorry about the Postgres - I had 2 development sites running and fixed the casting in the postgres version but hadn't committed from there.
JDatabaseQuery doesn't implement castAsInteger which would be the most efficient casting/matching so I'm using castAsChar or a.id for the match instead.
JDatabaseQuery doesn't implement castAsInteger which would be the most efficient casting/matching so I'm using castAsChar or a.id for the match instead.
good catch, i hope #18961 will fill the gap
we should remeber to use it, when and if #18961 will be merged, but for now we should go even without it
I have tested this item
mysql & postrgesql
@angieradtke, @tonypartridge, could you please re-test?
Second retest succesful on a live project .-)
@angieradtke please mark your Test as successfully:
I have tested this item
Status | Pending | ⇒ | Fixed in Code Base |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2017-12-03 16:30:51 |
Closed_By | ⇒ | wilsonge |
I tweaked a couple of small c/s issues and merged. Thankyou so much for this :)
Do we actually need the GROUP BY statement in the plugin's query? It doesn't seem it's needful here and taking it off it results to an extra ~30% gain in performance.
@joomlabeat as you can see this is a old issue / merged PR. Can you please open a new issue linking to this one? Else this get missed fast. Thanks!
Yes, the GROUP BY is required by the SQL standard (MySQL can deal without, others can't). The drivers are AFAIK not building the GROUP BY causes transparently if needed (which is what they should do).
@joomlabeat is right, line 288 can be completely removed.
There is only (#__content
INNER JOIN #__categories
), there are no duplicates and the query has no aggregate function.
This is version of com_overload that can generate the test data set
com_overload.zip