? Pending

User tests: Successful: Unsuccessful:

avatar GeraintEdwards
GeraintEdwards
29 Nov 2017

Pull Request for Issue # .

For MySQL based sites with a large number of articles (more than a few hundred) that have custom fields enabled the com_content search plugin ends up being so slow that it can cripple a site completely. For a test dataset (details below) of 3,900 articles with 2 custom fields each the database query for a search took between 700,000 and 1,300,000 milliseconds to return a set of search results (on an i7-6700k 4Ghz 8 core processor with 32GB or ram) . In addition the search results for matching all on a multi-select field incorrectly returns zero results.

Postgres performs a LOT better (query times in the region of 400 milliseconds) but the changes described below still cut the query times by 50%.

Summary of Changes

The plugin has been re-written and rather than joining all the custom fields rows and searching for custom fields at the same time as all the other content fields we now search custom fields and incorporate the results into the main query. In Postgres this can be done as a subquery but MySQL treats this independent subquery as a 'dependent subquery' with punishing performance implications. This may explain in part why MySQL performs so badly with the existing code.

Testing Instructions

This test requires a large dataset of articles/custom fields to test. It can be generated using com_overload ( Copyright (C) 2011 Nicholas K. Dionysopoulos) - an updated version is attached below that will work in Joomla 3.8 and that also creates 2 custom fields and populates these with the articles.

You should install this component and view it in the backend. A decent dataset to test with would use values of 3 categories, 3 levels, 100 articles -> this will give 3900 articles (WARNING this will take several minutes to generate the data)

This adds a text field with random values of "one" to "seven" as text and a multi-select list field with 2 random values from opt1, opt2. opt3, opt4 and op5.

Then use the search module/com_search in the frontend and search for

"seven" or "opt1" or "opt1 opt2" etc.

Expected result

1; The correct set of results within a couple of seconds max.

  1. Search for 'opt1 opt2' and match all and it should return the matching set of articles

Actual result

  1. Go and make a cup of coffee and then drink it, call your friends, have another cup of coffee - depending on your processor you are likely to be waiting 20 minutes for the results to return.

  2. The search for 'opt1 opt2' and match all returns NO values which is NOT correct since there are several hundred matches

If you implement the patch you will get a VERY FAST result for 1. and the correct results for 2.

Discussion Point

I have tested the changes on Postgres and it supports a subquery rather than having a separate list of ids from a different query. I have not been able to test this on SQL server.

Documentation Changes Required

None

avatar GeraintEdwards GeraintEdwards - open - 29 Nov 2017
avatar GeraintEdwards GeraintEdwards - change - 29 Nov 2017
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 29 Nov 2017
Category Front End Plugins
avatar GeraintEdwards
GeraintEdwards - comment - 29 Nov 2017

This is version of com_overload that can generate the test data set
com_overload.zip

avatar GeraintEdwards GeraintEdwards - change - 29 Nov 2017
Labels Added: ?
avatar GeraintEdwards GeraintEdwards - change - 29 Nov 2017
The description was changed
avatar GeraintEdwards GeraintEdwards - edited - 29 Nov 2017
avatar GeraintEdwards GeraintEdwards - change - 29 Nov 2017
The description was changed
avatar GeraintEdwards GeraintEdwards - edited - 29 Nov 2017
avatar GeraintEdwards
GeraintEdwards - comment - 29 Nov 2017

Agree entirely about the article ids and fetching the custom fields in one go. We need to do the same elsewhere in custom fields too. I'll look at the code formatting - but my IDE is setup to match Joomla's standards already

avatar mbabker
mbabker - comment - 29 Nov 2017

I'll look at the code formatting - but my IDE is setup to match Joomla's standards already

There's an extra indent on the method's code. Take that out and it should be mostly OK at a cursory glance.

avatar GeraintEdwards
GeraintEdwards - comment - 29 Nov 2017

Thanks @mbabker - not sure where the extra tab came from. That had tidied up the diff a lot.

Do we really care about aligning assignment operators?

@nibra I'll do a new PR later to implement the idea of picking up all the field values in one pass. We need to do the same elsewhere in the core (in my opinion too) instead of duplicated or repetitive queries.

avatar brianteeman
brianteeman - comment - 29 Nov 2017

Do we really care about aligning assignment operators?

it does make code more readable

avatar alikon
alikon - comment - 29 Nov 2017

i've made a quick test on mysql and after apply pr no need to

make a cup of coffee and then drink it, call your friends, have another cup of coffee

to get results 👍

i'll make some test on postgresql too

avatar angieradtke
angieradtke - comment - 30 Nov 2017

I tested you changes at a real project: 8 Fields per article within 6000 articles in 4 different languages.
It seems to work .-)
Thanks Angie

avatar angieradtke
angieradtke - comment - 30 Nov 2017

One more idea, maybe we should use a param; search in fields as well, so it is more flexible

avatar laoneo
laoneo - comment - 30 Nov 2017

Thanks @GeraintEdwards for the proper test instructions. Now I could easily reproduce the issue on my dev server.
But is there really no way to do that server type independent and in a more easy way? I tried a different approach in #18929 to split up the queries. What do you guys think?

avatar GeraintEdwards
GeraintEdwards - comment - 30 Nov 2017

Hi @laoneo - see my comments on your other PR.

Re the server type dependent code - as I explained in my comment in the code the subquery approach should be the fastest since the database engine can do its own optimisations. Unfortunately MySQL had a bug where the independent sub query is treated as if it were dependent and gets executed once for every article searched (and a ridiculous performance hit).

Postgres treats the query correctly and is about twice as fast as a subquery compared to passing the field ids in as a string.

So its a trade off between easier to read code and a small performance hit in Postgres.

avatar GeraintEdwards
GeraintEdwards - comment - 30 Nov 2017

@angieradtke I suppose it does make sense to be able to mark specific fields as 'searchable' and other not. That would require a new config option for individual fields - @laoneo what do you think?

avatar laoneo
laoneo - comment - 30 Nov 2017

What's the reason to not search in a field? If you want such an option, then I would add it as a parameter in the search plugin in which fields to search.

avatar laoneo
laoneo - comment - 30 Nov 2017

So its a trade off between easier to read code and a small performance hit in Postgres.

I'm more worried about other extension developers who want to implement a search plugin. Mostly the content search plugin acts as example. It is already very complex.

avatar GeraintEdwards
GeraintEdwards - comment - 30 Nov 2017

In general most sites will not want to search for matching text in media fields but I guess some people may want to search for matching image file name?

The setting could be in the plugin but if someone then adds a new field they need to remember to go back to the plugin to change its setting there too. Seems more natural to have the setting field specific to my mind.

On a related note I guess fields that only display in the backend should not be searched??

avatar angieradtke
angieradtke - comment - 30 Nov 2017

@GeraintEdwards
the custom fields can be used for various approaches. You can store article related data there. This is they are developed for. But you can use them as well to add CSS-Classnames to parts of the page.

For example : You got a three column layout and you want that the last column of one row is larger then the others you can add col-6 col-md-2 to this column using custom fields. This makes layout styling very flexible.

Such values I don't wanna see in my search results .

avatar alikon
alikon - comment - 30 Nov 2017

i don't think an if about dbserver type make the code unreadable, plus as already stated mysql and postgresql do things in a different way so imho is fine to have different behaviuor depending upon dbserver type, and we have the same behaviuor in different place already

this is a really big performance gain, making search again usable,
so please can we consider to give this pr high priority and then do the other related consideration to other future pr's

avatar laoneo
laoneo - comment - 30 Nov 2017

@alikon I do not agree. If we can make things less complex, reusable and easy to understand for other extension devs then I think it is worth to give it a try.

avatar nibra
nibra - comment - 30 Nov 2017

The current code is ok - it works and solves a real problem. IMO it should be merged.

@laoneo Your concerns are absolutely valid. The plugin contains a lot of duplicated code which should be avoided. The PR does not make it worse, it just follows the style, the plugin is written in.

Nevertheless, in a next, separate step, the code for custom fields integration should be moved to a helper method (or better an event handler / "nested" plugin), that provides the right WHERE clause.

avatar GeraintEdwards
GeraintEdwards - comment - 30 Nov 2017

I agree with @nibra the search is hitting large sites very badly already and I would suggest merging this PR now and then working on moving the custom code to a helper/model class, taking account of which fields should be searched (as per the valid suggestions from @angieradtke) and incorporating similar changes into the contacts search plugin when that code is ready. If that can be done in time for 3.8.3 then nothing is lost by merging this change now and a more refined solution later - but I don't think we should risk waiting for a more refined solution that may possibly not be merged in time for 3.8.3.

avatar GeraintEdwards
GeraintEdwards - comment - 1 Dec 2017

@nibra - I have removed the commented out code and largely restyled the file. I suggest the field value collection/assignement change is made into another PR since the same changes need to be applied in other parts of the core to eliminate duplicate/repeated queries.

avatar nibra
nibra - comment - 1 Dec 2017

@alikon @angieradtke Could you please re-test? Functionality should not have changed, but just to be on the safe side...

avatar alikon
alikon - comment - 1 Dec 2017

I'll do this weekend

Il 01 dic 2017 12:50 PM, "Niels Braczek" notifications@github.com ha
scritto:

@alikon https://github.com/alikon @angieradtke
https://github.com/angieradtke Could you please re-test? Functionality
should not have changed, but just to be on the safe side...


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#18915 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AALFsQBIW5SWmZAHDXe1yv99oKxXPfTWks5s7-gQgaJpZM4QvB9K
.

avatar tonypartridge tonypartridge - test_item - 1 Dec 2017 - Tested successfully
avatar tonypartridge
tonypartridge - comment - 1 Dec 2017

I have tested this item successfully on 5eab93c

This is a massive improvement to Joomla! Performance and is a real leap forward for customfields. I tested using 3 categories and 1500 articles per category = 4500 articles.

Searched for a keyword within most articles. We get a 504 gateway error after around 5 minutes (Local server for testing).

Applied patch, search result returned with page rendered in: 1180.36 ms which is pretty fast given the content being search.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/18915.
avatar angieradtke
angieradtke - comment - 1 Dec 2017

Retested successful. Great job .-) This is a big step towards the professional usage of the custom_fields.

avatar franz-wohlkoenig franz-wohlkoenig - alter_testresult - 1 Dec 2017 - angieradtke: Tested successfully
avatar franz-wohlkoenig
franz-wohlkoenig - comment - 1 Dec 2017

Altered test of @angieradtke as successfully.

avatar franz-wohlkoenig franz-wohlkoenig - alter_testresult - 1 Dec 2017 - angieradtke: Tested successfully
avatar franz-wohlkoenig franz-wohlkoenig - alter_testresult - 1 Dec 2017 - angieradtke: Not tested
avatar franz-wohlkoenig franz-wohlkoenig - change - 1 Dec 2017
Status Pending Ready to Commit
avatar franz-wohlkoenig
franz-wohlkoenig - comment - 1 Dec 2017

Ready to Commit after two successful tests.

avatar alikon alikon - test_item - 2 Dec 2017 - Tested unsuccessfully
avatar alikon
alikon - comment - 2 Dec 2017

I have tested this item 🔴 unsuccessfully on 5eab93c

on postgresql


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/18915.

avatar alikon
alikon - comment - 2 Dec 2017

the offending statement is when we have a.id IN (cfv.item_id)
cause we are comparing the id field of #__content table wich is a SERIAL with the item_id field of #__fields_values table wich is a VARCHAR

we need an explicit cast

from
$wheres2[] = 'a.id IN( ' . (string) $subQuery . ')';
to
$wheres2[] = 'CAST(a.id AS VARCHAR) IN(' . (string) $subQuery . ')';

lines 125,177,216

avatar alikon
alikon - comment - 2 Dec 2017

with this changes in place even on postgresql the performance are better

avatar wilsonge wilsonge - change - 2 Dec 2017
Status Ready to Commit Pending
avatar wilsonge
wilsonge - comment - 2 Dec 2017

Removing RTC so we can fix postgres :)


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/18915.

avatar nibra nibra - change - 2 Dec 2017
Priority Medium Urgent
avatar GeraintEdwards
GeraintEdwards - comment - 3 Dec 2017

Really sorry about the Postgres - I had 2 development sites running and fixed the casting in the postgres version but hadn't committed from there.

JDatabaseQuery doesn't implement castAsInteger which would be the most efficient casting/matching so I'm using castAsChar or a.id for the match instead.

avatar alikon
alikon - comment - 3 Dec 2017

JDatabaseQuery doesn't implement castAsInteger which would be the most efficient casting/matching so I'm using castAsChar or a.id for the match instead.

good catch, i hope #18961 will fill the gap

we should remeber to use it, when and if #18961 will be merged, but for now we should go even without it 😍

avatar alikon alikon - test_item - 3 Dec 2017 - Tested successfully
avatar alikon
alikon - comment - 3 Dec 2017

I have tested this item successfully on 351b0c6

mysql & postrgesql


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/18915.

avatar nibra
nibra - comment - 3 Dec 2017

@angieradtke, @tonypartridge, could you please re-test?

avatar angieradtke
angieradtke - comment - 3 Dec 2017

Second retest succesful on a live project .-)

avatar nibra nibra - alter_testresult - 3 Dec 2017 - angieradtke: Tested successfully
6e1c893 3 Dec 2017 avatar GeraintEdwards CS
avatar franz-wohlkoenig
franz-wohlkoenig - comment - 3 Dec 2017

@angieradtke please mark your Test as successfully:

  • open Issue Tracker
  • Login with your github-Account
  • Click on blue "Test this"-Button above Authors-Picture
  • mark your Test as successfully
  • hit "submit test result"
avatar angieradtke angieradtke - test_item - 3 Dec 2017 - Tested successfully
avatar angieradtke
angieradtke - comment - 3 Dec 2017

I have tested this item successfully on 6e1c893


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/18915.

avatar wilsonge wilsonge - change - 3 Dec 2017
Status Pending Fixed in Code Base
Closed_Date 0000-00-00 00:00:00 2017-12-03 16:30:51
Closed_By wilsonge
avatar wilsonge wilsonge - close - 3 Dec 2017
avatar wilsonge wilsonge - merge - 3 Dec 2017
avatar wilsonge
wilsonge - comment - 3 Dec 2017

I tweaked a couple of small c/s issues and merged. Thankyou so much for this :)

avatar joomlabeat
joomlabeat - comment - 18 Jan 2018

Do we actually need the GROUP BY statement in the plugin's query? It doesn't seem it's needful here and taking it off it results to an extra ~30% gain in performance.

avatar zero-24
zero-24 - comment - 18 Jan 2018

@joomlabeat as you can see this is a old issue / merged PR. Can you please open a new issue linking to this one? Else this get missed fast. Thanks!

avatar nibra
nibra - comment - 18 Jan 2018

Yes, the GROUP BY is required by the SQL standard (MySQL can deal without, others can't). The drivers are AFAIK not building the GROUP BY causes transparently if needed (which is what they should do).

avatar csthomas
csthomas - comment - 19 Jan 2018

@joomlabeat is right, line 288 can be completely removed.

There is only (#__content INNER JOIN #__categories), there are no duplicates and the query has no aggregate function.

Add a Comment

Login with GitHub to post a comment