No Code Attached Yet
avatar rigin
rigin
13 Apr 2021

Is your feature request related to a problem? Please describe.

  1. When indexing a large site on a slow server using a CLI script, for various reasons, you have to restart the process without finishing it, because the process takes several days.

When resuming, indexing does not continue from the place where it stopped, but begins to reindex the already prepared index, although a significant part of the site has not yet been indexed.

  1. And also, when you save content, the site freezes due to indexing.
    If you disable the smart search plugin content, saving is normal.

Describe the solution you'd like

  1. Change the order of crawling content during indexing: Index the content without an index first, and then in ascending order of the date of the previous indexing.
  2. If possible, create a separate option in the settings to disable indexing when saving content and indexing using the CLI script. In order to be able to save content without indexing, while the CLI script is running in parallel.

Additional context

avatar rigin rigin - open - 13 Apr 2021
avatar joomla-cms-bot joomla-cms-bot - change - 13 Apr 2021
Labels Added: ?
avatar joomla-cms-bot joomla-cms-bot - labeled - 13 Apr 2021
avatar brianteeman
brianteeman - comment - 13 Apr 2021

If the site is so large and on such a slow server that is takes several days then no amount of optimisation will really help you.

Time to bite the bullet and change to a better server. Problem solved permanently in hours saving you $$$ in time

avatar rigin
rigin - comment - 13 Apr 2021

That's how much I love these smart guys who, in response to a specific question, begin to lecture in the style of an old grandmother... Here's a type of youth today went... ))))
The server at my house is behind the TV. And he is weak, because I need it so much - he eats electricity less.
And the situation when a large site, even on a fast server, is indexed for many hours is typical. And the situation when indexing did not end at once is also common.

avatar brianteeman
brianteeman - comment - 13 Apr 2021

There is only so much juice that you can get from a lemon before you need to get another lemon

avatar sandramay0905
sandramay0905 - comment - 14 Apr 2021

@rigin Can you append in title "[4] "? New features go in Joomla4 so its easier to find in issue-view which version they belong. Thanks.

avatar rigin rigin - change - 14 Apr 2021
Title
Problems with indexing com_finder.
[4] ...[3.9] Problems with indexing com_finder.
avatar rigin rigin - edited - 14 Apr 2021
avatar richard67
richard67 - comment - 18 Apr 2021

@Hackwar Is there anything we could do about this?

avatar Hackwar
Hackwar - comment - 18 Apr 2021

@rigin is this indeed an issue you notice on Joomla 4? I improved the indexing process quite a bit in 4.0 and it does index a lot faster. Could you provide a bit more information about your site in order to determine if the indexing time is to be expected or not?

Generally, it is not exactly trivial to determine which parts haven't been indexed and which ones would need updating. However, checking if an item needs to be reindexed should be a rather quick operation, because it takes the result object that is prepared prior to indexing and creates a checksum over that. It then compares that checksum to the one in the database and only indexes that content when it has changed. If after restarting the indexing of that part still takes a long time, then you might want to look into your plugins to see which ones are executed during indexing (mainly to process the content triggers) and take up a lot of time.

In order to disable indexing on saving, you should just be able to disable the smart search content plugin. It should then still be possible to index via the CLI script.

I'm really torn on investing more work into this in 3.x, since all changes of 4.0 can't be backported due to backwards compatibility reasons. At the same time, it is more than just a bugfix and thus would have to go into a minor release... I would defer to @HLeithner if we want to do some improvements to speed this up in 3.x.

Generally, there are options to improve indexing speed specific to your site and there would also be the possibility of using a plugin which I wrote, which backports the changes from 4.0 to 3.9. If you want to go that route, please contact me privately and I'll try to help you. Please just search for my name and you will find ways to contact me.

avatar rigin
rigin - comment - 19 Apr 2021

I ran into this problem on Joomla 3.9. But I understand that this has always been the case in com_finder. ))
It's just that I got it on a combination of a slow server and a large amount of indexing.
@sandramay0905 advised to add the tag [4.0] in order to draw attention to the problem.

This problem occurred when indexing the site https://rigin.net/ . There are approximately 1,500 articles in it. Hardware-wise, it is located on a weak gigabyte ga-d525tud office computer.

I use the standard joomla 3.9 cron script for indexing.

On the provider's server, the indexing process took about 12 hours, and on this configuration it takes about 2 weeks.

When you try to edit the material in parallel with indexing, indexing is interrupted and when you restart the cron script, indexing can continue from the interrupted place, but if the session is interrupted, indexing starts again in the order of increasing the article id.

This can be seen in the admin panel /administrator/index. php?option=com_finder&view=index

avatar Hackwar
Hackwar - comment - 19 Apr 2021

So I'm running the improved code on a website with ~7500 items, about 3500 of those are articles. Indexing that takes about 30 minutes on a good server. Your site should index in definitely less than a day.

However, this is NOT a 4.0 problem.

avatar rigin
rigin - comment - 19 Apr 2021

OK, I'll fix it now.

avatar rigin rigin - change - 19 Apr 2021
Title
[4] ...[3.9] Problems with indexing com_finder.
[3.9] Problems with indexing com_finder.
avatar rigin rigin - edited - 19 Apr 2021
avatar sandramay0905
sandramay0905 - comment - 19 Apr 2021

However, this is NOT a 4.0 problem.

Sorry @rigin i thought its a feature request.

avatar rigin
rigin - comment - 19 Apr 2021

This is my terrible English.. )))

avatar HLeithner
HLeithner - comment - 19 Apr 2021

If needed we can backport improvements later (maybe after j4 release) but at this point in time I would really like to bring j4 into a release able state.

avatar Hackwar
Hackwar - comment - 9 May 2021

#33720 should improve indexing a little bit.

avatar chmst chmst - change - 20 Jan 2022
Status New Closed
Closed_Date 0000-00-00 00:00:00 2022-01-20 20:39:32
Closed_By chmst
Labels Added: No Code Attached Yet
Removed: ?
avatar chmst chmst - close - 20 Jan 2022
avatar chmst
chmst - comment - 20 Jan 2022

Closing as there is PR

Add a Comment

Login with GitHub to post a comment