No Code Attached Yet bug
avatar csib
csib
14 Jun 2021

Steps to reproduce the issue

Deploy Joomla to K8S. After a few hours/days (depends on the load!) the database will crash due to the Joomla. The InnoDB replication brokes and a lot of pending writes to _session table are waiting/blocked.
In my tests the root cause is the sessions itself.

I can reproduce this in my infrastructure with an average of 4pageload/sec.

Expected result

Working database.

Actual result

The databse replication will fail due to the Joomla. I have never experienced this behaviour on my other galera nodes, only that one where the Joomla is running.

System information (as much as possible)

Infra: Kubernetes
Ingress Proxy: Traefik
Image: Custom built based on Alpine
Joomla: 3.9.27
Database: Mariadb Galera Cluster 10.5.10 (There is an open issue related to this already in their repo )
Session settings:
image
Behind Load Balancer setting is set to Yes

Additional comments

I have tried several settings, load balancer on/off, setting rollback on Galera, changing timeouts, change from Redis to DB.
In my test when I had 1 Joomla Pod and 1 Galera pod everything is working. After I scaled up the Gaelra to the expected 3 pods, the system will crash in a few hours.

avatar csib csib - open - 14 Jun 2021
avatar joomla-cms-bot joomla-cms-bot - change - 14 Jun 2021
Labels Added: ?
avatar joomla-cms-bot joomla-cms-bot - labeled - 14 Jun 2021
avatar csib csib - change - 14 Jun 2021
The description was changed
avatar csib csib - edited - 14 Jun 2021
avatar csib csib - change - 14 Jun 2021
The description was changed
avatar csib csib - edited - 14 Jun 2021
avatar csib csib - change - 14 Jun 2021
Title
Joomla in Kubernetes with Maraidb Cluster
Joomla in Kubernetes with Mariadb Cluster
avatar csib csib - edited - 14 Jun 2021
avatar csib csib - change - 14 Jun 2021
The description was changed
avatar csib csib - edited - 14 Jun 2021
avatar csib csib - change - 14 Jun 2021
The description was changed
avatar csib csib - edited - 14 Jun 2021
avatar joomdonation
joomdonation - comment - 14 Jun 2021

Since it relates to session table, one thing you should check is making sure System - Session Data Purge plugin is enabled.

avatar richard67
richard67 - comment - 14 Jun 2021

Possibly the MariaDB cluster has problems with tables which don't have a primary key, at least I've recently heard something like that elsewhere, and Joomla has a few such tables.

avatar csib
csib - comment - 14 Jun 2021

Since it relates to session table, one thing you should check is making sure System - Session Data Purge plugin is enabled.

Thank you. I've checked and it's already enabled.

Possibly the MariaDB cluster has problems with tables which don't have a primary key, at least I've recently heard something like that elsewhere, and Joomla has a few such tables.

In the session table the session_id is primary key in my database.

avatar richard67
richard67 - comment - 14 Jun 2021

In the session table the session_id is primary key in my database.

@csib I know the session table has a primary key. I thought maybe some other table which has no primary key might make problems with the cluster as such. But I don’t really think that’s the case.

avatar Fedik
Fedik - comment - 14 Jun 2021

I had similar issue in one of high traffic sites (was resolved by bigger hosting and some configs for mysql by hosting support, and later by extra caching proxy).
The session table is a bottleneck, unfortunately.

avatar Fedik
Fedik - comment - 14 Jun 2021

@joomdonation @richard67 do you remember, does joomla 4 still store session metadata in the database, even if the session handler is not "database"?
I remember there was some discussion about it in past, but I not know what was done.

avatar richard67
richard67 - comment - 14 Jun 2021

@Fedik I have the same problem as you have. I remember there was something, but I don't remember details.

There is an option "Track Session Metadata" in the "System" tab of Global Configuration in J4, which is not there in J3. So it seems in J4 it's possible to switch that off.

avatar alikon
alikon - comment - 14 Jun 2021

long time ago #19460 but no one was interested

avatar richard67
richard67 - comment - 14 Jun 2021

long time ago #19460 but no one was interested

@alikon And it was called new feature. In my opinion, a new index or pk where was no pk before is not really a new feature. I think it should be fixed in 3.9 or 3.10 and could be even tested by review only.

avatar csib
csib - comment - 16 Jun 2021

Plus debug info:
When the Mariadb Cluster fails I am always seeing a lot of insert queries waiting in the queue in the database (SHOW PROCESSLIST).
image

At first there are 5-10 messages like that, after a while it's maxed out (500), but the database is non-functional when I see these messages raising, not needed to reach the 500.

I am also using @joomdonation's Membership Pro as seen in the screenshot. Maybe it is a useful plus information.

Thank you!

avatar joomdonation
joomdonation - comment - 16 Jun 2021

@csib We have system plugins in the extension, however, it is only triggered to be run every 1 hour (the time is configurable), not every page loads, so that should not cause any problem.

The main problem here, as people figured out, is how Joomla handles session meta data. In a high traffic website like yours, there would be problems because there will be too many records inserted into that table. I don't have time to look at it right now, but there are few things you could try:

Could not say that it would solve your issue but that is something I think you can try. I will try to look at this problem again when I have more time. But I remember that in the past, we had discussed about same problem here (someone had a high traffic website but we could come up with a solution)

avatar joomdonation
joomdonation - comment - 16 Jun 2021

OK. Found the original discussion #19146

avatar csib
csib - comment - 16 Jun 2021

Thank you very much I will try your advices.

Generally it is not a high traffic website (or dont know what the high traffic means for Joomla). The problem is that it is occurs sometimes in my website (due to a little traffic pike). Using 3 nodes, the CPU load is between 30-60% (2x 4cores, 8GB mem 1x 6cores 16GB mem) but they are VPS servers so the cores are not dedicated.

And the sadest thing is anyone can DOS my website right now, anyone who can make at least 4pageload/sec against the website and this is a very bad thing.

If I use a single Mariadb database the problem is not occuring, in my tests it happens only with clustered Mariadb. (mentioned in the first post)

Make long story short:
Thank you very much! I am going to try your advices. If they won't help, I will try to limit the pageloads/sec from the same IP for 2 or 3/sec (in proxy level), I hope I won't bother real users with this solution. If I end up with this proxy thing I will post the settings here, maybe it will help someone else who has a similar system like me.

avatar csib
csib - comment - 15 Jul 2021

With the solution that @joomdonation mentioned the website and the database are much more stable.

Without this fix my site randomly broke down between 2-4 days, but now it's lasts for almost a month. Yesterday there were a problem again due to this, but much much better then before.

So I can confirm that this solution is working, and the website is more stable, thank you @joomdonation.

Maybe I will try with Postgres(HA version) instead of Mariadb Galera and see what happens.

avatar chmst chmst - change - 20 Feb 2023
Labels Added: No Code Attached Yet bug
Removed: ?
avatar chmst chmst - labeled - 20 Feb 2023

Add a Comment

Login with GitHub to post a comment