User tests: Successful: Unsuccessful:
Apply patch, if necessary discover and install the new plugin. With the "Enable Session Garbage Collection" option enabled, during the onAfterRespond
event Joomla will attempt to perform session garbage collection based on the odds defined in the plugin's parameters. With the "Enable Session Metadata Cleanup" option enabled, and a non-database session handler in use, during the onAfterRespond
event Joomla will attempt to purge stale records from the session database table representing the optional session metadata (as explained a thousand times, this is done by design in the plugin for now because the session store and our metadata handling represent two different datasets and should be treated as such).
The odds calculation is the same as that used by the C level php_session_gc
function, ported to userland PHP and using the plugin's configuration values versus the PHP runtime configuration.
The two cleanup operations purposefully do not share the same probability calculation, though there is no harm in running both tasks in one request IMO there is no need to force it either.
Status | New | ⇒ | Pending |
Category | ⇒ | SQL Administration com_admin Postgresql MS SQL Language & Strings CLI Installation Libraries Front End Plugins |
I like this PR, I'm only wondering if we need an another CLI
script to clean up #__session
table.
I would like to combine these two functions into one.
We can check if the session handler is set to the database and then call the appropriate method.
A little off topic: I am against the division of the database table #__session
into 2 separated tables.
A little off topic: I am against the division of the database table #__session into 2 separated tables.
It needs to happen. It'd be like storing category and content data in one table. Though it might be related, it's two different types of data. In this case it's a little worse because some of the management tasks for that data have colliding/differing logic (i.e. we want the metadata to be frequently cleared, in part because of the common complaint of a wrong count for who's online, and we need to clear the metadata ourselves because it is a custom feature to Joomla, but we don't want Joomla always acting as the garbage collector for the database session handler because we are cleaning the metadata; that garbage collection should be rightfully deferred to a dedicated session task).
I'm only wondering if we need an another CLI script to clean up #__session table.
CLI tasks should be like web controllers and follow SRP. So yes, in the case of 3.x that means two files and in 4.0 two different Command subclasses.
My conclusion. If I am an "advanced" administrator and I want to use cronjob then
database
then I have to use cli/sessionGc.php
redis
then I have to use cli/sessionMetadataGc.php
.Otherwise, I only use the plugin.
Almost.
Session GC:
session.gc_probability
set to 0, And I do not have any other existing jobs purging expired data, Then I should create a cron job for cli/sessionGc.php
and disable the appropriate plugin settingssession.gc_probability
set to 0, And I do not have any other existing jobs purging expired data, Then I should use the plugin (default settings)session.gc_probability
set to any non-zero value, Then I should disable the appropriate plugin settings and not create a cron job (PHP will eventually do this cleanup for me without any extra nudging)Metadata GC:
cli/sessionMetadataGc.php
and disable the appropriate plugin settingsNote the cron doesn't make the database handler distinction, only the plugin is doing so now
Just enabled it on my life website. Will post the result after checking the size of my sessions table after two days or so...
Plugin enabled with default values. It is a public site with no registered users except for one super user account.
In the sessions table:
Before PR: 330
After PR overnight: 526
How can I tell if it is working?
Put a JLog::add()
statement in there. Because it's happening after the response you can't do any other kind of var_dump or anything. At https://github.com/joomla/joomla-cms/pull/19687/files#diff-c62bf5ee641c0af7406f55c3f4e4f9d9R58 you'd want to know the values of $probability
and $random
and see if values were calculated to meet the condition to trigger garbage collection (which again, is the exact logic used in php_session_gc
so unless there's a flaw in porting this code from C to PHP it should work the same).
I'm currently also wondering if the plugin works on my website. It's hosted on a shared server where I can't change the php settings set by the hoster.
Plugin Settings:
Enable Session Garbage Collection: Yes
Enable Session Metadata Cleanup: Yes
Probability: 1
Divisor: 100
Status: Enabled
Session settings in php.ini:
session.gc_probability = 0
session.gc_divisor = 1000
session.gc_maxlifetime = 1440
Session settings in Joomla's php information:
session.gc_probability 0
session.gc_divisor 1000 1000
session.gc_maxlifetime 9000 1440
My impression is that the plugin settings are ignored. Theoretically (if I understand this stuff correctly) I should set the plugin's Probability value to 100 in order to ensure a cleanup after 9000 seconds. Is that correct?
The PHP runtime settings are not used by the plugin.
If you really want the collector to run on every request then yes set it to 100/100. That's honestly overkill though. PHP will not let you resume a dead session so you don't need the data to purge out right at 9001 seconds of inactivity.
If you really need that frequent of garbage collection, disable the plugin and enable the cron jobs, because you're looking for the jobs to run on a high frequency and predictable schedule. If you don't mind the entropy, use the plugin and tweak the odds configuration (try even a 10/100 config to see what happens).
If you really need that frequent of garbage collection, disable the plugin and enable the cron jobs
The website including a shop has been running on a "cheap" shared server for several years. In this case "cheap" also means no cron jobs. The problem is that the response time when adding new content becomes annoying when working in the backend if the session table has grown to several hundred megabytes. After approximately 24 hours with the Plugin's default settings, the session table contains 15000 session entries already. Will play with different settings... :-)
In the backend, I am getting this: 0 Call to undefined method Joomla\CMS\Session\Session::gc()
Just for information - I just happened to see that Viktor also released a plugin to solve the problem on the 16th of February after an earlier discussion in the Joomla forum about the 'who's online' plugin. (Didn't try that one yet).
https://joomla-extensions.kubik-rubik.de/downloads/esk-easy-session-killer/joomla-3
Update: Link should work now
Here are some entries in the log file (p=$probability, d=$divisor, r=$random):
p=1 d=100 r=8.2489134663862
p=1 d=100 r=57.493870875459
p=1 d=100 r=3.1171028886235
p=1 d=100 r=3.9074626884956
p=1 d=100 r=22.473637906575
Will $random
ever be less than $probability
so that the following code will be executed?
if ($probability > 0 && $random < $probability)
Will
$random
ever be less than$probability
so that the following code will be executed?
@Quy yes, validation script can be found at https://gist.github.com/mbabker/6b2636033ced225d56c1c1001d40d4cd
I have tested this item
Currently this doesn't work in my case. Perhaps I missed something. I've used mbabker's patch tester to install #19687 and enabled the plugin of course. Is there anything else which is required to make this work?
After 48 hours my session table has grown again to 91.5 mb with 35500 lines and no sessions are deleted.
To patch:
This plugin is NOT designed to run the cleanup operation on all requests, as demonstrated by the validation script at https://gist.github.com/mbabker/6b2636033ced225d56c1c1001d40d4cd which emulates the default parameters. Running that in one continual processing script 15-20 times on my local computer, it reaches a matching condition at a minimum of 20 iterations and a maximum of 150 iterations (mind you this is one process, not spread across multiple processes).
The plugin is triggering our internal gc()
method on the session API. If you are using the database handler (default configuration), the session cleanup operation won't happen until just as the session connection is closed (which should presumably be at a point well after the HTTP response is sent, and even after this plugin is triggered, because we have a defer condition in the handler to delay cleanup until closing the connection).
If it is not triggering then validate your configuration. Unfortunately I don't have the time to offer one-on-one consulting for this PR, all I can say is validate everything is turned on and if need be add some logging statements to the plugin so that you can actually check to determine if the right code paths are being executed (is the path for cleanup being reached, if so is the random probability meeting the matching criteria). If it's really not running at all in a 48 hour period this really screams to me as a configuration issue, not a code issue (what happens if you set the probability to 100/100, basically meaning it should always run).
@Gitjk I am using v3.8.5. I had to also download/install #19548.
After installing #19687, I manually updated plugins/system/sessiongc/sessiongc.php
with the following to log garbage collection performed:
before line 73, insert:
JLog::add('enable_session_metadata_gc: p=' . $probability . ' d=' . $divisor . ' r=' . $random, JLog::ERROR, 'enable_session_metadata_gc');
before line 60, insert:
JLog::add('enable_session_gc: p=' . $probability . ' d=' . $divisor . ' r=' . $random, JLog::ERROR, 'enable_session_gc');
Under Extensions > Plugins > System - Debug > Logging, enable Log Almost Everything
.
Using FTP, download everything.php
in administrator/logs/
.
In the file, you should see garbage collection entries like the following if/when performed:
2018-02-18T18:11:02+00:00 ERROR #.#.#.# enable_session_gc enable_session_gc: p=1 d=100 r=0.65329211682033
Check the sessions
table.
@Gitjk I am using v3.8.5. I had to also download/install #19548.
I've been using my live J3.8.5 website to test this. Will try if it works when I also install #19548 later.
At present I'm testing the plugin from Victor (see link above), which seems to work out of the box.
Meanwhile I found that I could use a cron job on my 'cheap shared server', too. Didn't notice that option before. The hosting service uses 'LiveConfig' instead of the usual cPanel or Plesk. But as a non-programmer I need to find an example how to implement the 'truncate session' as a cron job for the Joomla table, which I can copy.
Just curious (because I'm not a programmer) - Is there a certain reason why clearing the session table is based on a 'probability' instead of a period of time? I mean, neither my website nor the server is a gambling machine. :-)
Is there a certain reason why clearing the session table is based on a 'probability' instead of a period of time?
You get into another database read/write operation to store a last cleanup timestamp, or you end up with the arbitrary logic that exists today where it is triggered on a second that is a divisor of 5. At least with a probability driven logic, you won't end up with several concurrent DELETE operations just because a timestamp was met.
If you do prefer time based logic, cron jobs are better for this because they can run on a set schedule.
I will be grateful for another success test and a quick merge.
Hopefully I can test it today, otherwise on tomorrow (just come back from new year holiday and have to clear support queue first)
I have tested this item
Great work, @mbabker!
Please get this into the next version.
Another question: Shouldn't we set the default handler to PHP instead of Database for fresh installations?
Another question: Shouldn't we set the default handler to PHP instead of Database for fresh installations?
We can discuss this default setting as a part of 4.0. I wouldn't touch it now though.
Status | Pending | ⇒ | Ready to Commit |
Ready to Commit after two successful tests.
Labels |
Added:
?
?
|
I look at the PR today and have few issues (or questions):
Since we moved the logic to clean up session meta data to plugin, I think the plugin should be published by default, otherwise, the session meta data would not be deleted unless users publish the plugin. When I apply the patch and use discover install, the plugin is unpublished and I had to publish it manually. Seems not right.
I found a minor issue with the UI. Since you use boolean filter for params, if someone choose No, It is not in selected state after saved (the button is not in red color)
Since the method afterSessionStart is removed, there are few lines of code was lost
$session = \JFactory::getSession();
if ($session->isNew())
{
$session->set('registry', new Registry);
$session->set('user', new \JUser);
}
Could you please confirm that it is expected? I didn't see any strange behavior during testing, just want to make sure it doesn't cause any issues at all
I think the option Enable Session Garbage Collection in the plugin should not be enabled by default. It should only be enabled for sites running on servers have gc_probability set to 0 only. By default, I think we should leave this job to PHP / server.
About default value of Probability and Divisor parameters. Currently, it sets to 1/100, mean there is 1% of chance the session meta data is clean up. While in our current behavior, it is run when time is a divisor of 5 (about 20%). So maybe we should increase the default value of Probability parameter (to 20?) ?
Still not happy with not deleting session meta data when Database Hanlder is used, but maybe it's just me
I have also an issue concerning the purpose of the plugin.
+PLG_SYSTEM_SESSIONGC_ENABLE_SESSION_GC_DESC="When enabled, this plugin will attempt to perform session garbage collection based on the odds calculated by the probability and divisor."
This is not understandable by non-specialists. Basic users (and I am one regarding that kind of stuff) will not know what it does and why, including the Options.
@infograf768 I had to google to find out what "divisor" was. I did higher level maths and had never come across it - even though it is a correct term - we always used the term "factor".
For the description I would suggest simplifying it to
"This plugin will clear session metadata on a regular basis"
I dont think it needs anything more than that.
IMHO, wherever is used Garbage collection, strings should be modified.
+PLG_SYSTEM_SESSIONGC_GC_DIVISOR_DESC="In combination with the probability field, these two fields are used to determine the odds of the garbage collection operation being triggered on a request. The probability is calculated by using probability/divisor, e.g. 1/100 means there is a 1% chance that the process runs on each request."
+PLG_SYSTEM_SESSIONGC_GC_PROBABILITY_DESC="In combination with the divisor field, these two fields are used to determine the odds of the garbage collection operation being triggered on a request."
PLG_SYSTEM_SESSIONGC_XML_DESCRIPTION="System Plugin that handles session garbage collection tasks."
Regarding 6.) The session table is also cleaned with the option Enable Session Garbage Collection for the Database handler. Or what exactly do you mean? Since we have the Database handler as default setting, this option should also be enabled by default.
@infograf768 suggestions welcome - am away from computer today
@Kubik-Rubik I just think by default, Joomla should only care about clean up session metadata and leave Session Garbage Collection to PHP/Server, so the option Enable Session Garbage Collection should be off by default. That's why I think we should delete session metadata for Database Handler as well. (Suppose we plan to store that metadata in separate table in the future as Michael suggested, we need that same delete logic for Database Handler, too).
That's how we did before 3.8.4. If anyone wants to improve performance, they can disable metadata clean up option and setup cron job.
Replies to @joomdonation
When I apply the patch and use discover install, the plugin is unpublished and I had to publish it manually. Seems not right.
Joomla does not automatically enable plugins when installed via the extension manager. This is unrelated to this PR. Please check the SQL deltas to ensure the expected published state is in use for normal upgrades.
I found a minor issue with the UI. Since you use boolean filter for params, if someone choose No, It is not in selected state after saved (the button is not in red color)
This is a bug unrelated to this PR. I'm going to take a hard nosed stance on this one, security shouldn't be compromised to have a button show the right color (and this field definition is correct and secure as far as ensuring data is properly stored/filtered/validated).
Since the method afterSessionStart is removed, there are few lines of code was lost
They were not. Please check the class hierarchy, you'll find that the parent web application class has the same method and logic.
I think the option Enable Session Garbage Collection in the plugin should not be enabled by default. It should only be enabled for sites running on servers have gc_probability set to 0 only.
We can not make this decision by any sane measure. The best we can do is add a custom form field detecting PHP runtime configuration and displaying a recommendation.
Currently, it sets to 1/100, mean there is 1% of chance the session meta data is clean up. While in our current behavior, it is run when time is a divisor of 5 (about 20%). So maybe we should increase the default value of Probability parameter (to 20?) ?
Personally I'd say no. The existing logic is based on a predictable factor (a timestamp, and because of the nature of this check all requests in a matching second would trigger that cleanup operation; so it might be a 20% probability now but in that 20% the operation could run 5-15 times easily depending on your site's traffic), this method is in line with the PHP internals and is a truly random process (which is less likely to have the overlap the existing code has). There isn't anything in this data that calls for such frequent cleanup operations, but if someone does want that, they can customize it.
Still not happy with not deleting session meta data when Database Hanlder is used
Sorry, I'm trying to make the system right and that means pushing an arbitrary restriction to prove a point. Once this gets merged then merged up to 4.0 I can finish this once and for all.
Regarding text, it is all influenced by the PHP manual. Try using translations from there to help form an opinion, or if someone can come up with a way to discuss highly technical matters in a not technical way then feel free to propose it.
English: http://php.net/manual/en/session.configuration.php#ini.session.gc-probability
Spanish: http://php.net/manual/es/session.configuration.php#ini.session.gc-probability
French: http://php.net/manual/fr/session.configuration.php#ini.session.gc-probability
German: http://php.net/manual/de/session.configuration.php#ini.session.gc-probability
if someone can come up with a way to discuss highly technical matters in a not technical way then feel free to propose it.
I will try as I do not think this plugin is the subject of discussion in its strings, but rather a basic explanation all users can understand without the knowledge of its underlying mechanism.
Part of the problem as I perceived it was explaining how the two numeric fields work to come up with a configuration and how that configuration actually translates to determining when the process should run. And to be honest I think the PHP docs cover it. But, the PHP docs are aimed at a level of user the Joomla UI isn't, so ya, what I have now is a lot more technical in nature as a result.
Shouldn't "Probability" be named to something easier for humans to understand? E.g. "Session refresh interval".
And better still, couldn't this be triggered on a time basis? E.g. "once a day", "weekly" etc.
As a sidenote, even a 1% setting on a site with immense traffic, will probably cause unexpected load. Sure, the devs can adjust in such cases and use a cron job, but I'm pretty sure good ol' sysadmins and cPanel/Plesk folks will start nagging, and for good reason.
I think @SniperSister was spot on @ #19585
Such decisions that inherently affect performance (and in return cause a bad rep for Joomla) shouldn't be taken lightly and in the name of technical superiority. WP is crap, but that hasn't stopped it from taking over 20% of the web.
Shouldn't "Probability" be named to something easier for humans to understand? E.g. "Session refresh interval".
sure.
As a sidenote, even a 1% setting on a site with immense traffic, will probably cause unexpected load. Sure, the devs can adjust in such cases and use a cron job, but I'm pretty sure good ol' sysadmins and cPanel/Plesk folks will start nagging, and for good reason.
With the plugin set to a 1% probability, this should result in less load than the 3.8.3 behavior where the cleanup's DELETE FROM
query was running on every even numbered second, or since 3.8.4 on a second which is a divisor of 5. Also the time based triggering resulted in concurrent runs, this approach shouldn't have that particular issue.
And better still, couldn't this be triggered on a time basis? E.g. "once a day", "weekly" etc.
If you're using the "Who's Online" module (or anything showing logged in user counts or lists), which apparently a lot of people do, you can't have the cleanup be that infrequent. If you aren't, and have the capability to do so, you can set up a cron job to do cleanup on a timed interval like this. As for defaulting to a time basis, Joomla's been doing that up to and through today, on a far too frequent basis, and that has also been a cause of performance issues. Maybe this improves things in the long run, maybe this is a bad idea, either way with this PR there are a lot more tools available to fine tune this aspect of things.
This is something you broke and this still is not a fix for it. Constantly blaming server configs and demanding people make cron jobs to clean up your session table is absurd. Hundreds of MB and tens of thousands of rows now make logging into sites take 20-30 seconds. This should have been reverted while a real fix is done. Over two dozen sites and four different hosting providers and only "1" site is not broken. Brian? This should have been reverted and then a "fix" should have been worked on, not leave sites broken.
It's true that the change in v3.8.5 will balloon backups and cause havoc
among users and hosting companies.
On Feb 22, 2018 4:08 PM, "GCLW" notifications@github.com wrote:
This is something you broke and this still is not a fix for it. Constantly
blaming server configs and demanding people make cron jobs to clean up
your session table is absurd. Hundreds of MB and tens of thousands of
rows now make logging into sites take 20-30 seconds. This should have been
reverted while a real fix is done. Over two dozen sites and four different
hosting providers and only "1" site is not broken. Brian? This should have
been reverted and then a "fix" should have been worked on, not leave sites
broken.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#19687 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABPdG2sBLYzyxCaYRCDDcClM0GAyjJKqks5tXXT4gaJpZM4SGMaG
.
This is something you broke and this still is not a fix for it.
If you have a fix for it then feel free to propose it. Reverting to the 3.8.3 behavior is not a valid option if you understand the architectural and performance issues with that behavior on a high traffic site.
If this is not working, please explain the issues you are having with this patch so they may be addressed, with something more than "revert until 'fixed'" as that conveys absolutely nothing useful.
It's true that the change in v3.8.5 will balloon backups
Personally I would suggest best practice is to exclude session data from backups when practical, but alas not all tools offer that so it is a concern to address and I do think this PR helps with that.
You shouldnt back up a session table as there is no value in it being restored
Your comment Brian is catalytic as always. Server wide backup solutions
don't cherry pick tables. Ask "small" companies like cPanel or Plesk.
On Feb 22, 2018 4:29 PM, "Brian Teeman" notifications@github.com wrote:
You shouldnt back up a session table as there is no value in it being
restored—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#19687 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABPdGzUgbfn7ZW1JuuU7fALCk3od2Dncks5tXXm8gaJpZM4SGMaG
.
Which is why I phrased my comment the way I did. When you're working with a site specific backup tool/protocol, it's easier to exclude the data than when you're doing server backups (to be fair, I really hope it's possible to exclude /var/lib/php/sessions
from that backup, but you're right that specific tables in specific databases can't be excluded arbitrarily).
Another thing to reiterate, in case it's not yet clear. We have the problems we do in our API today primarily because of the optional session metadata logging (which powers features like who's online or user status modules, but can also have a practical use when you need to reference session IDs in other ways). Prior to 3.8.4, the database table never ballooned even with PHP's native GC handler disabled because core had an arbitrary DELETE FROM #__session
query that ran so frequently that acted as a performance bottleneck on some sites. Yes, I changed the logic in when that query runs, and yes I see now that was a bit overzealous, and yes I still stand by that change because it exposed a lot of other problems.
With this PR, we break a reliance on PHP's native GC handling being enabled, so that stale data from ALL session stores can be purged (with loosely the same logic that PHP's configuration uses). So if you're using a non-standard file path that won't get cleaned by a default cron job installed in WHM/cPanel or Ubuntu's PHP packages, or using something like APCu or Memcached as your session store without other tools to ensure those data stores are cleared, that stale data will get purged.
With this PR, we break the reliance on HTTP requests to run these cleanup operations on an arbitrary timestamp based measurement. For advanced configurations, you can move the cleanup entirely to a cron based system and fire it as frequent (or infrequent) as you choose. For those who can't or don't want to do that, you in essence get the PHP behavior if the server had the right settings enabled.
With this PR, having #13322 in 4.0 gets a little easier. Because ultimately the root cause of this issue is the metadata. I want to make it possible to turn off the logging of that, because there are use cases where you don't want it or have no need for it (really, most of the joomla.org
sites I have admin for have no need for these constant CRUD actions to the session table). If that goal can be realized then we've made a lot of progress in the right direction IMO.
Overall, this is an improvement PR although there are few things I don't agree (or not happy with) as mentioned in my earlier comment #19687 (comment)
If someone has issue even with this PR applied, you can use this plugin https://joomla-extensions.kubik-rubik.de/downloads/esk-easy-session-killer/joomla-3 from @Kubik-Rubik . I checked it before, looks good and is something I expected to be default behavior of this PR.
As long as you choose to treat the optional metadata AND the real data as one entity, then Viktor's plugin is fine. You can't treat them as one and the same, especially as one piece of that puzzle is not always stored to the database (whereas the other is). You MUST split the data, and you MUST do it with a transitional period so it is clear why the system is designed the way it is and we aren't just making breaking changes because I feel like screwing over developers.
Otherwise, really, we can just go back to the 3.8.3 behavior and be done with it, and like I keep saying, just drop every non-database related session store. Because as I have said in frustration a lot of times it is quite clear nobody really cares that your server's session GC is turned off until you have a 20K row database.
One more reason why the metadata implementation is flawed by the way.
JSessionStorageDatabase::write()
does not have INSERT or UPDATE logic. Only UPDATE. Why? Because it relies on the application to insert the record, and the application fatally errors if it can't do that. So what is supposed to be an optional component is a fatally blocking operation, even when you aren't using the database.
This is why I am so adamant about treating the two types of data storage as what they are, two different things. The code in Joomla today not only mandates this metadata be written, but API elements are broken if it is not.
Revert to the 3.8.3
We are talking hundreds of MB and tens of thousands of records.
While the update was with good intentions it created a catastrophic side effect. Revert it back, and spend as much time as you need on a different solution. We all understand what your goal is, but what is happening right now should be considered site breaking, and is a lot more wide spread than the hand full of people pointing out the problem.
Why is my site taking so long to login? Why are the visitor and admins logged in numbers growing? Why is my hosting provider sending me emails now warning me of my sites database size? Is there a fix? Can't they change it back?
These are all questions people have been or will be asking.
I understand that real session data and session metadata are two different things and should be stored in separate tables as you mentioned. What I am trying to say here are:
Real session data clean up should be PHP internal / server job. So I expect Enable Session Garbage Collection set to No by default. For websites run on server which don't do this clean up, they can turn of this option manually.
Because of the reason #1, I want the code to clean up meta data runs for Database Handler as well. That is needed for Database Handler anyway when you store metadata in separate table in the future.
The default value Probability is 1. I am worry that it is low and we will still get people complain about wrong data of #__session table (Viktor's plugin has it default to 10 and I think that might be good value). The fact is that in old Joomla version, we have that in 20% and now, we only have that 1% by default, so much different.
Just my personal feeling, so if other users think that the default behavior is fine, we can go with it.
Real session data clean up should be PHP internal / server job. So I expect Enable Session Garbage Collection set to No by default. For websites run on server which don't do this clean up, they can turn of this option manually.
Because Joomla defaults to the session handler, and because it appears that PHP is more frequently configured to not run GC than it is configured to do so, the default should be on.
Because of the reason #1, I want the code to clean up meta data runs for Database Handler as well. That is needed for Database Handler anyway when you store metadata in separate table in the future.
You can't delete one without the other. The purge metadata operation through some misfortunate series of events could inadvertently purge active session data. That is why you cannot reliably have that process run when the database handler is in use. When the data is moved, the exclusion can be removed.
The default value Probability is 1. I am worry that it is low and we will still get people complain about wrong data of #__session table (Viktor's plugin has it default to 10 and I think that might be good value). The fact is that in old Joomla version, we have that in 20% and now, we only have that 1% by default, so much different.
This is in line with the default PHP configuration. It is still less frequent than what is happening now but frequent enough that stale data should not persist in the database for long (unless you can really prove it's possible to go X thousand requests over a measurable timeframe without it being triggered once, remember all of this is still reliant on HTTP requests so for a low traffic site a 1/100 probability does become problematic, but for a high traffic site the 20/100 probability becomes a performance issue and we are right back to where we started). The 10/100 probability is IMO too frequent.
You know what? I'm done fighting. @joomla/cms-maintainers someone else take over this PR, I'm sick and tired of being shot down every time I try to fix the real architecture and performance problems in Joomla so it stops getting labeled as an amateur hour hobby project; clearly I'm the only person who cares enough about that derogatory label to do something about it.
Reverting the code will NOT magically remove the records from your database!!
So instead of complaining please take the time to test this PR. Subject to it being a successful test of course then your problems will be solved. Just demanding a revert will get no where and wont help your site
@GCLW As @joomdonation said, you should test this PR. You will see that it solves the issue.
@Kubik-Rubik seems it is easier to demand others do stuff than to simply test it yourself :(
Don't have time to test right now. I have to work on solutions to right now keep those session database tables under control. Thank You.
Instead of working on your own solution, you do realize you could test this one to see if it fixes your problem, right? And either tell us it does, which helps fix the problem in core, or tell us it has problems so that the problems can be fixed.
I have tested this item
Tested on an off season site so traffic is very low. Here is the stat for 2 days with PHP session handler and default plugin settings: day1=9 times, day2=10 times. Sessions table is cleared of stale records.
I have tested this item
Labels |
Added:
?
|
This now has the language tweaks from JM, with some extra changes of my own (basically the term "garbage collection" is fully gone).
Do we need to update/cross-reference to this plugin in the Session Handler tooltip under Global Configuration?
IMO the answer's no. The settings there only relate to the configuration of connecting to the session store. There are no settings there which directly correspond to the optional metadata or cleanup operations.
I have tested this item
Deployed and tested it on our live site, it is now working as expected (had to setup cronjob before to clean up session data).
I have tested this item
Now it works also in my case, so I can say 'tested successfully'. However, I suppose many users will have to finetune the default settings of the plugin. On my website, approximately 99 percent of the sessions are due to crawler activity and if I'm not mistaken, each page request from a bot triggers a new session. With the default setting, I still encounter a growing waiting queue of session waiting for deletion (On average I currently see approximately 650 more sessions being added per hour than old sessions being deleted).
Would it make sense and be possible to limit the session time of bots?
Would it make sense and be possible to limit the session time of bots?
No. Trying to identify a bot visit and differentiate it from a human visit has too many variables to be done in a practical manner. Not to mention this would in essence be a static list in each release because identifiers (browser user agent strings and IP addresses primarily) are constantly changing.
When I added "I have tested this item
Current settings are:
php.ini settings are:
Local value Master Value
session.gc_divisor: 1000 1000
session.gc_maxlifetime: 9000 1440
session.gc_probability: 0 0
Joomla session handler: Database
Plugin System - Session Data Purge
Enable Session Data Cleanup: Yes
Enable Session Metadata Cleanup: Yes
Probability: 50
Divisor: 100
Any idea about possible reasons why the plugin still seems reject to work in my case?
Did I understand it correctly that it (normally) shouldn't require to configure a cron job?
Of course I could simply use one of the other options (Victor's plugin or add a cron job) to avoid an ever growing session table. But I would like to understand why it's not working in my case. My understanding is that switching the session handler to php would not work, because of session.gc_probability = 0 in Debian.
(I hope the other successful tests were also done with session.gc_probability = 0)
Without having access to your server to log what is actually happening when the cleanup code it is impossible to say. In all cases though the cleanup code for the session is basically delete all records where the timestamp is earlier than current Unix epoch timestamp minus the session lifetime in seconds (i.e. the time is 1519420314, the lifetime is 900 seconds, the query should delete all records whose timestamp is older than 1519419414).
The PHP runtime configuration settings are not taken into consideration in this plugin. It is driven entirely by the Joomla global configuration (for the lifetime) and the probability settings in the plugin params.
Did I understand it correctly that it (normally) shouldn't require to configure a cron job?
No you are required for your case, because you mention that you have ?:
Joomla session handler: Database
So you are required to set up a CRON job for cli/sessionGc.php
Please see description and at least the first 6-7 comments
You are NOT required to set up a cron job at all.
If you intend to use the plugin and have cleanup happen as part of web requests (default configuration):
If the session handler is the database, you MUST have the "Enable Session Data Cleanup" option set to yes (default configuration) otherwise the database will not be cleaned. This is the option that triggers the code path for JFactory::getSession()->gc();
. The "Enable Session Metadata Cleanup" option in this case makes no difference as explained all over the place.
If the session handler is not the database, you MUST have the "Enable Session Metadata Cleanup" option set to yes (default configuration) otherwise the database will not be cleaned. This is the option that triggers the code path for clearing the optional session metadata, with the database handler check in place to ensure that metadata cleanup does not corrupt "real" session data (in addition to all the opinionated discussion about treating session data and extra metadata as separate things).
I was taking about metadata cleanup and i was looking at this line
https://github.com/joomla/joomla-cms/pull/19687/files#diff-c62bf5ee641c0af7406f55c3f4e4f9d9R64
maybe i too sleepless, but it looks like it is not executed for database case regardless of setting
"Enable Session Metadata Cleanup"
because it has an logical AND , not an OR
The line evaluates to "if the database handler is not the database and the 'Enable Session Metadata Cleanup' option is enabled".
The two are different tasks.
Would somebody be so kind to answer two noobish questions: :
From cli/sessionMetadataGc.php:
+ * This is a CRON script to delete expired optional session metadata which should be called from the command-line, not the
+ * web. For example something like:
+ * /usr/bin/php /path/to/site/cli/sessionMetadataGc.php
...the optional session metadata
Hmm, after a while the double entry does not exist anymore.
Where does the plugin get the live path from?
I don't get the question.
What exactly are optional session meta data. And are there 'optional' session metadata in J3.8.5?
('optional' sounds to me like a configuration option)
If Joomla does not take care of the session we don't need that optional session meta data as this is done by the system that take care about the sessions :)
Where does the plugin get the live path from?
I guess you want to get the that mentioned in /path/to/site so that you can setup cronjob? If so, you can get this zip file, unzip it, upload the received root.php to root folder of your site (via FTP or cpanel), then access to this URL https://domain.com/root.php, the path to root folder of your site will be displayed (of course you need to replace https://domain.com with your site url)
What exactly are optional session meta data. And are there 'optional' session metadata in J3.8.5?
That session metadata is not optional at the moment (It might be optional in Joomla 4, @mbabker hmade a PR for it). When someone access to your site, Joomla will create a record in #__session table to store session id, time (the time that user accesses to your site), userid, username, client_id (site or administrator). That session metadata has been there for long time, not something only introduced in 3.8.5)
Hope it helps answering your questions
When I do sort the session table by the time column and afterwards look at the first and the last row, it reveals that the oldest sessions still don't get purged.
@mbabker @csthomas Maybe it is because currently, we store time using varchar data type, so the < operator in delete command doesn't work properly (maybe it is compare using string and that's why some session records are not deleted)
@Gitjk If you can export data of the #__session table on your site and attach it here, I think people could help figure out why it doesn't work on your site
What exactly are optional session meta data. And are there 'optional' session metadata in J3.8.5?
('optional' sounds to me like a configuration option)
Session data in Joomla is made up of two components:
The "optional" metadata is not required for a session to be valid (as far as what's in the database goes, you only need the session ID, the timestamp, and the data columns; the rest are all for this metadata stuff). But, there is a fundamental flaw in its implementation. Joomla fatally exits if this metadata record cannot be inserted, and if you are using the database session handler (default configuration) then it cannot work correctly without the metadata record being inserted (as in the handler only issues UPDATE queries to write updated data, it does not try to do an INSERT query if a row does not already exist).
Many sites just don't have a need for this metadata tracking, and aside from one core feature (the check if a record is checked out for editing by an authenticated user with an active session), it can be safely disabled with no repercussions, if the architecture for it is in the right state (which my PR for 4.0 fixes). As a result, this metadata tracking is in and of itself a performance hit because at a minimum it forces one SELECT query onto the session table per request, and up to three in total under the right circumstances (the SELECT query doesn't find a record for your session ID, an INSERT query to put it in the database, and a DELETE query for cleanup of expired data).
So, the plugin here has two different cleanup operations, and we have two different scripts for cron jobs supporting these cleanup operations.
"Enable Session Data Cleanup" plugin option and cli/sessionGc.php
- This essentially runs the same cleanup mechanism that PHP core does when you have session.gc_probability
and session.gc_divisor
are set to the right values. This option/script isn't always needed even when session.gc_probability
is set to 0; as an example on cPanel based servers or with the default scripts installed in PHP packages on an Ubuntu server, there are cron jobs to clear the default filesystem storage path (or if using another backend you may already have tools in place to do this cleanup).
"Enable Session Metadata Cleanup" plugin option and cli/sessionMetadataGc.php
- This is the trigger for clearing that optional metadata. As this part isn't part of sessions in the native PHP API, this cleanup operation does need to exist in some form.
So, for 3.x with this PR applied the architecture is moved as far as possible to be able to split up the handling of metadata and real data without any B/C breaks, with the benefit of being able to cut some of the performance impacting operations out of each request cycle. By the time we get around to 4.0, everything should be able to be split up and implemented in a way where the metadata can be turned off if so desired.
Perhaps the debug option would be useful to add a log when running the delete query to determine the number of rows removed by the plugin.
Perhaps the debug option would be useful to add a log when running the delete query to determine the number of rows removed by the plugin.
As this is already in RTC state i think we can include / discuss such a option in a later PR. As this does not impact the feature / functionality itself. So we can move that thing forward to be included in 3.8.6.
I have just fixed the merge conflicts commig from the recaptcha PR :)
For me it is OK, I just thought that this PR is still not merged because there is some issue.
Status | Ready to Commit | ⇒ | Fixed in Code Base |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2018-02-26 22:19:50 |
Closed_By | ⇒ | rdeutz |
In the "Session Data Purge" plugin the probability
input field is a variable (1, 2, 3, ...)
The probability
is quantified as a number between 0 and 1. (Wikipedia, etc).
So the term probability
is something confusing.
Alternative: Take a fixed value for the probability (= 1).
One variable value for the divisor should be sufficient.
See the following examples:
E.g. 1):
probability = 1
divisor = 100
Result 1/100 x 100% = 1%
E.g. 2):
probability = 2
divisor = 200
Result 2/200 x 100% = 1%
E.g. 3):
probability = 1
divisor = 2
Result 1/2 x 100% = 50%
E.g. 4):
probability = 4
divisor = 8
Result 4/8 x 100% = 50%
You get the same result.
The probability is quantified as a number between 0 and 1. (Wikipedia, etc).
In this case 'probability' actually means 'percentage probability' as defined in php.
See here: http://php.net/manual/en/session.configuration.php#ini.session.gc-probability
The probability is quantified as a number between 0 and 1. (Wikipedia, etc).
The probability
is always a number between 0 and 1, or expressed as a percentage between 0 and 100%.
Yes, mathematically there are several values you can configure the two fields with that have the same result (1/10 == 10/100, 1/25 == 4/100), that doesn't mean one of the configuration fields is redundant and can be hardcoded at a single value.
The probability is always a number between 0 and 1, or expressed as a percentage between 0 and 100%.
@mbabker
I do not understand what you mean exactly.
The next example makes clear what I mean.
This means that one field for the probability should be adequate.
<?php
echo 'probability between 0 and 100% <br />';
echo '0% is never a hit <br />';
echo '50% every two times a hit<br />';
echo '100% is always a hit <br />';
echo '<br />';
$probability = 50; // between 0% - 100%
echo 'probability = ' . $probability . '%';
echo '<br />';
$random = 100 * lcg_value();
echo 'random = ' . $random;
echo '<br />';
if ($probability > $random)
{
echo 'hit'; // equivalent with do $session->gc()
}
else
{
echo 'no hit';
}
?>
[EDIT] Yes, in php there is another definition. But this functionality is actually not used. Finally it does not benefit the user-friendliness.
@mbabker
Thank you for all your long hard work trying to fix this design flaw while putting up with all the resulting flak (I've read through a number of your RPs on this topic). A couple of the sites I maintain are pretty high traffic and by the time I discovered this issue a couple days ago (an editor was having trouble logging in to the admin side) the sessions table from that site was 9 gigs large with 10 million rows
I implemented the patch here and am running cleanup using the cron method. Everything looks good!
The one question I am still unclear on:
You had stated the following (which applies to my configuration):
Session GC:
"If I want Joomla to perform session garbage collection, And want to use cron jobs to manage that, And I have session.gc_probability set to 0, And I do not have any other existing jobs purging expired data, Then I should create a cron job for cli/sessionGc.php and disable the appropriate plugin settings"
Metadata GC:
If I want optional session metadata to be cleared by way of a cron job, Then I should create a cron job for cli/sessionMetadataGc.php and disable the appropriate plugin settings
Does "And I do not have any other existing jobs purging expired data" include sessionMetadataGc.php among "other existing jobs"?
ie, If I am running sessionGc.php do I ever need to consider also running sessionMetadataGc.php?
My current understand is, no, I don't need to run sessionMetadataGc.php with my current configuration.
If you're setting up cron jobs...
If you're using the database session handler, you only need one cron job for the sessionGc.php
file. If you're using any other session handler, you'd need a cron job for the sessionMetadataGc.php
file always, and the sessionGc.php
file only if nothing else is already doing its job (if using the PHP handler, odds are you don't need this cron job as cPanel and Ubuntu have default jobs cleaning up this part of the filesystem (but you should still check to verify); if using one of the cache engines as a session handler you may need it depending on the cache engine configuration).
If you're using the database session handler, you only need one cron job for the sessionGc.php file.
I am. That answers it. Thank you!
[EDIT] Yes, in php there is another definition. But this functionality is actually not used.
@mbabker I try to understand the code, so the following.
It is possible to add the following functionality in htaccess, to clean stored garbage data.
php_value session.gc_probability 1
php_value session.gc_divisor 100
That's why I went looking for this functionality in the Joomla! core.
I found only this functionality:
ini_get('session.gc_maxlifetime')
And that the session table is finaly emptied by the following function:
->delete($this->db->quoteName('#__session'))
Why then fill in the probabilty and the divisor, if the session.gc_probability
and the session.gc_divisor
are not used?
Or do I make a mistake?
It is possible to add the following functionality in htaccess, to clean stored garbage data.
Not all server setups will allow you to set a php_value in an htaccess
Thanks @brianteeman, that means that one input field is sufficient.
file: sessiongc.php (Joomla! core file)
$probability = $this->params->get('gc_probability', 1);
$divisor = $this->params->get('gc_divisor', 100);
$random = $divisor * lcg_value();
if ($probability > 0 && $random < $probability)
{
$session->gc();
}
My explanation:
$probability = $this->params->get('gc_probability', 1);
$random = 100 * lcg_value();
if ($probability > $random)
{
$session->gc();
}
It is fine the way it is and is consistent with the PHP runtime configuration.
Sorry. i know it's closed. Just to add. I have a not huge site (last 6 day 763 sessions and 2061 pageviews - google analytics) and i truncate the session table 6 days ago because it had 250k + entries and the backend had loads with about 30k ms in some cases. After truncate it and the issue disappeared. now after 6 days it was 220k+ entries again and started to be very slow again.
I hope it has the same reason. If not and i can do some help let me know. running 3.8.5 on php 7
Thanks
thanks for this - i will test first thing in the morning