Joomla creates a JSession - so an entry in #_session table - not just for logged in users, but also for guests (site anonymous visitors, bots included). For a site with session lifetime set to 30 minutes or more and a bit of traffic is easy to have #_session table populated with thousands of entries and get performance issues due to the high number of concurrent database writes per second.
I think this is a point to fix in 2015/2016, please make possible for site admins to prevent Joomla from creating a session for guest users.
I think I know WHY joomla is creating that sessions for anonymous visitors:
What i propose as draft/idea for a solution is the following points:
Title |
|
Title |
|
Title |
|
Scrub that - i forgot that there is still a database entry even then - just storing less data
The #__session
table is a multi-purpose database table (sadly) in that:
1) It is the data store for the database session handler (session_id
, time
, data
columns), this type of data is logged in some form by all session handlers (generally the session ID is used as some kind of identifier, either as the primary key in the database table or the filename for PHP's default filesystem sessions and the data is self explanatory; the time value is handled differently by the handlers based on their requirements)
2) It is a data store for application metadata (client_id
, guest
, userid
, username
columns); because of the hardcoded instructions in the application classes, every row but the data
column gets written to the session table to support the use of this metadata
For what it is worth, it is only through the storage of this additional metadata that Joomla users are able to differentiate between sessions for guests and authenticated users and which application those sessions are "assigned" to. Sessions in PHP do not differentiate between guests and authenticated users, they are treated as one in the same and since PHP itself is a stateless code base (the only data that persists between requests is basically assigned as a cookie to the user's browser or recorded in the session data store, or a developer is using something of their own accord like writing data to the filesystem), it boils down to the integrity of the session data store and the application's security measures to keep a user logged in.
Even with changes I've experimented with that defers explicitly starting the session as part of the application's constructor, Joomla's internal dependencies will cause a session to start near the application's doExecute
methods. Specifically, the linked lines in JApplicationSite::initialiseApp(), JApplicationAdministrator::doExecute(), and InstallationApplicationWeb::initialiseApp() are where the session is being started by Joomla with my deferred session startup experiment.
A handful of places in the code where end user features are made available through this additional metadata or where this metadata is manipulated outside the handling of a "normal" session:
JTable::isCheckedOut()
uses the session metadata to determine if a record that has a value for its checked_out
column is checked out to a user with an active sessiononUserAfterDelete
, onUserLogin
, onUserLogout
) that manipulate the session table's recordsLong and short, Joomla should not know nor care if the active session is assigned to a guest or an authenticated user; that distinction is made in the application in other ways surrounding how the user object is persisted in the session (this is why if you're checking if a user is authenticated or a guest you're checking the JUser
object, not an API endpoint in JSession
). So any API or configuration changes to make distinctions of the two types of sessions is needless complexity IMO and truthfully something I'm not sure could be handled very effectively given PHP's stateless nature and how its internals handle sessions.
What I wanted to say is simply: are you sure that we really need to start sessions for any guest user? In my classic PHP apps, a 'guest user' is a user with 'no session', yes, this is even more 'stateless'.
The splitting I proposed is just an example of a refactoring you can surely do, to permit some site admins to disable the 'guest user sessions' if they don't need them (e.g. I don't need them at all, and I disabled them with the 'hack' of caching, I've no drawbacks so i think them are not so useful, not for sites that just show news to the public, for instance)
Anyway I recognize, it's not an urgent issue, but please don't underestimate the cost of a db write done for any new visitor, it can be a bottleneck when you have lots of new visitors at once (e.g. an article going viral). Is wordpress doing something similar with sessions? I think not.
The way Joomla's coded, yes, a session will be started for every visitor to the site and I pointed where in each application's sequencing where the first call to something that would trigger a lazy loaded session happens. It's long before Joomla's first plugin event happens, so even with Joomla's caching layer enabled, the session API would get booted up. From a technical perspective, you couldn't distinguish between an authenticated user with a pre-existing session and a guest user first visiting the site until after the session is started in the app and the serialized contents extracted. At best, you might be able to add a configuration switch that asks to destroy sessions for unauthenticated users at the end of the application cycle so that sessions aren't persisted, but even with that toggle as long as the non-required metadata is logged in the #__session
database table it would truthfully only make matters worse (yet another database operation for data that won't even persist).
It is entirely possible to decouple session handling from the database. Simply get rid of all those features I mentioned previously which log extra metadata in the database. That is what truly forces the DB interaction on every request, even moreso than the default use of the database session handler.
@brianteeman yes, the #session table is INSERTED with new rows even when 'files' is used as session handler, i was talking also about this,i don't like this behavior, cause it's always a write to disk/mysql connection, even if with less data, it has a cost (consider also the RDBMS response time, that is _blocking).
So I ask: are you sure Joomla needs this?
why?
I have the suspect that the origin of this choice are of the times of 'Mambo' and the "who's online module".
Are you sure that this is really needed by Joomla 3? Are you sure that JSession doesn't need a strong review nowadays?
There's a lot of dead code in it, for instance check the 'fix_browser' and 'fix_adres' code, it's not nice to see that commented code since ages ... (anyway i'm not telling to remove it: I use it, as i use it as security check for the session ip to prevent hijacking, you can make a configuration switch also to enable this)
You're doing a great job developing the new features of the CMS and refactoring a lot of classes, but I can't explain myself why to leave this foundament 'untouchable' , this basic and simple part, so important for security and performance: the session.
I mean, please don't say: "It was developed this way, it works, and we don't want to touch it anymore" because I'm not sure it was developed so fine as i suspect is pretty old.
Maybe it's ok, but please be a bit 'paranoid' on this and start some deep study on it... can you optimize it? Is it perfect yet or we can do a bit better?
It is entirely possible to decouple session handling from the database. Simply get rid of all those features I mentioned previously which log extra metadata in the database. That is what truly forces the DB interaction on every request
Yes I'm asking also for this: decoupling the session handling from the database
even moreso than the default use of the database session handler.
this not necesarly if you limit sessions to 'real' logged users
At best, you might be able to add a configuration switch that asks to destroy sessions for unauthenticated users at the end of the application cycle so that sessions aren't persisted, but even with that toggle as long as the non-required metadata is logged in the #__session database table it would truthfully only make matters worse
yes this is worse
from a technical perspective, you couldn't distinguish between an authenticated user with a pre-existing session and a guest user first visiting the site until after the session is started in the app and the serialized contents extracted
true (but I've to test if really a disk read is really needed at session_status()/start(), PHP should cache active session ids in a shared memory object if it was developed fine), but anyway, one thing is the 'cost' of a read, and another is the cost of a read+write, that is what joomla is doing now for any visitor, it's like this (i'm pseudocoding it in old school php):
if (session_status() !== PHP_SESSION_ACTIVE) { // read (from disk?)
session_start(); //generate session_id and send back to client cookies
$_SESSION["guest"]=1; //disk write (and mysql connection used!)
//...other metadata writes...
}
//i know that the real write will be only at session_write_close(), in the JSession->close() method now, but anyway it will be a write for any incoming request
while I would do this way
if (session_status() !== PHP_SESSION_ACTIVE) { // read (from disk?)
//don't generate any session_id
GlobalAccessibleClass::setIsGuestUser(true); //in memory write
}
a session will be started for every visitor to the site and I pointed where in each application's sequencing where the first call to something that would trigger a lazy loaded session happens. It's long before Joomla's first plugin event happens, so even with Joomla's caching layer enabled,
yes, in fact i dropped the use of the standard Joomla 'page-caching' layer (the system>cache plugin) because it was very low performant due to those writes always invoked anyway: always a 300ms for the first-byte (this why I hate this behavior guys, don't blame me, i spent hours to debug it) for this reason i overcomed the issue using varnish, apache traffic server, or nginx_fastcgi_cache in front of joomla, this means that in my configuration, for a guest user, when chache is hot, not a single line of PHP joomla code is invoked and all works fine (I added an 'x-logged-in' cookie set in an onAfterLogin for logged in users so the reverse proxy can distinguish logged in users and forward the request to Joomla if the cookie is set) but i will not need this complex configuration if Joomla wasn't setting the session 'before' the app initialization for any user
Yes, you can use the same technique too: check for the session cookie presence to determine if the user is not a guest, also in php, in this case surely you'll not have to do any disk read to check if the user is a guest:
if (!hasCookie(session_name())) { //in memory read (a read from request headers that server got in memory)
GlobalAccessibleClass::setIsGuestUser(true); //in memory write
}
this not necesarly if you limit sessions to 'real' logged users
You can't though without opening up security holes. session_status()
is only PHP 5.4+ (so unusable while Joomla supports PHP 5.3) and from what I've gathered from its documentation, it will only tell you if the session has been started, not whether one was persisted from a previous request. So even with that check the session would have to be fully started still to determine if the user is authenticated. Personally, I would not depend on request headers or cookies as a means to determine a user's status, unless those values can only be generated by a system the end user cannot manipulate (so if you're running a proxy server between the end user and your Joomla install).
There is no guest status persisted to $_SESSION
in Joomla. JUser::$guest
is where the guest property is tracked as the JUser
object is persisted as part of the session. All of the metadata I pointed out previously is always stored in the database and aside from the session ID itself, none of it is part of the true session handling for the application.
All this metadata is frankly optional and somewhat annoying add on data for a majority of users. I have found only one condition in Joomla that gets even remotely broken by just dumping it all completely. Of the list I posted earlier, JTable::isCheckedOut()
would need updates to deal with its condition to query the session table to determine if a record with a value for its checked_out
column matches a user with an active session.
I don't disagree there needs to be work on the infrastructure. But frankly, short term I don't see a way to refactor Joomla to a point where it would be possible to get even to the first plugin event trigger (without adding a new one in the application constructors before JSession
is instantiated) without the session being booted.
You can't though without opening up security holes. session_status() is only PHP 5.4+ (so unusable while Joomla supports PHP 5.3) and from what I've gathered from its documentation
Ok, but maybe for Joomla 4?
it will only tell you if the session has been started, not whether one was persisted from a previous request.
and what about this:
if (session_id()=="")
So even with that check the session would have to be fully started still to determine if the user is authenticated.
Yes you will still have to start a session for non-guest users, but you can do it in onAfterLogin event, right before where you currently set the JUser data and metadata.
Personally, I would not depend on request headers or cookies as a means to determine a user's status, unless those values can only be generated by a system the end user cannot manipulate
But this is what php does internally, yes I was shocked when i realized it some years ago, but client cookies are the only 'keys' for the server side sessions (for this reason i use HTTPS on all the sites, to avoid session cookie sniffing).
Anyway I mean: if there's no session cookie, you can trust that the user is surely a guest (also in the current implementation, if you delete the session cookie you become a guest), but of course not that if there's a cookie we can just trust it and consider it as a logged in user, in that case the system should check for the JUser object presence and its values in THAT session, all like is doing now , and this will be always a thing that end user cannot manipulate (except if they guess or obtain the session id, for this reason another improvement i suggest is to use multiple cookies as session ids, not only one, like bigs like Facebook and Google do, just in order to increase the session id 'key length' to make harder the brute forcing)
There is no guest status persisted to $_SESSION in Joomla. JUser::$guest is where the guest property is tracked as the JUser object is persisted as part of the session.
Yes, my pseudocode was an optimistic example, there's more data written i know
All of the metadata I pointed out previously is always stored in the database and aside from the session ID itself, none of it is part of the true session handling for the application.
Yes, and this is not fine don't you agree? It's redundant!
All this metadata is frankly optional and somewhat annoying add on data for a majority of users.
Yes, you agree :)
have found only one condition in Joomla that gets even remotely broken by just dumping it all completely. Of the list I posted earlier, JTable::isCheckedOut() would need updates to deal with its condition to query the session table to determine if a record with a value for its checked_out column matches a user with an active session.
Yes this is useful, for reasons like this I was asking keep this behaivor on, but just for 'real' users, not guests (You'll never have a guest locking an item, so you can simply check if the row is expired)
Anyway you can also get rid of this, and implement an 'expire' for those locks, and an ajax poller that if the user is active, and editing an item refreshes the checked out status extending his lifetime
I don't disagree there needs to be work on the infrastructure. But frankly, short term I don't see a way to refactor Joomla to a point where it would be possible to get even to the first plugin event trigger (without adding a new one in the application constructors before JSession is instantiated) without the session being booted.
Ok I understand that is not a priority and I agree, sorry, i don't want to criticize anything, you're doing a great job: thank you all for all! Joomla is always the best CMS around IMHO, I just wanted to point out this to ensure that you'll not completely forget of this part of its codebase, and now I know you will not.
So, nevermind, maybe I'm just a bit obsessive on this, please forgive me.
Thank you for your time and all the detailed replies
Best Regards
staging...mbabker:refactor-session-metadata-coupling is basically everything needed to split the logic for the extra metadata from the "real" session requirements without dealing with every feature using that metadata.
session_id()
and other session functions only return data about the current state of the session, not whether there is a persisted session. Even the $_SESSION
superglobal is only available after the session is started.
Yes you have to start session just for non-guest users, but you can do it in onAfterLogin event, right before where you currently set the JUser data and metadata.
That's far too late, and that event only runs when a user actually tries to login. The application itself checks for data from the session literally two or three executable lines of code into when $app->execute()
is called. Without refactoring a lot of the application startup sequence there is nothing that can be done to keep JSession::start()
from being called before the onAfterInitialise
plugin event. The very first line in the site and admin apps initialiseApp
methods is a call to JFactory::getUser()
, which attempts to load a user object from the session. A mix of user params, browser data, and site configuration is used to boot up the global JLanguage
object.
Anyway i mean: if there's no session cookie user you can trust that is surely a guest (also in the current implementation, if you delete the session cookie you become a guest), but of course not that if there's a cookie we can just trust it and consider it as a logged in user, in that case the system should check for the JUser object presence and its values in THAT session, all like is doing now , and this will always be a thing that end user cannot manipulate
While this check may be feasible at the beginning of the application's cycle, you still have to have a way to instruct the API "I want JSession::start()
and its internals to run in full even though the config says don't start sessions for guests because my extension's features require session persistence". Considering the number of calls to get data out of the session, this might end up being more taxing than it's worth IMO. It's something that might be worth playing with long term, but on top of adding new config values and checks one would need to make sure enforcing them doesn't hurt the app more than help.
Since not creating a JSession will be very difficult, i believe doing 1 and 2 suggestions should be examined
Some sites (that have enough RAM) with many active visitors (guests, even bots) have converted 'data' column of session table, to varchar(10000) or varchar(20000) and then converted session table to MEMORY storage (and configured a small session time)
so besides examining a memcached/redis solution for guests, we could examine using MEMORY storage engine for a new session table 'session_guests' with a varchar of 20000 for guests (fallback to INNODB is MEMORY is not allowed or if server is not MySQL) . Just to avoid memory problems with SQL server and big session times (30+ or 60+ minutes), a different session time for guests will be needed too
when user logins the session data will be stored to the current 'session' table instead of table 'session_guests'
*sigh*
THERE IS NO DISTINCTION AT THE PHP LEVEL BETWEEN A "GUEST" SESSION AND AN AUTHENTICATED SESSION. ADDING ANY CODE TO JOOMLA TO TRY AND MAKE THAT DISTINCTION ONLY FURTHER COUPLES THE APPLICATION TO NEEDING THAT EXTRA METADATA STORED IN THE SESSION TABLE.
my comment was just a thought for discussion ))
about php not having a distintion for guest / authenticated,
yes, i forgot about it in my above comments
(maybe) we can solve it
Also if memory storage is not available the 'session_guests' maybe not create the table at all, joomla configuration should indicate if table exists or not
e.g. J3.4.7+ is using base64 / serializing on session data, and nothing broke because of it,
the above are just thoughts, i do not fully know the implications of the changes
e.g. extensions that try to store "large" data for *guests users into session ... will break because of the reduced available space of varchar(20000)
What do we want to do? Keep Joomla dependent on the need to write extraneous and borderline useless metadata somewhere (and have the ability to read it) to give admins some false sense of knowing which users are logged into their site and where and potentially having the ability to forcefully log them out (feature only works if using database session store) or find a way to drop it so the only true dependency on sessions in Joomla is the actual session data? If it's the former, then have fun adding in all these extra options and tables and processes that can't be removed. If it's the latter, I'm interested because frankly I think it's stupid for the application to track this stuff.
Status | New | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2016-01-05 11:12:30 |
Closed_By | ⇒ | Bakual |
Labels |
Added:
?
|
You dont have to use the database for sessions. The option exists to use native php as well
This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/8772.