No Code Attached Yet
avatar nikosdion
nikosdion
8 Oct 2022

Steps to reproduce the issue

  • Install Joomla 4.2 on PHP 8.2 (PHP 8.2 is still in development)

Expected result

No deprecated notices :trollface:

Actual result

(Among other deprecated notices)

Deprecated: Function utf8_decode() is deprecated in .../boot4/libraries/vendor/joomla/uri/src/UriHelper.php on line 36

System information (as much as possible)

PHP 8.2 is the only relevant thing

Additional comments

utf8_encode and utf8_decode are deprecated in PHP 8.2.

You can replace the calls to the deprecated function with calls to the following method:

public static function utf8_decode($s)
	{
		if (version_compare(PHP_VERSION, '8.1.999', 'le'))
		{
			return utf8_decode($s);
		}

		if (function_exists('mb_convert_encoding'))
		{
			return mb_convert_encoding($s, 'ISO-8859-1', 'UTF-8');
		}

		if (class_exists('UConverter'))
		{
			return UConverter::transcode($s, 'ISO-8859-1', 'UTF8');
		}

		if (function_exists('iconv'))
		{
			return iconv('UTF-8', 'ISO-8859-1', $s);
		}

		/**
		 * Fallback to the pure PHP implementation from Symfony Polyfill for PHP 7.2
		 *
		 * @see https://github.com/symfony/polyfill-php72/blob/v1.26.0/Php72.php
		 */
		$s = (string) $s;
		$len = \strlen($s);

		for ($i = 0, $j = 0; $i < $len; ++$i, ++$j) {
			switch ($s[$i] & "\xF0") {
				case "\xC0":
				case "\xD0":
					$c = (\ord($s[$i] & "\x1F") << 6) | \ord($s[++$i] & "\x3F");
					$s[$j] = $c < 256 ? \chr($c) : '?';
					break;

				case "\xF0":
					++$i;
				// no break

				case "\xE0":
					$s[$j] = '?';
					$i += 2;
					break;

				default:
					$s[$j] = $s[$i];
			}
		}

		return substr($s, 0, $j);
	}

Tagging @nibra @Hackwar as this is a Joomla Framework issue, @HLeithner @laoneo because this touches architecture and may be happening in other places in the code I have not come across yet.

Votes

# of Users Experiencing Issue
1/1
Average Importance Score
3.00

avatar nikosdion nikosdion - open - 8 Oct 2022
avatar joomla-cms-bot joomla-cms-bot - change - 8 Oct 2022
Labels Added: No Code Attached Yet
avatar joomla-cms-bot joomla-cms-bot - labeled - 8 Oct 2022
avatar brianteeman
brianteeman - comment - 8 Oct 2022

reported and ignored in July #38258

avatar nikosdion
nikosdion - comment - 8 Oct 2022

@brianteeman I think there is a subtle difference between the two reports. Yours is reporting a deprecation which does not become a major issue for another 2 years with no alternative proposed.

I have a working alternative as I have been closely tracking the PHP 8.2 RFCs and had to address this deprecation in my fork of Joomla's Uri class used in my standalone software.

I would like to believe that having a viable alternative would make it easier to address it. Let's not be defeatists.

avatar HLeithner
HLeithner - comment - 9 Oct 2022

Do we really assume ISO-8859-1 as fallback? wouldn't ISO-8859-15 fit better? I know that's what utf8_decode does but it also fails to decode some characters because browsers uses windows-1252 as ISO-8859-1 (which of course is wrong). At least that's the warning in the php documentation. Maybe a bad idea.

And do we really need (at first position)

		if (version_compare(PHP_VERSION, '8.1.999', 'le'))
		{
			return utf8_decode($s);
		}

I mean we always should have a mb_convert_encoding and if not fall back to utf8_decode or the own implementation?

Since I'm pretty sure this function doesn't go away (even if php devs wants this) I would move it to https://github.com/joomla-framework/string/blob/2.0-dev/src/StringHelper.php including a second parameter for the target encoding.

avatar nikosdion
nikosdion - comment - 9 Oct 2022

@HLeithner

Since I'm pretty sure this function doesn't go away (even if php devs wants this)

You are wrong. Please do read the PHP RFC I linked to (https://wiki.php.net/rfc/remove_utf8_decode_and_utf8_encode). It has been voted on in April 2022 after one year of discussion and accepted with 33 votes in favor and only 2 against. PHP RFCs are binding.

Key points from the RFC:

  • utf8_decode and utf8_encode are definitely going to be removed in PHP 9.0
  • The reason for its removal is that it's always using ISO-8859-1 which has been the source of many misunderstandings.
  • It tells you that you are supposed to replace this “with mb_convert_encoding as the primary replacement, with UConverter::transcode and iconv also listed as possibilities”. The RFC provides the alternative code I am using.

Regarding whether we need to use ISO-8859-1 or ISO-8859-15, please let's read the code from URIHelper. As you can see, utf8_decode is used as a “cheat”:

if (utf8_decode($url) === $url)
{
	return parse_url($url, $component);
}

Basically, what we do here is this. If the entirety of the URL is in ISO-8859-1 (therefore: it's just plain ASCII) we will use parse_url directly. Otherwise we will URL-encode the entire URL except some reserved characters, then put it through parse_url.

Using a different character encoding in this use case would be wrong because of the way parse_url works. For example, having a Euro sign (€) in the URL is valid in ISO-8859-15 but invalid in ISO-8859-1. It has to be encoded for parse_url to work correctly — it assumes an ISO-8859-1 encoding. So, if you used ISO-8859-15 because it's “newer and better” it would actually break the URI parser in Joomla.

That said, Joomla actually requires the mbstring PHP extension. Therefore we can simply replace utf8_decode with mb_convert_encoding.

I would also say that Joomla MUST enforce its minimum requirements in index.php, like it does with the minimum supported PHP version. This would include the mbstring requirement.

avatar HLeithner
HLeithner - comment - 9 Oct 2022

Sorry I didn't clearly explained my self I mean the usage of this/such function doesn't go away. I know that the utf8_* functions get removed.

Ok but this decode function should not be in the URI class, I think that's the only different left between your proposal and my answer.

avatar nikosdion
nikosdion - comment - 9 Oct 2022

@HLeithner Thank you for the clarification. Yes, I agree that a replacement method should NOT be in the URIHelper class.

However, I have to note again that Joomla does require mbstring. Since we have that hard requirement does it not make far more sense to use mbstring directly instead of having a helper method?

(The reason I proposed a helper method is that my standalone software can also run on servers with mbstring unavailable, at least far enough so that it shows a nicely formatted error message. This requires me to have access to my fork of the URI class which is why I have this complex replacement method to utf8_decode).

avatar HLeithner
HLeithner - comment - 9 Oct 2022

true maybe core should do best practice and do the right thing by replacing utf8_decode/encode with mb_convert_encoding.

The only question is does we really require mb_string? I checked https://downloads.joomla.org/technical-requirements which has some garbage requirements (mod_mysql for apache?!) but no requirement for mb string or any other php module/extension. Or do I miss something?

avatar ManuelHu
ManuelHu - comment - 6 Dec 2022

I also do not think that Joomla currently is explicitely requiring mbstring, at least I did not find anything pointing in this direction. Both in

  • administrator/components/com_joomlaupdate/src/Model/UpdateModel.php
  • installation/src/Model/ChecksModel.php

there are some checks about mbstring ini options, but they will just be silently dropped if mbstring is not loaded.
Also the requirements page does not list it (and apart from that, the page is partly wrong, as already pointed out...)

On the other hand, com_finder already uses some mb_* functions without fallback. Also, the Application class in the framework uses it.

Although I never encountered a recent web hosting package without mbstring installed, there are probably some (sadly, mbstring is not part of the PHP extension set that is required to be shipped with PHP...)

So to not break b/c even more, there is only the option of this overcomplicated polyfill function for Joomla 4?
And maybe for Joomla 5 requiring a mbstring extension in addition to the already-decided recent PHP version might be reasonable?

avatar nikosdion
nikosdion - comment - 6 Dec 2022

Joomla DOES require mb_string. Smart Search, Joomla Update, the php-encryption library used in the core, the Crypt and Encrypt core packages (used among other things for Joomla Update, and MFA), the Console package (used for the Joomla CLI), the Joomla String package (used EVERYWHERE in Joomla), the ParagonIE Sodium support (used for logging in!), the WebAuthn library dependencies (used for WebAuthn login and WebAuthn in MFA), as well as several libraries pulled in through composer as dependencies for these and other features require mb_string.

Further to that, the developers of PHP itself are recommending to move away from the old functions such as utf8_decode / utf8_encode and use mb_string instead.

Any server environment which does not have mb_string enabled is run by idiots who last learned about how to configure a PHP server more than 20 years ago when the web was still using 8-bit, per-language encodings. This has not been the case since circa 2003, two years before Joomla 1.0 was released. If someone runs into a server environment which does not have mb_string enabled they should run away as fast as they can, for their hosting company does not understand how PHP works — not just “modern” PHP the last decade, but even PHP 5 from 16 years ago.

avatar laoneo
laoneo - comment - 9 Jan 2023

@HLeithner and @nikosdion I appreciate your feedback on #39583.

avatar HLeithner
HLeithner - comment - 9 Jan 2023

I checked the source and only found 2 usages in core (not framework) of utf8_decode and 3 of utf8_encode.
We have several polyfills in the cms by 3rd party libs for the 2 functions.
mb_convert_encoding is only used in smart search and everything other usages seems to have a fallback

Our Joomla\String package has a polyfill for utf8 functions...

My suggestion replace the CMS use cases with Joomla\StringHelper in j4 and make mb_string in j5 mandatory and replace the 5 codeplaces with mb_convert_encoding.

For the framework I would add the polyfill provided by nic in url helper directly and also require mb_string in the next major version.

ymmv

avatar nikosdion
nikosdion - comment - 9 Jan 2023

@HLeithner I agree with this:

My suggestion replace the CMS use cases with Joomla\StringHelper in j4 and make mb_string in j5 mandatory and replace the 5 codeplaces with mb_convert_encoding.

In my experience, the only servers without mb_string support are those where the PHP extension was accidentally not enabled after a PHP upgrade. Source: my software has required mb_string since 2007.

avatar HLeithner
HLeithner - comment - 9 Jan 2023

It's about the b/c promise and it's easy to keep this promise here. Introducing a new dependency (even if everyone should have it, I know at least one hoster (which doesn't host joomla of course) that doesn't have mbstring) is a hard break so only allowed in j5 from my point of view.

avatar nikosdion
nikosdion - comment - 9 Jan 2023

@HLeithner To be fair, mb_string was required from Joomla 1.5 to 3.10. It wasn't listed for Joomla 4 by mistake. But, sure, I don't disagree with this going into Joomla 5.0.

avatar wilsonge
wilsonge - comment - 10 Jan 2023

@HLeithner To be fair, mb_string was required from Joomla 1.5 to 3.10. It wasn't listed for Joomla 4 by mistake. But, sure, I don't disagree with this going into Joomla 5.0.

I'm not sure that's true - we've been conditionally checking for that extension at least in the installer since 1.6 minimum (sorry too lazy to go back to 1.5 :D)

if (extension_loaded('mbstring'))
{
// Check for default MB language.
$option = new stdClass;
$option->label = JText::_('MB language is default');
$option->state = (strtolower(ini_get('mbstring.language')) == 'neutral');
$option->notice = ($option->state) ? null : JText::_('NOTICEMBLANGNOTDEFAULT');
$options[] = $option;
// Check for MB function overload.
$option = new stdClass;
$option->label = JText::_('MB string overload off');
$option->state = (ini_get('mbstring.func_overload') == 0);
$option->notice = ($option->state) ? null : JText::_('NOTICEMBSTRINGOVERLOAD');
$options[] = $option;
}
and certainly if we did require it - we weren't consistent in that judging by that code - which has carried through all the way to 3.x and 4.x

I will add that I'm pretty sure the entire point of us using the phputf8 library in the string package is to polyfill around that. Also in J4 we (admittedly not as an explicit dependency) have the symfony mbstring polyfill included - hence we could start using things in the console package and in the framework - which was an explicit thing we checked back when we did things in finder for j4 #22042 (comment)

Googling around for wordpress/joomla and mbstring - seems from support there's still a good number of recent questions of people who don't have it on their shared hosting - so to me making the symfony mbstring an explicit dependency (given we already have it shipped anyhow) and being done with it seems reasonable. Don't see the need to make things required at the expense of all the tickets for a relatively cheap library to include.

avatar laoneo
laoneo - comment - 11 Jan 2023

We have been already loaded the mbstring polyfill, but made it now explicit in #39583. As you guys can see in the commit 8288372, the lock file has not really changed.

avatar nikosdion nikosdion - change - 22 Jan 2023
Status New Closed
Closed_Date 0000-00-00 00:00:00 2023-01-22 14:43:28
Closed_By nikosdion
avatar nikosdion nikosdion - close - 22 Jan 2023

Add a Comment

Login with GitHub to post a comment