?
avatar rogercreagh
rogercreagh
12 Nov 2018

Steps to reproduce the issue

This bug was reported in j2.5 https://forum.joomla.org/viewtopic.php?t=679547 and still hasn't been fixed.
If the truncate point is between a tag pair, then string.truncate closes the hanging tag but puts the ellipsis after the tag closure.

<?php $text1='test without tags';
      $text2='<h4>test with tags</h4>';
      echo 'text1: ';
      echo JHtml::_('string.truncate', $text1, 15);
      echo '<p>--</p>';
      echo 'text2: ';
      echo JHtml::_('string.truncate', $text2, 15);    
?>

Expected result

ellipsis should be within the tag that is closed

<div>
<p>Truncate test</p>
text1: test without...<p>--</p>text2: <h4>test with...</h4>
</div>

Actual result

text1: test without...<p>--</p>text2: <h4>test with</h4>...

The ellipsis is after the tag is closed which screws up the display if it is an h or p tag
The result is the same with truncateComplex (the difference is that truncateComplex doesn't count the chars inside the <> of a tag when working out where to cut the string).

System information (as much as possible)

j3.9
macosx 10.13.6 MAMP server
php 7.2.1
Chrome browser (current)

Additional comments

there might be some cases where it makes sense to have the ellipsis outside the tag that is closed by truncate. eg if the truncate comes in the first part of an <a tag or inside an <img tag.
Currently this

      $text3='<p>test <a href="#">with</a> anchor</p>';
      echo 'text3: ';
      echo JHtml::_('string.truncate', $text3, 15);

gives this result

text3: <p>test <a< p="">...</a<></p>

which is clearly garbage.
For this particular case the truncateComplex method does correctly close the <a tag but it still puts the ellipsis after the final </p> which it also adds.

avatar rogercreagh rogercreagh - open - 12 Nov 2018
avatar joomla-cms-bot joomla-cms-bot - labeled - 12 Nov 2018
avatar rogercreagh
rogercreagh - comment - 12 Nov 2018

sorry forgot to add tags for this, and editing only lets me add this comment


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/23058.

avatar PhilETaylor
PhilETaylor - comment - 13 Nov 2018

This is the kind of thing that Unit Tests would be great for ;-)

avatar PhilETaylor
PhilETaylor - comment - 13 Nov 2018

There appears to be some (basic) unit tests for this

'Do not split within a tag' => array(

avatar rogercreagh
rogercreagh - comment - 13 Nov 2018

Having done some more detailed testing and found the relevant code I conclude thus:
There are four problems

  1. If $noSplit is true (whether or not $allowHtml is true) then
    a. if the break point is just after the space after a word then the last word is cut - it should be left intact
    b. if the split point is just after the last character of a word then the last word is cut - it should be left intact.�
    (In both these cases there is a question as to how punctuation should be handled, but this could be debated and dealt with in truncateComplex)

  2. If $allowHtml is true and the string starts with a tag (eg <p>...) then
    a. if the break occurs in the middle of a tag then the html output is fragmented
    b. the ellipsis is placed outside the closing tag (eg </p> which is correctly added on the end
    (NB both of these cases only occur when the start of the string is a tag )

1a. can be addressed by :
a. not trimming the left end of the truncated string at line 64. Use ltrim
1b. can be addressed by:
a. adding a test to see if the next char after the break point is a space and if it is including it
(NB the final string is left trimmed anyway at line 137 when it is copied back to return)

2a. can be addressed by:
a. move the code block to cut off any tag fragments from line 127-132 up to line 106 before we close any complete tags that are open
2b can be addressed thus:
a. add the ellipsis after any tag fragments are stripped and before tags are closed (just after the moved block above)
b. remove the addition of ellipsis at line 137 and add an else case to the if ($allowHtml) block lines 91-133 to add an ellipsis in the case that $allowHtml is false.

The good news is that so long as your string doesn't start with a tag it works correctly.
These errors should be picked up by the unit tests PhilETaylor mentioned - I guess I'd better learn how to run them before submitting a pull request. What I have works fine with my own testing.
I think truncateComplex has similar problems.
RCO

avatar rogercreagh
rogercreagh - comment - 28 Nov 2018

Here is the code for a version of truncate that addresses the above issues.

    public static function truncate($text, $length = 0, $noSplit = true, $allowHtml = true)
    {
        // Assume a lone open tag is invalid HTML.
        if ($length === 1 && $text[0] === '<')
        {
        	return '...';
        }

        // Check if HTML tags are allowed.
        if (!$allowHtml)
        {
            // Deal with spacing issues in the input.
            $text = str_replace('>', '> ', $text);
            $text = str_replace(array('&nbsp;', '&#160;'), ' ', $text);
            $text = StringHelper::trim(preg_replace('#\s+#mui', ' ', $text));

            // Strip the tags from the input and decode entities.
            $text = strip_tags($text);
            $text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');

            // Remove remaining extra spaces.
            $text = str_replace('&nbsp;', ' ', $text);
            $text = StringHelper::trim(preg_replace('#\s+#mui', ' ', $text));
        }

        // Whether or not allowing HTML, truncate the item text if it is too long.
        if ($length > 0 && StringHelper::strlen($text) > $length)
        {
           //test if the next character is a space - if it is include it so we don't loose the word
           if ($text[$length] == ' ')
           {
		        ++$length;
           }
           //trim leading spaces, leave trailing ones so as not to loose the last word
           $tmp = ltrim(StringHelper::substr($text, 0, $length));
		    
           //test if all we have is an incomplete tag
           if ($tmp[0] === '<' && strpos($tmp, '>') === false)
           {
                return '...';
           }

           // $noSplit true means that we do not allow splitting of words.
           if ($noSplit)
           {
                // Find the position of the last space within the allowed length.
                $offset = StringHelper::strrpos($tmp, ' ');
                // If there are no spaces and the string is longer than the maximum
                // we need to just use the ellipsis. In that case we are done.
                if ($offset === false && strlen($text) > $length)
                {
                    return '...';
                }
                $tmp = StringHelper::substr($tmp, 0, $offset + 1);
           }

           if ($allowHtml)
           {
                // Put all opened tags into an array
                preg_match_all("#<([a-z][a-z0-9]*)\b.*?(?!/)>#i", $tmp, $result);
                $openedTags = $result[1];

                // Some tags self close so they do not need a separate close tag.
                $openedTags = array_diff($openedTags, array('img', 'hr', 'br'));
                $openedTags = array_values($openedTags);

                // Put all closed tags into an array
                preg_match_all("#</([a-z][a-z0-9]*)\b(?:[^>]*?)>#iU", $tmp, $result);
                $closedTags = $result[1];

                $numOpened = count($openedTags);

                // Check if we end inside a tag; if we are remove it to get rid of the fragment
                if (StringHelper::strrpos($tmp, '<') > StringHelper::strrpos($tmp, '>'))
                {
                    $offset = StringHelper::strrpos($tmp, '<');
                    $tmp = StringHelper::trim(StringHelper::substr($tmp, 0, $offset));
                }
                //now we can add the ellipsis
                $tmp .= '...';

                // Not all tags are closed so close them and finish.
                if (count($closedTags) !== $numOpened)
                {
                    // Closing tags need to be in the reverse order of opening tags.
                    $openedTags = array_reverse($openedTags);

                    // Close tags
                    for ($i = 0; $i < $numOpened; $i++)
                    {
                        if (!in_array($openedTags[$i], $closedTags))
                        {
                            $tmp .= '</' . $openedTags[$i] . '>';
                        } else {
                            unset($closedTags[array_search($openedTags[$i], $closedTags)]);
                        }
                    }
                }
            } else {
                // $allowHtml==false so just add an ellipsis
                $tmp .= '...';
            }

            if ($tmp === false || strlen($text) > strlen($tmp))
            {
                $text = trim($tmp); 
            }
        }

        // Clean up any internal spaces created by the processing.
        $text = str_replace(' </', '</', $text);
        $text = str_replace(' ...', '...', $text);

        return $text;
    }

Would it be appropriate to create a pull request for this?

avatar Quy Quy - change - 30 Nov 2018
Status New Closed
Closed_Date 0000-00-00 00:00:00 2018-11-30 22:53:09
Closed_By Quy
avatar joomla-cms-bot joomla-cms-bot - change - 30 Nov 2018
Closed_By Quy joomla-cms-bot
avatar joomla-cms-bot joomla-cms-bot - close - 30 Nov 2018
avatar Quy
Quy - comment - 30 Nov 2018

See PR #23189


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/23058.

avatar joomla-cms-bot
joomla-cms-bot - comment - 30 Nov 2018

Set to "closed" on behalf of @Quy by The JTracker Application at issues.joomla.org/joomla-cms/23058

Add a Comment

Login with GitHub to post a comment