Solution for regular expression in Q2A caching plugin

Question

Solution for regular expression in Q2A caching plugin

asked Jul 21, 2015 in Plugins by sama55
edited Jul 21, 2015 by sama55

To PHP developers:

I am developing caching plugin for Q2A. I am troubled that I can not delete new line (LF?) with regular expression in XAMPP (Windows). Finally, I want to remove comments, tabs, and new lines. New line is row of only CR/LF/CRLF there is no character on the top of row.

Processing result:

<!DOCTYPE html>
<html>
<<<=== New line (LF?) remains.
<head>

Expected results:

<!DOCTYPE html>
<html>
<head>

Failure code1 (New line (LF?) is not removed):

private function compress_html($html) {
$searchs = array(
'//s', // remove comment
'/\t/', // remove tab
'/^(\r\n|\n\r|\n|\r)/', // remove only new line
);
$replaces = array(
'',
'',
'',
);
return preg_replace($searchs, $replaces, $html);
}

Failure code2 (New line (LF?) is not removed):

private function compress_html($html) {
$searchs = array(
'//s', // remove comment
'/\t/', // remove tab
'/^(\r\n|\n\r|\n|\r)/', // remove only new line
);
$replaces = array(
'',
'',
'',
);
foreach($searchs as $key => $search)
$html = preg_replace($search, $replaces[$key], $html);
return $html;
}

Thanks.

Q2A version: 1.7

commented Jul 21, 2015 by igael

commented Jul 21, 2015 by sama55

2 Answers

answered Jul 21, 2015 by pupi1985
selected Jul 22, 2015 by sama55

Best answer

If you want to remove all line breaks, then you should replace with the empty string. The thing is that you'll get only one long line. In your expected, result you want something like this:

<!DOCTYPE html>\n
<html>\n
<head>

So what you want to do is actually turn all repeated line breaks into a maximum of one. I guess something like this should get you your expected result:

$html = "<!DOCTYPE html>
<html>

Something else

<!--xyz

blah-->Again...

<head>";
$searchs = array(
   '//s',
   '/\t/',
   '/[\r\n]+/',
);
$replaces = array(
   '',
   '',
   "\n",
);
echo preg_replace($searchs, $replaces, $html);

A better approach would be to use already existing libraries: https://github.com/mrclay/minify/blob/master/min/lib/Minify/HTML.php

Having said so, there are something this is leaving apart. For example, if the HTML loos like this "word\tword" you would see something this in the browser "word word". However, if you remove the tabs, you'll see something like this "wordword". The same happens with the line brakes so it would make more sense to use a multiple match on the tabs too an leave only one.

Also note you're trying to parse HTML with a regular expression. Regular expressions don't understand the hierarchical data of HTML so having comments inside the comments would again break things. Better to user an HTML parser (which will, obviously, degrade performance).

Finally, take into account you are not considering <pre> tags. This will absolutely destroy their content so, again, you need an HTML parser to avoid them.

Conclusion: try the library. If it doesn't do the trick, better not to touch the HTML. If it does the trick, then measure the time it consumes to make sure it really ends up being faster. Don't leave aside enabling gzip HTML compression as it might be all it is actually needed and you don't have to worry at all about the compression as it would be happening in a different layer.

commented Jul 22, 2015 by sama55

commented Jul 22, 2015 by steven2

steven2 · Answer 1 · 2015-07-21T21:44:47+0000

commented Jul 21, 2015 by steven2

commented Jul 21, 2015 by sama55

commented Jul 22, 2015 by steven2

commented Jul 22, 2015 by sama55

Solution for regular expression in Q2A caching plugin

Please log in or register to add a comment.

Please log in or register to answer this question.

2 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Categories

Solution for regular expression in Q2A caching plugin

Please log in or register to add a comment.

Please log in or register to answer this question.

2 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories