Jun

26

There are many ways to optimize a HTML page, and one of these way is to remove white space. Whitespace between tags in HTML pages is just for readability, so if you have a site that has a lot of visits, it’s a good idea to consider strip away things such as extra whitespace in HTML. For a small to medium size webite, you can easily save 500 megabytes (MB) to a few gigabytes (GB) transfers a month just by cleaning whitespace and newline characters out of their HTML.

<html>
<head><title>Removing whitespace from HTML</title></head>
<body>
<form action="<?= $_SERVER['PHP_SELF']; ?>" method="post">
<input type="text" name="html"
value="<?php print $_POST['html'];?>" /><br />
<input type="submit" value="Remove whitespace" /><br /><br />
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" )
{
$html = $_POST['html'];
$newhtml = preg_replace( "/(?:(?<=\>)|(?<=\/\))(\s+)(?=\<\/?)/","", $html );
print "<b>Original text was: &amp;nbsp;'". htmlspecialchars($html) .
"'</b><br/>";
print "<b>New text is: &amp;nbsp;'". htmlspecialchars($newhtml) . "'</b><br />";
}
?>
</form>
</body>
</html>

Regular Expression Explanation:

The look-behind group (?:(?<=\>)|(?<=\/\>)) matches the end of an HTML tag. The reason (?<=\>|\/\>) doesn’t work in the expression is because neither Perl nor PHP permits variable-length look-behinds. Each look-behind needs to be broken up by itself and put inside a group, such as (?:(?<=\>)|(?<\/\>)).

(?:

a noncapturing group that contains

(?<=

a positive look-behind with

\>

a >

)

the end of the positive look-behind

|

or

(?<=

a positive look-behind with

\/

a slash, followed by

\>

a >

)

the end of the positive look-behind

)

the end of the noncapturing group

(

a capturing group that contains

\s

whitespace


+

one time or more

)

the end of the capturing group

(?=

a positive look-ahead

\<

a <, followed by

\/

a slash

?

that can occur at most once

)

the end of the positive look-ahead.



Similar Posts

Comments

Name (required)

Email (required)

Website

Speak your mind

3 Comments so far

  1. Toby on September 4, 2008 3:24 am

    Very useful but noticed a type in the rendering of the code perhaps because of the wrap around:

    /(?:(?)|(?<=\/
    \))(\s+)(?=\<\/?)/

    Based on your breakdown of the regex I think it should be:

    /(?:(?)|(?))(\s+)(?=\ is missing from the colour render.

    Thanks again.

  2. admin on September 4, 2008 5:34 am

    Hi, Toby

    thanks for your comment, yea, i guess my wrapper has that problem, i will change to a better one once i find one. :)

  3. Andre on October 14, 2008 8:00 am

    Hi, i like to steal code very often, but I have Problems now to understand where and how to add the correction of Toby.

    I also get an Error with the Regex used in your Demo:

    Warning: preg_replace() [function.preg-replace]: Compilation failed: missing ) at offset 34 in /home/chris/html/xeloop/root/TEST/form.php on line 12

    Is it possible to add the full working regex again?

    Thanks for your work.

Sponsors




Links