Jun
26
PHP Regex Remove Whitespace from HTML
June 26, 2008 |
There are many ways to optimize a HTML page, and one of these way is to remove white space. Whitespace between tags in HTML pages is just for readability, so if you have a site that has a lot of visits, it’s a good idea to consider strip away things such as extra whitespace in HTML. For a small to medium size webite, you can easily save 500 megabytes (MB) to a few gigabytes (GB) transfers a month just by cleaning whitespace and newline characters out of their HTML.
<html>
<head><title>Removing whitespace from HTML</title></head>
<body>
<form action="<?= $_SERVER['PHP_SELF']; ?>" method="post">
<input type="text" name="html"
value="<?php print $_POST['html'];?>" /><br />
<input type="submit" value="Remove whitespace" /><br /><br />
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" )
{
$html = $_POST['html'];
$newhtml = preg_replace( "/(?:(?<=\>)|(?<=\/\))(\s+)(?=\<\/?)/","", $html );
print "<b>Original text was: &nbsp;'". htmlspecialchars($html) .
"'</b><br/>";
print "<b>New text is: &nbsp;'". htmlspecialchars($newhtml) . "'</b><br />";
}
?>
</form>
</body>
</html>
Regular Expression Explanation:
The look-behind group (?:(?<=\>)|(?<=\/\>)) matches the end of an HTML tag. The reason (?<=\>|\/\>) doesn’t work in the expression is because neither Perl nor PHP permits variable-length look-behinds. Each look-behind needs to be broken up by itself and put inside a group, such as (?:(?<=\>)|(?<\/\>)).
|
(?: |
a noncapturing group that contains … |
|
(?<= |
a positive look-behind with … |
|
\> |
a > … |
|
) |
the end of the positive look-behind … |
|
| |
or … |
|
(?<= |
a positive look-behind with … |
|
\/ |
a slash, followed by … |
|
\> |
a > … |
|
) |
the end of the positive look-behind … |
|
) |
the end of the noncapturing group … |
|
( |
|
|
\s |
whitespace … |
|
+ |
one time or more … |
|
) |
the end of the capturing group … |
|
(?= |
a positive look-ahead … |
|
\< |
a <, followed by … |
|
\/ |
a slash … |
|
? |
that can occur at most once … |
|
) |
the end of the positive look-ahead. |
Similar Posts
- Validate URL Using PHP Regex
- PHP Regex Validate IP address
- PHP Regex Extract Username from Email Address
- PHP Regex Extract Filename from Full Path
- PHP Regex Extract Directiory from Full Path
- PHP Regex - Validate Email Address
- PHP Regex - Extract Filenames from Full Path
Comments
3 Comments so far



































Very useful but noticed a type in the rendering of the code perhaps because of the wrap around:
/(?:(?)|(?<=\/
\))(\s+)(?=\<\/?)/
Based on your breakdown of the regex I think it should be:
/(?:(?)|(?))(\s+)(?=\ is missing from the colour render.
Thanks again.
Hi, Toby
thanks for your comment, yea, i guess my wrapper has that problem, i will change to a better one once i find one. :)
Hi, i like to steal code very often, but I have Problems now to understand where and how to add the correction of Toby.
I also get an Error with the Regex used in your Demo:
Warning: preg_replace() [function.preg-replace]: Compilation failed: missing ) at offset 34 in /home/chris/html/xeloop/root/TEST/form.php on line 12
Is it possible to add the full working regex again?
Thanks for your work.