Archive for category PHP Regex

PHP Regex – Extract Filenames from Full Path

PHP Regular Expression (Regex) can do a lot wonders, one is to extract what looks like a filename from a full path. It makes an assumption that anything after the last directory separator (in this case a slash: / ) is the name of a file.

Below is the code:

<html>
<head><title>Extract Filenames from Full Path</title></head>
<body>
<form action="<?= $_SERVER['PHP_SELF']; ?>" method="post"&amp;amp;amp;amp;amp;amp;amp;
<input type="text" name="value" value="<? print $_POST ['value']; ?>"/><br/>
<input type="submit" value="Submit" /><br/><br/>
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" )
{
    $mystr = $_POST['value'];
    if ( ereg( '^\/.*\/([^\/]+)$', $mystr, $matches ) )
    {
        echo "The file is: $matches[1]";
    }
    else
    {
        echo "<b>No file found here.</b>";
    }
}
?>
</form>
</body>
</html>

Regular Expression Explanation:

^

the beginning of the line, followed by

\/

a slash, then

.

any character

+

found one or more times, up to

\/

a slash, followed by

(

the beginning of the group that will capture the filename and contains

[

a character class that contains

^

anything that isn't

\/

a slash

]

the end of the character class

+

found one or more times

)

the end of the group, which goes up to

$

the end of the line

3 Comments

PHP Regex – Validate Email Address

This article is for our friend Josh and all those who might be interested in knowing how to validate email address using PHP Regex (Regular Expression). The script below makes sure an e-mail address looks like a valid address, containing a username, @, and valid hostname. For example, null@example.com is valid, but NOSPAM@spam isn’t valid.

<html>
<head><title>4-11 Validating e-mail addresses</title></head>
<style>
    .err { color : red ; font-weight : bold }
</style>
<body>
<form action="recipe4-11.php" method="post">
<input type="text" name="input" /><br/>
<input type="submit" value="Submit Form" /><br/><br/>
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" )
{
    $input = $_POST['input'];
    if ( preg_match( "/^[-\w.]+@([A-z0-9][-A-z0-9]+\.)+[A-z]{2,4}$/", $input ) )
    {
        # Do some processing here - input if valid
    }
    else
    {
        print "<span class=\"err\">Bad e-mail address. Please correct ".
            "and resubmit the form</span><br/>";
    }
}
?>
</form>
</body>
</html>


Regular Expression Explanation:

(

a group that includes

[A-z0-9]

a letter or number

[-A-z0-9]

a letter, number, or hyphen

\.

a literal dot or period

)

the end of the group

+

found one or more times

[A-z]

a letter

{2,4}

found between two and four times.

9 Comments

PHP Regex – Search for Line Ends with a Word

This article teaches you how to use PHP regex – regular expression to find whole words at the end of a line.

<html>
<head><title>Searching for lines beginning with a word</title></head>
<body>
<form action="<?= $_SERVER['PHP_SELF']; ?>" method="post">
<input type="text" name="str"
    value="<?php print $_POST['str'];?>" /><br />
<input type="submit" value="Find lines" /><br /><br />
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" )
{
    $str = $_POST['str'];
    if ( preg_match( "/^Word\b/", $str ) )
    {
        print "<b>Found a match!: &amp;amp;amp;amp;nbsp;'". $str . "'</b><br/>";
    } else {
        print "<b>Didn't find it: &amp;amp;amp;amp;nbsp;'". $str . "'</b><br/>";
    }
}
?>
</form>
</body>
</html>

Regular Expression Explanation:

\b

is a word boundary, such as a space, tab, and so on, followed by

f

then

o

then another

o

and lastly

$

the end of the line.

No Comments

PHP Regex – Search for Line Begins with a Word

This article teaches you how to use PHP regular expression to find whole words at the beginning of a line.

<html>
<head><title>Searching for lines beginning with a word</title></head>
<body>
<form action="<?= $_SERVER['PHP_SELF']; ?>" method="post">
<input type="text" name="str"
    value="<?php print $_POST['str'];?>" /><br />
<input type="submit" value="Find lines" /><br /><br />
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" )
{
    $str = $_POST['str'];
    if ( preg_match( "/^Word\b/", $str ) )
    {
        print "<b>Found a match!: &amp;nbsp;'". $str . "'</b><br/>";
    } else {
        print "<b>Didn't find it: &amp;nbsp;'". $str . "'</b><br/>";
    }
}
?>
</form>
</body>
</html>

Regular Expression Explanation:

^

at the start of the line, followed immediately by

W

then

o

followed by

r

then

d

and lastly

\b

a word boundary.

No Comments

PHP Regular Expression – Search for Words Within Comments

You can use this PHP regular expression to search for a word in code and to make sure the word is found within comments. The string // WORD will match on a line and so will /* WORD */.

<html>
<head><title>Searching for words within comments</title></head>
<body>
<form action="<?= $_SERVER['PHP_SELF']; ?>" method="post">
<input type="text" name="str"
value="<?php print $_POST['str'];?>" /><br />
<input type="submit" value="Find WORD in comments" /><br /><br />
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" )
{
$str = $_POST['str'];
if ( preg_match( "/^(?:\/\*(?:(?!\*\/).)*|\/\/.*?)WORD/", $str ) )
{
print "<b>Found WORD in comments: &nbsp;'" . htmlspecialchars($str)
.."'</b><br/>";
}
else
{
print "<b>Found no match in text: &nbsp;'" .
htmlspecialchars($str) . "'</b><br/>";
}

}
?>
</form>
</body>
</html>

5 Comments

PHP Regex Extract Directiory from Full Path

This code snippet uses PHP Regex – ereg(), which is based on the extended POSIX regular expression implementation. It extracts directory from pull path.

<html>
<head><title>Extracting directiories from full paths</title></head>
<body>
<form action="<?= $_SERVER['PHP_SELF']; ?>" method="post">
<input type="text" name="value" value="<? print $_POST ['value']; ?>"/><br/>
<input type="submit" value="Submit" /><br/><br/>
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" ) {
    $mystr = $_POST['value'];
    if ( ereg( '^\/(.*)\/([^\/]+$|$)', $mystr, $matches ) ) {
        echo "The directory is: /$matches[1]";
    }
}
?>
</form>
</body>
</html>

Regular Expression Explanation:

^

the beginning of the line, followed by

\/

a slash, then

(

the beginning of the group that captures

.

any character

*

found zero, one, or many times

)

the end of the group, followed by

\/

a slash, then

(

the beginning of a group that contains

[

a character class

^

that doesn't include

\/

a slash

]

the end of the character class

+

found at least one time, followed by

$

the end of the line

|

or

$

the end of the line (without the [^\/]+ stuff).

No Comments

PHP Regex Extract Filename from Full Path

There are times you may want to extract filenames from their full paths. This example code extracts what looks like a filename from a full path. It makes an assumption that anything after the last directory separator (in this case / ) is the name of a file.

<html>
<head><title></title></head>
<body>
<form action="<?= $_SERVER['PHP_SELF']; ?>" method="post">
<input type="text" name="value" value="<? print $_POST ['value']; ?>"/><br/>
<input type="submit" value="Submit" /><br/><br/>
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" )
{
    $mystr = $_POST['value'];
    if ( ereg( '^\/.*\/([^\/]+)$', $mystr, $matches ) )
    {
        echo "The file is: $matches[1]";
    }
    else
    {
        echo "<b>No file found here.</b>";
    }
}
?>
</form>
</body>
</html>

Regular Expression Explanation:

^

the beginning of the line, followed by

\/

a slash, then

.

any character

+

found one or more times, up to

\/

a slash, followed by

(

the beginning of the group that will capture the filename and contains

[

a character class that contains

^

anything that isn't

\/

a slash

]

the end of the character class

+

found one or more times

)

the end of the group, which goes up to

$

the end of the line

6 Comments

PHP Regex Remove Whitespace from HTML

There are many ways to optimize a HTML page, and one of these way is to remove white space. Whitespace between tags in HTML pages is just for readability, so if you have a site that has a lot of visits, it’s a good idea to consider strip away things such as extra whitespace in HTML. For a small to medium size webite, you can easily save 500 megabytes (MB) to a few gigabytes (GB) transfers a month just by cleaning whitespace and newline characters out of their HTML.

<html>
<head><title>Removing whitespace from HTML</title></head>
<body>
<form action="<?= $_SERVER['PHP_SELF']; ?>" method="post">
<input type="text" name="html"
value="<?php print $_POST['html'];?>" /><br />
<input type="submit" value="Remove whitespace" /><br /><br />
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" )
{
$html = $_POST['html'];
$newhtml = preg_replace( "/(?:(?<=\>)|(?<=\/\))(\s+)(?=\<\/?)/","", $html );
print "<b>Original text was: &amp;nbsp;'". htmlspecialchars($html) .
"'</b><br/>";
print "<b>New text is: &amp;nbsp;'". htmlspecialchars($newhtml) . "'</b><br />";
}
?>
</form>
</body>
</html>

Regular Expression Explanation:

The look-behind group (?:(?<=\>)|(?<=\/\>)) matches the end of an HTML tag. The reason (?<=\>|\/\>) doesn’t work in the expression is because neither Perl nor PHP permits variable-length look-behinds. Each look-behind needs to be broken up by itself and put inside a group, such as (?:(?<=\>)|(?<\/\>)).

(?:

a noncapturing group that contains

(?<=

a positive look-behind with

\>

a >

)

the end of the positive look-behind

|

or

(?<=

a positive look-behind with

\/

a slash, followed by

\>

a >

)

the end of the positive look-behind

)

the end of the noncapturing group

(

a capturing group that contains

\s

whitespace

+

one time or more

)

the end of the capturing group

(?=

a positive look-ahead

\<

a <, followed by

\/

a slash

?

that can occur at most once

)

the end of the positive look-ahead.

12 Comments

PHP Regex Validate IP address

Before we see the code, let’s first understand what an IP address is consist of. IP address is four groups of numbers between 0 and 255 separated by periods. The address 192.168.0.1 is a valid IP address, but 256.0.1.2 isn’t.

You can use the code to validate an IP address.

<html>
<head><title>Validating IP addresses</title></head>
<style>
          .err { color : red ; font-weight : bold }
</style>
<body>
<form action="<?= $_SERVER['PHP_SELF']; ?>" method="post">
<input type="text" name="input" /><br/>
<input type="submit" value="Submit Form" /><br/><br/>
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" )
{
    $input = $_POST['input'];
	if (preg_match( "/^(([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]).){3}([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$/", $input))
{
        print "valid!";
    }
    else
    {
        print "<span class=\"err\">Bad IP address. Please correct and " .
            "resubmit the form</span><br/>";
    }
}
?>
</form>
</body>
</html>


Regular Expression Explanation:
The bulk of this expression is a group that breaks down the numbers that range from 0 to 255. The expression would be a lot shorter if 002 or 015 were valid instead of 2 and 15, respectively, but for this expression you want to specify IP addresses without the leading zeros.

The range from 0 to 255 breaks down into other ranges: 0–99, 100–199, 200–249, and 250–255. The expression to match this is ([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]), which can be broken down into [1-9]?[0-9], which will match 0–99; 1[0-9]{2}, which will match 100–199; 2[0-4][0-9], which will match 200–249; and 25[0-5], which will match 250–255.

After taking out the IP address validation expression, the rest of it breaks down like this:

^

the beginning of the line

(

the beginning of a group that contains

( )

the IP address expression explained previously

\.

a literal dot

)

the end of the group

{3}

occurring exactly three times

( )

another occurrence of the IP address

$

the end of the line.

hope it helps!

7 Comments

PHP Regex Extract Username from Email Address

You can use the following PHP snippet to grab the username out of an e-mail address. Given coldwarkids@ussr.com, the result will be coldwarkids.

This expression works to extract a username from an e-mail address because it gets everything up to the @ in one group and holds everything including and after the @ to the end of the line in another group. In the expression, after separating the two groups, it simply drops the second group so everything after @ goes nowhere.

<html>
<head><title>Extracting usernames from email addresses</title></head>
<style>
    .err { color : red ; font-weight : bold }
</style>
<body>
<form action="<?= $_SERVER['PHP_SELF']; ?>" method="post">
<input type="text" name="input" /><br/>
<input type="submit" value="Submit Form" /><br/><br/>
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" )
{
    $input = $_POST['input'];
    if (preg_match ( "/^([^@]+)(@.*)$/", $input ) )
    {
        # Do some processing here - input if valid
        $username = preg_replace( "/^([^@]+)(@.*)$/", "$1", $input);
        print "<b>Found username \"$username\"</b>";
    }
    else
    {
        print "<span class=\"err\">No username found here:</span><br/>";
    }
}
?>
</form>
</body>
</html>

Regular Expression Explanation:

^

the beginning of the line

(

a capturing group containing

[^@]

everything that isn’t an at (@) sign

+

found one or more times, up to

(

another group containing

@

an at sign (@)

.

any character

*

found zero, one, or many times

)

the end of the group

$

the end of the line.

1 Comment