Archive for category PHP Regex

Validate URL Using PHP Regex

There are times you may want a URL validation function that accepts (I think) most known types of URL’s. Useful for validating a homepage link, or submission of links from the public.

It allows for port, path and query string validations, the parameter $url string contains the user input URL and the function returns the boolean of true or false.

<?php
/**
* Validate URL
* Allows for port, path and query string validations
* @param    string      $url	   string containing url user input
* @return   boolean     Returns TRUE/FALSE
*/
function validateURL($url)
{
$pattern = '/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&amp;?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';
return preg_match($pattern, $url);
}

$result = validateURL('http://www.google.com');
print $result;
?>

Hope this helps!

12 Comments

PHP Regex – POSIX vs PCRE

PHP supports both POSIX Regex (POSIX Extended) and PCRE (Perl-Compatible) regular expression. For POSIX, the syntax is ereg, and for PCRE, the syntax is preg. According to PHP offical site documentation, PCRE is much (at least 6 times) faster than POSIX. The scripts below replace everything between double quotes. the first one is written in POSIX, another in PCRE.

Regular Expression Explanation:

a quote, followed by …
[ a character class …
^ that isn't …
" another quote …
] the end of the character class …
* zero or more times …
another quote appears.


Source Code:

<html>
<head><title>Using POSIX</title></head>
<body>
<form action="<?php $_SERVER['PHP_SELF'] ?>"
method="post">
<input type="text" name="value"
value="<? print stripslashes($_POST['value']); ?>"/>
<br/>
<input type="submit" value="Submit" /><br/><br/>
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" )
{
$mystr = $_POST['value'];
$mynewstr = ereg_replace(
'"[^"]*"', '"***"', $mystr );
print stripslashes($mynewstr);
}
?>
</form>
</body>
</html>
<html>
<head><title>Using PCRE</title></head>
<body>
<form action="<?php $_SERVER['PHP_SELF'] ?>"
method="post">
<input type="text" name="value"
value="<? print stripslashes($_POST['value']); ?>"/>
<br/>
<input type="submit" value="Submit" /><br/><br/>
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" )
{
$mystr = $_POST['value'];
$mynewstr = preg_replace(
'/"[^"]*"/', '"***"', $mystr );
print stripslashes($mynewstr);
}
?>
</form>
</body>
</html>

, , , , ,

2 Comments

PHP Regex – Finding Similar Words

I take a break from fighting evil visitors. [] is a quite useful PHP PCRE expression that helps finding similar words.

Regular Expression Explanation:

\b a word boundary, followed by …
b letter ‘b’, followed by …
[aio] one of a, i, or o, followed by …
t letter ‘t’, and finally …
\b a word boundary.


Source Code:

<html>
<head><title>Finding Similar Words</title></head>
<body>
<form action="<?php $_SERVER['PHP_SELF'] ?>"
method="post">
<input type="text" name="value"
value="<?php print $_POST['value'];?>" />
<br />
<input type="submit" value="Submit" />
<br /><br />
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" ) {
$mystr = $_POST['value'];
if ( preg_match( "/\bb[aio]t\b/", $mystr ) ) {
echo "Yes!<br/>";
} else {
echo "Uh, no.<br/>";
}
}
?>
</form>
</body>
</html>

,

No Comments

PHP Regex – Bad Words Streamlining

So what if we detect and filter out all bad words on the site? Will that make the bad guys try to behave themselves or go to hell? No, they will neither try to behave themselves nor go to hell. What they will do is to leave our website and go visit another website that accepts bad words. So if you filter out all bad words on the site, go to visit your google analytics tomorrow, you will see the graphics change from a slash shape to a mountain shape. We don’t want to see that, so how? So instead of filtering out bad words, we should ‘streamline’ them. Instead of filtering out the word ‘fuck’, we can streamline it to ‘f**k’. In this way, we can keep both our ‘moral standard’ and visitors on our site. Hypocritical it might be, but so what? We need traffic!
Below is how I use PHP’s preg_replace to make the streamline process work like a charm:

Regular Expression Explanation:

\b a word boundary, followed by …
fuck the word ‘fuck’, followed by …
\b a word boundary.

Source Code:

<html>
<head><title>Replacing Words</title></head>
<body>
<form action="<?php $_SERVER['PHP_SELF'] ?>"
method="post">
<input type="text" name="value"
value="<?php print $_POST['value'];?>" />
<br />
<input type="submit" value="Replace word" />
<br /><br />
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" ) {
$str = $_POST['value'];
$newstr = preg_replace(
"/\bfuck\b/", "f**k", $str
);
print "<b>$newstr</b><br />";
}
?>
</form>
</body>
</html>

4 Comments

PHP Regex – Finding Variations on Bad Words

If evil people type something like ‘stupid fuck’, obviously, it will be detected and blocked by most sites, so they become smart, they adapt, so they may use ‘fuk’ instead of ‘fuck’. They don’t care about the spelling, all they want is to let you know they are trying to say ‘fuck’. We definitely understand, but does our little PHP program understand? Well, that depends on how we-engineers program it. Here is a simple way to stop evolving ‘evil doers’ who try to post mutant bad words on our site. Enjoy!

Regular Expression Explanation:

stupid the word ‘stupid’, followed by …
space a space, followed by …
fu the letteers ‘fu’, followed by …
c c, which may …
? appear once, but isn’t required, followed by …
k the letter ‘k’, followed by …
(er) a group that contains the letters ‘er’ …
? that may appear once, but isn’t require


Source Code:

<br />
<html><br />
<head><br />
<title>Finding Variations on phrase</title><br />
</head><br />
<body><br />
<form action="<?php $_SERVER['PHP_SELF'] ?>"<br />
method="post"><br />
<input type="text" name="str"<br />
value="<?php print $_POST['str'];?>" /><br /><br />
<input type="submit" value="stop bad words" /><br />
<br /><br /><br />
<?php<br />
if ( $_SERVER['REQUEST_METHOD'] == "POST" )<br />
{<br />
$str = $_POST['str'];<br />
if ( preg_match( "/stupid fuc?k(er)?/", $str ) )<br />
{<br />
print "<b>Found him: '". $str . "'</b><br/>";<br />
}<br />
else<br />
{<br />
print "<b>Did NOT find match</b><br/>";<br />
}<br />
}<br />
?><br />
</form><br />
</body><br />
</html><br />

2 Comments

PHP Regex – Finding Multiple Dirty Words

Evil things are evolving, multiplying, f word is by no means the only bad word out there, in order to STOP THEM ALL, we need a better script to detect them. and here it is:

Regular Expression Explanation:

\b a word boundary …
( followed by the word boundary group …
( followed by a group that contains …
fuck the word ‘fuck’ …
) the end of the ‘fuck’ group …
| or …
( a group that contains …
shit the word ‘shit’ …
) the end of the ‘shit’ group.
) the end of the word boundary group followed by …
\b the end of the word boundary.


Source Code:

<html>
<head><title>Finding multiple words</title></head>
<body>
<form action="<?php $_SERVER['PHP_SELF'] ?>"
method="post">
<input type="text" name="str"
value="<?php print $_POST['str'];?>" /><br />
<p><input type="submit" value="Find words" /></p>
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" )
{
$str = $_POST['str'];
if ( preg_match( "/\b((fuck)|(shit))\b/", $str ) )
{
print "<b>Found match: '".$str."'</b><br/>";
}
else
{
print "<b>Did NOT find match: '".$str."'</b><br/>";
}
}
?>
</form>
</body>
</html>

1 Comment

PHP Regex – Finding the F Word

Most of the time we don’t want people to post the f word on our website, here is the way to use PHP regular expression to find the word ‘fuck’. Enjoy it!

Regular Expression Explanation:

\b a word boundary (a space or beginning of a line, or punctuation) …
w a w followed by …
o an o, followed by …
r an r, then …
d a d, and finally …
\b a word boundary at the end of the word.


Source Code

<html>
<head>
<title>Finding the 'F' words</title>
</head>
<body>
<form action="<?php $_SERVER['PHP_SELF'] ?>"
method="post">
<input type="text" name="str"
value="<?php print $_POST['str'];?>" />
<br />
<input type="submit" value="Find word" />
<br /><br />
<?php
if ( $_SERVER['REQUEST_METHOD'] == "POST" ) {
$str = $_POST['str'];
if ( preg_match( "/\bfuck\b/", $str ) ) {
print "<b>Heh heh. You said 'fuck'</b>";
} else {
print "<b>Nope. Didn't find it.</b>";
}
}
?>
</form>
</body>
</html>

No Comments