Injection is an attack that involves breaking out of a data context and switching into a code context through the use of special characters that are significant in the interpreter being used. - OWASP
Many PHP applications will HTML encode any untrusted data using htmlentities() irrespective of context. This is a problem as htmlentities does not mitigate certain XSS injections. For example, the output of any data that will be used as a URL:
[php]echo '<a href="', htmlentities( 'javascript:alert("xss");' ), '">XSS</a>';[/php]
In this instance htmlentities is not sufficient protection, the above outputs
[html]<a href="javascript:alert("xss");">XSS</a>[/html]
To prevent this injection URLs should be validated on input, and htmlentites() encoded on output.
[php] $url = 'http://hostname/path?arg=value';
$parsedUrl = parse_url( $url );
if( $parsedUrl['scheme'] != 'http' && $parsedUrl['scheme'] != 'https' ) {
// reject URL
} else {
$url = mysqlirealescape_string( $mysqli, $url );
$sql = "INSERT INTO table (url) VALUES ('$url')";
// insert query
}
...
echo '<a href="', htmlentities( $url ), '">XSS</a>', "\n";
[/php]
The URL now stored in the database should still be output using htmlentities() to encode quote marks that could inject further code, such as in this example:
[html]http://www.test.com/" onClick="javascript:alert(\'xss\');"[/html]
For further information about XSS, the OWASP XSS (Cross Site Scripting) Prevention Cheat Sheet offers a list of XSS Prevention Rules, while RSnake's XSS (Cross Site Scripting) Cheat Sheet is the definitive list of XSS injection test strings.