Web lists-archives.com

[PHP] Re: Utf8 issues with FILTER_SANITIZE_URL




On 12.02.2016 at 13:30, Ashley Sheridan wrote:

> I've noticed that the santisation filter FILTER_SANITIZE_URL is not working quite as the documentation suggests.
> 
> Particularly, this filter says it removes all characters except letters, digits, and a small list of specific characters. However, I took letters in this context to be the same as \p{L} that the preg_* functions support, but it appears it's actually only meaning [a-zA-Z] here. I need characters like êéö to not be stripped (these are valid in URLs and have been widely supported in browsers and servers for years)
> 
> Is utf8 support on this filter intentionally missing, or is there a flag I need to set in order for it to work correctly.

Well, UTF-8 support is neither intentionally missing, nor is there a
flag to change the behavior: UTF-8 support is simply not implemented
(yet).  See also the definition of allowed_list[1], and the definition
of LOWALPHA and HIALPHA[2].

Consider to file a feature request, but please double-check if there's
not already a respective one.  I guess, there is.

[1]
<https://github.com/php/php-src/blob/php-7.0.3/ext/filter/sanitizing_filters.c#L324>
[2]
<https://github.com/php/php-src/blob/php-7.0.3/ext/filter/sanitizing_filters.c#L60>

-- 
Christoph M. Becker


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php