[PHP] Re: Utf8 issues with FILTER_SANITIZE_URL

On 12.02.2016 at 13:30, Ashley Sheridan wrote:

> I've noticed that the santisation filter FILTER_SANITIZE_URL is not working quite as the documentation suggests.
> Particularly, this filter says it removes all characters except letters, digits, and a small list of specific characters. However, I took letters in this context to be the same as \p{L} that the preg_* functions support, but it appears it's actually only meaning [a-zA-Z] here. I need characters like êéö to not be stripped (these are valid in URLs and have been widely supported in browsers and servers for years)
> Is utf8 support on this filter intentionally missing, or is there a flag I need to set in order for it to work correctly.

Well, UTF-8 support is neither intentionally missing, nor is there a
flag to change the behavior: UTF-8 support is simply not implemented
(yet).  See also the definition of allowed_list[1], and the definition

Consider to file a feature request, but please double-check if there's
not already a respective one.  I guess, there is.


Christoph M. Becker

