On 08/28/2012 11:45 AM, Alexander Barton wrote:
I don't think that this is a good idea, because I fear it is way too permissive: It allows characters like ASCII 34("), 39('), 63(?), 91([), 126(~), 127(DEL), which could cause problems or even interfere with pattern patching and the like.
So, I guess I was in a mood to follow Larry Wall's "be liberal in what you accept" advice. :) Before I went writing I did a little looking into what some of the back-end authentication mechanisms would accept. The useradd man page on squeeze says:
| On Debian, the only constraints are that usernames must neither | start with a dash ('-') nor contain a colon (':') or a | whitespace (space: ' ', end of line: '\n', tabulation: | '\t',etc.). Note that using a slash ('/') may break the default | algorithm for the definition of the user's home directory.
I didn't find exactly an authoritative source, but searching around suggests that usernames in LDAP are even more inclusive.
Given all this, plus considering UTF-8's slow but sure advancement into the IRC universe, a blacklist approach seemed more on target to me than a whilelist. I can definitely agree the blacklist in my patch isn't big enough; '~' and DEL need to be kept out too, for sure. But, beyond that, I guess my general feeling is that the days of dirt-basic IRC parsing are over, and clients and servers alike need to be prepared to deal with it.
Is there something specific you think is likely to break? If we're just worried about these being used in regular expressions, '.', '+', and sometimes '-' all have special meanings there too…
Allowing some(!) more characters – probably like 42(+), 45(-), 46(.), and 95(_) – makes sense to me and should do no harm, I think …
But, for what it's worth, I did confirm that adding this set would cover our immediate needs.
Thanks,