On 08/28/2012 11:45 AM, Alexander Barton wrote:
I don't think that this is a good idea, because I fear it is way
too permissive: It allows characters like ASCII 34("), 39('), 63(?), 91([),
126(~), 127(DEL), which could cause problems or even interfere with pattern patching and
So, I guess I was in a mood to follow Larry Wall's "be liberal in what
you accept" advice. :) Before I went writing I did a little looking
into what some of the back-end authentication mechanisms would accept.
The useradd man page on squeeze says:
| On Debian, the only constraints are that usernames must neither
| start with a dash ('-') nor contain a colon (':') or a
| whitespace (space: ' ', end of line: '\n', tabulation:
| '\t',etc.). Note that using a slash ('/') may break the default
| algorithm for the definition of the user's home directory.
I didn't find exactly an authoritative source, but searching around
suggests that usernames in LDAP are even more inclusive.
Given all this, plus considering UTF-8's slow but sure advancement into
the IRC universe, a blacklist approach seemed more on target to me than
a whilelist. I can definitely agree the blacklist in my patch isn't big
enough; '~' and DEL need to be kept out too, for sure. But, beyond
that, I guess my general feeling is that the days of dirt-basic IRC
parsing are over, and clients and servers alike need to be prepared to
deal with it.
Is there something specific you think is likely to break? If we're just
worried about these being used in regular expressions, '.', '+', and
sometimes '-' all have special meanings there too…
Allowing some(!) more characters – probably like 42(+), 45(-), 46(.),
and 95(_) – makes sense to me and should do no harm, I think …
But, for what it's worth, I did confirm that adding this set would cover
our immediate needs.