Scott Perry scperry@ucsd.edu wrote:
This message is for discussion, the patches attached are not intended to be committed quite yet. I'd like to spend at least one more night of heavy break-in testing before finally signing off on my modified patch or the original.
fair enough. Thanks for testing this.
I've attached my own modified version of the patch (hashtable.mine.diff) for subsequent review. There is only one notable change, in Client_Destroy(), where I've restructured nearly the entire function.
Mhh, this attachment seems to have been lost... could you please re-send it?
I think there's a serious optimization to be made here: If Client is not a server, then no iteration is necessary, so maybe separating this function into two functional portions, one for normal clients and one for servers, would be a good idea, since this function is needlessly inefficient if the Client is a server. This is an especially large bottleneck with a multiserver network, since destroying a server would have taken O(n^2) as written in this patch. This portion of the code was changed in my patch.
Great, i'm looking forward to review your change 8-) You analysis looks correct to me, it's indeed unnecessary to loop through all clients when we're removing a normal user.
@@ -575,13 +635,14 @@ Client_GetFromToken( CLIENT *Client, int
this looks a lot less efficient than it could be. luckily, it's called only once during a connection's lifetime, but if a connection has not been fully registered and logged in, than why not use the token (or a variation of it) as the hashtable's key?
As far is I understood this, the token is only relevant for remote servers. Alex can probably shed some light on this.
In any case, i think that we might later benefit from putting clients and servers in separate tables.
@@ -884,12 +939,11 @@ Client_MyServerCount( void )
this comment applies to all the Client_*Count() functions: performance in these functions is very unfortunate, changing from a hash table to a linked list does not make this any faster, but if performance in these sections of the code is important, then the counts should probably be cached. for extra credit, the cached values could be updated in some way that does not include a O(n) search of the hash table.
You're right, these still iterate through all the tables. I didn't bother changing that, because the patch was already rather intrusive. If this is a problem, we can introduce a counter that keeps track of the relevant number just as you suggested.
@@ -1062,18 +1118,14 @@ Generate_MyToken( CLIENT *Client )
this function could also stand to use some help from caching. it seems raucously inefficient.
Perhaps Alex can explain what this number is about, and what its requirements are....
allowing some way for the builder to set this at configure-time (instead of adding -DNITIAL_HASHSIZE in their CFLAGS) would probably be much appreciated, since network size varies enormously. if nothing else, the initial size should err on the side of being too large, as hash table performance does not suffer considerably from being too empty.
+#define INITIAL_HASHSIZE 23
I wouldn't worry about this; we can always make this configurable at runtime.
+static size_t get_larger_prime(size_t old) +{
- size_t i;
since I've never seen an IRC network with more than 6664223 users, this is probably fine. having artificial limitations allows for a much faster get_larger_primes() than would otherwise be possible. I'd suggest using consecutive primes that are approximately twice the value of the previous in the list (extra credit: as close to a power of 2 without exceeding).
I still believe we should consider using oldsize*2 and rely on a hash function to distribute individual hash values properly. Just feeding all results of ->hashfunc() to the jenkins mix functions will probably be sufficient.
while we're here, it's a good time to mention: iterating over a hash table should be done as little as possible. cscope reports that there are 16 places in the code that call hashtable_iterator_getnext(), which is probably at least 15 too many.
Absolutely.
as per my recent change using strtok_r() in IRC_JOIN(), it's probably worth looking into allowing a caller to provide/use their own iterator context variables.
Yes, I tried to get the API for something like that right, but failed. If you look at the hash table structure, the iterator context is embedded there, I think it could be moved out to a extra parameter passed in by the caller. Suggestions of how to handle this (e.g. how its initialized correctly in the first place, if the iterator context should be opaque, etc.) are welcome.
iterator contexts provided by the caller would be an unagreeably large change to add to an already large patch.
agreed.
There's nothing in the hash table's implementation that seems to require a prime table size (usually only required by doublehashing, which does not use a list of buckets on collisions).
Thats right.
enforcing a power-of-2 table size, since compilers for some platforms compile modulus to a multiply and divide, whereas modulus by a power of 2 is strength-reduced by all at-least-semi intelligent compilers as a simple and instruction. Using powers of 2 for the table size would also make the get_larger_prime function faster and easier, if not misleadingly named.
Yes. However, in that case we should think about forcing the hash value though e.g. the jenkins' hash mix function, i remember that at least one hash function uses the address of a structure as a hash, which would give a bad distribution.
--- ngircd.orig/src/ngircd/irc-info.c 2008-05-25 10:52:59.000000000 -0700 +++ ngircd/src/ngircd/irc-info.c 2008-05-25 13:05:55.000000000 -0700
there should probably be a faster way to iterate through servers than this. iterating through all open connections feels a lot like using a shotgun in a crowd to kill a pickpocket. for a future optimization, maybe servers and users could warrant separate tables?
I should have read all of this before replying :)
this is slow, but so is calling /names on any network. for future scaling, it might be worth looking into supporting the ability to return a "server too busy" reply if the server's input socket has waiting data being blocked by running this command?
I'm not sure about this. There are perhaps hundreds of sockets... If we wanted to introduce such a check, we'd need to re-write io.c and move it from a callback model to a select/poll one (you can't call io_dispatch from within a callback, and almost all functions in ngircd are executed from a callback context). And what about commands what have been read and stored in the read buffer, but haven't been processed yet? I'm worried that we'd end up investing more time in a check than we'd need for actually completing the function. We already have a similar issue with other commands like WHOIS...
@@ -292,7 +292,7 @@ IRC_WALLOPS( CLIENT *Client, REQUEST *Re
again with the iterating. this is really inefficient and something to look at in the future.
And its completely unneeded. We already have a structure that has all the non-local connections (conn.c 'My_ConnArray'), we might be able to use it instead.
Final thoughts:
Most of the lines in this diff are actually due to function renaming or variable renaming. Some of this was necessary (the arguments to Client_Next() and Channel_Next() changed, for instance), but for the most part the function name changes can be seen as code cleanup. Some portions of it are valid bugfixes that could have been split out, but in a few cases the bugfixes interact closely with the new code, so it's very justifiable to include them. As I've thoroughly reviewed and tested this patch, I feel that despite it's size, it is not unnecessarily large or risky.
Still, separating out the variable renames, comments translations etc. would help review and narrow down the culprit if a bug should be found...
Users and servers (and in the future, services) should probably be stored in separate hash tables. Iterating over a hash table of the servers will be notably faster than iterating over a list of connections and filtering out all clients that are not servers.
Almost anywhere that Client_Next() or Channel_Next() is called is a possible place to optimize. In a lot of cases Client_Next() was being used, but most of the set was not being used (when trying to iterate over all servers or opers, for instance). This would be a considerable speedup, especially since iterating over the hash table takes longer than iterating over a linked list by approximately the m/n fill ratio.
Correct.
As ngircd grows and gets used on larger networks, it might be worth finding expensive operations that cannot be sped up more (like /names, or /list) and return a "server too busy" message if processing the command would result in too many waiting inputs going unprocessed for too long.
I wouldn't worry about this too much. If these places are problems, we could e.g. place an upper bound on the number of iterations.
Multithreading would fix this, but most of the code (especially the hash table in this patch and the linked list currently in use) is not thread safe.
The problem is that you'd have to add locks in a lot of places, e.g. all in/output buffers...
Regards, Florian