Re: [ngIRCd-ML] Re: Hash table backing store patch (and general optimization thoughts)

26 Mai 2008

Goodness, I forgot to reply all.  Sorry for the double, Florian.
Both the patches are attached to this message.  Not sure what happened  
to the last one…
A few replies, none of which are anything terribly new:
On May 26, 2008, at 9:19, Florian Westphal wrote:
...
...
...
@@ -575,13 +635,14 @@ Client_GetFromToken( CLIENT *Client, int
this looks a lot less efficient than it could be.  luckily, it's  
called
only once during a connection's lifetime, but if a connection has  
not been
fully registered and logged in, than why not use the token (or a  
variation
of it) as the hashtable's key?
As far is I understood this, the token is only relevant for remote
servers. Alex can probably shed some light on this.
In any case, i think that we might later benefit from putting clients
and servers in separate tables.
There are multiple places in the code where the client table is  
iterated over for the purpose of iterating over just servers.  This is  
a great place for a later change to focus.
...
...
since I've never seen an IRC network with more than 6664223 users,  
this is
probably fine.  having artificial limitations allows for a much  
faster
get_larger_primes() than would otherwise be possible.  I'd suggest  
using
consecutive primes that are approximately twice the value of the  
previous
in the list (extra credit: as close to a power of 2 without  
exceeding).
I still believe we should consider using oldsize*2 and rely on a hash
function to distribute individual hash values properly. Just
feeding all results of ->hashfunc() to the jenkins mix functions will
probably be sufficient.
Agreed.  Enforcing tables that are a power of 2 would make things very  
simple, but I don't think many compiler will optimize out the modulus  
because it would be against a variable.  we might have to do something  
like hash & (tablesize - 1), as long as the table size is well-enforced.
...
...
as per my recent change using strtok_r() in IRC_JOIN(), it's  
probably worth
looking into allowing a caller to provide/use their own iterator  
context
variables.
Yes, I tried to get the API for something like that right, but failed.
If you look at the hash table structure, the iterator context is
embedded there, I think it could be moved out to a extra parameter
passed in by the caller. Suggestions of how to handle this (e.g. how
its initialized correctly in the first place, if the iterator
context should be opaque, etc.) are welcome.
This would be something to do in a later change, but I'll have a go on  
it once we're done with this change.
...
Yes. However, in that case we should think about forcing the hash  
value
though e.g. the jenkins' hash mix function, i remember that at least  
one
hash function uses the address of a structure as a hash, which would
give a bad distribution.
True.  Requiring better hashes of all data going into the hash table  
would be a fairly large change.  Power of 2 table sizes and other  
optimizations should wait for anther change.
...
...
...

--- ngircd.orig/src/ngircd/irc-info.c	2008-05-25  
10:52:59.000000000 -0700
+++ ngircd/src/ngircd/irc-info.c	2008-05-25 13:05:55.000000000 -0700
there should probably be a faster way to iterate through servers  
than this.
iterating through all open connections feels a lot like using a  
shotgun in
a crowd to kill a pickpocket.  for a future optimization, maybe  
servers and
users could warrant separate tables?
I should have read all of this before replying :)
Well it's not my fault your patch had the hash table stuff all the way  
at the end :P
...
...
this is slow, but so is calling /names on any network.  for future  
scaling,
it might be worth looking into supporting the ability to return a  
"server
too busy" reply if the server's input socket has waiting data being  
blocked
by running this command?
I'm not sure about this. There are perhaps hundreds of sockets...
If we wanted to introduce such a check, we'd need to re-write io.c
and move it from a callback model to a select/poll one (you can't call
io_dispatch from within a callback, and almost all functions in  
ngircd are
executed from a callback context).
And what about commands what have been read and stored in the read  
buffer,
but haven't been processed yet?
I'm worried that we'd end up investing more time in a check than  
we'd need for
actually completing the function. We already have a similar
issue with other commands like WHOIS...
Good point.
...
...
...
@@ -292,7 +292,7 @@ IRC_WALLOPS( CLIENT *Client, REQUEST *Re
again with the iterating.  this is really inefficient and something  
to look
at in the future.
And its completely unneeded. We already have a structure that has all
the non-local connections (conn.c 'My_ConnArray'), we might be able to
use it instead.
Ah, I missed that.  This would be something good to change shortly  
after hash tables get merged in.
...
...
Final thoughts:
Most of the lines in this diff are actually due to function  
renaming or
variable renaming.  Some of this was necessary (the arguments to
Client_Next() and Channel_Next() changed, for instance), but for  
the most
part the function name changes can be seen as code cleanup.  Some  
portions
of it are valid bugfixes that could have been split out, but in a  
few cases
the bugfixes interact closely with the new code, so it's very  
justifiable
to include them.  As I've thoroughly reviewed and tested this  
patch, I feel
that despite it's size, it is not unnecessarily large or risky.
Still, separating out the variable renames, comments translations etc.
would help review and narrow down the culprit if a bug should be
found...
We can, but a lot of the variable renaming (like from c -> Chan in the  
latter part of  Remove_Client()) was done because intermediate code  
was removed and c was made unnecessary.  In all cases the change made  
the code less ambiguous as well (since in Remove_Client() there are  
also variables for Client, Reason, Origin, etc.  c was rather out of  
place and confusing).
...
...
As ngircd grows and gets used on larger networks, it might be worth  
finding
expensive operations that cannot be sped up more (like /names, or / 
list)
and return a "server too busy" message if processing the command  
would
result in too many waiting inputs going unprocessed for too long.
I wouldn't worry about this too much.
If these places are problems, we could e.g. place an upper bound on  
the number
of iterations.
That's a good solution.
...
...
Multithreading would fix this, but most of the code (especially the  
hash
table in this patch and the linked list currently in use) is not  
thread
safe.
The problem is that you'd have to add locks in a lot of places, e.g.  
all
in/output buffers...
Well, I never said it would be easy. :)
I'm doing some last rounds of testing this morning, and then I think  
the patches are good to go.
./scott

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [ngIRCd-ML] Re: Hash table backing store patch (and general optimization thoughts)