Analysis inline, final thoughts below.
This was generated by valgrind running ngircd using my latest patch on top of Florian's hash table patch:
ERROR SUMMARY: 15 errors from 6 contexts (suppressed: 19 from 1)
This error occurred 5 times (the other 4 times have been omitted from this paste), and is an error in libc.
1 errors in context 1 of 6: Conditional jump or move depends on uninitialised value(s) at 0x5085B5E: vfprintf (in /lib/libc-2.7.so) by 0x50AB9E9: vsnprintf (in /lib/libc-2.7.so) by 0x415F87: Log (log.c:248) by 0x402542: NGIRCd_Init (ngircd.c:805) by 0x402E2E: main (ngircd.c:269)
This is an error in ld, which is something we have no control over.
2 errors in context 4 of 6: Invalid read of size 8 at 0x4015EE4: (within /lib/ld-2.7.so) by 0x400AB93: (within /lib/ld-2.7.so) by 0x40061E4: (within /lib/ld-2.7.so) by 0x4008677: (within /lib/ld-2.7.so) by 0x4012048: (within /lib/ld-2.7.so) by 0x400DDF5: (within /lib/ld-2.7.so) by 0x401191A: (within /lib/ld-2.7.so) by 0x514DF7F: (within /lib/libc-2.7.so) by 0x400DDF5: (within /lib/ld-2.7.so) by 0x514E0E6: __libc_dlopen_mode (in /lib/libc-2.7.so) by 0x512803C: __nss_lookup_function (in /lib/libc-2.7.so) by 0x57A444A: ??? Address 0x53a4978 is 16 bytes inside a block of size 21 alloc'd at 0x4C22FAB: malloc (vg_replace_malloc.c:207) by 0x4008B75: (within /lib/ld-2.7.so) by 0x4012048: (within /lib/ld-2.7.so) by 0x400DDF5: (within /lib/ld-2.7.so) by 0x401191A: (within /lib/ld-2.7.so) by 0x514DF7F: (within /lib/libc-2.7.so) by 0x400DDF5: (within /lib/ld-2.7.so) by 0x514E0E6: __libc_dlopen_mode (in /lib/libc-2.7.so) by 0x512803C: __nss_lookup_function (in /lib/libc-2.7.so) by 0x57A444A: ??? by 0x57A5968: ??? by 0x50DCA40: getpwnam_r (in /lib/libc-2.7.so)
supp: 19 dl-hack3-1
IN SUMMARY: 15 errors from 6 contexts (suppressed: 19 from 1)
malloc/free: in use at exit: 8,456 bytes in 266 blocks. malloc/free: 6,430 allocs, 6,164 frees, 895,921 bytes allocated.
searching for pointers to 266 not-freed blocks. checked 134,152 bytes.
This is also not something we can do anything about, but luckily getpwdnam() is only called once when the ircd is being initialized, so this error is not a concern.
584 (104 direct, 480 indirect) bytes in 2 blocks are definitely lost in loss record 2 of 6 at 0x4C22FAB: malloc (vg_replace_malloc.c:207) by 0x5128240: (within /lib/libc-2.7.so) by 0x5128AFE: __nss_database_lookup (in /lib/libc-2.7.so) by 0x57A442F: ??? by 0x57A5968: ??? by 0x50DCA40: getpwnam_r (in /lib/libc-2.7.so) by 0x50DC3D3: getpwnam (in /lib/libc-2.7.so) by 0x4022DF: NGIRCd_Init (ngircd.c:692) by 0x402E2E: main (ngircd.c:269)
LEAK SUMMARY: definitely lost: 104 bytes in 2 blocks. indirectly lost: 480 bytes in 20 blocks. possibly lost: 0 bytes in 0 blocks. still reachable: 7,872 bytes in 244 blocks. suppressed: 0 bytes in 0 blocks. Reachable blocks (those to which a pointer was found) are not shown. To see them, rerun with: --leak-check=full --show-reachable=yes memcheck: sanity checks: 8301 cheap, 109 expensive memcheck: auxmaps: 0 auxmap entries (0k, 0M) in use memcheck: auxmaps_L1: 0 searches, 0 cmps, ratio 0:10 memcheck: auxmaps_L2: 0 searches, 0 nodes memcheck: SMs: n_issued = 38 (608k, 0M) memcheck: SMs: n_deissued = 0 (0k, 0M) memcheck: SMs: max_noaccess = 524287 (8388592k, 8191M) memcheck: SMs: max_undefined = 0 (0k, 0M) memcheck: SMs: max_defined = 246 (3936k, 3M) memcheck: SMs: max_non_DSM = 38 (608k, 0M) memcheck: max sec V bit nodes: 16 (1k, 0M) memcheck: set_sec_vbits8 calls: 49 (new: 16, updates: 33) memcheck: max shadow mem size: 4753k, 4M translate: fast SP updates identified: 5,407 ( 86.2%) translate: generic_known SP updates identified: 719 ( 11.4%) translate: generic_unknown SP updates identified: 142 ( 2.2%) tt/tc: 1,004,341 tt lookups requiring 1,032,617 probes tt/tc: 1,004,341 fast-cache updates, 5 flushes transtab: new 6,324 (152,518 -> 2,321,122; ratio 152:10) [0 scs] transtab: dumped 0 (0 -> ??) transtab: discarded 300 (6,085 -> ??) scheduler: 830,201,474 jumps (bb entries). scheduler: 8,301/14,083,053 major/minor sched events. sanity: 8302 cheap, 109 expensive checks. exectx: 769 lists, 221 contexts (avg 0 per list) exectx: 12,628 searches, 12,445 full compares (985 per 1000) exectx: 274 cmp2, 226 cmp4, 0 cmpAll errormgr: 27 supplist searches, 924 comparisons during search errormgr: 34 errlist searches, 279 comparisons during search
Valgrind reported the same errors and leak when I tested using the vanilla ngircd from git.
I haven't been able to test the performance improvement, since my testing harness currently supports a maximum of around 1000 clients and does a lot of blocking/waiting. This results in no more than 200kbps peak network load out of the ircd, experienced when the ircd is sending WELCOME and MOTD messages. It would be interesting to see the results from a test harness that scales better and makes an attempt to represent 'normal' IRC traffic.
I've attached a revised version of my patch. To use, first apply Florian's original hashtable.diff from my previous email and then apply my changes (attached: hashtable.addendum.diff) on top of it. This change fixes the performance issue in Client_Destroy and does some code reorganization within the method so it makes more sense. The performance (and organization) issue existed in both the patched and vanilla ngircd branches, and the fix for vanilla is very similar and would result in immediate gains since servers are destroyed much less frequently than users.
At this point, I'd be confident shipping the hashtable branch. It seems very solid, and all the changes make sense.
./scott