Reduce size of antilog table for w=16
Currently, the antilog table for w=16 consumes 256KB of memory as there are 128K entries.
Since mod 65535 can be computed fairly quickly (see below) we can easily halve the size of this table down to 64K entries. Shaving 128KB off the size can make the table fit into a CPU's L2 cache more easily and can improve performance.
Unfortunately this change does affect some user-facing functionality, though I presume most users don't actually use the tables for anything other than inline multiply/divide.
I've made some not fully tested changes, which can be found here: http://lab.jerasure.org/Nyan/gf-complete/commits/w16_antilog I haven't changed LOG_ZERO as I don't think the implementation is particularly useful (and the idea of a larger table goes against the goal here)
This isn't so useful for w=8 and smaller, since the tables can easily fit in L1 cache anyway.
If n mod 65535 != 0, then: n mod 65535 == (n >> 16) + (n & 65536), for 0 <= n <= (65535*2)
The only extra cases that need to be handled are n=65535 and n=65535*2, where n mod 65535 == 0 but the above gives 65535. This can easily be worked around by setting antilog_tbl = antilog_tbl
The overhead of these 3 (shift, and, add) simple operations (divide needs an additional add) should be masked by L2 cache latency, so overall performance should be largely unaffected.