Issue #7

Created by Nyan

Use native word size instead of int64

A number of places use uint64_t type variables when not strictly necessary. It works well if the native word size is 64-bits, but can really hurt performance when it isn't.

I suggest changing the type depending on the target architecture. Something like uint_fastX_t might not be a bad place to start, although it's not guaranteed to be the most efficient, so might be better to define your own types.

I got a significant speed boost for w16 mSPLIT(16,8) on 32-bit by simply changing the uint64_t variables to uint_fast32_t (and the corresponding shift operations to be based on sizeof(uint_fast32_t).

