Fix sse runtime detection

As said in #16, current sse runtime detection actually doesn't work, you may get Illegal instruction on SSE2 only machine when compiled with SSE3/4 support.

So this Merge Request separates all SSE functions to its own file and compile them with corresponding flags.

The whole diff looks large, but actually I didn't change any code logic, I just cut/paste.

The first commit adds a flag --enable-qemu, so you can test the SSE much easily by qemu-user(install qemu-user-static on Debian/Ubuntu). you can only apply this commit to see the Illegal instruction errors produced by qemu.

