Issue #11

0 up
0 down
Created by Nyan

SSE2 based region multiply (fallback for when SSSE3 is unavailable)

This is perhaps more of a discussion rather than a request/report - just that I can't really find any other place where this is suitable (I hope this is okay with everyone!).

Thanks to GF-Complete, we have an ultra-fast SSSE3 shuffle-based method for GF region multiplication. Unfortunately, it does require a CPU with SSSE3 support. I've been trying to think up a compatible method for CPUs without byte shuffle support (but with SSE2 support, which is much more prevalent amongst CPUs), which is faster than the fallback to non-SIMD code.

I've come up with the following idea:

Whilst the idea isn't complex, an optimised implementation seems to be very complicated, and the method has a number of other warts.

The above page contains a link to a hacky patch for w=16, which you definitely don't want to merge (but might apply for testing purposes, if interested).

Now the thing is, I don't know much about the mathematics/science behind all of this at all, and in fact, have only started reading about erasure coding about two months ago. I don't know whether this approach has been tried before, and, as far as I can tell, does seem to be a viable method.

What I'm interested is: what is the viability of this, and what sort of improvements can be made? Any other feedback/suggestions/comments about such an SSE2 approach?

Assignee: None
Milestone: None
1 participant