Merge Request #16
← To merge requests
From
Nyan:neon_fixes
into
jerasure:master
NEON fixes/tweaks
This merge request fixes some issues and adds some tweaks to NEON code:
- SPLIT(16,4) ALTMAP implementation was broken as it only processed half the amount of data. As such, this fixed implementation is significantly slower than the old code (which is to be expected). Fixes #2
- SPLIT(16,4) implementations now merge the ARMv8 and older code path, similar to SPLIT(32,4). This fixes the ALTMAP variant, and also enables the non-ALTMAP version to have consistent sizing
- Unnecessary VTRN removed in non-ALTMAP SPLIT(16,4) as NEON allows (de)interleaving during load/store; because of this, ALTMAP isn't so useful in NEON
- This can also be done for SPLIT(32,4), but I have not implemented it
- I also pulled the
if(xor)
conditional from non-ALTMAP SPLIT(16,4) to outside the loop. It seems to improve performance a bit on my Cortex A7- It probably should be implemented everywhere else, but I have not done this
- CARRY_FREE was incorrectly enabled on all sizes of w, when it's only available for w=4 and w=8
Commits (5)
-
Seems to improve performance a fair bit
-
Also makes the ARMv8 version consistent with the older one, in terms of processing width
-
mentioned in commit 51a1abb9185ec6ea35817620d13322047f4fde4d