24 Oct, 2014

1 commit

  • Optimisations for the 4,16 split table region multiplications.
    
    Selected time_tool.sh 16 -A -B results for a 1.7 GHz cortex-a9:
    Region Best (MB/s):   532.14   W-Method: 16 -m SPLIT 16 4 -r SIMD -
    Region Best (MB/s):   212.34   W-Method: 16 -m SPLIT 16 4 -r NOSIMD -
    Region Best (MB/s):   801.36   W-Method: 16 -m SPLIT 16 4 -r SIMD -r ALTMAP -
    Region Best (MB/s):    93.20   W-Method: 16 -m SPLIT 16 4 -r NOSIMD -r ALTMAP -
    Region Best (MB/s):   273.99   W-Method: 16 -m SPLIT 16 8 -
    Region Best (MB/s):   270.81   W-Method: 16 -m SPLIT 8 8 -
    Region Best (MB/s):    70.42   W-Method: 16 -m COMPOSITE 2 - -
    Region Best (MB/s):   393.54   W-Method: 16 -m COMPOSITE 2 - -r ALTMAP -
    Janne Grunau