Merge Request #18

Merged
jerasure/gf-complete!18
Created by bassamtabbara

Support for runtime detection of SIMD

This merge request adds support for runtime SIMD detection. The idea is that you would build gf-complete with full SIMD support, and gf_init will select the appropriate function at runtime based on the capabilities of the target machine. This would eliminate the need to build different versions of the code for different processors (you still need to build for different archs). Ceph for example has 3-4 flavors of jerasure on Intel (and does not support PCLMUL optimizations as a result of using to many binaries). Numerous libraries have followed as similar approach include zlib.

When reviewing this merge request I recommend that you look at each of the 5 commits independently. The first 3 commits don't change the existing logic. Instead they add debugging functions and test scripts that facilitate testing of the 4th and commit. The 4th commit is where all the new logic goes along with tests. The 5th commit fixes build scripts.

I've tested this on x86_64, arm, and aarch64 using QEMU. Numerous tests have been added that help this code and could help with future testing of gf-complete. Also I've compared the functions selected with the old code (prior to runtime SIMD support) with the new code and all functions are identical. Here's a gist with the test results prior to SIMD extensions: https://gist.github.com/bassamtabbara/d9a6dcf0a749b7ab01bc2953a359edec.

Assignee: bassamtabbara
Milestone: None

Merged by bassamtabbara

Commits (7)
  • Bassam Tabbara
     
  • Bassam Tabbara
     
  • ax_ext.m4 no longer performs any CPU checks. Instead it just checks
    if the the compile supports SIMD flags.
    
    Runtime detection will choose the right methods base on CPU
    instructions available.
    
    Intel AVX support is still done through the build since it would
    require a major refactoring of the code base to support it at runtime.
    For now I added a configuration flag --enable-avx that can be used
    to compile with AVX support.
    
    Also use cpu intrinsics instead of __asm__
    Bassam Tabbara
     
  • This commits adds support for runtime detection of SIMD instructions. The idea is that you would build once with all supported SIMD functions and the same binaries could run on different machines with varying support for SIMD. At runtime gf-complete will select the right functions based on the processor.
    
    gf_cpu.c has the logic to detect SIMD instructions. On Intel processors this is done through cpuid. For ARM on linux we use getauxv.
    
    The logic in gf_w*.c has been changed to check for runtime SIMD support and fallback to generic code.
    
    Also a new test has been added. It compares the functions selected by gf_init when we enable/disable SIMD support through build flags, with runtime enabling/disabling. The test checks if the results are identical.
    Bassam Tabbara
     
  • This commit adds a couple of scripts that help test SIMD functionality
    on different machines through QEMU.
    
    tools/test_simd_qemu.sh will automatically start qemu, run tests
    and stop it. it uses the Ubuntu cloud images which are built for
    x86_64, arm and arm64.
    
    tools/test_simd.sh run a number of tests including compiling
    with different flags, unit tests, and gathering the functions
    selected in gf_init (and when compiling with DEBUG_FUNCTIONS)
    Bassam Tabbara
     
  • There is currently no way to figure out which functions were selected
    during gf_init and as a result of SIMD options. This is not even possible
    in gdb since most functions are static.
    
    This commit adds a new macro SET_FUNCTION that records the name of the
    function selected during init inside the gf_internal structure. This macro
    only works when DEBUG_FUNCTIONS is defined during compile. Otherwise the
    code works exactly as it did before this change.
    
    The names of selected functions will be used during testing of SIMD
    runtime detection.
    
    All calls such as:
    
    gf->multiply.w32 = gf_w16_shift_multiply;
    
    need to be replaced with the following:
    
    SET_FUNCTION(gf,multiply,w32,gf_w16_shift_multiply)
    
    Also added a new flag to tools/gf_methods that will print the names of
    functions selected during gf_init.
    Bassam Tabbara
     
  • .gitignore to ignore some autotools files and tests.
    Bassam Tabbara
     
4 participants
  • 3bd79d22d1246850e00b0d6f770253fe?s=40&d=identicon
    bassamtabbara @bassamtabbara

    @dachary @Nyan does this look good to you?

    Choose File ...   File name...
    Cancel
  • Loic avatar small 75dpi
    Loic Dachary @dachary

    mentioned in merge request !17

    Choose File ...   File name...
    Cancel
  • Loic avatar small 75dpi
    Loic Dachary @dachary

    @bassamtabbara http://jerasure.org/jerasure/gf-complete/merge_requests/16 has been merged, could you rebase and run tests to verify all pass on ARM this time around ?

    Choose File ...   File name...
    Cancel
  • Loic avatar small 75dpi
    Loic Dachary @dachary

    Did you run the test suite on a ARMv8 bare metal ?

    Choose File ...   File name...
    Cancel
  • Loic avatar small 75dpi
    Loic Dachary @dachary

    Overall it looks great, thanks a lot for this contribution.

    Choose File ...   File name...
    Cancel
  • Loic avatar small 75dpi
    Loic Dachary @dachary

    I was able to successfully run gf-complete/tools$ bash -x test_simd_qemu.sh.

    Choose File ...   File name...
    Cancel
  • Loic avatar small 75dpi
    Loic Dachary @dachary

    @bassamtabbara would you mind rebasing ? I'd like to run a make check with valgrind. Not that I see anything likely to change the way memory is allocated / freed, just to be on the safe side.

    Choose File ...   File name...
    Cancel
  • 3bd79d22d1246850e00b0d6f770253fe?s=40&d=identicon
    bassamtabbara @bassamtabbara (Edited )

    awesome nice to see it run on another machine besides mine. I've rebased to pickup your commits. I'm running final tests now, will update this branch after tests pass.

    Choose File ...   File name...
    Cancel
  • 3bd79d22d1246850e00b0d6f770253fe?s=40&d=identicon
    bassamtabbara @bassamtabbara

    Added 20 new commits:

    • 7a9a09f3 - CARRY_FREE is currently only available for w=4 and w=8 on NEON
    • f373b138 - Initial fix for SPLIT(16,4) ALTMAP NEON (non ARMv8)
    • 438283c1 - Use similar strategy for SPLIT(16,4) ALTMAP NEON implementation as SPLIT(32,4)
    • 05057e56 - Eliminate unnecessary VTRNs in SPLIT(16,4) NEON implementation
    • 643743d0 - Move conditional outside loop for NEON SPLIT4 implementation
    • 51a1abb9 - Merge branch 'neon_fixes' into 'master'
    • 9f9f005a - Fix a number of conversion issues in the HTML manual
    • 8fe7382e - Merge branch 'manual' into 'master'
    • 62b702d5 - do not memcpy if src and dst are the same
    • 22cd7b15 - add --enable-valgrind for make check
    • e2dd917e - increase the verbosity of make check failures
    • f940bf3b - log-zero-ext: workaround for uninitialized memory
    • 185295f2 - Merge branch 'wip-valgrind' into 'master'
    • 22352ca0 - Remove generated autotools files from the build. Also update
    • 87f0d439 - Add support for printing functions selected in gf_init
    • 7761438c - Add SIMD test helpers
    • 4339569f - Support for runtime SIMD detection
    • ad110421 - Simplify SIMD make scripts
    • 0e5c920f - gf_multby_one now checks runtime SIMD support
    • 0690ba86 - Added --enable flags for debugging runtime SIMD
    Choose File ...   File name...
    Cancel
  • 3bd79d22d1246850e00b0d6f770253fe?s=40&d=identicon
    bassamtabbara @bassamtabbara

    @dachary Intel tests pass after the rebase. ARM tests still running. I pushed the changes anyway. I also incorporated your feedback.

    Choose File ...   File name...
    Cancel
  • 3bd79d22d1246850e00b0d6f770253fe?s=40&d=identicon
    bassamtabbara @bassamtabbara

    Reassigned to @bassamtabbara

    Choose File ...   File name...
    Cancel
  • 3bd79d22d1246850e00b0d6f770253fe?s=40&d=identicon
    bassamtabbara @bassamtabbara

    @dachary tests passed in QEMU and on bare-metal ARMv8 in cloud lab.

    I'll wait for @KMG feedback before merging.

    Choose File ...   File name...
    Cancel
  • Chuck
    KMG @kmg

    @bassamtabbara and @dachary lgtm! Sorry it took so long...

    Choose File ...   File name...
    Cancel
  • 3bd79d22d1246850e00b0d6f770253fe?s=40&d=identicon
    bassamtabbara @bassamtabbara
    Choose File ...   File name...
    Cancel
  • 3bd79d22d1246850e00b0d6f770253fe?s=40&d=identicon
    bassamtabbara @bassamtabbara

    @KMG thanks!

    Choose File ...   File name...
    Cancel