Commit 363da207236617b1d50f04bb191a14f0de364303

Authored by KMG
2 parents d1b6bbf7 547f67ec
Exists in master and in 1 other branch v3

Merge branch 'wip-manual' into 'master'

convert manual from PDF to HTML

See merge request !11
Manual.pdf
No preview for this file type
manual/gf-complete.html 0 → 100644
... ... @@ -0,0 +1,3484 @@
  1 +<html>
  2 +
  3 +<head>
  4 +
  5 +<link rel="stylesheet" type="text/css" href="style.css">
  6 +
  7 +</head>
  8 +
  9 +<body>
  10 +
  11 +<div id="box">
  12 +
  13 +<h1>
  14 +GF-Complete: A Comprehensive Open Source Library for Galois </br>
  15 +Field Arithmetic
  16 +</h1>
  17 +
  18 +<h1> Version 1.02 </h1>
  19 +
  20 +<h4>James S. Plank* &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp Ethan L. Miller
  21 +Kevin M. Greenan &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp Benjamin A. Arnold<br>
  22 +John A. Burnum &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp Adam W. Disney &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp
  23 +Allen C. McBride
  24 +
  25 +</h4> <br>
  26 +
  27 +
  28 +
  29 +<a href="">
  30 +
  31 +https://bitbucket.org/jimplank/gf-complete
  32 +
  33 + </a><br><br>
  34 +<a href="">
  35 +http://web.eecs.utk.edu/~plank/plank/papers/GF-Complete-Manual-1.02.pdf
  36 +
  37 +
  38 + </a> <br> <br>
  39 +
  40 +
  41 +
  42 +
  43 +
  44 +
  45 +
  46 +</div>
  47 +
  48 +
  49 +<div id="pages_paragraphs_2">
  50 +
  51 +This is a user's manual for GF-Complete, version 1.02. This release supersedes version 0.1 and represents the first
  52 +major release of GF-Complete. To our knowledge, this library implements every Galois Field multiplication technique
  53 +applicable to erasure coding for storage, which is why we named it GF-Complete. The primary goal of this library is
  54 +to allow storage system researchers and implementors to utilize very fast Galois Field arithmetic for Reed-Solomon
  55 +coding and the like in their storage installations. The secondary goal is to allow those who want to explore different
  56 +ways to perform Galois Field arithmetic to be able to do so effectively.
  57 +
  58 +
  59 +<p>
  60 +If you wish to cite GF-Complete, please cite technical report UT-CS-13-716: [PMG<sup>+</sup>13].
  61 +
  62 +</p>
  63 +
  64 +
  65 +<h2>If You Use This Library or Document </h2>
  66 +
  67 +
  68 +
  69 +Please send me an email to let me know how it goes. Or send me an email just to let me know you are using the
  70 +library. One of the ways in which we are evaluated both internally and externally is by the impact of our work, and if
  71 +you have found this library and/or this document useful, we would like to be able to document it. Please send mail to
  72 +<em>plank@cs.utk.edu.</em> Please send bug reports to that address as well.
  73 +
  74 +
  75 +
  76 +<p>
  77 +The library itself is protected by the New BSD License. It is free to use and modify within the bounds of this
  78 +license. To the authors' knowledge, none of the techniques implemented in this library have been patented, and the
  79 +authors are not pursing patents. </p> <br>
  80 +
  81 + </div>
  82 +<div id="footer">
  83 +
  84 +<span id="footer_bar">&nbsp&nbsp&nbsp&nbsp.*plank@cs.utk.edu (University of Tennessee), el </span> <em>m@cs.ucsc.edu </em>(UC Santa Cruz), <em>kmgreen2@gmail.com </em> (Box). This material
  85 +is based upon work supported by the National Science Foundation under grants CNS-0917396, IIP-0934401 and CSR-1016636, plus REU supplements
  86 +CNS-1034216, CSR-1128847 and CSR-1246277. Thanks to Jens Gregor for helping us wade through compilation issues, and for Will
  87 +Houston for his initial work on this library.
  88 +
  89 +</div>
  90 +
  91 +<b>Finding the Code </b>
  92 +<br><br>
  93 +This code is actively maintained on bitbucket:<a href=""> https://bitbucket.org/jimplank/gf-complete. </a> There are
  94 +previous versions on my UTK site as a technical report; however, that it too hard to maintain, so the main version is
  95 +on bitbucket.<br><br>
  96 +
  97 +
  98 +<b>Two Related Papers </b> <br><br>
  99 +
  100 +This software acccompanies a large paper that describes these implementation techniques in detail [PGM13a]. We
  101 +will refer to this as <em> "The Paper." </em> You do not have to read The Paper to use the software. However, if you want to
  102 +start exploring the various implementations, then The Paper is where you'll want to go to learn about the techniques
  103 +in detail.
  104 +
  105 +
  106 +
  107 +<p>This library implements the techniques described in the paper "Screaming Fast Galois Field Arithmetic Using Intel
  108 +SIMD Instructions," [PGM13b]. The Paper describes all of those techniques as well.
  109 +</p><br><br>
  110 +
  111 +<b>If You Would Like HelpWith the Software </b><br><br>
  112 +
  113 +Please contact the first author of this manual.<br><br>
  114 +
  115 +<b>Changes from Revision 1.01</b>
  116 +<br><br>
  117 +The major change is that we are using autoconf to aid with compilation, thus obviating the need for the old <b>flag_tester</b>
  118 +code. Additionally, we have added a quick timing tool, and we have modified <b>gf_methods</b> so that it may be used to
  119 +run the timing tool and the unit tester.
  120 +
  121 +
  122 +
  123 +
  124 +
  125 +
  126 +
  127 +
  128 +
  129 +
  130 +
  131 +
  132 +
  133 +
  134 +
  135 +
  136 +
  137 +
  138 +<br/>
  139 +CONTENT <span class="aligning_page_number"> 3 </span>
  140 +<h2>Contents </h2>
  141 +<div class="index">
  142 +1 <span class="aligning_numbers">Introduction </span> <span class="aligning_page_number"> 5 </span>
  143 + <br><br>
  144 +2 <span class="aligning_numbers">Files in the Library </span> <span class="aligning_page_number"> 6 </span> <br> </div>
  145 +
  146 +<div class="sub_indices">
  147 +2.1 Header files in the directory <b>"include"</b> . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 6 </span> <br>
  148 +2.2 Source files in the <b>"src"</b> directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .<span class="aligning_page_number"> 7 </span> <br>
  149 +2.3 Library tools files in the <b>"tools"</b> directory . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 7 </span> <br>
  150 +2.4 The unit tester in the <b>"test"</b> directory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 8 </span> <br>
  151 +2.5 Example programs in the <b>"examples"</b> directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .<span class="aligning_page_number"> 8 </span>
  152 +
  153 +</div>
  154 +<br>
  155 +<div class="index">
  156 +
  157 +3 <span class="aligning_numbers">Compilation </span><span class="aligning_page_number"> 8 </span> <br> <br>
  158 +4 <span class="aligning_numbers">Some Tools and Examples to Get You Started </span><span class="aligning_page_number"> 8 </span> <br><br> </div>
  159 +
  160 +
  161 +
  162 +<div class="sub_indices">
  163 +4.1 Three Simple Command Line Tools: gf mult, gf div and gf add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 8</span> <br>
  164 +4.2 Quick Starting Example #1: Simple multiplication and division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 9 </span> <br>
  165 +4.3 Quick Starting Example #2: Multiplying a region by a constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 10 </span> <br>
  166 +4.4 Quick Starting Example #3: Using w = 64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 11 </span> <br>
  167 +4.5 Quick Starting Example #4: Using w = 128. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 11 </span>
  168 +</div>
  169 +<br>
  170 +
  171 +
  172 +<div class="index">
  173 +5 <span class="aligning_numbers"> Important Information on Alignment when Multiplying Regions </span><span class="aligning_page_number"> 12</span> <br><br>
  174 +
  175 +6 <span class="aligning_numbers"> The Defaults</span><span class="aligning_page_number"> 13 </span> <br>
  176 +
  177 +</div>
  178 +
  179 +<div class="sub_indices">
  180 +6.1 Changing the Defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .<span class="aligning_page_number"> 14 </span> <br>
  181 +
  182 +
  183 +<ul style="list-style-type:none;">
  184 +<li>6.1.1 Changing the Components of a Galois Field with <b> create_gf_from_argv() </b> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 15 </span> <br>
  185 +</li>
  186 +<li>
  187 +6.1.2 Changing the Polynomial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 16 </span> <br>
  188 +</li>
  189 +<li>
  190 +6.1.3 Changing the Multiplication Technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .<span class="aligning_page_number"> 17 </span>
  191 +</li>
  192 +
  193 +
  194 +<li>
  195 +6.1.4 Changing the Division Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 19 </span>
  196 +</li>
  197 +
  198 +
  199 +<li>
  200 +6.1.5 Changing the Region Technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..<span class="aligning_page_number"> 19 </span>
  201 +</li>
  202 +</ul>
  203 +6.2 Determining Supported Techniques with <b>gf_methods</b> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 20</span> <br>
  204 +
  205 +6.3 Testing with <b>gf_unit, gf_time,</b> and <b>time_tool.sh </b>. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 21</span>
  206 +
  207 +<ul style="list-style-type:none;">
  208 +<li>
  209 +6.3.1 <b>time_tool.sh</b> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . <span class="aligning_page_number"> 22 </span>
  210 +</li>
  211 +
  212 +<li>
  213 +6.3.2 An example of <b>gf_methods</b> and <b>time_tool.sh</b> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . .<span class="aligning_page_number"> 23 </span>
  214 +</li>
  215 +
  216 +</ul>
  217 +
  218 +6.4 Calling <b>gf_init_hard()</b> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . . . <span class="aligning_page_number"> 24</span> <br>
  219 +
  220 +6.5 <b>gf_size()</b> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . . .. . <span class="aligning_page_number"> 26</span> <br><br>
  221 +</div>
  222 +
  223 +
  224 +<div class="index">
  225 +8 <span class="aligning_numbers"> Further Information on Options and Algorithms </span><span class="aligning_page_number"> 26 </span> </div> <br><br> </div>
  226 +<div class="sub_indices">
  227 +7.1 Inlining Single Multiplication and Division for Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 26 </span> <br>
  228 +7.2 Using different techniques for single and region multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 27 </span> <br>
  229 +7.3 General <em>w</em> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 28 </span><br>
  230 +
  231 +7.4 Arguments to <b>"SPLIT"</b> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 28</span> <br>
  232 +7.5 Arguments to <b>"GROUP"</b> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number">29 </span> <br>
  233 +7.6 Considerations with <b>"COMPOSITE"</b> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number">30 </span> <br>
  234 +7.7 <b>"CARRY FREE"</b> and the Primitive Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number">31 </span> <br>
  235 +7.8 More on Primitive Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . <span class="aligning_page_number">31 </span> <br>
  236 +
  237 +
  238 +<ul style="list-style-type:none;">
  239 +<li>
  240 +7.8.1 Primitive Polynomials that are not Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 31</span> <br>
  241 +
  242 +</li>
  243 +<li>7.8.2 Default Polynomials for Composite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 32</span> <br>
  244 +
  245 +</li>
  246 +</ul>
  247 +
  248 +</div>
  249 +
  250 +
  251 +
  252 +
  253 +
  254 +
  255 +
  256 +
  257 +
  258 +
  259 +
  260 +<br/>
  261 +CONTENT <span class="aligning_page_number"> 4 </span>
  262 +
  263 +<div class="sub_indices">
  264 +<ul style="list-style-type:none">
  265 +<li> 7.8.3 The Program <b>gf_poly</b> for Verifying Irreducibility of Polynomials </span><span class="aligning_page_number"> 33 </span>
  266 +</li>
  267 +</ul>
  268 +
  269 +
  270 +7.9<span class="aligning_numbers"><b>"ALTMAP"</b> considerations and <b>extract_word()</b> </span><span class="aligning_page_number"> 34 </span>
  271 +<ul style="list-style-type:none">
  272 +<li>
  273 +
  274 +7.9.1 Alternate mappings with <b>"SPLIT"</b> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .<span class="aligning_page_number"> 34</span> <br>
  275 +</li>
  276 +<li>
  277 +7.9.2 Alternate mappings with <b>"COMPOSITE"</b> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 36 </span> <br>
  278 +</li>
  279 +<li>
  280 +7.9.3 The mapping of <b>"CAUCHY"</b> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . .. <span class="aligning_page_number"> 37 </span> <br>
  281 +</li>
  282 +</ul>
  283 +</div>
  284 +
  285 +
  286 +8 <span class="aligning_numbers"><b>Thread Safety </b></span><span class="aligning_page_number"> 37 </span> <br><br> </div>
  287 +
  288 +9 <span class="aligning_numbers"><b>Listing of Procedures</b> </span><span class="aligning_page_number"> 37 </span> <br><br> </div>
  289 +
  290 +10 <span class="aligning_numbers"><b>Troubleshooting</b> </span><span class="aligning_page_number"> 38 </span> <br><br> </div>
  291 +11 <span class="aligning_numbers"><b>Timings</b> </span><span class="aligning_page_number"> 41 </span> <br><br> </div>
  292 +
  293 +<div class="sub_indices">
  294 +11.1 Multiply() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . .. . . . <span class="aligning_page_number"> 42</span> <br>
  295 +11.2 Divide() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . . . .. . . . . <span class="aligning_page_number"> 42 </span> <br>
  296 +11.3 Multiply Region() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . <span class="aligning_page_number"> 43 </span> <br>
  297 +</div>
  298 +
  299 +
  300 +
  301 +
  302 +
  303 +
  304 +<br/>
  305 +INTRODUCTION <span class="aligning_page_number"> 5 </span>
  306 +
  307 +
  308 +<h3>1 Introduction </h3>
  309 +
  310 +Galois Field arithmetic forms the backbone of erasure-coded storage systems, most famously the Reed-Solomon
  311 +erasure code. A Galois Field is defined over w-bit words and is termed <em>GF(2<sup>w</sup>).</em> As such, the elements of a Galois
  312 +Field are the integers 0, 1, . . ., 2<sup>w</sup> - 1. Galois Field arithmetic defines addition and multiplication over these closed
  313 +sets of integers in such a way that they work as you would hope they would work. Specifically, every number has a
  314 +unique multiplicative inverse. Moreover, there is a value, typically the value 2, which has the property that you can
  315 +enumerate all of the non-zero elements of the field by taking that value to successively higher powers.
  316 +
  317 +
  318 +<p>Addition in a Galois Field is equal to the bitwise exclusive-or operation. That's nice and convenient. Multiplication
  319 +is a little more complex, and there are many, many ways to implement it. The Paper describes them all, and the
  320 +following references providemore supporting material: [Anv09, GMS08, LHy08, LD00, LBOX12, Pla97]. The intent
  321 +of this library is to implement all of the techniques. That way, their performancemay be compared, and their tradeoffs
  322 +may be analyzed. <p>
  323 +
  324 +
  325 +
  326 +
  327 +<ol>
  328 +
  329 +When used for erasure codes, there are typically five important operations:<br>
  330 +<li> <b>Adding two numbers in </b> GF(2<sup>w</sup>). That's bitwise exclusive-or. </li>
  331 +<li> <b>Multiplying two numbers in</b> GF(2<sup>w</sup>). Erasure codes are usually based on matrices in GF(2<sup>w</sup>), and constructing
  332 +these matrices requires both addition and multiplication.</li>
  333 +<li> <b>Dividing two numbers in </b>GF(2<sup>w</sup>). Sometimes you need to divide to construct matrices (for example, Cauchy
  334 +Reed-Solomon codes [BKK<sup>+</sup>95, Rab89]). More often, though, you use division to invert matrices for decoding.
  335 +Sometimes it is easier to find a number's inverse than it is to divide. In that case, you can divide by multiplying
  336 +by an inverse. </li>
  337 +
  338 +<li><b>adding two regions of numbers in</b> GF(2<sup>w</sup>), which will be explained along with... </li>
  339 +<li> <b>Mutiplying a region of numbers in </b>GF(2<sup>w</sup>) by a constant in GF(2<sup>w</sup>). Erasure coding typically boils down
  340 +to performing dot products in GF(2<sup>w</sup>). For example, you may define a coding disk using the equation: </li><br>
  341 +
  342 +
  343 +
  344 +
  345 +<center>c<em><sub>0</sub></em>= d<em><sub>0</sub></em> + 2d<em><sub>1</sub></em> + 4d<em><sub>2</sub></em> + 8d<em><sub>3</sub></em>.</sup> </center><br>
  346 +
  347 +That looks like three multiplications and three additions However, the way ' implemented in a disk system
  348 +looks as in Figure 1. Large regions of disks are partitioned into w-bit words in GF(2<sup>w</sup>). In the example, let us
  349 +suppose that <em>w</em> = 8, and therefore that words are bytes. Then the regions pictured are 1 KB from each disk.
  350 +The bytes on disk Di are labeled d<sub>i,0,</sub> d<sub>i,1, . . . ,</sub> d<sub>i,1023,</sub> and the equation above is replicated 1024 times. For
  351 +0 &#8804 j < 1024:
  352 +<br><br>
  353 +<center>c<em><sub>0,j</sub></em> = d<em><sub>0,j</sub></em> + 2d<em><sub>1,j</sub></em> + 4d<em><sub>2,j</sub></em> + 8d<em><sub>3,j</sub></em> . </center>
  354 +<br>
  355 +
  356 +
  357 +While it's possible to implement each of these 1024 equations independently, using the single multiplication
  358 +and addition operations above, it is often much more efficient to aggregate. For example, most computer architectures
  359 +support bitwise exclusive-or of 64 and 128 bit words. Thus, it makes much more sense to add regions
  360 +of numbers in 64 or 128 bit chunks rather than as words in GF(2<sup>w</sup>). Multiplying a region by a constant can
  361 +leverage similar optimizations. </ol>
  362 +
  363 +
  364 +<p>GF-Complete supports multiplication and division of single values for all values of <em>w</em> &#8804 32, plus <em>w</em> = 64 and <em>w</em> =
  365 +128. It also supports adding two regions of memory (for any value of <em>w</em>, since addition equals XOR), and multiplying
  366 +a region by a constant in <em>GF(2<sup>4</sup>), GF(2<sup>8</sup>), GF(2<sup>16</sup>), GF(2<sup>32</sup>), GF(2<sup>64</sup>) and GF(2<sup>128</sup>).</em> These values are chosen
  367 +because words in GF(2<sup>w</sup>) fit into machine words with these values of <em>w.</em> Other values of w don't lend themselves
  368 +to efficient multiplication of regions by constants (although see the <b>"CAUCHY"</b> option in section 6.1.5 for a way to
  369 +multiply regions for other values of <em>w</em>).</p>
  370 +
  371 +
  372 +
  373 +
  374 +
  375 +
  376 +<br/>
  377 +
  378 +2 &nbsp &nbsp <em> FILES IN THE LIBRARY </em> <span id="index_number">6 </span> <br><br><br>
  379 +
  380 +
  381 +
  382 +<div class="image-cell_1"> </div> <br><br><br>
  383 +
  384 +Figure 1: An example of adding two regions of numbers, and multiplying a region of numbers by a constant
  385 +in <em>GF(2<sup>w</sup>) </em>. In this example, <em>w</em> = 8, and each disk is holding a 1KB region. The same coding equation -
  386 +c<sub>0,j</sub></b> = d<sub>0,j</sub> + ad<sub>1,j</sub> + a<sup>2</sup>d<sub>2,j</sub> + a<sup>3</sup>d<sub>3,j</sub> is applied 1024 times. However, rather than executing this equation 1024
  387 +times, it is more efficient to implement this with three region-constant multiplications and three region-region additions.
  388 +
  389 +<h3>2 &nbsp&nbsp&nbsp Files in the Library </h3>
  390 +This section provides an overview of the files that compose GF-Complete. They are partitioned among multiple
  391 +directories.
  392 +
  393 +<h4> <b>2.1 &nbsp&nbsp&nbsp Header files in the directory "include"</b> </h4>
  394 +
  395 +The following header files are part of GF-Complete.
  396 +<ul>
  397 +<li><b>gf_complete.h:</b> This is the header file that applications should include. It defines the gf_t type, which holds
  398 +all of the data that you need to perform the various operations in GF(2<sup>w</sup>). It also defines all of the arithmetic
  399 +operations. For an application to use this library, you should include gf_complete.h and then compile with the
  400 +library src/libgf_complete.la. </li><br>
  401 +
  402 +<li><b>gf_method.h:</b> If you are wanting to modify the implementation techniques from the defaults, this file provides
  403 +a "helper" function so that you can do it from the Unix command line.
  404 +</li><br>
  405 +
  406 +<li><b>gf_general.h:</b> This file has helper routines for doing basic Galois Field operations with any legal value of <em>w.</em>
  407 +The problem is that <em>w </em> &#8804 32, <em>w </em> = 64 and <em> w </em> = 128 all have different data types, which is a pain. The procedures
  408 +in this file try to alleviate that pain. They are used in <b>gf_mult, gf_unit</b> and <b>gf_time.</b> I'm guessing that most
  409 +applications won't use them, as most applications use <em>w</em> &#8804 32. </li><br>
  410 +
  411 +<li><b>gf_rand.h:</b> I've learned that <b>srand48()</b> and its kin are not supported in all C installations. Therefore, this file
  412 +defines some randomnumber generators to help test the programs. The randomnumber generator is the "Mother
  413 +</li>
  414 +
  415 +</ul>
  416 +
  417 +
  418 +
  419 +
  420 +
  421 +
  422 +
  423 +<br/>
  424 +
  425 +2 &nbsp &nbsp <em> FILES IN THE LIBRARY </em> <span id="index_number">7 </span> <br><br><br>
  426 +<ul>
  427 +
  428 +of All" random number generator [Mar94] which we've selected because it has no patent issues. <b>gf_unit</b> and
  429 +gf time use these random number generators.<br><br>
  430 +<li><b>gf_int.h:</b> This is an internal header file that the various source files use. This is <em>not</em> intended for applications to
  431 +include.</li><br>
  432 +<li><b>config.xx</b> and <b>stamp-h1</b> are created by autoconf, and should be ignored by applications. </li>
  433 +</ul>
  434 +
  435 +<h3>2.2 &nbsp &nbsp <b> Source files in the "src" directory" </b> </h3>
  436 +<ul>
  437 +The following C files compose <b>gf_complete.a,</b> and they are in the direcoty src. You shouldn't have to mess with these
  438 +files, but we include them in case you have to:<br><br>
  439 +<li><b> gf_.c:</b> This implements all of the procedures in both <b>gf_complete.h</b> and <b>gf_int.h.</b> </li><br>
  440 +<li><b> gf_w4.c:</b> Procedures specific to <em>w </em> = 4. </li><br>
  441 +<li> <b>gf_w8.c:</b> Procedures specific to <em>w </em> = 8</li><br>
  442 +<li> <b>gf_w16.c:</b> Procedures specific to <em>w </em> = 16</li><br>
  443 +<li> <b>gf_w32.c:</b> Procedures specific to <em>w </em> = 32</li><br>
  444 +<li><b>gf_w64.c:</b> Procedures specific to <em>w </em> = 64</li><br>
  445 +<li> <b>gf_w128.c:</b> Procedures specific to <em>w </em> = 128</li><br>
  446 +<li> <b>gf_wgen.c:</b> Procedures specific to other values of <em>w </em> between 1 and 31</li><br>
  447 +<li> <b>gf_general.c:</b> Procedures that let you manipulate general values, regardless of whether <em>w </em> &#8804 32, <em>w </em> = 64
  448 +or <em>w </em> = 128. (I.e. the procedures defined in <b>gf_ general.h</b>)</li><br>
  449 +<li> <b>gf_method.c:</b> Procedures to help you switch between the various implementation techniques. (I.e. the procedures
  450 +defined in <b>gf_method.h</b>)</li><br>
  451 +<li> <b>gf_ rand.c:</b>"The Mother of all" random number generator. (I.e. the procedures defined in <b>gf_rand.h</b>)</li><br> </ul>
  452 +
  453 +<h3>2.3 &nbsp &nbsp Library tools files in the "tools" directory </h3>
  454 +
  455 +<ul>
  456 +The following are tools to help you with Galois Field arithmetic, and with the library. They are explained in greater
  457 +detail elsewhere in this manual.<br><br>
  458 +<li> <b>gf_mult.c, gf_ div.c</b> and <b>gf_ add:</b> Command line tools to do multiplication, division and addition by single numbers</li><br>
  459 +<li> <b>gf_time.c:</b> A program that times the procedures for given values of <em>w </em> and implementation options</li><br>
  460 +<li> <b>time tool.sh:</b> A shell script that helps perform rough timings of the various multiplication, division and region
  461 +operations in GF-Complete</li><br>
  462 +<li> <b>gf_methods.c:</b> A program that enumerates most of the implementation methods supported by GF-Complete</li><br>
  463 +<li> <b> gf_poly.c:</b> A program to identify irreducible polynomials in regular and composite Galois Fields</li><br>
  464 +
  465 +</ul>
  466 +
  467 +
  468 +
  469 +
  470 +
  471 +
  472 +
  473 +
  474 +<br/>
  475 +
  476 +3 &nbsp &nbsp <em> COMPILATION </em> <span id="index_number">8 </span> <br><br><br>
  477 +
  478 +
  479 +<h3>2.4 &nbsp &nbsp The unit tester in the "test" directory </h3>
  480 +
  481 +The test directory contains the proram <b>gf_unit.c,</b> which performs a battery of unit tests on GF-Complete. This is
  482 +explained in more detail in section 6.3.
  483 +
  484 +
  485 +<h3>2.5&nbsp &nbsp Example programs in the "examples" directory </h3>
  486 +
  487 +There are seven example programs to help you understand various facets of GF-Complete. They are in the files
  488 +<b>gf_example x.c </b> in the <b>examples</b> directory. They are explained in sections 4.2 through 4.5, and section 7.9.<br><br>
  489 +
  490 +<h2>3 &nbsp &nbsp Compilation </h2>
  491 +
  492 +<em>From revision 1.02 forward, we are using autoconf. The old "flag tester" directory is now gone, as it is no longer in
  493 +use. </em><br><br>
  494 +To compile and install, you should do the standard operations that you do with most open source Unix code:<br><br>
  495 +
  496 +UNIX> ./configure <br>
  497 +... <br>
  498 +UNIX> make <br>
  499 +... <br>
  500 +UNIX> sudo make install <br><br>
  501 +
  502 +
  503 +<p>If you perform the <b>install,</b> then the header, source, tool, and library files will be moved to system locations. In
  504 +particular, you may then compile the library by linking with the flag <b>-lgf_complete,</b> and you may use the tools from a
  505 +global executable directory (like <b>/usr/local/bin</b>). </p>
  506 +
  507 +<p>
  508 +If you don't perform the install, then the header and tool files will be in their respective directories, and the library
  509 +will be in <b>src/libgf_complete.la.</b> </p>
  510 +<p>
  511 +If your system supports the various Intel SIMD instructions, the compiler will find them, and GF-Complete will
  512 +use them by default. </p>
  513 +
  514 +
  515 +
  516 +<h2>4 &nbsp &nbsp Some Tools and Examples to Get You Started </h2>
  517 +<h3>4.1 Three Simple Command Line Tools: gf_mult, gf_div and gf_add </h3>
  518 +
  519 +
  520 +Before delving into the library, it may be helpful to explore Galois Field arithmetic with the command line tools:
  521 +<b>gf_mult, gf_div </b> and <b>gf_add.</b> These perform multiplication, division and addition on elements in <em>GF(2<sup>w</sup>).</em> If these are
  522 +not installed on your system, then you may find them in the tools directory. Their syntax is:
  523 +<ul>
  524 +<li><b>gf_mult a b</b> <em>w </em> - Multiplies a and b in <em> GF(2<sup>w</sup>)</em>. </li><br>
  525 +<li> <b>gf_div a b </b><em>w </em> - Divides a by b in GF(2<em><sup>w </sup></em>). </li><br>
  526 +<li><b>gf_add a b </b> <em>w </em> - Adds a and b in GF(2<em><sup>w </sup> </em>). </li><br>
  527 +
  528 +You may use any value of <em>w </em> from 1 to 32, plus 64 and 128. By default, the values are read and printed in decimal;
  529 +however, if you append an 'h' to <em>w </em>, then <em>a, b </em> and the result will be printed in hexadecimal. For <em>w </em> = 128, the 'h' is
  530 +mandatory, and all values will be printed in hexadecimal.
  531 +
  532 +
  533 +
  534 +
  535 +
  536 +
  537 +
  538 +<br/>
  539 +
  540 +4 &nbsp &nbsp <em> SOME TOOLS AND EXAMPLES TO GET YOU STARTED 9 </em> <span id="index_number">9 </span> <br><br><br>
  541 +
  542 +
  543 +<p>Try them out on some examples like the ones below. You of course don't need to know that, for example, 5 * 4 = 7
  544 +in <em>GF(2<sup>4 </sup>) </em>; however, once you know that, you know that 7/
  545 +5 = 4 and 7/4 = 5. You should be able to verify the <b>gf_add</b>
  546 +statements below in your head. As for the other <b>gf_mult's</b>, you can simply verify that division and multiplication work
  547 +with each other as you hope they would. </p>
  548 +<br><br>
  549 +<div id="number_spacing">
  550 +
  551 +UNIX> gf_mult 5 4 4 <br>
  552 +7 <br>
  553 +UNIX> gf_div 7 5 4 <br>
  554 +4 <br>
  555 +UNIX> gf_div 7 4 4 <br>
  556 +5 <br>
  557 +UNIX> gf_mult 8000 2 16h <br>
  558 +100b <br>
  559 +UNIX> gf_add f0f0f0f0f0f0f0f0 1313131313131313 64h <br>
  560 +e3e3e3e3e3e3e3e3 <br>
  561 +UNIX> gf_mult f0f0f0f0f0f0f0f0 1313131313131313 64h <br>
  562 +8da08da08da08da0 <br>
  563 +UNIX> gf_div 8da08da08da08da0 1313131313131313 64h <br>
  564 +f0f0f0f0f0f0f0f0 <br>
  565 +UNIX> gf_add f0f0f0f0f0f0f0f01313131313131313 1313131313131313f0f0f0f0f0f0f0f0 128h <br>
  566 +e3e3e3e3e3e3e3e3e3e3e3e3e3e3e3e3 <br>
  567 +UNIX> gf_mult f0f0f0f0f0f0f0f01313131313131313 1313131313131313f0f0f0f0f0f0f0f0 128h <br>
  568 +786278627862784982d782d782d7816e <br>
  569 +UNIX> gf_div 786278627862784982d782d782d7816e f0f0f0f0f0f0f0f01313131313131313 128h <br>
  570 +1313131313131313f0f0f0f0f0f0f0f0 <br>
  571 +UNIX> <br><br>
  572 +
  573 +</div>
  574 +
  575 +
  576 +Don't bother trying to read the source code of these programs yet. Start with some simpler examples like the ones
  577 +below. <br><br>
  578 +
  579 +<h3>4.2 Quick Starting Example #1: Simple multiplication and division </h3>
  580 +
  581 +The source files for these examples are in the examples directory.
  582 +<p>These two examples are intended for those who just want to use the library without getting too complex. The
  583 +first example is <b>gf_example 1,</b> and it takes one command line argument - w, which must be between 1 and 32. It
  584 +generates two random non-zero numbers in <em>GF(2<sup>w </sup>) </em> and multiplies them. After doing that, it divides the product by
  585 +each number. </p>
  586 +<p>
  587 +To perform multiplication and division in <em>GF(2<sup>w </sup>) </em>, you must declare an instance of the gf_t type, and then initialize
  588 +it for <em>GF(2<sup>w </sup>) </em> by calling <b>gf_init_easy().</b> This is done in <b>gf_example 1.c</b> with the following lines: </p><br><br>
  589 +
  590 +gf_t gf; <br><br>r
  591 +... <br><br>
  592 +if (!gf_init_easy(&gf, w)) { <br>
  593 +fprintf(stderr, "Couldn't initialize GF structure.\n"); <br>
  594 +exit(0); <br>
  595 +} <br>
  596 +
  597 +
  598 +
  599 +
  600 +
  601 +
  602 +<br/>
  603 +
  604 +4 &nbsp &nbsp <em> SOME TOOLS AND EXAMPLES TO GET YOU STARTED </em> <span id="index_number">10 </span> <br><br><br>
  605 +
  606 +<p>Once <b>gf</b> is initialized, you may use it for multiplication and division with the function pointers <b>multiply.w32</b> and
  607 +<b>divide.w32.</b> These work for any element of <em>GF(2<sup>w</sup>)</em> so long as w &#8804 32. </p> <br><br>
  608 +
  609 +<div id="number_spacing">
  610 +<div style="padding-left:54px">
  611 +c = gf.multiply.w32(&gf, a, b);<br>
  612 +printf("%u * %u = %u\n", a, b, c);<br><br>
  613 +printf("%u / %u = %u\n", c, a, gf.divide.w32(&gf, c, a));<br>
  614 +printf("%u / %u = %u\n", c, b, gf.divide.w32(&gf, c, b));<br>
  615 +
  616 +
  617 +</div> </div>
  618 +<br><br>
  619 +Go ahead and test this program out. You can use <b>gf_mult</b> and <b>gf_div</b> to verify the results:<br><br>
  620 +
  621 +<div id="number_spacing">
  622 +UNIX> gf_example_1 4 <br>
  623 +12 * 4 = 5 <br>
  624 +5 / 12 = 4 <br>
  625 +5 / 4 = 12 <br>
  626 +UNIX> gf_mult 12 4 4 <br>
  627 +5 <br>
  628 +UNIX> gf_example_1 16 <br>
  629 +14411 * 60911 = 44568 <br>
  630 +44568 / 14411 = 60911 <br>
  631 +44568 / 60911 = 14411 <br>
  632 +UNIX> gf_mult 14411 60911 16 <br>
  633 +44568 <br>
  634 +UNIX> <br><br>
  635 +</div>
  636 +
  637 +<b>gf_init_easy()</b> (and <b>later_gf_init_hard()</b>) do call <b>malloc()</b> to implement internal structures. To release memory, call
  638 +<b>gf_free().</b> Please see section 6.4 to see how to call <b>gf_init_hard()</b> in such a way that it doesn't call <b>malloc().</b> <br><br>
  639 +
  640 +
  641 +
  642 +<h3>4.3 &nbsp &nbsp &nbspQuick Starting Example #2: Multiplying a region by a constant </h3>
  643 +
  644 +
  645 +The program <b>gf_example</b> 2 expands on <b>gf_example</b> 1. If <em>w</em> is equal to 4, 8, 16 or 32, it performs a region multiply
  646 +operation. It allocates two sixteen byte regions, <b>r1</b> and <b>r2,</b> and then multiples <b>r1</b> by a and puts the result in <b>r2</b> using
  647 +the <b>multiply_region.w32</b> function pointer: <br><br>
  648 +
  649 +<div style="padding-left:52px">
  650 +gf.multiply_region.w32 (&gf, r1, r2, a, 16, 0); <br><br>
  651 +</div>
  652 +
  653 +That last argument specifies whether to simply place the product into r2 or to XOR it with the contents that are already
  654 +in r2. Zero means to place the product there. When we run it, it prints the results of the <b>multiply_region.w32</b> in
  655 +hexadecimal. Again, you can verify it using gf mult:<br><br>
  656 +<div id="number_spacing">
  657 +UNIX> gf_example_2 4 <br>
  658 +12 * 2 = 11 <br>
  659 +11 / 12 = 2 <br>
  660 +11 / 2 = 12 <br><br>
  661 +multiply_region by 0xc (12) <br><br>
  662 +R1 (the source): 0 2 d 9 d 6 8 a 8 d b 3 5 c 1 8 8 e b 0 6 1 5 a 2 c 4 b 3 9 3 6 <br>
  663 +R2 (the product): 0 b 3 6 3 e a 1 a 3 d 7 9 f c a a 4 d 0 e c 9 1 b f 5 d 7 6 7 e <br>
  664 +
  665 +</div>
  666 +
  667 +
  668 +
  669 +
  670 +
  671 +
  672 +
  673 +
  674 +
  675 +
  676 +<br/>
  677 +
  678 +4 &nbsp &nbsp <em> SOME TOOLS AND EXAMPLES TO GET YOU STARTED </em> <span id="index_number">11 </span> <br><br><br>
  679 +
  680 +<div id="number_spacing">
  681 +<table cellpadding="6">
  682 +<tr><td>UNIX></td> <td colspan="4"> gf_example_2 16 </td> </tr>
  683 +
  684 +<tr>
  685 +
  686 +<td>49598</td> <td> * </td> <td> 35999</td> <td> = </td> <td>19867 </td> </tr>
  687 +
  688 +<tr><td>19867 </td><td>/ </td> <td> 49598 </td> <td> = </td> <td>35999 </td> </tr>
  689 +<tr><td>19867</td><td> /</td> <td> 35999 </td> <td> = </td> <td> 49598 </td> </tr> </table><br>
  690 +
  691 +
  692 +&nbsp multiply_region by 0xc1be (49598) <br><br>
  693 +
  694 +
  695 +<table cellpadding="6" >
  696 +<tr>
  697 +<td>R1 (the source):</td> <td> 8c9f </td> <td> b30e </td> <td> 5bf3 </td> <td> 7cbb </td> <td>16a9 </td> <td> 105d </td> <td> 9368 </td> <td> 4bbe </td> </tr>
  698 +<td>R2 (the product):</td> <td> 4d9b</td> <td> 992d </td> <td> 02f2 </td> <td> c95c </td> <td> 228e </td> <td> ec82 </td> <td> 324e </td> <td> 35e4 </td></tr>
  699 +</table>
  700 +</div>
  701 +<div id="number_spacing">
  702 +<div style="padding-left:9px">
  703 +UNIX> gf_mult c1be 8c9f 16h<br>
  704 +4d9b <br>
  705 +UNIX> gf_mult c1be b30e 16h <br>
  706 +992d <br>
  707 +UNIX> <br><br>
  708 +</div>
  709 +</div>
  710 +
  711 +<h3>4.4 &nbsp &nbsp &nbsp Quick Starting Example #3: Using <em>w </em>= 64 </h3>
  712 +The program in <b>gf_example 3.c </b> is identical to the previous program, except it uses <em> GF(2<sup>64 </sup>). </em> Now <em>a, b</em> and <em> c </em> are
  713 +<b>uint64 t'</b>s, and you have to use the function pointers that have <b>w64</b> extensions so that the larger types may be employed.
  714 +<br><br>
  715 +<div id="number_spacing">
  716 +
  717 +UNIX> gf_example_31
  718 +<table cellpadding="6">
  719 +<tr>
  720 +
  721 +<td>a9af3adef0d23242 </td> <td> * </td> <td> 61fd8433b25fe7cd</td> <td> = </td> <td>bf5acdde4c41ee0c </td> </tr>
  722 +
  723 +<td>bf5acdde4c41ee0c </td> <td> / </td> <td> a9af3adef0d23242 </td> <td> = </td> <td>61fd8433b25fe7cd </td> </tr>
  724 +<td>bf5acdde4c41ee0c </td> <td> / </td> <td> 61fd8433b25fe7cd </td> <td>= </td> <td>a9af3adef0d23242 </td> </tr>
  725 +</table><br><br>
  726 +
  727 +&nbsp multiply_region by a9af3adef0d23242<br><br>
  728 +<table cellpadding="6" >
  729 +<tr>
  730 +<td>R1 (the source): </td> <td> 61fd8433b25fe7cd </td> <td>272d5d4b19ca44b7 </td> <td> 3870bf7e63c3451a </td> <td> 08992149b3e2f8b7 </td> </tr>
  731 +<tr><td>R2 (the product): </td> <td> bf5acdde4c41ee0c </td> <td> ad2d786c6e4d66b7 </td> <td> 43a7d857503fd261 </td> <td> d3d29c7be46b1f7c </td> </tr>
  732 +</table>
  733 +
  734 +<div style="padding-left:9px">
  735 +
  736 +UNIX> gf_mult a9af3adef0d23242 61fd8433b25fe7cd 64h <br>
  737 +bf5acdde4c41ee0c<br>
  738 +UNIX><br><br>
  739 +</div>
  740 +</div>
  741 +<h3>4.5 &nbsp &nbsp &nbsp Quick Starting Example #4: Using <em>w </em>= 128 </h3>
  742 +Finally, the program in <b>gf_example_4.c</b> uses <em>GF(2<sup>128</sup>).</em> Since there is not universal support for uint128 t, the library
  743 +represents 128-bit numbers as arrays of two uint64 t's. The function pointers for multiplication, division and region
  744 +multiplication now accept the return values as arguments:<br><br>
  745 +
  746 +gf.multiply.w128(&gf, a, b, c); <br><br>
  747 +
  748 +Again, we can use <b>gf_mult </b> and <b>gf_div </b>to verify the results:<br><br>
  749 +<div id="number_spacing">
  750 +<div style="padding-left:9px">
  751 +UNIX> gf_example_4 </div>
  752 +<table cellpadding="6" >
  753 +<tr>
  754 +
  755 +<td>e252d9c145c0bf29b85b21a1ae2921fa </td> <td> * </td> <td> b23044e7f45daf4d70695fb7bf249432 </td> <td> = </td> </tr>
  756 +<tr><td>7883669ef3001d7fabf83784d52eb414 </td> </tr>
  757 +
  758 +</table>
  759 +
  760 +</div>
  761 +
  762 +
  763 +
  764 +
  765 +
  766 +
  767 +
  768 +
  769 +<br/>
  770 +
  771 +4 &nbsp &nbsp <em> IMPORTANT INFORMATION ON ALIGNMENT WHEN MULTIPLYING REGIONS </em> <span id="index_number">12 </span> <br><br><br>
  772 +
  773 +<div id="number_spacing">
  774 +multiply_region by e252d9c145c0bf29b85b21a1ae2921fa <br>
  775 +R1 (the source): f4f56f08fa92494c5faa57ddcd874149 b4c06a61adbbec2f4b0ffc68e43008cb <br>
  776 +R2 (the product): b1e34d34b031660676965b868b892043 382f12719ffe3978385f5d97540a13a1 <br>
  777 +UNIX> gf_mult e252d9c145c0bf29b85b21a1ae2921fa f4f56f08fa92494c5faa57ddcd874149 128h <br>
  778 +b1e34d34b031660676965b868b892043 <br>
  779 +UNIX> gf_div 382f12719ffe3978385f5d97540a13a1 b4c06a61adbbec2f4b0ffc68e43008cb 128h<br>
  780 +e252d9c145c0bf29b85b21a1ae2921fa<br>
  781 +UNIX><br><br>
  782 +
  783 +</div>
  784 +
  785 +
  786 +<h2>5 &nbsp &nbsp &nbspImportant Information on Alignment when Multiplying Regions </h2>
  787 +
  788 +
  789 +
  790 +In order to make multiplication of regions fast, we often employ 64 and 128 bit instructions. This has ramifications
  791 +for pointer alignment, because we want to avoid bus errors, and because on many machines, loading and manipulating
  792 +aligned quantities is much faster than unalinged quantities.<br><br>
  793 +
  794 +
  795 +When you perform multiply_region.wxx(<em>gf, source, dest, value, size, add </em>), there are three requirements:
  796 +<ol>
  797 +<li>
  798 + The pointers <em>source</em> and <em>dest </em> must be aligned for <em>w</em>-bit words. For <em>w </em> = 4 and <em>w </em> = 8, there is no restriction;
  799 +however for <em>w </em> = 16, the pointers must be multiples of 2, for <em>w </em> = 32, they must be multiples of 4, and for
  800 +<em>w </em> &#1013; {64, 128}, they must be multiples of 8. </li><br>
  801 +
  802 +<li> The <em>size</em> must be a multiple of &#91; <em>w /
  803 +</em>
  804 +8 .&#93;
  805 + With <em>w </em> = 4 and <em>w </em> = 8, <em>w/ </em>
  806 +8 = 1 and there is no restriction. The other
  807 +sizes must be multiples of <em>w </em>/
  808 +8 because you have to be multiplying whole elements of <em> GF(2<sup>w </sup>) </em>. </li><br>
  809 +
  810 +<li> The <b>source</b> and <b>dest</b> pointers must be aligned identically with respect to each other for the implementation
  811 +chosen. This is subtle, and we explain it in detail in the next few paragraphs. However, if you'd rather not figure
  812 +it out, the following recommendation will <em>always </em> work in GF-Complete: </li>
  813 +
  814 +</ol>
  815 +
  816 +
  817 +
  818 +<div style="padding-left:100px">
  819 +<b>If you want to be safe, make sure that source and dest are both multiples of 16. That is not a
  820 +strict requirement, but it will always work! </b> <br><br>
  821 +</div>
  822 +
  823 +
  824 +If you want to relax the above recommendation, please read further.
  825 +<p>When performing <b>multiply_region.wxx() </b>, the implementation is typically optimized for a region of bytes whose
  826 +size must be a multiple of a variable <em>s </em> ,, and which must be aligned to a multiple of another variable <em>t </em>. For example,
  827 +when doing <b>multiply_region.w32() </b> in <em> GF(2<sup>16 </sup>) </em> with SSE enabled, the implementation is optimized for regions of
  828 +32 bytes, which must be aligned on a 16-byte quantity. Thus, <em>s </em> = 32 and <em>t</em> = 16. However, we don't want <b>multiply_
  829 +region.w32() </b> to be too restrictive, so instead of requiring <em>source</em> and <em> dest </em> to be aligned to 16-byte regions, we
  830 +require that (<em>source </em> mod 16) equal (<em>dest</em> mod 16). Or, in general, that (<em>source</em> mod t) equal (<em>dest</em> mod <em>t</em>). </p>
  831 +
  832 +
  833 +<p>
  834 +Then, <b>multiply_region.wxx()</b> proceeds in three phases. In the first phase,<b> multiply.wxx()</b> is called on successive
  835 +words until (<em>source</em> mod <em>t</em>) equals zero. The second phase then performs the optimized region multiplication on
  836 +chunks of <em> s </em>bytes, until the remaining part of the region is less than s bytes. At that point, the third phase calls
  837 +<em>multiply.wxx() </em> on the last part of the region. </p>
  838 +
  839 +A detailed example helps to illustrate. Suppose we make the following call in <em>GF(2<sup>16</sup>) </em> with SSE enabled:<br><br>
  840 +<center><b>multiply region.w32(gf, 0x10006, 0x20006, a, 274, 0)</b> </center>
  841 +
  842 +
  843 +
  844 +
  845 +
  846 +
  847 +
  848 +<br/>
  849 +
  850 +2 &nbsp &nbsp <em> FILES IN THE LIBRARY </em> <span id="index_number">13 </span> <br><br><br>
  851 +
  852 +
  853 +
  854 +<div class="image-cell_2"> </div> <br><br><br>
  855 +
  856 +Figure 2: Example of multiplying a region of 274 bytes in GF(216) when (source mod 16) = (dest mod 16) = 6. The
  857 +alignment parameters are s = 32 and t = 16. The multiplication is in three phases, which correspond to the initial
  858 +unaligned region (10 bytes), the aligned region of s-byte chunks (256 bytes), and the final leftover region (8 bytes).
  859 +
  860 +
  861 +<p>First, note that <em>source</em> and <em>dest</em> are aligned on two-byte quantities, which they must be in <em>GF(2<sup>16</sup>).</em> Second, note
  862 +that size is a multiple of &#91; 16/
  863 +8 &#93 = 2. And last, note that (<em>source</em> mod 16) equals (<em>dest</em> mod 16). We illustrate the three
  864 +phases of region multiplication in Figure 2. Because (<em>source</em> mod 16) = 6, there are 10 bytes of unaligned words that
  865 +are multiplied with five calls to <b>multiply.w32()</b> in the first phase. The second phase multiplies 256 bytes (eight chunks
  866 +of <em>s</em> = 32 bytes) using the SSE instructions. That leaves 8 bytes remaining for the third phase.
  867 +</p>
  868 +
  869 +<p>
  870 +When we describe the defaults and the various implementation options, we specify s and t as "alignment parameters."
  871 +</p>
  872 +<p>
  873 +One of the advanced region options is using an alternate mapping of words to memory ("ALTMAP"). These interact
  874 +in a more subtle manner with alignment. Please see Section 7.9 for details.
  875 +</p>
  876 +
  877 +<h3> 6 &nbsp &nbspThe Defaults </h3>
  878 +
  879 +
  880 +GF-Complete implements a wide variety of techniques for multiplication, division and region multiplication. We have
  881 +set the defaults with three considerations in mind:
  882 +<ol>
  883 +<li>
  884 +<b>Speed:</b> Obviously, we want the implementations to be fast. Therefore, we choose the fastest implementations
  885 +that don’t violate the other considerations. The compilation environment is considered. For example, if SSE is
  886 +enabled, region multiplication in <em> GF(2<sup>4 </sup>) </em> employs a single multiplication table. If SSE is not enabled, then a
  887 +"double" table is employed that performs table lookup two bytes at a time. </li><br>
  888 +<li>
  889 +<b>Memory Consumption:</b> We try to keep the memory footprint of GF-Complete low. For example, the fastest
  890 +way to perform <b>multiply.w32()</b> in <em>GF(2<sup>32</sup>) </em> is to employ 1.75 MB of multiplication tables (see Section 7.4
  891 +below). We do not include this as a default, however, because we want to keep the default memory consumption
  892 +of GF-Complete low.
  893 +</li>
  894 +
  895 +</ul>
  896 +
  897 +
  898 +
  899 +
  900 +
  901 +
  902 +<br/>
  903 +
  904 +6 &nbsp &nbsp <em> THE DEFAULTS </em> <span id="index_number">14 </span> <br><br><br>
  905 +
  906 +<ul>
  907 +
  908 +3. &nbsp <b>Compatibility with "standard" implementations:</b> While there is no <em>de facto</em> standard of Galois Field arithmetic,
  909 +most libraries implement the same fields. For that reason, we have not selected composite fields, alternate
  910 +polynomials or memory layouts for the defaults, even though these would be faster. Again, see section 7.7 for
  911 +more information.
  912 +
  913 +</ul>
  914 +
  915 +<p>Table 1 shows the default methods used for each power-of-two word size, their alignment parameters <em>s</em> and <em> t,</em> their
  916 +memory consumption and their rough performance. The performance tests are on an Intel Core i7-3770 running at
  917 +3.40 GHz, and are included solely to give a flavor of performance on a standard microprocessor. Some processors
  918 +will be faster with some techniques and others will be slower, so we only put numbers in so that you can ballpark it.
  919 +For other values of <em>w</em> between 1 and 31, we use table lookup when w &#8804 8, discrete logarithms when w &#8804 16 and
  920 +"Bytwop" for w &#8804 32. </p>
  921 +<br><br>
  922 +<center> With SSE
  923 +<div id="data1">
  924 +<table cellpadding="6" cellspacing="0">
  925 +<tr>
  926 +<th>w </th><th class="double_border" >Memory <br> Usage </br> </th><th>multiply() <br> Implementation</th><th>Performance <br>(Mega Ops / s) </th><th>multiply region() <br> Implementation </th>
  927 +<th>s </th> <th>t </th> <th> Performance <br>(MB/s)</th>
  928 +</tr>
  929 +<tr>
  930 +<td>4 </td><td class="double_border"><1K </td><td>Table</td><td>501</td><td>Table</td>
  931 +<td>16 </td><td>16 </td> <td>11,659</td> </tr>
  932 +
  933 +<tr>
  934 +<td>8 </td><td class="double_border">136K </td><td>Table</td><td>501</td><td>Split Table (8,4)</td>
  935 +<td>16 </td><td>16 </td> <td>11,824</td> </tr>
  936 +
  937 +<tr>
  938 +<td>16 </td><td class="double_border">896K </td><td>Log</td><td>260</td><td>Split Table (16,4)</td>
  939 +<td>32 </td><td>16 </td> <td>7,749</td> </tr>
  940 +
  941 +<tr>
  942 +<td>32 </td><td class="double_border"><1K </td><td>Carry-Free</td><td>48</td><td>Split Table (32,4)</td>
  943 +<td>64 </td><td>16 </td> <td>5,011</td> </tr>
  944 +
  945 +<tr>
  946 +<td>64 </td><td class="double_border">2K </td><td>Carry-Free</td><td>84</td><td>Split Table (64,4)</td>
  947 +<td>128 </td><td>16 </td> <td>2,402</td> </tr>
  948 +
  949 +<tr>
  950 +<td>128 </td><td class="double_border">64K </td><td>Carry-Free</td><td>48</td><td>Split Table (128,4)</td>
  951 +<td>16 </td><td>16 </td> <td>833</td> </tr>
  952 +</table></div>
  953 +
  954 +
  955 +<div id="data1">
  956 +<center>Without SE </center>
  957 +<table cellpadding="6" cellspacing="0">
  958 +<tr>
  959 +<th>w </th><th>Memory <br> Usage </br> </th><th>multiply() <br> Implementation</th><th>Performance <br>(Mega Ops / s) </th><th>multiply region() <br> Implementation </th>
  960 +<th>s </th> <th>t </th> <th> Performance <br>(MB/s)</th>
  961 +</tr>
  962 +<tr>
  963 +<td>4 </td><td>4K </td><td>Table</td><td>501</td><td>Double Table</td>
  964 +<td>16 </td><td>16 </td> <td>11,659</td> </tr>
  965 +
  966 +<tr>
  967 +<td>8 </td><td>128K </td><td>Table</td><td>501</td><td>Table</td>
  968 +<td>1 </td><td>1 </td> <td>1,397</td> </tr>
  969 +
  970 +<tr>
  971 +<td>16 </td><td>896K </td><td>Log</td><td>266</td><td>Split Table (16,8)</td>
  972 +<td>32 </td><td>16 </td> <td>2,135</td> </tr>
  973 +
  974 +<tr>
  975 +<td>32 </td><td>4K </td><td>Bytwop</td><td>19</td><td>Split Table (32,4)</td>
  976 +<td>4 </td><td>4 </td> <td>1,149</td> </tr>
  977 +
  978 +<tr>
  979 +<td>64 </td><td>16K </td><td>Bytwop</td><td>9</td><td>Split Table (64,4)</td>
  980 +<td>8 </td><td>8 </td> <td>987</td> </tr>
  981 +
  982 +<tr>
  983 +<td>128 </td><td>64K </td><td>Bytwop</td><td>1.4</td><td>Split Table (128,4)</td>
  984 +<td>16 </td><td>8 </td> <td>833</td> </tr>
  985 +</table>
  986 +</div>
  987 +</center>
  988 +<br><br>