Implementing carryless multiplication using normal multiplication











up vote
0
down vote

favorite












In carryless multiplication the partial products are XORed (ie addition without carry) together rather than added normally. The partial products themselves are still the product of a power of two (namely a bit extracted from one operand) and the other operand. For a multiplication by a power of two there is no difference between carryless multiplication and regular multiplication, so this step can be done with a normal multiplication. But then in JavaScript the question arises, which multiplication? It could be written like this:



function clmul_u32(a, b) {
var prod = 0;
while (a != 0) {
prod ^= Math.imul(b, a & -a);
a &= a - 1;
}
return prod;
}


Using Math.imul is correct by definition, so no problems there.



Or it could be written like this:



function clmul_u32(a, b) {
var prod = 0;
while (a != 0) {
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}


This version avoids Math.imul, which I am often told to "avoid or polyfill", but it is less clear why this version should work. Multiplication by a power of two is actually safe, it cannot mangle the integer part of significand unlike multiplication by values that have simultaneously a low number of leading zeroes and a low number of trailing zeroes. I feel like if I used this version, I would have to write a long-winded comment justifying its correctness.



So which way should I go? Or perhaps there are better ways entirely?










share|improve this question




























    up vote
    0
    down vote

    favorite












    In carryless multiplication the partial products are XORed (ie addition without carry) together rather than added normally. The partial products themselves are still the product of a power of two (namely a bit extracted from one operand) and the other operand. For a multiplication by a power of two there is no difference between carryless multiplication and regular multiplication, so this step can be done with a normal multiplication. But then in JavaScript the question arises, which multiplication? It could be written like this:



    function clmul_u32(a, b) {
    var prod = 0;
    while (a != 0) {
    prod ^= Math.imul(b, a & -a);
    a &= a - 1;
    }
    return prod;
    }


    Using Math.imul is correct by definition, so no problems there.



    Or it could be written like this:



    function clmul_u32(a, b) {
    var prod = 0;
    while (a != 0) {
    prod ^= b * (a & -a);
    a &= a - 1;
    }
    return prod;
    }


    This version avoids Math.imul, which I am often told to "avoid or polyfill", but it is less clear why this version should work. Multiplication by a power of two is actually safe, it cannot mangle the integer part of significand unlike multiplication by values that have simultaneously a low number of leading zeroes and a low number of trailing zeroes. I feel like if I used this version, I would have to write a long-winded comment justifying its correctness.



    So which way should I go? Or perhaps there are better ways entirely?










    share|improve this question


























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      In carryless multiplication the partial products are XORed (ie addition without carry) together rather than added normally. The partial products themselves are still the product of a power of two (namely a bit extracted from one operand) and the other operand. For a multiplication by a power of two there is no difference between carryless multiplication and regular multiplication, so this step can be done with a normal multiplication. But then in JavaScript the question arises, which multiplication? It could be written like this:



      function clmul_u32(a, b) {
      var prod = 0;
      while (a != 0) {
      prod ^= Math.imul(b, a & -a);
      a &= a - 1;
      }
      return prod;
      }


      Using Math.imul is correct by definition, so no problems there.



      Or it could be written like this:



      function clmul_u32(a, b) {
      var prod = 0;
      while (a != 0) {
      prod ^= b * (a & -a);
      a &= a - 1;
      }
      return prod;
      }


      This version avoids Math.imul, which I am often told to "avoid or polyfill", but it is less clear why this version should work. Multiplication by a power of two is actually safe, it cannot mangle the integer part of significand unlike multiplication by values that have simultaneously a low number of leading zeroes and a low number of trailing zeroes. I feel like if I used this version, I would have to write a long-winded comment justifying its correctness.



      So which way should I go? Or perhaps there are better ways entirely?










      share|improve this question















      In carryless multiplication the partial products are XORed (ie addition without carry) together rather than added normally. The partial products themselves are still the product of a power of two (namely a bit extracted from one operand) and the other operand. For a multiplication by a power of two there is no difference between carryless multiplication and regular multiplication, so this step can be done with a normal multiplication. But then in JavaScript the question arises, which multiplication? It could be written like this:



      function clmul_u32(a, b) {
      var prod = 0;
      while (a != 0) {
      prod ^= Math.imul(b, a & -a);
      a &= a - 1;
      }
      return prod;
      }


      Using Math.imul is correct by definition, so no problems there.



      Or it could be written like this:



      function clmul_u32(a, b) {
      var prod = 0;
      while (a != 0) {
      prod ^= b * (a & -a);
      a &= a - 1;
      }
      return prod;
      }


      This version avoids Math.imul, which I am often told to "avoid or polyfill", but it is less clear why this version should work. Multiplication by a power of two is actually safe, it cannot mangle the integer part of significand unlike multiplication by values that have simultaneously a low number of leading zeroes and a low number of trailing zeroes. I feel like if I used this version, I would have to write a long-winded comment justifying its correctness.



      So which way should I go? Or perhaps there are better ways entirely?







      javascript comparative-review bitwise






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited yesterday









      200_success

      127k15148410




      127k15148410










      asked yesterday









      harold

      98357




      98357






















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          Question 1




          So which way should I go?




          That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.



          Readability



          That depends on the reader. Knowing that bitwise operators coerce numbers to signed 32 ints the need to clarify this by using Math.imult is mute. Personally operators are more readable than functions, but some argue the opposite. I would opt for the second version.



          Ease of use



          Is the function easy to use? Can it be misused? Yes and no. The Math.imult is a little easier because it does not rely on coercion and thus handles different number types better.



          The second version has a bug that makes its use difficult.



          Example of a failure.



          clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110
          clmul_u32_imult(0b1110000000000, 1.2); // >> correct 1110000000000


          Why does clmul_u32_coerced fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.



          To fix clmul_u32_coerced you need to coerce b to an int32. But this comes at a performance cost (see below) Well I assumed it would be slower, not so the extra line is worth it, even if you pass int's as args.



          function clmul_u32_coerced_f(a, b) {
          var prod = 0;
          b |= 0;
          while (a !== 0) { // Note strict not equal add ~1% performance
          prod ^= b * (a & -a);
          a &= a - 1;
          }
          return prod;
          }


          Ease of use means that the caller need not worry about type so clmul_u32_imult is the better option, or use the variant above.



          Performance



          In terms of performance the use of the * operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.



          Tests using double. Note that 32Int coerced fails



          Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.
          32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles
          Math.imul.............: 0.878µs OPS 1,139,002 95% Total 904ms 1030 cycles
          32Int coerced **......: 0.973µs OPS 1,028,149 86% Total 1,002ms 1030 cycles

          // * Fixed includes the line `b |= 0;`
          // ** This call fails to return the correct result


          Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b); so there is no failure.



          Testing Compare clmul(0b1110000000000, 1) 400 times.
          32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles
          32Int coerced.........: 0.404µs OPS 2,478,225 89% Total 1,114ms 920 cycles
          Math.imul.............: 0.525µs OPS 1,903,171 68% Total 1,497ms 950 cycles

          // * Fixed includes the line `b |= 0;`


          Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul's a second. Not bad for Javascript on a 1.60GHz Core.



          Question 2




          Or perhaps there are better ways entirely?




          In javascript there is no easy better way. The functions performance is dependent on the number of bits in a. The operation b * (a & -a) lets you shift b via a multiplication. However if you could get the bit position of (a & -a) then you could use b << bitPos(a & -a) which would be quicker if bitpos had no overhead.



          Consider the following variation that avoids the multiplication by shifting b and xoring it with the product.



          function clmul_u32_b(a, b) {
          var p = 0;
          while (a > 0) {
          while((a & 1) === 0) {
          a >>= 1;
          b <<= 1;
          }
          p ^= b;
          a >>= 1;
          b <<= 1;
          }
          return p;
          }


          The problem is that its performance is dependent on the position of the highest bit in a. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version



          Testing Compare clmul(0xF000000, 1).
          32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles
          Shifting......: 1.497µs OPS 667,984 35% Total 1,467ms 980 cycles


          But swap the arguments and the table is turned upside down out performing all other versions with ease.



          function clmul_u32_b(b, a) {  // swapped args
          ... code see above snippet
          return p;
          }


          Results



          Testing Compare clmul(0xF000000, 1).
          Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles
          32Int coerced.: 0.521µs OPS 1,918,526 36% Total 1,564ms 1000 cycles


          You may think, why not add a line that swaps the arguments, ensuring that a is less than b allowing for the least number of shifts. This does not work. A JS statement will force a cache miss and you lose any benefit.



          Conclultion



          In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.



          // includes the line b |= 0
          function clmul_u32_coerced(a, b) {
          var prod = 0;
          b |= 0;
          while (a != 0) {
          prod ^= b * (a & -a);
          a &= a - 1;
          }
          return prod;
          }


          Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.



          Notes on testing




          • A call is via a function that grouped function calls by 400 see below.

          • Calls are run in strict mode.

          • Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)

          • Device and setup "Chrome Version 70.0.3538.102 (Official Build) (32-bit)" on a "Win10 CPU i7 Q720 @1.6GHz" laptop





          function call(a, b) {
          var i = 25;
          while (i--) {
          clmul_u32(a, b);
          clmul_u32(a, b);
          clmul_u32(a, b);
          clmul_u32(a, b);
          clmul_u32(a, b);
          clmul_u32(a, b);
          clmul_u32(a, b);
          clmul_u32(a, b);
          clmul_u32(a, b);
          clmul_u32(a, b);
          clmul_u32(a, b);
          clmul_u32(a, b);
          clmul_u32(a, b);
          clmul_u32(a, b);
          clmul_u32(a, b);
          clmul_u32(a, b);
          }
          }





          In javascript the performance landscape is in constant flux, browser version changes can have significant changes to performance comparisons, and not always positive. If performance is important than you must monitor performance for each browser update, and use code that is tuned for each browser's type.






          share|improve this answer























            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "196"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207504%2fimplementing-carryless-multiplication-using-normal-multiplication%23new-answer', 'question_page');
            }
            );

            Post as a guest
































            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            Question 1




            So which way should I go?




            That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.



            Readability



            That depends on the reader. Knowing that bitwise operators coerce numbers to signed 32 ints the need to clarify this by using Math.imult is mute. Personally operators are more readable than functions, but some argue the opposite. I would opt for the second version.



            Ease of use



            Is the function easy to use? Can it be misused? Yes and no. The Math.imult is a little easier because it does not rely on coercion and thus handles different number types better.



            The second version has a bug that makes its use difficult.



            Example of a failure.



            clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110
            clmul_u32_imult(0b1110000000000, 1.2); // >> correct 1110000000000


            Why does clmul_u32_coerced fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.



            To fix clmul_u32_coerced you need to coerce b to an int32. But this comes at a performance cost (see below) Well I assumed it would be slower, not so the extra line is worth it, even if you pass int's as args.



            function clmul_u32_coerced_f(a, b) {
            var prod = 0;
            b |= 0;
            while (a !== 0) { // Note strict not equal add ~1% performance
            prod ^= b * (a & -a);
            a &= a - 1;
            }
            return prod;
            }


            Ease of use means that the caller need not worry about type so clmul_u32_imult is the better option, or use the variant above.



            Performance



            In terms of performance the use of the * operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.



            Tests using double. Note that 32Int coerced fails



            Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.
            32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles
            Math.imul.............: 0.878µs OPS 1,139,002 95% Total 904ms 1030 cycles
            32Int coerced **......: 0.973µs OPS 1,028,149 86% Total 1,002ms 1030 cycles

            // * Fixed includes the line `b |= 0;`
            // ** This call fails to return the correct result


            Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b); so there is no failure.



            Testing Compare clmul(0b1110000000000, 1) 400 times.
            32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles
            32Int coerced.........: 0.404µs OPS 2,478,225 89% Total 1,114ms 920 cycles
            Math.imul.............: 0.525µs OPS 1,903,171 68% Total 1,497ms 950 cycles

            // * Fixed includes the line `b |= 0;`


            Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul's a second. Not bad for Javascript on a 1.60GHz Core.



            Question 2




            Or perhaps there are better ways entirely?




            In javascript there is no easy better way. The functions performance is dependent on the number of bits in a. The operation b * (a & -a) lets you shift b via a multiplication. However if you could get the bit position of (a & -a) then you could use b << bitPos(a & -a) which would be quicker if bitpos had no overhead.



            Consider the following variation that avoids the multiplication by shifting b and xoring it with the product.



            function clmul_u32_b(a, b) {
            var p = 0;
            while (a > 0) {
            while((a & 1) === 0) {
            a >>= 1;
            b <<= 1;
            }
            p ^= b;
            a >>= 1;
            b <<= 1;
            }
            return p;
            }


            The problem is that its performance is dependent on the position of the highest bit in a. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version



            Testing Compare clmul(0xF000000, 1).
            32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles
            Shifting......: 1.497µs OPS 667,984 35% Total 1,467ms 980 cycles


            But swap the arguments and the table is turned upside down out performing all other versions with ease.



            function clmul_u32_b(b, a) {  // swapped args
            ... code see above snippet
            return p;
            }


            Results



            Testing Compare clmul(0xF000000, 1).
            Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles
            32Int coerced.: 0.521µs OPS 1,918,526 36% Total 1,564ms 1000 cycles


            You may think, why not add a line that swaps the arguments, ensuring that a is less than b allowing for the least number of shifts. This does not work. A JS statement will force a cache miss and you lose any benefit.



            Conclultion



            In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.



            // includes the line b |= 0
            function clmul_u32_coerced(a, b) {
            var prod = 0;
            b |= 0;
            while (a != 0) {
            prod ^= b * (a & -a);
            a &= a - 1;
            }
            return prod;
            }


            Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.



            Notes on testing




            • A call is via a function that grouped function calls by 400 see below.

            • Calls are run in strict mode.

            • Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)

            • Device and setup "Chrome Version 70.0.3538.102 (Official Build) (32-bit)" on a "Win10 CPU i7 Q720 @1.6GHz" laptop





            function call(a, b) {
            var i = 25;
            while (i--) {
            clmul_u32(a, b);
            clmul_u32(a, b);
            clmul_u32(a, b);
            clmul_u32(a, b);
            clmul_u32(a, b);
            clmul_u32(a, b);
            clmul_u32(a, b);
            clmul_u32(a, b);
            clmul_u32(a, b);
            clmul_u32(a, b);
            clmul_u32(a, b);
            clmul_u32(a, b);
            clmul_u32(a, b);
            clmul_u32(a, b);
            clmul_u32(a, b);
            clmul_u32(a, b);
            }
            }





            In javascript the performance landscape is in constant flux, browser version changes can have significant changes to performance comparisons, and not always positive. If performance is important than you must monitor performance for each browser update, and use code that is tuned for each browser's type.






            share|improve this answer



























              up vote
              1
              down vote



              accepted










              Question 1




              So which way should I go?




              That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.



              Readability



              That depends on the reader. Knowing that bitwise operators coerce numbers to signed 32 ints the need to clarify this by using Math.imult is mute. Personally operators are more readable than functions, but some argue the opposite. I would opt for the second version.



              Ease of use



              Is the function easy to use? Can it be misused? Yes and no. The Math.imult is a little easier because it does not rely on coercion and thus handles different number types better.



              The second version has a bug that makes its use difficult.



              Example of a failure.



              clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110
              clmul_u32_imult(0b1110000000000, 1.2); // >> correct 1110000000000


              Why does clmul_u32_coerced fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.



              To fix clmul_u32_coerced you need to coerce b to an int32. But this comes at a performance cost (see below) Well I assumed it would be slower, not so the extra line is worth it, even if you pass int's as args.



              function clmul_u32_coerced_f(a, b) {
              var prod = 0;
              b |= 0;
              while (a !== 0) { // Note strict not equal add ~1% performance
              prod ^= b * (a & -a);
              a &= a - 1;
              }
              return prod;
              }


              Ease of use means that the caller need not worry about type so clmul_u32_imult is the better option, or use the variant above.



              Performance



              In terms of performance the use of the * operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.



              Tests using double. Note that 32Int coerced fails



              Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.
              32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles
              Math.imul.............: 0.878µs OPS 1,139,002 95% Total 904ms 1030 cycles
              32Int coerced **......: 0.973µs OPS 1,028,149 86% Total 1,002ms 1030 cycles

              // * Fixed includes the line `b |= 0;`
              // ** This call fails to return the correct result


              Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b); so there is no failure.



              Testing Compare clmul(0b1110000000000, 1) 400 times.
              32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles
              32Int coerced.........: 0.404µs OPS 2,478,225 89% Total 1,114ms 920 cycles
              Math.imul.............: 0.525µs OPS 1,903,171 68% Total 1,497ms 950 cycles

              // * Fixed includes the line `b |= 0;`


              Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul's a second. Not bad for Javascript on a 1.60GHz Core.



              Question 2




              Or perhaps there are better ways entirely?




              In javascript there is no easy better way. The functions performance is dependent on the number of bits in a. The operation b * (a & -a) lets you shift b via a multiplication. However if you could get the bit position of (a & -a) then you could use b << bitPos(a & -a) which would be quicker if bitpos had no overhead.



              Consider the following variation that avoids the multiplication by shifting b and xoring it with the product.



              function clmul_u32_b(a, b) {
              var p = 0;
              while (a > 0) {
              while((a & 1) === 0) {
              a >>= 1;
              b <<= 1;
              }
              p ^= b;
              a >>= 1;
              b <<= 1;
              }
              return p;
              }


              The problem is that its performance is dependent on the position of the highest bit in a. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version



              Testing Compare clmul(0xF000000, 1).
              32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles
              Shifting......: 1.497µs OPS 667,984 35% Total 1,467ms 980 cycles


              But swap the arguments and the table is turned upside down out performing all other versions with ease.



              function clmul_u32_b(b, a) {  // swapped args
              ... code see above snippet
              return p;
              }


              Results



              Testing Compare clmul(0xF000000, 1).
              Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles
              32Int coerced.: 0.521µs OPS 1,918,526 36% Total 1,564ms 1000 cycles


              You may think, why not add a line that swaps the arguments, ensuring that a is less than b allowing for the least number of shifts. This does not work. A JS statement will force a cache miss and you lose any benefit.



              Conclultion



              In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.



              // includes the line b |= 0
              function clmul_u32_coerced(a, b) {
              var prod = 0;
              b |= 0;
              while (a != 0) {
              prod ^= b * (a & -a);
              a &= a - 1;
              }
              return prod;
              }


              Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.



              Notes on testing




              • A call is via a function that grouped function calls by 400 see below.

              • Calls are run in strict mode.

              • Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)

              • Device and setup "Chrome Version 70.0.3538.102 (Official Build) (32-bit)" on a "Win10 CPU i7 Q720 @1.6GHz" laptop





              function call(a, b) {
              var i = 25;
              while (i--) {
              clmul_u32(a, b);
              clmul_u32(a, b);
              clmul_u32(a, b);
              clmul_u32(a, b);
              clmul_u32(a, b);
              clmul_u32(a, b);
              clmul_u32(a, b);
              clmul_u32(a, b);
              clmul_u32(a, b);
              clmul_u32(a, b);
              clmul_u32(a, b);
              clmul_u32(a, b);
              clmul_u32(a, b);
              clmul_u32(a, b);
              clmul_u32(a, b);
              clmul_u32(a, b);
              }
              }





              In javascript the performance landscape is in constant flux, browser version changes can have significant changes to performance comparisons, and not always positive. If performance is important than you must monitor performance for each browser update, and use code that is tuned for each browser's type.






              share|improve this answer

























                up vote
                1
                down vote



                accepted







                up vote
                1
                down vote



                accepted






                Question 1




                So which way should I go?




                That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.



                Readability



                That depends on the reader. Knowing that bitwise operators coerce numbers to signed 32 ints the need to clarify this by using Math.imult is mute. Personally operators are more readable than functions, but some argue the opposite. I would opt for the second version.



                Ease of use



                Is the function easy to use? Can it be misused? Yes and no. The Math.imult is a little easier because it does not rely on coercion and thus handles different number types better.



                The second version has a bug that makes its use difficult.



                Example of a failure.



                clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110
                clmul_u32_imult(0b1110000000000, 1.2); // >> correct 1110000000000


                Why does clmul_u32_coerced fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.



                To fix clmul_u32_coerced you need to coerce b to an int32. But this comes at a performance cost (see below) Well I assumed it would be slower, not so the extra line is worth it, even if you pass int's as args.



                function clmul_u32_coerced_f(a, b) {
                var prod = 0;
                b |= 0;
                while (a !== 0) { // Note strict not equal add ~1% performance
                prod ^= b * (a & -a);
                a &= a - 1;
                }
                return prod;
                }


                Ease of use means that the caller need not worry about type so clmul_u32_imult is the better option, or use the variant above.



                Performance



                In terms of performance the use of the * operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.



                Tests using double. Note that 32Int coerced fails



                Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.
                32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles
                Math.imul.............: 0.878µs OPS 1,139,002 95% Total 904ms 1030 cycles
                32Int coerced **......: 0.973µs OPS 1,028,149 86% Total 1,002ms 1030 cycles

                // * Fixed includes the line `b |= 0;`
                // ** This call fails to return the correct result


                Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b); so there is no failure.



                Testing Compare clmul(0b1110000000000, 1) 400 times.
                32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles
                32Int coerced.........: 0.404µs OPS 2,478,225 89% Total 1,114ms 920 cycles
                Math.imul.............: 0.525µs OPS 1,903,171 68% Total 1,497ms 950 cycles

                // * Fixed includes the line `b |= 0;`


                Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul's a second. Not bad for Javascript on a 1.60GHz Core.



                Question 2




                Or perhaps there are better ways entirely?




                In javascript there is no easy better way. The functions performance is dependent on the number of bits in a. The operation b * (a & -a) lets you shift b via a multiplication. However if you could get the bit position of (a & -a) then you could use b << bitPos(a & -a) which would be quicker if bitpos had no overhead.



                Consider the following variation that avoids the multiplication by shifting b and xoring it with the product.



                function clmul_u32_b(a, b) {
                var p = 0;
                while (a > 0) {
                while((a & 1) === 0) {
                a >>= 1;
                b <<= 1;
                }
                p ^= b;
                a >>= 1;
                b <<= 1;
                }
                return p;
                }


                The problem is that its performance is dependent on the position of the highest bit in a. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version



                Testing Compare clmul(0xF000000, 1).
                32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles
                Shifting......: 1.497µs OPS 667,984 35% Total 1,467ms 980 cycles


                But swap the arguments and the table is turned upside down out performing all other versions with ease.



                function clmul_u32_b(b, a) {  // swapped args
                ... code see above snippet
                return p;
                }


                Results



                Testing Compare clmul(0xF000000, 1).
                Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles
                32Int coerced.: 0.521µs OPS 1,918,526 36% Total 1,564ms 1000 cycles


                You may think, why not add a line that swaps the arguments, ensuring that a is less than b allowing for the least number of shifts. This does not work. A JS statement will force a cache miss and you lose any benefit.



                Conclultion



                In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.



                // includes the line b |= 0
                function clmul_u32_coerced(a, b) {
                var prod = 0;
                b |= 0;
                while (a != 0) {
                prod ^= b * (a & -a);
                a &= a - 1;
                }
                return prod;
                }


                Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.



                Notes on testing




                • A call is via a function that grouped function calls by 400 see below.

                • Calls are run in strict mode.

                • Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)

                • Device and setup "Chrome Version 70.0.3538.102 (Official Build) (32-bit)" on a "Win10 CPU i7 Q720 @1.6GHz" laptop





                function call(a, b) {
                var i = 25;
                while (i--) {
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                }
                }





                In javascript the performance landscape is in constant flux, browser version changes can have significant changes to performance comparisons, and not always positive. If performance is important than you must monitor performance for each browser update, and use code that is tuned for each browser's type.






                share|improve this answer














                Question 1




                So which way should I go?




                That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.



                Readability



                That depends on the reader. Knowing that bitwise operators coerce numbers to signed 32 ints the need to clarify this by using Math.imult is mute. Personally operators are more readable than functions, but some argue the opposite. I would opt for the second version.



                Ease of use



                Is the function easy to use? Can it be misused? Yes and no. The Math.imult is a little easier because it does not rely on coercion and thus handles different number types better.



                The second version has a bug that makes its use difficult.



                Example of a failure.



                clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110
                clmul_u32_imult(0b1110000000000, 1.2); // >> correct 1110000000000


                Why does clmul_u32_coerced fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.



                To fix clmul_u32_coerced you need to coerce b to an int32. But this comes at a performance cost (see below) Well I assumed it would be slower, not so the extra line is worth it, even if you pass int's as args.



                function clmul_u32_coerced_f(a, b) {
                var prod = 0;
                b |= 0;
                while (a !== 0) { // Note strict not equal add ~1% performance
                prod ^= b * (a & -a);
                a &= a - 1;
                }
                return prod;
                }


                Ease of use means that the caller need not worry about type so clmul_u32_imult is the better option, or use the variant above.



                Performance



                In terms of performance the use of the * operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.



                Tests using double. Note that 32Int coerced fails



                Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.
                32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles
                Math.imul.............: 0.878µs OPS 1,139,002 95% Total 904ms 1030 cycles
                32Int coerced **......: 0.973µs OPS 1,028,149 86% Total 1,002ms 1030 cycles

                // * Fixed includes the line `b |= 0;`
                // ** This call fails to return the correct result


                Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b); so there is no failure.



                Testing Compare clmul(0b1110000000000, 1) 400 times.
                32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles
                32Int coerced.........: 0.404µs OPS 2,478,225 89% Total 1,114ms 920 cycles
                Math.imul.............: 0.525µs OPS 1,903,171 68% Total 1,497ms 950 cycles

                // * Fixed includes the line `b |= 0;`


                Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul's a second. Not bad for Javascript on a 1.60GHz Core.



                Question 2




                Or perhaps there are better ways entirely?




                In javascript there is no easy better way. The functions performance is dependent on the number of bits in a. The operation b * (a & -a) lets you shift b via a multiplication. However if you could get the bit position of (a & -a) then you could use b << bitPos(a & -a) which would be quicker if bitpos had no overhead.



                Consider the following variation that avoids the multiplication by shifting b and xoring it with the product.



                function clmul_u32_b(a, b) {
                var p = 0;
                while (a > 0) {
                while((a & 1) === 0) {
                a >>= 1;
                b <<= 1;
                }
                p ^= b;
                a >>= 1;
                b <<= 1;
                }
                return p;
                }


                The problem is that its performance is dependent on the position of the highest bit in a. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version



                Testing Compare clmul(0xF000000, 1).
                32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles
                Shifting......: 1.497µs OPS 667,984 35% Total 1,467ms 980 cycles


                But swap the arguments and the table is turned upside down out performing all other versions with ease.



                function clmul_u32_b(b, a) {  // swapped args
                ... code see above snippet
                return p;
                }


                Results



                Testing Compare clmul(0xF000000, 1).
                Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles
                32Int coerced.: 0.521µs OPS 1,918,526 36% Total 1,564ms 1000 cycles


                You may think, why not add a line that swaps the arguments, ensuring that a is less than b allowing for the least number of shifts. This does not work. A JS statement will force a cache miss and you lose any benefit.



                Conclultion



                In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.



                // includes the line b |= 0
                function clmul_u32_coerced(a, b) {
                var prod = 0;
                b |= 0;
                while (a != 0) {
                prod ^= b * (a & -a);
                a &= a - 1;
                }
                return prod;
                }


                Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.



                Notes on testing




                • A call is via a function that grouped function calls by 400 see below.

                • Calls are run in strict mode.

                • Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)

                • Device and setup "Chrome Version 70.0.3538.102 (Official Build) (32-bit)" on a "Win10 CPU i7 Q720 @1.6GHz" laptop





                function call(a, b) {
                var i = 25;
                while (i--) {
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                }
                }





                In javascript the performance landscape is in constant flux, browser version changes can have significant changes to performance comparisons, and not always positive. If performance is important than you must monitor performance for each browser update, and use code that is tuned for each browser's type.






                function call(a, b) {
                var i = 25;
                while (i--) {
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                }
                }





                function call(a, b) {
                var i = 25;
                while (i--) {
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                clmul_u32(a, b);
                }
                }






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited yesterday

























                answered yesterday









                Blindman67

                6,3691521




                6,3691521






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207504%2fimplementing-carryless-multiplication-using-normal-multiplication%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest




















































































                    Popular posts from this blog

                    Сан-Квентин

                    8-я гвардейская общевойсковая армия

                    Алькесар