Implementing carryless multiplication using normal multiplication
up vote
0
down vote
favorite
In carryless multiplication the partial products are XORed (ie addition without carry) together rather than added normally. The partial products themselves are still the product of a power of two (namely a bit extracted from one operand) and the other operand. For a multiplication by a power of two there is no difference between carryless multiplication and regular multiplication, so this step can be done with a normal multiplication. But then in JavaScript the question arises, which multiplication? It could be written like this:
function clmul_u32(a, b) {
var prod = 0;
while (a != 0) {
prod ^= Math.imul(b, a & -a);
a &= a - 1;
}
return prod;
}
Using Math.imul
is correct by definition, so no problems there.
Or it could be written like this:
function clmul_u32(a, b) {
var prod = 0;
while (a != 0) {
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}
This version avoids Math.imul
, which I am often told to "avoid or polyfill", but it is less clear why this version should work. Multiplication by a power of two is actually safe, it cannot mangle the integer part of significand unlike multiplication by values that have simultaneously a low number of leading zeroes and a low number of trailing zeroes. I feel like if I used this version, I would have to write a long-winded comment justifying its correctness.
So which way should I go? Or perhaps there are better ways entirely?
javascript comparative-review bitwise
add a comment |
up vote
0
down vote
favorite
In carryless multiplication the partial products are XORed (ie addition without carry) together rather than added normally. The partial products themselves are still the product of a power of two (namely a bit extracted from one operand) and the other operand. For a multiplication by a power of two there is no difference between carryless multiplication and regular multiplication, so this step can be done with a normal multiplication. But then in JavaScript the question arises, which multiplication? It could be written like this:
function clmul_u32(a, b) {
var prod = 0;
while (a != 0) {
prod ^= Math.imul(b, a & -a);
a &= a - 1;
}
return prod;
}
Using Math.imul
is correct by definition, so no problems there.
Or it could be written like this:
function clmul_u32(a, b) {
var prod = 0;
while (a != 0) {
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}
This version avoids Math.imul
, which I am often told to "avoid or polyfill", but it is less clear why this version should work. Multiplication by a power of two is actually safe, it cannot mangle the integer part of significand unlike multiplication by values that have simultaneously a low number of leading zeroes and a low number of trailing zeroes. I feel like if I used this version, I would have to write a long-winded comment justifying its correctness.
So which way should I go? Or perhaps there are better ways entirely?
javascript comparative-review bitwise
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
In carryless multiplication the partial products are XORed (ie addition without carry) together rather than added normally. The partial products themselves are still the product of a power of two (namely a bit extracted from one operand) and the other operand. For a multiplication by a power of two there is no difference between carryless multiplication and regular multiplication, so this step can be done with a normal multiplication. But then in JavaScript the question arises, which multiplication? It could be written like this:
function clmul_u32(a, b) {
var prod = 0;
while (a != 0) {
prod ^= Math.imul(b, a & -a);
a &= a - 1;
}
return prod;
}
Using Math.imul
is correct by definition, so no problems there.
Or it could be written like this:
function clmul_u32(a, b) {
var prod = 0;
while (a != 0) {
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}
This version avoids Math.imul
, which I am often told to "avoid or polyfill", but it is less clear why this version should work. Multiplication by a power of two is actually safe, it cannot mangle the integer part of significand unlike multiplication by values that have simultaneously a low number of leading zeroes and a low number of trailing zeroes. I feel like if I used this version, I would have to write a long-winded comment justifying its correctness.
So which way should I go? Or perhaps there are better ways entirely?
javascript comparative-review bitwise
In carryless multiplication the partial products are XORed (ie addition without carry) together rather than added normally. The partial products themselves are still the product of a power of two (namely a bit extracted from one operand) and the other operand. For a multiplication by a power of two there is no difference between carryless multiplication and regular multiplication, so this step can be done with a normal multiplication. But then in JavaScript the question arises, which multiplication? It could be written like this:
function clmul_u32(a, b) {
var prod = 0;
while (a != 0) {
prod ^= Math.imul(b, a & -a);
a &= a - 1;
}
return prod;
}
Using Math.imul
is correct by definition, so no problems there.
Or it could be written like this:
function clmul_u32(a, b) {
var prod = 0;
while (a != 0) {
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}
This version avoids Math.imul
, which I am often told to "avoid or polyfill", but it is less clear why this version should work. Multiplication by a power of two is actually safe, it cannot mangle the integer part of significand unlike multiplication by values that have simultaneously a low number of leading zeroes and a low number of trailing zeroes. I feel like if I used this version, I would have to write a long-winded comment justifying its correctness.
So which way should I go? Or perhaps there are better ways entirely?
javascript comparative-review bitwise
javascript comparative-review bitwise
edited yesterday
200_success
127k15148410
127k15148410
asked yesterday
harold
98357
98357
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
Question 1
So which way should I go?
That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.
Readability
That depends on the reader. Knowing that bitwise operators coerce numbers to signed 32 ints the need to clarify this by using Math.imult
is mute. Personally operators are more readable than functions, but some argue the opposite. I would opt for the second version.
Ease of use
Is the function easy to use? Can it be misused? Yes and no. The Math.imult
is a little easier because it does not rely on coercion and thus handles different number types better.
The second version has a bug that makes its use difficult.
Example of a failure.
clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110
clmul_u32_imult(0b1110000000000, 1.2); // >> correct 1110000000000
Why does clmul_u32_coerced
fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.
To fix clmul_u32_coerced
you need to coerce b
to an int32. But this comes at a performance cost (see below) Well I assumed it would be slower, not so the extra line is worth it, even if you pass int's as args.
function clmul_u32_coerced_f(a, b) {
var prod = 0;
b |= 0;
while (a !== 0) { // Note strict not equal add ~1% performance
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}
Ease of use means that the caller need not worry about type so clmul_u32_imult
is the better option, or use the variant above.
Performance
In terms of performance the use of the *
operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.
Tests using double. Note that 32Int coerced
fails
Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.
32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles
Math.imul.............: 0.878µs OPS 1,139,002 95% Total 904ms 1030 cycles
32Int coerced **......: 0.973µs OPS 1,028,149 86% Total 1,002ms 1030 cycles
// * Fixed includes the line `b |= 0;`
// ** This call fails to return the correct result
Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b);
so there is no failure.
Testing Compare clmul(0b1110000000000, 1) 400 times.
32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles
32Int coerced.........: 0.404µs OPS 2,478,225 89% Total 1,114ms 920 cycles
Math.imul.............: 0.525µs OPS 1,903,171 68% Total 1,497ms 950 cycles
// * Fixed includes the line `b |= 0;`
Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul
's a second. Not bad for Javascript on a 1.60GHz Core.
Question 2
Or perhaps there are better ways entirely?
In javascript there is no easy better way. The functions performance is dependent on the number of bits in a
. The operation b * (a & -a)
lets you shift b
via a multiplication. However if you could get the bit position of (a & -a)
then you could use b << bitPos(a & -a)
which would be quicker if bitpos
had no overhead.
Consider the following variation that avoids the multiplication by shifting b
and xoring it with the product.
function clmul_u32_b(a, b) {
var p = 0;
while (a > 0) {
while((a & 1) === 0) {
a >>= 1;
b <<= 1;
}
p ^= b;
a >>= 1;
b <<= 1;
}
return p;
}
The problem is that its performance is dependent on the position of the highest bit in a
. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version
Testing Compare clmul(0xF000000, 1).
32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles
Shifting......: 1.497µs OPS 667,984 35% Total 1,467ms 980 cycles
But swap the arguments and the table is turned upside down out performing all other versions with ease.
function clmul_u32_b(b, a) { // swapped args
... code see above snippet
return p;
}
Results
Testing Compare clmul(0xF000000, 1).
Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles
32Int coerced.: 0.521µs OPS 1,918,526 36% Total 1,564ms 1000 cycles
You may think, why not add a line that swaps the arguments, ensuring that a
is less than b
allowing for the least number of shifts. This does not work. A JS statement will force a cache miss and you lose any benefit.
Conclultion
In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.
// includes the line b |= 0
function clmul_u32_coerced(a, b) {
var prod = 0;
b |= 0;
while (a != 0) {
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}
Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.
Notes on testing
- A call is via a function that grouped function calls by 400 see below.
- Calls are run in strict mode.
- Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)
- Device and setup
"Chrome Version 70.0.3538.102 (Official Build) (32-bit)"
on a"Win10 CPU i7 Q720 @1.6GHz"
laptop
function call(a, b) {
var i = 25;
while (i--) {
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
}
}
In javascript the performance landscape is in constant flux, browser version changes can have significant changes to performance comparisons, and not always positive. If performance is important than you must monitor performance for each browser update, and use code that is tuned for each browser's type.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
Question 1
So which way should I go?
That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.
Readability
That depends on the reader. Knowing that bitwise operators coerce numbers to signed 32 ints the need to clarify this by using Math.imult
is mute. Personally operators are more readable than functions, but some argue the opposite. I would opt for the second version.
Ease of use
Is the function easy to use? Can it be misused? Yes and no. The Math.imult
is a little easier because it does not rely on coercion and thus handles different number types better.
The second version has a bug that makes its use difficult.
Example of a failure.
clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110
clmul_u32_imult(0b1110000000000, 1.2); // >> correct 1110000000000
Why does clmul_u32_coerced
fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.
To fix clmul_u32_coerced
you need to coerce b
to an int32. But this comes at a performance cost (see below) Well I assumed it would be slower, not so the extra line is worth it, even if you pass int's as args.
function clmul_u32_coerced_f(a, b) {
var prod = 0;
b |= 0;
while (a !== 0) { // Note strict not equal add ~1% performance
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}
Ease of use means that the caller need not worry about type so clmul_u32_imult
is the better option, or use the variant above.
Performance
In terms of performance the use of the *
operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.
Tests using double. Note that 32Int coerced
fails
Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.
32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles
Math.imul.............: 0.878µs OPS 1,139,002 95% Total 904ms 1030 cycles
32Int coerced **......: 0.973µs OPS 1,028,149 86% Total 1,002ms 1030 cycles
// * Fixed includes the line `b |= 0;`
// ** This call fails to return the correct result
Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b);
so there is no failure.
Testing Compare clmul(0b1110000000000, 1) 400 times.
32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles
32Int coerced.........: 0.404µs OPS 2,478,225 89% Total 1,114ms 920 cycles
Math.imul.............: 0.525µs OPS 1,903,171 68% Total 1,497ms 950 cycles
// * Fixed includes the line `b |= 0;`
Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul
's a second. Not bad for Javascript on a 1.60GHz Core.
Question 2
Or perhaps there are better ways entirely?
In javascript there is no easy better way. The functions performance is dependent on the number of bits in a
. The operation b * (a & -a)
lets you shift b
via a multiplication. However if you could get the bit position of (a & -a)
then you could use b << bitPos(a & -a)
which would be quicker if bitpos
had no overhead.
Consider the following variation that avoids the multiplication by shifting b
and xoring it with the product.
function clmul_u32_b(a, b) {
var p = 0;
while (a > 0) {
while((a & 1) === 0) {
a >>= 1;
b <<= 1;
}
p ^= b;
a >>= 1;
b <<= 1;
}
return p;
}
The problem is that its performance is dependent on the position of the highest bit in a
. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version
Testing Compare clmul(0xF000000, 1).
32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles
Shifting......: 1.497µs OPS 667,984 35% Total 1,467ms 980 cycles
But swap the arguments and the table is turned upside down out performing all other versions with ease.
function clmul_u32_b(b, a) { // swapped args
... code see above snippet
return p;
}
Results
Testing Compare clmul(0xF000000, 1).
Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles
32Int coerced.: 0.521µs OPS 1,918,526 36% Total 1,564ms 1000 cycles
You may think, why not add a line that swaps the arguments, ensuring that a
is less than b
allowing for the least number of shifts. This does not work. A JS statement will force a cache miss and you lose any benefit.
Conclultion
In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.
// includes the line b |= 0
function clmul_u32_coerced(a, b) {
var prod = 0;
b |= 0;
while (a != 0) {
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}
Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.
Notes on testing
- A call is via a function that grouped function calls by 400 see below.
- Calls are run in strict mode.
- Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)
- Device and setup
"Chrome Version 70.0.3538.102 (Official Build) (32-bit)"
on a"Win10 CPU i7 Q720 @1.6GHz"
laptop
function call(a, b) {
var i = 25;
while (i--) {
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
}
}
In javascript the performance landscape is in constant flux, browser version changes can have significant changes to performance comparisons, and not always positive. If performance is important than you must monitor performance for each browser update, and use code that is tuned for each browser's type.
add a comment |
up vote
1
down vote
accepted
Question 1
So which way should I go?
That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.
Readability
That depends on the reader. Knowing that bitwise operators coerce numbers to signed 32 ints the need to clarify this by using Math.imult
is mute. Personally operators are more readable than functions, but some argue the opposite. I would opt for the second version.
Ease of use
Is the function easy to use? Can it be misused? Yes and no. The Math.imult
is a little easier because it does not rely on coercion and thus handles different number types better.
The second version has a bug that makes its use difficult.
Example of a failure.
clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110
clmul_u32_imult(0b1110000000000, 1.2); // >> correct 1110000000000
Why does clmul_u32_coerced
fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.
To fix clmul_u32_coerced
you need to coerce b
to an int32. But this comes at a performance cost (see below) Well I assumed it would be slower, not so the extra line is worth it, even if you pass int's as args.
function clmul_u32_coerced_f(a, b) {
var prod = 0;
b |= 0;
while (a !== 0) { // Note strict not equal add ~1% performance
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}
Ease of use means that the caller need not worry about type so clmul_u32_imult
is the better option, or use the variant above.
Performance
In terms of performance the use of the *
operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.
Tests using double. Note that 32Int coerced
fails
Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.
32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles
Math.imul.............: 0.878µs OPS 1,139,002 95% Total 904ms 1030 cycles
32Int coerced **......: 0.973µs OPS 1,028,149 86% Total 1,002ms 1030 cycles
// * Fixed includes the line `b |= 0;`
// ** This call fails to return the correct result
Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b);
so there is no failure.
Testing Compare clmul(0b1110000000000, 1) 400 times.
32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles
32Int coerced.........: 0.404µs OPS 2,478,225 89% Total 1,114ms 920 cycles
Math.imul.............: 0.525µs OPS 1,903,171 68% Total 1,497ms 950 cycles
// * Fixed includes the line `b |= 0;`
Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul
's a second. Not bad for Javascript on a 1.60GHz Core.
Question 2
Or perhaps there are better ways entirely?
In javascript there is no easy better way. The functions performance is dependent on the number of bits in a
. The operation b * (a & -a)
lets you shift b
via a multiplication. However if you could get the bit position of (a & -a)
then you could use b << bitPos(a & -a)
which would be quicker if bitpos
had no overhead.
Consider the following variation that avoids the multiplication by shifting b
and xoring it with the product.
function clmul_u32_b(a, b) {
var p = 0;
while (a > 0) {
while((a & 1) === 0) {
a >>= 1;
b <<= 1;
}
p ^= b;
a >>= 1;
b <<= 1;
}
return p;
}
The problem is that its performance is dependent on the position of the highest bit in a
. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version
Testing Compare clmul(0xF000000, 1).
32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles
Shifting......: 1.497µs OPS 667,984 35% Total 1,467ms 980 cycles
But swap the arguments and the table is turned upside down out performing all other versions with ease.
function clmul_u32_b(b, a) { // swapped args
... code see above snippet
return p;
}
Results
Testing Compare clmul(0xF000000, 1).
Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles
32Int coerced.: 0.521µs OPS 1,918,526 36% Total 1,564ms 1000 cycles
You may think, why not add a line that swaps the arguments, ensuring that a
is less than b
allowing for the least number of shifts. This does not work. A JS statement will force a cache miss and you lose any benefit.
Conclultion
In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.
// includes the line b |= 0
function clmul_u32_coerced(a, b) {
var prod = 0;
b |= 0;
while (a != 0) {
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}
Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.
Notes on testing
- A call is via a function that grouped function calls by 400 see below.
- Calls are run in strict mode.
- Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)
- Device and setup
"Chrome Version 70.0.3538.102 (Official Build) (32-bit)"
on a"Win10 CPU i7 Q720 @1.6GHz"
laptop
function call(a, b) {
var i = 25;
while (i--) {
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
}
}
In javascript the performance landscape is in constant flux, browser version changes can have significant changes to performance comparisons, and not always positive. If performance is important than you must monitor performance for each browser update, and use code that is tuned for each browser's type.
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
Question 1
So which way should I go?
That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.
Readability
That depends on the reader. Knowing that bitwise operators coerce numbers to signed 32 ints the need to clarify this by using Math.imult
is mute. Personally operators are more readable than functions, but some argue the opposite. I would opt for the second version.
Ease of use
Is the function easy to use? Can it be misused? Yes and no. The Math.imult
is a little easier because it does not rely on coercion and thus handles different number types better.
The second version has a bug that makes its use difficult.
Example of a failure.
clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110
clmul_u32_imult(0b1110000000000, 1.2); // >> correct 1110000000000
Why does clmul_u32_coerced
fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.
To fix clmul_u32_coerced
you need to coerce b
to an int32. But this comes at a performance cost (see below) Well I assumed it would be slower, not so the extra line is worth it, even if you pass int's as args.
function clmul_u32_coerced_f(a, b) {
var prod = 0;
b |= 0;
while (a !== 0) { // Note strict not equal add ~1% performance
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}
Ease of use means that the caller need not worry about type so clmul_u32_imult
is the better option, or use the variant above.
Performance
In terms of performance the use of the *
operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.
Tests using double. Note that 32Int coerced
fails
Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.
32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles
Math.imul.............: 0.878µs OPS 1,139,002 95% Total 904ms 1030 cycles
32Int coerced **......: 0.973µs OPS 1,028,149 86% Total 1,002ms 1030 cycles
// * Fixed includes the line `b |= 0;`
// ** This call fails to return the correct result
Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b);
so there is no failure.
Testing Compare clmul(0b1110000000000, 1) 400 times.
32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles
32Int coerced.........: 0.404µs OPS 2,478,225 89% Total 1,114ms 920 cycles
Math.imul.............: 0.525µs OPS 1,903,171 68% Total 1,497ms 950 cycles
// * Fixed includes the line `b |= 0;`
Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul
's a second. Not bad for Javascript on a 1.60GHz Core.
Question 2
Or perhaps there are better ways entirely?
In javascript there is no easy better way. The functions performance is dependent on the number of bits in a
. The operation b * (a & -a)
lets you shift b
via a multiplication. However if you could get the bit position of (a & -a)
then you could use b << bitPos(a & -a)
which would be quicker if bitpos
had no overhead.
Consider the following variation that avoids the multiplication by shifting b
and xoring it with the product.
function clmul_u32_b(a, b) {
var p = 0;
while (a > 0) {
while((a & 1) === 0) {
a >>= 1;
b <<= 1;
}
p ^= b;
a >>= 1;
b <<= 1;
}
return p;
}
The problem is that its performance is dependent on the position of the highest bit in a
. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version
Testing Compare clmul(0xF000000, 1).
32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles
Shifting......: 1.497µs OPS 667,984 35% Total 1,467ms 980 cycles
But swap the arguments and the table is turned upside down out performing all other versions with ease.
function clmul_u32_b(b, a) { // swapped args
... code see above snippet
return p;
}
Results
Testing Compare clmul(0xF000000, 1).
Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles
32Int coerced.: 0.521µs OPS 1,918,526 36% Total 1,564ms 1000 cycles
You may think, why not add a line that swaps the arguments, ensuring that a
is less than b
allowing for the least number of shifts. This does not work. A JS statement will force a cache miss and you lose any benefit.
Conclultion
In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.
// includes the line b |= 0
function clmul_u32_coerced(a, b) {
var prod = 0;
b |= 0;
while (a != 0) {
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}
Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.
Notes on testing
- A call is via a function that grouped function calls by 400 see below.
- Calls are run in strict mode.
- Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)
- Device and setup
"Chrome Version 70.0.3538.102 (Official Build) (32-bit)"
on a"Win10 CPU i7 Q720 @1.6GHz"
laptop
function call(a, b) {
var i = 25;
while (i--) {
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
}
}
In javascript the performance landscape is in constant flux, browser version changes can have significant changes to performance comparisons, and not always positive. If performance is important than you must monitor performance for each browser update, and use code that is tuned for each browser's type.
Question 1
So which way should I go?
That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.
Readability
That depends on the reader. Knowing that bitwise operators coerce numbers to signed 32 ints the need to clarify this by using Math.imult
is mute. Personally operators are more readable than functions, but some argue the opposite. I would opt for the second version.
Ease of use
Is the function easy to use? Can it be misused? Yes and no. The Math.imult
is a little easier because it does not rely on coercion and thus handles different number types better.
The second version has a bug that makes its use difficult.
Example of a failure.
clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110
clmul_u32_imult(0b1110000000000, 1.2); // >> correct 1110000000000
Why does clmul_u32_coerced
fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.
To fix clmul_u32_coerced
you need to coerce b
to an int32. But this comes at a performance cost (see below) Well I assumed it would be slower, not so the extra line is worth it, even if you pass int's as args.
function clmul_u32_coerced_f(a, b) {
var prod = 0;
b |= 0;
while (a !== 0) { // Note strict not equal add ~1% performance
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}
Ease of use means that the caller need not worry about type so clmul_u32_imult
is the better option, or use the variant above.
Performance
In terms of performance the use of the *
operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.
Tests using double. Note that 32Int coerced
fails
Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.
32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles
Math.imul.............: 0.878µs OPS 1,139,002 95% Total 904ms 1030 cycles
32Int coerced **......: 0.973µs OPS 1,028,149 86% Total 1,002ms 1030 cycles
// * Fixed includes the line `b |= 0;`
// ** This call fails to return the correct result
Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b);
so there is no failure.
Testing Compare clmul(0b1110000000000, 1) 400 times.
32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles
32Int coerced.........: 0.404µs OPS 2,478,225 89% Total 1,114ms 920 cycles
Math.imul.............: 0.525µs OPS 1,903,171 68% Total 1,497ms 950 cycles
// * Fixed includes the line `b |= 0;`
Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul
's a second. Not bad for Javascript on a 1.60GHz Core.
Question 2
Or perhaps there are better ways entirely?
In javascript there is no easy better way. The functions performance is dependent on the number of bits in a
. The operation b * (a & -a)
lets you shift b
via a multiplication. However if you could get the bit position of (a & -a)
then you could use b << bitPos(a & -a)
which would be quicker if bitpos
had no overhead.
Consider the following variation that avoids the multiplication by shifting b
and xoring it with the product.
function clmul_u32_b(a, b) {
var p = 0;
while (a > 0) {
while((a & 1) === 0) {
a >>= 1;
b <<= 1;
}
p ^= b;
a >>= 1;
b <<= 1;
}
return p;
}
The problem is that its performance is dependent on the position of the highest bit in a
. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version
Testing Compare clmul(0xF000000, 1).
32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles
Shifting......: 1.497µs OPS 667,984 35% Total 1,467ms 980 cycles
But swap the arguments and the table is turned upside down out performing all other versions with ease.
function clmul_u32_b(b, a) { // swapped args
... code see above snippet
return p;
}
Results
Testing Compare clmul(0xF000000, 1).
Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles
32Int coerced.: 0.521µs OPS 1,918,526 36% Total 1,564ms 1000 cycles
You may think, why not add a line that swaps the arguments, ensuring that a
is less than b
allowing for the least number of shifts. This does not work. A JS statement will force a cache miss and you lose any benefit.
Conclultion
In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.
// includes the line b |= 0
function clmul_u32_coerced(a, b) {
var prod = 0;
b |= 0;
while (a != 0) {
prod ^= b * (a & -a);
a &= a - 1;
}
return prod;
}
Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.
Notes on testing
- A call is via a function that grouped function calls by 400 see below.
- Calls are run in strict mode.
- Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)
- Device and setup
"Chrome Version 70.0.3538.102 (Official Build) (32-bit)"
on a"Win10 CPU i7 Q720 @1.6GHz"
laptop
function call(a, b) {
var i = 25;
while (i--) {
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
}
}
In javascript the performance landscape is in constant flux, browser version changes can have significant changes to performance comparisons, and not always positive. If performance is important than you must monitor performance for each browser update, and use code that is tuned for each browser's type.
function call(a, b) {
var i = 25;
while (i--) {
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
}
}
function call(a, b) {
var i = 25;
while (i--) {
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
clmul_u32(a, b);
}
}
edited yesterday
answered yesterday
Blindman67
6,3691521
6,3691521
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207504%2fimplementing-carryless-multiplication-using-normal-multiplication%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password