Implementing carryless multiplication using normal multiplication

up vote
0
down vote

favorite

In carryless multiplication the partial products are XORed (ie addition without carry) together rather than added normally. The partial products themselves are still the product of a power of two (namely a bit extracted from one operand) and the other operand. For a multiplication by a power of two there is no difference between carryless multiplication and regular multiplication, so this step can be done with a normal multiplication. But then in JavaScript the question arises, which multiplication? It could be written like this:

function clmul_u32(a, b) {

    var prod = 0;

    while (a != 0) {

        prod ^= Math.imul(b, a & -a);

        a &= a - 1;

    }

    return prod;

}

Using Math.imul is correct by definition, so no problems there.

Or it could be written like this:

function clmul_u32(a, b) {

    var prod = 0;

    while (a != 0) {

        prod ^= b * (a & -a);

        a &= a - 1;

    }

    return prod;

}

This version avoids Math.imul, which I am often told to "avoid or polyfill", but it is less clear why this version should work. Multiplication by a power of two is actually safe, it cannot mangle the integer part of significand unlike multiplication by values that have simultaneously a low number of leading zeroes and a low number of trailing zeroes. I feel like if I used this version, I would have to write a long-winded comment justifying its correctness.

So which way should I go? Or perhaps there are better ways entirely?

edited yesterday

200_success

127k15148410

asked yesterday

harold

98357

add a comment |

up vote
0
down vote

favorite

function clmul_u32(a, b) {

    var prod = 0;

    while (a != 0) {

        prod ^= Math.imul(b, a & -a);

        a &= a - 1;

    }

    return prod;

}

Using Math.imul is correct by definition, so no problems there.

Or it could be written like this:

function clmul_u32(a, b) {

    var prod = 0;

    while (a != 0) {

        prod ^= b * (a & -a);

        a &= a - 1;

    }

    return prod;

}

So which way should I go? Or perhaps there are better ways entirely?

edited yesterday

200_success

127k15148410

asked yesterday

harold

98357

add a comment |

up vote
0
down vote

favorite

function clmul_u32(a, b) {

    var prod = 0;

    while (a != 0) {

        prod ^= Math.imul(b, a & -a);

        a &= a - 1;

    }

    return prod;

}

Using Math.imul is correct by definition, so no problems there.

Or it could be written like this:

function clmul_u32(a, b) {

    var prod = 0;

    while (a != 0) {

        prod ^= b * (a & -a);

        a &= a - 1;

    }

    return prod;

}

So which way should I go? Or perhaps there are better ways entirely?

edited yesterday

200_success

127k15148410

asked yesterday

harold

98357

function clmul_u32(a, b) {

    var prod = 0;

    while (a != 0) {

        prod ^= Math.imul(b, a & -a);

        a &= a - 1;

    }

    return prod;

}

Using Math.imul is correct by definition, so no problems there.

Or it could be written like this:

function clmul_u32(a, b) {

    var prod = 0;

    while (a != 0) {

        prod ^= b * (a & -a);

        a &= a - 1;

    }

    return prod;

}

So which way should I go? Or perhaps there are better ways entirely?

javascript comparative-review bitwise

edited yesterday

200_success

127k15148410

asked yesterday

harold

98357

edited yesterday

200_success

127k15148410

asked yesterday

harold

98357

edited yesterday

200_success

127k15148410

edited yesterday

200_success

127k15148410

edited yesterday

200_success

127k15148410

asked yesterday

harold

98357

asked yesterday

harold

98357

asked yesterday

harold

98357

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

Question 1

So which way should I go?

That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.

Readability

That depends on the reader. Knowing that bitwise operators coerce numbers to signed 32 ints the need to clarify this by using Math.imult is mute. Personally operators are more readable than functions, but some argue the opposite. I would opt for the second version.

Ease of use

Is the function easy to use? Can it be misused? Yes and no. The Math.imult is a little easier because it does not rely on coercion and thus handles different number types better.

The second version has a bug that makes its use difficult.

Example of a failure.

clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110

clmul_u32_imult(0b1110000000000, 1.2);   // >> correct 1110000000000

Why does clmul_u32_coerced fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.

To fix clmul_u32_coerced you need to coerce b to an int32. ~~But this comes at a performance cost (see below)~~ Well I assumed it would be slower, not so the extra line is worth it, even if you pass int's as args.

function clmul_u32_coerced_f(a, b) {

    var prod = 0;

    b |= 0;

    while (a !== 0) {  // Note strict not equal add ~1% performance

        prod ^= b * (a & -a);

        a &= a - 1;

    }

    return prod;

}

Ease of use means that the caller need not worry about type so clmul_u32_imult is the better option, or use the variant above.

Performance

In terms of performance the use of the * operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.

Tests using double. Note that 32Int coerced fails

Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.

32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles

Math.imul.............: 0.878µs OPS 1,139,002  95% Total 904ms 1030 cycles

32Int coerced **......: 0.973µs OPS 1,028,149  86% Total 1,002ms 1030 cycles



// * Fixed includes the line `b |= 0;` 

// ** This call fails to return the correct result

Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b); so there is no failure.

Testing Compare clmul(0b1110000000000, 1) 400 times.

32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles

32Int coerced.........: 0.404µs OPS 2,478,225  89% Total 1,114ms 920 cycles

Math.imul.............: 0.525µs OPS 1,903,171  68% Total 1,497ms 950 cycles



// * Fixed includes the line `b |= 0;`

Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul's a second. Not bad for Javascript on a 1.60GHz Core.

Question 2

Or perhaps there are better ways entirely?

In javascript there is no easy better way. The functions performance is dependent on the number of bits in a. The operation b * (a & -a) lets you shift b via a multiplication. However if you could get the bit position of (a & -a) then you could use b << bitPos(a & -a) which would be quicker if bitpos had no overhead.

Consider the following variation that avoids the multiplication by shifting b and xoring it with the product.

function clmul_u32_b(a, b) {

    var p = 0;

    while (a > 0) {

        while((a & 1) === 0) {

            a >>= 1;

            b <<= 1;

        }

        p ^= b;

        a >>= 1;

        b <<= 1;

    }

    return p;

}

The problem is that its performance is dependent on the position of the highest bit in a. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version

Testing Compare clmul(0xF000000, 1).

32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles

Shifting......: 1.497µs OPS 667,984  35% Total 1,467ms 980 cycles

But swap the arguments and the table is turned upside down out performing all other versions with ease.

function clmul_u32_b(b, a) {  // swapped args

    ... code see above snippet

    return p;

}

Results

Testing Compare clmul(0xF000000, 1).

Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles

32Int coerced.: 0.521µs OPS 1,918,526  36% Total 1,564ms 1000 cycles

You may think, why not add a line that swaps the arguments, ensuring that a is less than b allowing for the least number of shifts. This does not work. A JS statement will force a cache miss and you lose any benefit.

Conclultion

In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.

// includes the line b |= 0

function clmul_u32_coerced(a, b) {

    var prod = 0;

    b |= 0;

    while (a != 0) {

        prod ^= b * (a & -a);

        a &= a - 1;

    }

    return prod;

}

Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.

Notes on testing

A call is via a function that grouped function calls by 400 see below.

Calls are run in strict mode.

Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)

Device and setup "Chrome Version 70.0.3538.102 (Official Build) (32-bit)" on a "Win10 CPU i7 Q720 @1.6GHz" laptop

function call(a, b) {

    var i = 25;

    while (i--) {

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

    }

}

In javascript the performance landscape is in constant flux, browser version changes can have significant changes to performance comparisons, and not always positive. If performance is important than you must monitor performance for each browser update, and use code that is tuned for each browser's type.

edited yesterday

answered yesterday

Blindman67

6,3691521

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207504%2fimplementing-carryless-multiplication-using-normal-multiplication%23new-answer', 'question_page');
}
);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

Question 1

So which way should I go?

That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.

Readability

Ease of use

Is the function easy to use? Can it be misused? Yes and no. The Math.imult is a little easier because it does not rely on coercion and thus handles different number types better.

The second version has a bug that makes its use difficult.

Example of a failure.

clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110

clmul_u32_imult(0b1110000000000, 1.2);   // >> correct 1110000000000

Why does clmul_u32_coerced fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.

function clmul_u32_coerced_f(a, b) {

    var prod = 0;

    b |= 0;

    while (a !== 0) {  // Note strict not equal add ~1% performance

        prod ^= b * (a & -a);

        a &= a - 1;

    }

    return prod;

}

Ease of use means that the caller need not worry about type so clmul_u32_imult is the better option, or use the variant above.

Performance

In terms of performance the use of the * operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.

Tests using double. Note that 32Int coerced fails

Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.

32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles

Math.imul.............: 0.878µs OPS 1,139,002  95% Total 904ms 1030 cycles

32Int coerced **......: 0.973µs OPS 1,028,149  86% Total 1,002ms 1030 cycles



// * Fixed includes the line `b |= 0;` 

// ** This call fails to return the correct result

Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b); so there is no failure.

Testing Compare clmul(0b1110000000000, 1) 400 times.

32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles

32Int coerced.........: 0.404µs OPS 2,478,225  89% Total 1,114ms 920 cycles

Math.imul.............: 0.525µs OPS 1,903,171  68% Total 1,497ms 950 cycles



// * Fixed includes the line `b |= 0;`

Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul's a second. Not bad for Javascript on a 1.60GHz Core.

Question 2

Or perhaps there are better ways entirely?

Consider the following variation that avoids the multiplication by shifting b and xoring it with the product.

function clmul_u32_b(a, b) {

    var p = 0;

    while (a > 0) {

        while((a & 1) === 0) {

            a >>= 1;

            b <<= 1;

        }

        p ^= b;

        a >>= 1;

        b <<= 1;

    }

    return p;

}

The problem is that its performance is dependent on the position of the highest bit in a. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version

Testing Compare clmul(0xF000000, 1).

32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles

Shifting......: 1.497µs OPS 667,984  35% Total 1,467ms 980 cycles

But swap the arguments and the table is turned upside down out performing all other versions with ease.

function clmul_u32_b(b, a) {  // swapped args

    ... code see above snippet

    return p;

}

Results

Testing Compare clmul(0xF000000, 1).

Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles

32Int coerced.: 0.521µs OPS 1,918,526  36% Total 1,564ms 1000 cycles

Conclultion

In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.

// includes the line b |= 0

function clmul_u32_coerced(a, b) {

    var prod = 0;

    b |= 0;

    while (a != 0) {

        prod ^= b * (a & -a);

        a &= a - 1;

    }

    return prod;

}

Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.

Notes on testing

A call is via a function that grouped function calls by 400 see below.

Calls are run in strict mode.

Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)

Device and setup "Chrome Version 70.0.3538.102 (Official Build) (32-bit)" on a "Win10 CPU i7 Q720 @1.6GHz" laptop

function call(a, b) {

    var i = 25;

    while (i--) {

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

    }

}

edited yesterday

answered yesterday

Blindman67

6,3691521

add a comment |

up vote
1
down vote

accepted

Question 1

So which way should I go?

That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.

Readability

Ease of use

Is the function easy to use? Can it be misused? Yes and no. The Math.imult is a little easier because it does not rely on coercion and thus handles different number types better.

The second version has a bug that makes its use difficult.

Example of a failure.

clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110

clmul_u32_imult(0b1110000000000, 1.2);   // >> correct 1110000000000

Why does clmul_u32_coerced fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.

function clmul_u32_coerced_f(a, b) {

    var prod = 0;

    b |= 0;

    while (a !== 0) {  // Note strict not equal add ~1% performance

        prod ^= b * (a & -a);

        a &= a - 1;

    }

    return prod;

}

Ease of use means that the caller need not worry about type so clmul_u32_imult is the better option, or use the variant above.

Performance

In terms of performance the use of the * operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.

Tests using double. Note that 32Int coerced fails

Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.

32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles

Math.imul.............: 0.878µs OPS 1,139,002  95% Total 904ms 1030 cycles

32Int coerced **......: 0.973µs OPS 1,028,149  86% Total 1,002ms 1030 cycles



// * Fixed includes the line `b |= 0;` 

// ** This call fails to return the correct result

Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b); so there is no failure.

Testing Compare clmul(0b1110000000000, 1) 400 times.

32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles

32Int coerced.........: 0.404µs OPS 2,478,225  89% Total 1,114ms 920 cycles

Math.imul.............: 0.525µs OPS 1,903,171  68% Total 1,497ms 950 cycles



// * Fixed includes the line `b |= 0;`

Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul's a second. Not bad for Javascript on a 1.60GHz Core.

Question 2

Or perhaps there are better ways entirely?

Consider the following variation that avoids the multiplication by shifting b and xoring it with the product.

function clmul_u32_b(a, b) {

    var p = 0;

    while (a > 0) {

        while((a & 1) === 0) {

            a >>= 1;

            b <<= 1;

        }

        p ^= b;

        a >>= 1;

        b <<= 1;

    }

    return p;

}

The problem is that its performance is dependent on the position of the highest bit in a. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version

Testing Compare clmul(0xF000000, 1).

32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles

Shifting......: 1.497µs OPS 667,984  35% Total 1,467ms 980 cycles

But swap the arguments and the table is turned upside down out performing all other versions with ease.

function clmul_u32_b(b, a) {  // swapped args

    ... code see above snippet

    return p;

}

Results

Testing Compare clmul(0xF000000, 1).

Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles

32Int coerced.: 0.521µs OPS 1,918,526  36% Total 1,564ms 1000 cycles

Conclultion

In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.

// includes the line b |= 0

function clmul_u32_coerced(a, b) {

    var prod = 0;

    b |= 0;

    while (a != 0) {

        prod ^= b * (a & -a);

        a &= a - 1;

    }

    return prod;

}

Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.

Notes on testing

A call is via a function that grouped function calls by 400 see below.

Calls are run in strict mode.

Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)

Device and setup "Chrome Version 70.0.3538.102 (Official Build) (32-bit)" on a "Win10 CPU i7 Q720 @1.6GHz" laptop

function call(a, b) {

    var i = 25;

    while (i--) {

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

    }

}

edited yesterday

answered yesterday

Blindman67

6,3691521

add a comment |

up vote
1
down vote

accepted

Question 1

So which way should I go?

That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.

Readability

Ease of use

Is the function easy to use? Can it be misused? Yes and no. The Math.imult is a little easier because it does not rely on coercion and thus handles different number types better.

The second version has a bug that makes its use difficult.

Example of a failure.

clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110

clmul_u32_imult(0b1110000000000, 1.2);   // >> correct 1110000000000

Why does clmul_u32_coerced fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.

function clmul_u32_coerced_f(a, b) {

    var prod = 0;

    b |= 0;

    while (a !== 0) {  // Note strict not equal add ~1% performance

        prod ^= b * (a & -a);

        a &= a - 1;

    }

    return prod;

}

Ease of use means that the caller need not worry about type so clmul_u32_imult is the better option, or use the variant above.

Performance

In terms of performance the use of the * operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.

Tests using double. Note that 32Int coerced fails

Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.

32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles

Math.imul.............: 0.878µs OPS 1,139,002  95% Total 904ms 1030 cycles

32Int coerced **......: 0.973µs OPS 1,028,149  86% Total 1,002ms 1030 cycles



// * Fixed includes the line `b |= 0;` 

// ** This call fails to return the correct result

Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b); so there is no failure.

Testing Compare clmul(0b1110000000000, 1) 400 times.

32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles

32Int coerced.........: 0.404µs OPS 2,478,225  89% Total 1,114ms 920 cycles

Math.imul.............: 0.525µs OPS 1,903,171  68% Total 1,497ms 950 cycles



// * Fixed includes the line `b |= 0;`

Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul's a second. Not bad for Javascript on a 1.60GHz Core.

Question 2

Or perhaps there are better ways entirely?

Consider the following variation that avoids the multiplication by shifting b and xoring it with the product.

function clmul_u32_b(a, b) {

    var p = 0;

    while (a > 0) {

        while((a & 1) === 0) {

            a >>= 1;

            b <<= 1;

        }

        p ^= b;

        a >>= 1;

        b <<= 1;

    }

    return p;

}

The problem is that its performance is dependent on the position of the highest bit in a. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version

Testing Compare clmul(0xF000000, 1).

32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles

Shifting......: 1.497µs OPS 667,984  35% Total 1,467ms 980 cycles

But swap the arguments and the table is turned upside down out performing all other versions with ease.

function clmul_u32_b(b, a) {  // swapped args

    ... code see above snippet

    return p;

}

Results

Testing Compare clmul(0xF000000, 1).

Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles

32Int coerced.: 0.521µs OPS 1,918,526  36% Total 1,564ms 1000 cycles

Conclultion

In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.

// includes the line b |= 0

function clmul_u32_coerced(a, b) {

    var prod = 0;

    b |= 0;

    while (a != 0) {

        prod ^= b * (a & -a);

        a &= a - 1;

    }

    return prod;

}

Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.

Notes on testing

A call is via a function that grouped function calls by 400 see below.

Calls are run in strict mode.

Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)

Device and setup "Chrome Version 70.0.3538.102 (Official Build) (32-bit)" on a "Win10 CPU i7 Q720 @1.6GHz" laptop

function call(a, b) {

    var i = 25;

    while (i--) {

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

    }

}

edited yesterday

answered yesterday

Blindman67

6,3691521

Question 1

So which way should I go?

That depends on what is the assessment criteria. Three basic criteria are performance, readability and ease of use.

Readability

Ease of use

Is the function easy to use? Can it be misused? Yes and no. The Math.imult is a little easier because it does not rely on coercion and thus handles different number types better.

The second version has a bug that makes its use difficult.

Example of a failure.

clmul_u32_coerced(0b1110000000000, 1.2); // >> incorrect 0b1111001100110

clmul_u32_imult(0b1110000000000, 1.2);   // >> correct 1110000000000

Why does clmul_u32_coerced fail? The reason is that javascript coerces to the left and for numbers to the highest precision. Thus if you have a double in the operation the result will be a double.

function clmul_u32_coerced_f(a, b) {

    var prod = 0;

    b |= 0;

    while (a !== 0) {  // Note strict not equal add ~1% performance

        prod ^= b * (a & -a);

        a &= a - 1;

    }

    return prod;

}

Ease of use means that the caller need not worry about type so clmul_u32_imult is the better option, or use the variant above.

Performance

In terms of performance the use of the * operator gives a benefit. This is expected as any function call has overhead that is generally unavoidable. Testing shows the performance comparison.

Tests using double. Note that 32Int coerced fails

Testing Compare clmul(0b1110000000000, 1.2) 400 times. Sorted by performance.

32Int coerced fixed *.: 0.836µs OPS 1,195,472 100% Total 786ms 940 cycles

Math.imul.............: 0.878µs OPS 1,139,002  95% Total 904ms 1030 cycles

32Int coerced **......: 0.973µs OPS 1,028,149  86% Total 1,002ms 1030 cycles



// * Fixed includes the line `b |= 0;` 

// ** This call fails to return the correct result

Tests using arguments that are typed as Signed Int32 eg const a = 0b1110000000000 | 0, b = 1 | 0; clmul(a, b); so there is no failure.

Testing Compare clmul(0b1110000000000, 1) 400 times.

32Int coerced fixed *.: 0.358µs OPS 2,791,501 100% Total 1,214ms 1130 cycles

32Int coerced.........: 0.404µs OPS 2,478,225  89% Total 1,114ms 920 cycles

Math.imul.............: 0.525µs OPS 1,903,171  68% Total 1,497ms 950 cycles



// * Fixed includes the line `b |= 0;`

Thus the test device can do 2,791,501 * 400 ~= 1,116,600,400 clmul's a second. Not bad for Javascript on a 1.60GHz Core.

Question 2

Or perhaps there are better ways entirely?

Consider the following variation that avoids the multiplication by shifting b and xoring it with the product.

function clmul_u32_b(a, b) {

    var p = 0;

    while (a > 0) {

        while((a & 1) === 0) {

            a >>= 1;

            b <<= 1;

        }

        p ^= b;

        a >>= 1;

        b <<= 1;

    }

    return p;

}

The problem is that its performance is dependent on the position of the highest bit in a. Thus with four bits at the top it runs at a poor 35% of your fixed coerced version

Testing Compare clmul(0xF000000, 1).

32Int coerced.: 0.520µs OPS 1,924,528 100% Total 530ms 1020 cycles

Shifting......: 1.497µs OPS 667,984  35% Total 1,467ms 980 cycles

But swap the arguments and the table is turned upside down out performing all other versions with ease.

function clmul_u32_b(b, a) {  // swapped args

    ... code see above snippet

    return p;

}

Results

Testing Compare clmul(0xF000000, 1).

Shifting swap.: 0.188µs OPS 5,324,813 100% Total 563ms 1000 cycles

32Int coerced.: 0.521µs OPS 1,918,526  36% Total 1,564ms 1000 cycles

Conclultion

In terms of ease of use, readability (4+ years JS experience reader), and performance your coerced version with the extra line to force b to an int has in my view the best for all 3 criteria.

// includes the line b |= 0

function clmul_u32_coerced(a, b) {

    var prod = 0;

    b |= 0;

    while (a != 0) {

        prod ^= b * (a & -a);

        a &= a - 1;

    }

    return prod;

}

Though the shifting version I present can in a best case out perform by a factor of 3, the best case is rare and any code to select for best cases will result in an overall performance loss.

Notes on testing

A call is via a function that grouped function calls by 400 see below.

Calls are run in strict mode.

Calls have been allowed to run until optimisations complete (eg testing waits till times are stable before logging time)

Device and setup "Chrome Version 70.0.3538.102 (Official Build) (32-bit)" on a "Win10 CPU i7 Q720 @1.6GHz" laptop

function call(a, b) {

    var i = 25;

    while (i--) {

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

    }

}

function call(a, b) {

    var i = 25;

    while (i--) {

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

    }

}

function call(a, b) {

    var i = 25;

    while (i--) {

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

        clmul_u32(a, b);

    }

}

edited yesterday

answered yesterday

Blindman67

6,3691521

edited yesterday

answered yesterday

Blindman67

6,3691521

answered yesterday

Blindman67

6,3691521

answered yesterday

Blindman67

6,3691521

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Name

This page is only for reference, If you need detailed information, please check here