Why is if (variable1 % variable2 == 0) inefficient?












39















I am new to java, and was running some code last night, and this really bothered me. I was building a simple program to display every X outputs in a for loop, and I noticed a MASSIVE decrease in performance, when I used modulus as variable % variable vs variable % 5000 or whatnot. Can someone explain to me why this is and what is causing it? So I can be better...



Here is the "efficient" code (sorry if I get a bit of syntax wrong I'm not on the computer with the code right now)



long startNum = 0;
long stopNum = 1000000000L;

for (long i = startNum; i <= stopNum; i++){
if (i % 50000 == 0) {
System.out.println(i);
}
}


Here is the "inefficient code"



long startNum = 0;
long stopNum = 1000000000L;
long progressCheck = 50000;

for (long i = startNum; i <= stopNum; i++){
if (i % progressCheck == 0) {
System.out.println(i);
}
}


Mind you I had a date variable to measure the differences, and once it became long enough, the first one took 50ms while the other took 12 seconds or something like that. You may have to increase the stopNum or decrease the progressCheck if your PC is more efficient than mine or what not.



I looked for this question around the web, but I can't find an answer, maybe I'm just not asking it right.



EDIT:
I did not expect my question to be so popular, I appreciate all the answers. I did perform a benchmark on each half in time taken, and the inefficient code took considerably longer, 1/4 of a second vs 10 seconds give or take. Granted they are using println, but they are both doing the same amount, so I wouldn't imagine that would skew it much, especially since the discrepancy is repeatable. As for the answers, since I am new to Java, I will let votes decide for now which answer is best. I will try to pick one by Wednesday.



EDIT2:
I am going to make another test tonight, where instead of modulus, it just increments a variable, and when it reaches progressCheck, it will perform one, and then reset that variable to 0. for a 3rd option.



EDIT3:



I used this code, and below I'll show my results.. Thank you ALL for the wonderful help!



public class Main {

public static void main(String args) {

long startNum = 0;
long stopNum = 1000000000L;
long progressCheck = 50000;
final long finalProgressCheck = 50000;
long date;

// using a fixed value
date = System.currentTimeMillis();
for (long i = startNum; i <= stopNum; i++) {
if (i % 50000 == 0) {
System.out.println(i);
}
}
long final1 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
//using a variable
for (long i = startNum; i <= stopNum; i++) {
if (i % progressCheck == 0) {
System.out.println(i);
}
}
long final2 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
// using a final declared variable
for (long i = startNum; i <= stopNum; i++) {
if (i % finalProgressCheck == 0) {
System.out.println(i);
}
}
long final3 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
// using increments to determine progressCheck
int increment = 0;
for (long i = startNum; i <= stopNum; i++) {
if (increment == 50000) {
System.out.println(i);
increment = 0;
}
increment++;

}

long final4 = System.currentTimeMillis() - date;
System.out.println(
"/nfixed = " + final1 + " ms " + "/nvariable = " + final2 + " ms " + "/nfinal variable = " + final3 + " ms " + "/nincrement = " + final4 + " ms");
}
}


Results:




  • fixed = 1067 ms

  • variable = 8406 ms

  • final variable = 1045 ms

  • increment = 1894 ms










share|improve this question




















  • 14





    I got the same result actually. On my machine, the first loop runs in about 1,5 seconds and the second runs in about 9 seconds. If I add final in front of the progressCheck variable, both run at the same speed again. That leads me to believe that the compiler or the JIT manages to optimize the loop when it knows that progressCheck is constant.

    – marstran
    9 hours ago






  • 10





    Division by a constant can be easily converted to a multiplication by the multiplicative inverse. Division by a variable can't. And a 32-bit division is faster than a 64-bit division on x86

    – phuclv
    9 hours ago






  • 2





    @phuclv note 32-bit division is not a issue here, it is a 64-bit remainder operation in both cases

    – Carlos Heuberger
    9 hours ago






  • 2





    BTW both of those variables can be declared not just final but static final. But this is a very interesting observation. I am up-oting the question

    – Michael Gantman
    9 hours ago






  • 3





    @RobertCotterman if you declare the variable as final, the compiler creates the same bytecode as using the constant (eclipse/Java 11) ((despite using one more memory slot for the variable))

    – Carlos Heuberger
    9 hours ago


















39















I am new to java, and was running some code last night, and this really bothered me. I was building a simple program to display every X outputs in a for loop, and I noticed a MASSIVE decrease in performance, when I used modulus as variable % variable vs variable % 5000 or whatnot. Can someone explain to me why this is and what is causing it? So I can be better...



Here is the "efficient" code (sorry if I get a bit of syntax wrong I'm not on the computer with the code right now)



long startNum = 0;
long stopNum = 1000000000L;

for (long i = startNum; i <= stopNum; i++){
if (i % 50000 == 0) {
System.out.println(i);
}
}


Here is the "inefficient code"



long startNum = 0;
long stopNum = 1000000000L;
long progressCheck = 50000;

for (long i = startNum; i <= stopNum; i++){
if (i % progressCheck == 0) {
System.out.println(i);
}
}


Mind you I had a date variable to measure the differences, and once it became long enough, the first one took 50ms while the other took 12 seconds or something like that. You may have to increase the stopNum or decrease the progressCheck if your PC is more efficient than mine or what not.



I looked for this question around the web, but I can't find an answer, maybe I'm just not asking it right.



EDIT:
I did not expect my question to be so popular, I appreciate all the answers. I did perform a benchmark on each half in time taken, and the inefficient code took considerably longer, 1/4 of a second vs 10 seconds give or take. Granted they are using println, but they are both doing the same amount, so I wouldn't imagine that would skew it much, especially since the discrepancy is repeatable. As for the answers, since I am new to Java, I will let votes decide for now which answer is best. I will try to pick one by Wednesday.



EDIT2:
I am going to make another test tonight, where instead of modulus, it just increments a variable, and when it reaches progressCheck, it will perform one, and then reset that variable to 0. for a 3rd option.



EDIT3:



I used this code, and below I'll show my results.. Thank you ALL for the wonderful help!



public class Main {

public static void main(String args) {

long startNum = 0;
long stopNum = 1000000000L;
long progressCheck = 50000;
final long finalProgressCheck = 50000;
long date;

// using a fixed value
date = System.currentTimeMillis();
for (long i = startNum; i <= stopNum; i++) {
if (i % 50000 == 0) {
System.out.println(i);
}
}
long final1 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
//using a variable
for (long i = startNum; i <= stopNum; i++) {
if (i % progressCheck == 0) {
System.out.println(i);
}
}
long final2 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
// using a final declared variable
for (long i = startNum; i <= stopNum; i++) {
if (i % finalProgressCheck == 0) {
System.out.println(i);
}
}
long final3 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
// using increments to determine progressCheck
int increment = 0;
for (long i = startNum; i <= stopNum; i++) {
if (increment == 50000) {
System.out.println(i);
increment = 0;
}
increment++;

}

long final4 = System.currentTimeMillis() - date;
System.out.println(
"/nfixed = " + final1 + " ms " + "/nvariable = " + final2 + " ms " + "/nfinal variable = " + final3 + " ms " + "/nincrement = " + final4 + " ms");
}
}


Results:




  • fixed = 1067 ms

  • variable = 8406 ms

  • final variable = 1045 ms

  • increment = 1894 ms










share|improve this question




















  • 14





    I got the same result actually. On my machine, the first loop runs in about 1,5 seconds and the second runs in about 9 seconds. If I add final in front of the progressCheck variable, both run at the same speed again. That leads me to believe that the compiler or the JIT manages to optimize the loop when it knows that progressCheck is constant.

    – marstran
    9 hours ago






  • 10





    Division by a constant can be easily converted to a multiplication by the multiplicative inverse. Division by a variable can't. And a 32-bit division is faster than a 64-bit division on x86

    – phuclv
    9 hours ago






  • 2





    @phuclv note 32-bit division is not a issue here, it is a 64-bit remainder operation in both cases

    – Carlos Heuberger
    9 hours ago






  • 2





    BTW both of those variables can be declared not just final but static final. But this is a very interesting observation. I am up-oting the question

    – Michael Gantman
    9 hours ago






  • 3





    @RobertCotterman if you declare the variable as final, the compiler creates the same bytecode as using the constant (eclipse/Java 11) ((despite using one more memory slot for the variable))

    – Carlos Heuberger
    9 hours ago
















39












39








39


6






I am new to java, and was running some code last night, and this really bothered me. I was building a simple program to display every X outputs in a for loop, and I noticed a MASSIVE decrease in performance, when I used modulus as variable % variable vs variable % 5000 or whatnot. Can someone explain to me why this is and what is causing it? So I can be better...



Here is the "efficient" code (sorry if I get a bit of syntax wrong I'm not on the computer with the code right now)



long startNum = 0;
long stopNum = 1000000000L;

for (long i = startNum; i <= stopNum; i++){
if (i % 50000 == 0) {
System.out.println(i);
}
}


Here is the "inefficient code"



long startNum = 0;
long stopNum = 1000000000L;
long progressCheck = 50000;

for (long i = startNum; i <= stopNum; i++){
if (i % progressCheck == 0) {
System.out.println(i);
}
}


Mind you I had a date variable to measure the differences, and once it became long enough, the first one took 50ms while the other took 12 seconds or something like that. You may have to increase the stopNum or decrease the progressCheck if your PC is more efficient than mine or what not.



I looked for this question around the web, but I can't find an answer, maybe I'm just not asking it right.



EDIT:
I did not expect my question to be so popular, I appreciate all the answers. I did perform a benchmark on each half in time taken, and the inefficient code took considerably longer, 1/4 of a second vs 10 seconds give or take. Granted they are using println, but they are both doing the same amount, so I wouldn't imagine that would skew it much, especially since the discrepancy is repeatable. As for the answers, since I am new to Java, I will let votes decide for now which answer is best. I will try to pick one by Wednesday.



EDIT2:
I am going to make another test tonight, where instead of modulus, it just increments a variable, and when it reaches progressCheck, it will perform one, and then reset that variable to 0. for a 3rd option.



EDIT3:



I used this code, and below I'll show my results.. Thank you ALL for the wonderful help!



public class Main {

public static void main(String args) {

long startNum = 0;
long stopNum = 1000000000L;
long progressCheck = 50000;
final long finalProgressCheck = 50000;
long date;

// using a fixed value
date = System.currentTimeMillis();
for (long i = startNum; i <= stopNum; i++) {
if (i % 50000 == 0) {
System.out.println(i);
}
}
long final1 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
//using a variable
for (long i = startNum; i <= stopNum; i++) {
if (i % progressCheck == 0) {
System.out.println(i);
}
}
long final2 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
// using a final declared variable
for (long i = startNum; i <= stopNum; i++) {
if (i % finalProgressCheck == 0) {
System.out.println(i);
}
}
long final3 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
// using increments to determine progressCheck
int increment = 0;
for (long i = startNum; i <= stopNum; i++) {
if (increment == 50000) {
System.out.println(i);
increment = 0;
}
increment++;

}

long final4 = System.currentTimeMillis() - date;
System.out.println(
"/nfixed = " + final1 + " ms " + "/nvariable = " + final2 + " ms " + "/nfinal variable = " + final3 + " ms " + "/nincrement = " + final4 + " ms");
}
}


Results:




  • fixed = 1067 ms

  • variable = 8406 ms

  • final variable = 1045 ms

  • increment = 1894 ms










share|improve this question
















I am new to java, and was running some code last night, and this really bothered me. I was building a simple program to display every X outputs in a for loop, and I noticed a MASSIVE decrease in performance, when I used modulus as variable % variable vs variable % 5000 or whatnot. Can someone explain to me why this is and what is causing it? So I can be better...



Here is the "efficient" code (sorry if I get a bit of syntax wrong I'm not on the computer with the code right now)



long startNum = 0;
long stopNum = 1000000000L;

for (long i = startNum; i <= stopNum; i++){
if (i % 50000 == 0) {
System.out.println(i);
}
}


Here is the "inefficient code"



long startNum = 0;
long stopNum = 1000000000L;
long progressCheck = 50000;

for (long i = startNum; i <= stopNum; i++){
if (i % progressCheck == 0) {
System.out.println(i);
}
}


Mind you I had a date variable to measure the differences, and once it became long enough, the first one took 50ms while the other took 12 seconds or something like that. You may have to increase the stopNum or decrease the progressCheck if your PC is more efficient than mine or what not.



I looked for this question around the web, but I can't find an answer, maybe I'm just not asking it right.



EDIT:
I did not expect my question to be so popular, I appreciate all the answers. I did perform a benchmark on each half in time taken, and the inefficient code took considerably longer, 1/4 of a second vs 10 seconds give or take. Granted they are using println, but they are both doing the same amount, so I wouldn't imagine that would skew it much, especially since the discrepancy is repeatable. As for the answers, since I am new to Java, I will let votes decide for now which answer is best. I will try to pick one by Wednesday.



EDIT2:
I am going to make another test tonight, where instead of modulus, it just increments a variable, and when it reaches progressCheck, it will perform one, and then reset that variable to 0. for a 3rd option.



EDIT3:



I used this code, and below I'll show my results.. Thank you ALL for the wonderful help!



public class Main {

public static void main(String args) {

long startNum = 0;
long stopNum = 1000000000L;
long progressCheck = 50000;
final long finalProgressCheck = 50000;
long date;

// using a fixed value
date = System.currentTimeMillis();
for (long i = startNum; i <= stopNum; i++) {
if (i % 50000 == 0) {
System.out.println(i);
}
}
long final1 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
//using a variable
for (long i = startNum; i <= stopNum; i++) {
if (i % progressCheck == 0) {
System.out.println(i);
}
}
long final2 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
// using a final declared variable
for (long i = startNum; i <= stopNum; i++) {
if (i % finalProgressCheck == 0) {
System.out.println(i);
}
}
long final3 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
// using increments to determine progressCheck
int increment = 0;
for (long i = startNum; i <= stopNum; i++) {
if (increment == 50000) {
System.out.println(i);
increment = 0;
}
increment++;

}

long final4 = System.currentTimeMillis() - date;
System.out.println(
"/nfixed = " + final1 + " ms " + "/nvariable = " + final2 + " ms " + "/nfinal variable = " + final3 + " ms " + "/nincrement = " + final4 + " ms");
}
}


Results:




  • fixed = 1067 ms

  • variable = 8406 ms

  • final variable = 1045 ms

  • increment = 1894 ms







java performance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 4 mins ago







Robert Cotterman

















asked 9 hours ago









Robert CottermanRobert Cotterman

728110




728110








  • 14





    I got the same result actually. On my machine, the first loop runs in about 1,5 seconds and the second runs in about 9 seconds. If I add final in front of the progressCheck variable, both run at the same speed again. That leads me to believe that the compiler or the JIT manages to optimize the loop when it knows that progressCheck is constant.

    – marstran
    9 hours ago






  • 10





    Division by a constant can be easily converted to a multiplication by the multiplicative inverse. Division by a variable can't. And a 32-bit division is faster than a 64-bit division on x86

    – phuclv
    9 hours ago






  • 2





    @phuclv note 32-bit division is not a issue here, it is a 64-bit remainder operation in both cases

    – Carlos Heuberger
    9 hours ago






  • 2





    BTW both of those variables can be declared not just final but static final. But this is a very interesting observation. I am up-oting the question

    – Michael Gantman
    9 hours ago






  • 3





    @RobertCotterman if you declare the variable as final, the compiler creates the same bytecode as using the constant (eclipse/Java 11) ((despite using one more memory slot for the variable))

    – Carlos Heuberger
    9 hours ago
















  • 14





    I got the same result actually. On my machine, the first loop runs in about 1,5 seconds and the second runs in about 9 seconds. If I add final in front of the progressCheck variable, both run at the same speed again. That leads me to believe that the compiler or the JIT manages to optimize the loop when it knows that progressCheck is constant.

    – marstran
    9 hours ago






  • 10





    Division by a constant can be easily converted to a multiplication by the multiplicative inverse. Division by a variable can't. And a 32-bit division is faster than a 64-bit division on x86

    – phuclv
    9 hours ago






  • 2





    @phuclv note 32-bit division is not a issue here, it is a 64-bit remainder operation in both cases

    – Carlos Heuberger
    9 hours ago






  • 2





    BTW both of those variables can be declared not just final but static final. But this is a very interesting observation. I am up-oting the question

    – Michael Gantman
    9 hours ago






  • 3





    @RobertCotterman if you declare the variable as final, the compiler creates the same bytecode as using the constant (eclipse/Java 11) ((despite using one more memory slot for the variable))

    – Carlos Heuberger
    9 hours ago










14




14





I got the same result actually. On my machine, the first loop runs in about 1,5 seconds and the second runs in about 9 seconds. If I add final in front of the progressCheck variable, both run at the same speed again. That leads me to believe that the compiler or the JIT manages to optimize the loop when it knows that progressCheck is constant.

– marstran
9 hours ago





I got the same result actually. On my machine, the first loop runs in about 1,5 seconds and the second runs in about 9 seconds. If I add final in front of the progressCheck variable, both run at the same speed again. That leads me to believe that the compiler or the JIT manages to optimize the loop when it knows that progressCheck is constant.

– marstran
9 hours ago




10




10





Division by a constant can be easily converted to a multiplication by the multiplicative inverse. Division by a variable can't. And a 32-bit division is faster than a 64-bit division on x86

– phuclv
9 hours ago





Division by a constant can be easily converted to a multiplication by the multiplicative inverse. Division by a variable can't. And a 32-bit division is faster than a 64-bit division on x86

– phuclv
9 hours ago




2




2





@phuclv note 32-bit division is not a issue here, it is a 64-bit remainder operation in both cases

– Carlos Heuberger
9 hours ago





@phuclv note 32-bit division is not a issue here, it is a 64-bit remainder operation in both cases

– Carlos Heuberger
9 hours ago




2




2





BTW both of those variables can be declared not just final but static final. But this is a very interesting observation. I am up-oting the question

– Michael Gantman
9 hours ago





BTW both of those variables can be declared not just final but static final. But this is a very interesting observation. I am up-oting the question

– Michael Gantman
9 hours ago




3




3





@RobertCotterman if you declare the variable as final, the compiler creates the same bytecode as using the constant (eclipse/Java 11) ((despite using one more memory slot for the variable))

– Carlos Heuberger
9 hours ago







@RobertCotterman if you declare the variable as final, the compiler creates the same bytecode as using the constant (eclipse/Java 11) ((despite using one more memory slot for the variable))

– Carlos Heuberger
9 hours ago














4 Answers
4






active

oldest

votes


















16














You are measuring the OSR (on-stack replacement) stub.



OSR stub is a special version of compiled method intended specifically for transferring execution from interpreted mode to compiled code while the method is running.



OSR stubs are not as optimized as regular methods, because they need a frame layout compatible with interpreted frame. I showed this already in the following answers: 1, 2, 3.



A similar thing happens here, too. While "inneficient code" is running a long loop, the method is compiled specially for the on-stack replacement right inside the loop. The state is transferred from the interpreted frame to OSR-compiled method, and this state includes progressCheck local variable. At this point JIT cannot replace the variable with the constant, and thus cannot apply certain optimizations like strength reduction.



In particular this means JIT does not replace integer division with multiplication. (See Why does GCC use multiplication by a strange number in implementing integer division? for the asm trick from an ahead-of-time compiler, when the value is a compile-time constant after inlining / constant-propagation, if those optimizations are enabled. An integer literal right in the % expression also gets optimized by gcc -O0, similar to here where it's optimized by the JITer even in an OSR stub.)



However, if you run the same method several times, the second and the subsequent runs will execute the regular (non-OSR) code, which is fully optimized. Here is a benchmark to prove the theory:



@State(Scope.Benchmark)
public class Div {

@Benchmark
public void divConst(Blackhole blackhole) {
long startNum = 0;
long stopNum = 100000000L;

for (long i = startNum; i <= stopNum; i++) {
if (i % 50000 == 0) {
blackhole.consume(i);
}
}
}

@Benchmark
public void divVar(Blackhole blackhole) {
long startNum = 0;
long stopNum = 100000000L;
long progressCheck = 50000;

for (long i = startNum; i <= stopNum; i++) {
if (i % progressCheck == 0) {
blackhole.consume(i);
}
}
}
}


And the results:



# Benchmark: bench.Div.divConst

# Run progress: 0,00% complete, ETA 00:00:16
# Fork: 1 of 1
# Warmup Iteration 1: 126,967 ms/op
# Warmup Iteration 2: 105,660 ms/op
# Warmup Iteration 3: 106,205 ms/op
Iteration 1: 105,620 ms/op
Iteration 2: 105,789 ms/op
Iteration 3: 105,915 ms/op
Iteration 4: 105,629 ms/op
Iteration 5: 105,632 ms/op


# Benchmark: bench.Div.divVar

# Run progress: 50,00% complete, ETA 00:00:09
# Fork: 1 of 1
# Warmup Iteration 1: 844,708 ms/op <-- much slower!
# Warmup Iteration 2: 105,893 ms/op <-- as fast as divConst
# Warmup Iteration 3: 105,601 ms/op
Iteration 1: 105,570 ms/op
Iteration 2: 105,475 ms/op
Iteration 3: 105,702 ms/op
Iteration 4: 105,535 ms/op
Iteration 5: 105,766 ms/op


The very first iteration of divVar is indeed much slower, because of inefficiently compiled OSR stub. But as soon as the method reruns from the beginning, the new unconstrained version is executed which leverages all the available compiler optimizations.






share|improve this answer


























  • I hesitate to vote on this. On the one hand, it sounds like an elaborate way of saying "You messed up your benchmark, read something about JIT". On the other hand, I wonder why you seem to be so sure that OSR was the main relevant point here. I mean, doing a (micro) benchmark that involves System.out.println will nearly necessarily produce garbage results, and the fact that both versions are equally fast does not have to do anything with OSR in particular, as far as I can tell..

    – Marco13
    1 hour ago



















5














As others have noted, the general modulus operation requires a division to be done. In some cases, the division can be replaced (by the compiler) by a multiplication. But both can be slow compared to addition/subtraction. Hence, the best performance can be expected by something along these lines:



long progressCheck = 50000;

long counter = progressCheck;

for (long i = startNum; i <= stopNum; i++){
if (--counter == 0) {
System.out.println(i);
counter = progressCheck;
}
}


Notice that often you don't really want/need modulus, because you know that your loop counter (i) or whatever is only ever incremented by 1, and you really don't care about the actual remainder the modulus will give you, just see if the incrementing-by-one counter hits some value.



Another 'trick' is to use power-of-two values/limits, e.g. progressCheck = 1024;. Modulus a power of two can be quickly calculated via bitwise and, i.e. if ( (i & (1024-1)) == 0 ) {...}. This should be pretty fast too, and may on some architectures outperform the explicit counter above.






share|improve this answer
























  • A smart compiler would invert the loops here. Or you could do that in the source. The if() body becomes an outer-loop body, and the stuff outside the if() becomes an inner loop body that runs for min(progressCheck, stopNum-i) iterations. So at the start, and every time counter reaches 0, you do long next_stop = i + min(progressCheck, stopNum-i); to set up for a for(; i< next_stop; i++) {} loop. In this case that inner loop is empty and should hopefully optimize away entirely, of you can do that in the source and make it easy for the JITer, reducing your loop to i+=50k.

    – Peter Cordes
    1 hour ago






  • 1





    But yes, in general a down-counter is a good efficient technique for fizzbuzz / progresscheck type stuff.

    – Peter Cordes
    1 hour ago



















4














In follow-up to @phuclv comment, I checked the code generated by JIT1, the results are as follows:



for variable % 5000 (division by constant):



mov     rax,29f16b11c6d1e109h
imul rbx
mov r10,rbx
sar r10,3fh
sar rdx,0dh
sub rdx,r10
imul r10,rdx,0c350h ; <-- imul
mov r11,rbx
sub r11,r10
test r11,r11
jne 1d707ad14a0h


for variable % variable (by variable):



mov     rax,r14
mov rdx,8000000000000000h
cmp rax,rdx
jne 22ccce218edh
xor edx,edx
cmp rbx,0ffffffffffffffffh
je 22ccce218f2h
cqo
idiv rax,rbx ; <-- idiv
test rdx,rdx
jne 22ccce218c0h


Because division always takes longer than multiplication, the last code snippet is less performant.



Java version:



java version "11" 2018-09-25
Java(TM) SE Runtime Environment 18.9 (build 11+28)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11+28, mixed mode)




1 - VM options used: -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,src/java/Main.main






share|improve this answer

































    1














    I also surprised by seeing the performance of the above codes. Its all about the time taken by the compiler for executing the program as per the declared variable. In the second code i.e. inefficient one :



    for (long i = startNum; i <= stopNum; i++){
    if (i % progressCheck == 0) {
    System.out.println(i)
    }
    }


    you are performing the modulus operation between two variable. Here compiler have to check the value of stopNum and progressCheck to go to the specific memory block located for these variable every time after each iteration because it is a variable and its value might be change. Thats why after each iteration compiler went to the memory location to check the latest value of the variables. Therefore at the compile time compiler not able to create efficient byte code. In the first code , you are performing modulus operator between a variable and a constant numeric value which is not going to change within execution and compiler no need to check the value of that numeric value from the memory location. Thats why compiler able to create efficient byte code. If you declare progressCheck as a final/final static variable then at the time of run-time/compile-time compiler know that its a final variable and its value not going to change then compiler replace the progressCheck with 50000 in code :



    for (long i = startNum; i <= stopNum; i++){
    if (i % 50000== 0) {
    System.out.println(i)
    }
    }


    Now you can notice that this code is also look like the first code i.e. efficient one. The performance of first code and as we mentioned above both code will work efficiently. There will not be much difference in execution time of both code.






    share|improve this answer








    New contributor




    Bishal Dubey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.




















      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54405842%2fwhy-is-if-variable1-variable2-0-inefficient%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      16














      You are measuring the OSR (on-stack replacement) stub.



      OSR stub is a special version of compiled method intended specifically for transferring execution from interpreted mode to compiled code while the method is running.



      OSR stubs are not as optimized as regular methods, because they need a frame layout compatible with interpreted frame. I showed this already in the following answers: 1, 2, 3.



      A similar thing happens here, too. While "inneficient code" is running a long loop, the method is compiled specially for the on-stack replacement right inside the loop. The state is transferred from the interpreted frame to OSR-compiled method, and this state includes progressCheck local variable. At this point JIT cannot replace the variable with the constant, and thus cannot apply certain optimizations like strength reduction.



      In particular this means JIT does not replace integer division with multiplication. (See Why does GCC use multiplication by a strange number in implementing integer division? for the asm trick from an ahead-of-time compiler, when the value is a compile-time constant after inlining / constant-propagation, if those optimizations are enabled. An integer literal right in the % expression also gets optimized by gcc -O0, similar to here where it's optimized by the JITer even in an OSR stub.)



      However, if you run the same method several times, the second and the subsequent runs will execute the regular (non-OSR) code, which is fully optimized. Here is a benchmark to prove the theory:



      @State(Scope.Benchmark)
      public class Div {

      @Benchmark
      public void divConst(Blackhole blackhole) {
      long startNum = 0;
      long stopNum = 100000000L;

      for (long i = startNum; i <= stopNum; i++) {
      if (i % 50000 == 0) {
      blackhole.consume(i);
      }
      }
      }

      @Benchmark
      public void divVar(Blackhole blackhole) {
      long startNum = 0;
      long stopNum = 100000000L;
      long progressCheck = 50000;

      for (long i = startNum; i <= stopNum; i++) {
      if (i % progressCheck == 0) {
      blackhole.consume(i);
      }
      }
      }
      }


      And the results:



      # Benchmark: bench.Div.divConst

      # Run progress: 0,00% complete, ETA 00:00:16
      # Fork: 1 of 1
      # Warmup Iteration 1: 126,967 ms/op
      # Warmup Iteration 2: 105,660 ms/op
      # Warmup Iteration 3: 106,205 ms/op
      Iteration 1: 105,620 ms/op
      Iteration 2: 105,789 ms/op
      Iteration 3: 105,915 ms/op
      Iteration 4: 105,629 ms/op
      Iteration 5: 105,632 ms/op


      # Benchmark: bench.Div.divVar

      # Run progress: 50,00% complete, ETA 00:00:09
      # Fork: 1 of 1
      # Warmup Iteration 1: 844,708 ms/op <-- much slower!
      # Warmup Iteration 2: 105,893 ms/op <-- as fast as divConst
      # Warmup Iteration 3: 105,601 ms/op
      Iteration 1: 105,570 ms/op
      Iteration 2: 105,475 ms/op
      Iteration 3: 105,702 ms/op
      Iteration 4: 105,535 ms/op
      Iteration 5: 105,766 ms/op


      The very first iteration of divVar is indeed much slower, because of inefficiently compiled OSR stub. But as soon as the method reruns from the beginning, the new unconstrained version is executed which leverages all the available compiler optimizations.






      share|improve this answer


























      • I hesitate to vote on this. On the one hand, it sounds like an elaborate way of saying "You messed up your benchmark, read something about JIT". On the other hand, I wonder why you seem to be so sure that OSR was the main relevant point here. I mean, doing a (micro) benchmark that involves System.out.println will nearly necessarily produce garbage results, and the fact that both versions are equally fast does not have to do anything with OSR in particular, as far as I can tell..

        – Marco13
        1 hour ago
















      16














      You are measuring the OSR (on-stack replacement) stub.



      OSR stub is a special version of compiled method intended specifically for transferring execution from interpreted mode to compiled code while the method is running.



      OSR stubs are not as optimized as regular methods, because they need a frame layout compatible with interpreted frame. I showed this already in the following answers: 1, 2, 3.



      A similar thing happens here, too. While "inneficient code" is running a long loop, the method is compiled specially for the on-stack replacement right inside the loop. The state is transferred from the interpreted frame to OSR-compiled method, and this state includes progressCheck local variable. At this point JIT cannot replace the variable with the constant, and thus cannot apply certain optimizations like strength reduction.



      In particular this means JIT does not replace integer division with multiplication. (See Why does GCC use multiplication by a strange number in implementing integer division? for the asm trick from an ahead-of-time compiler, when the value is a compile-time constant after inlining / constant-propagation, if those optimizations are enabled. An integer literal right in the % expression also gets optimized by gcc -O0, similar to here where it's optimized by the JITer even in an OSR stub.)



      However, if you run the same method several times, the second and the subsequent runs will execute the regular (non-OSR) code, which is fully optimized. Here is a benchmark to prove the theory:



      @State(Scope.Benchmark)
      public class Div {

      @Benchmark
      public void divConst(Blackhole blackhole) {
      long startNum = 0;
      long stopNum = 100000000L;

      for (long i = startNum; i <= stopNum; i++) {
      if (i % 50000 == 0) {
      blackhole.consume(i);
      }
      }
      }

      @Benchmark
      public void divVar(Blackhole blackhole) {
      long startNum = 0;
      long stopNum = 100000000L;
      long progressCheck = 50000;

      for (long i = startNum; i <= stopNum; i++) {
      if (i % progressCheck == 0) {
      blackhole.consume(i);
      }
      }
      }
      }


      And the results:



      # Benchmark: bench.Div.divConst

      # Run progress: 0,00% complete, ETA 00:00:16
      # Fork: 1 of 1
      # Warmup Iteration 1: 126,967 ms/op
      # Warmup Iteration 2: 105,660 ms/op
      # Warmup Iteration 3: 106,205 ms/op
      Iteration 1: 105,620 ms/op
      Iteration 2: 105,789 ms/op
      Iteration 3: 105,915 ms/op
      Iteration 4: 105,629 ms/op
      Iteration 5: 105,632 ms/op


      # Benchmark: bench.Div.divVar

      # Run progress: 50,00% complete, ETA 00:00:09
      # Fork: 1 of 1
      # Warmup Iteration 1: 844,708 ms/op <-- much slower!
      # Warmup Iteration 2: 105,893 ms/op <-- as fast as divConst
      # Warmup Iteration 3: 105,601 ms/op
      Iteration 1: 105,570 ms/op
      Iteration 2: 105,475 ms/op
      Iteration 3: 105,702 ms/op
      Iteration 4: 105,535 ms/op
      Iteration 5: 105,766 ms/op


      The very first iteration of divVar is indeed much slower, because of inefficiently compiled OSR stub. But as soon as the method reruns from the beginning, the new unconstrained version is executed which leverages all the available compiler optimizations.






      share|improve this answer


























      • I hesitate to vote on this. On the one hand, it sounds like an elaborate way of saying "You messed up your benchmark, read something about JIT". On the other hand, I wonder why you seem to be so sure that OSR was the main relevant point here. I mean, doing a (micro) benchmark that involves System.out.println will nearly necessarily produce garbage results, and the fact that both versions are equally fast does not have to do anything with OSR in particular, as far as I can tell..

        – Marco13
        1 hour ago














      16












      16








      16







      You are measuring the OSR (on-stack replacement) stub.



      OSR stub is a special version of compiled method intended specifically for transferring execution from interpreted mode to compiled code while the method is running.



      OSR stubs are not as optimized as regular methods, because they need a frame layout compatible with interpreted frame. I showed this already in the following answers: 1, 2, 3.



      A similar thing happens here, too. While "inneficient code" is running a long loop, the method is compiled specially for the on-stack replacement right inside the loop. The state is transferred from the interpreted frame to OSR-compiled method, and this state includes progressCheck local variable. At this point JIT cannot replace the variable with the constant, and thus cannot apply certain optimizations like strength reduction.



      In particular this means JIT does not replace integer division with multiplication. (See Why does GCC use multiplication by a strange number in implementing integer division? for the asm trick from an ahead-of-time compiler, when the value is a compile-time constant after inlining / constant-propagation, if those optimizations are enabled. An integer literal right in the % expression also gets optimized by gcc -O0, similar to here where it's optimized by the JITer even in an OSR stub.)



      However, if you run the same method several times, the second and the subsequent runs will execute the regular (non-OSR) code, which is fully optimized. Here is a benchmark to prove the theory:



      @State(Scope.Benchmark)
      public class Div {

      @Benchmark
      public void divConst(Blackhole blackhole) {
      long startNum = 0;
      long stopNum = 100000000L;

      for (long i = startNum; i <= stopNum; i++) {
      if (i % 50000 == 0) {
      blackhole.consume(i);
      }
      }
      }

      @Benchmark
      public void divVar(Blackhole blackhole) {
      long startNum = 0;
      long stopNum = 100000000L;
      long progressCheck = 50000;

      for (long i = startNum; i <= stopNum; i++) {
      if (i % progressCheck == 0) {
      blackhole.consume(i);
      }
      }
      }
      }


      And the results:



      # Benchmark: bench.Div.divConst

      # Run progress: 0,00% complete, ETA 00:00:16
      # Fork: 1 of 1
      # Warmup Iteration 1: 126,967 ms/op
      # Warmup Iteration 2: 105,660 ms/op
      # Warmup Iteration 3: 106,205 ms/op
      Iteration 1: 105,620 ms/op
      Iteration 2: 105,789 ms/op
      Iteration 3: 105,915 ms/op
      Iteration 4: 105,629 ms/op
      Iteration 5: 105,632 ms/op


      # Benchmark: bench.Div.divVar

      # Run progress: 50,00% complete, ETA 00:00:09
      # Fork: 1 of 1
      # Warmup Iteration 1: 844,708 ms/op <-- much slower!
      # Warmup Iteration 2: 105,893 ms/op <-- as fast as divConst
      # Warmup Iteration 3: 105,601 ms/op
      Iteration 1: 105,570 ms/op
      Iteration 2: 105,475 ms/op
      Iteration 3: 105,702 ms/op
      Iteration 4: 105,535 ms/op
      Iteration 5: 105,766 ms/op


      The very first iteration of divVar is indeed much slower, because of inefficiently compiled OSR stub. But as soon as the method reruns from the beginning, the new unconstrained version is executed which leverages all the available compiler optimizations.






      share|improve this answer















      You are measuring the OSR (on-stack replacement) stub.



      OSR stub is a special version of compiled method intended specifically for transferring execution from interpreted mode to compiled code while the method is running.



      OSR stubs are not as optimized as regular methods, because they need a frame layout compatible with interpreted frame. I showed this already in the following answers: 1, 2, 3.



      A similar thing happens here, too. While "inneficient code" is running a long loop, the method is compiled specially for the on-stack replacement right inside the loop. The state is transferred from the interpreted frame to OSR-compiled method, and this state includes progressCheck local variable. At this point JIT cannot replace the variable with the constant, and thus cannot apply certain optimizations like strength reduction.



      In particular this means JIT does not replace integer division with multiplication. (See Why does GCC use multiplication by a strange number in implementing integer division? for the asm trick from an ahead-of-time compiler, when the value is a compile-time constant after inlining / constant-propagation, if those optimizations are enabled. An integer literal right in the % expression also gets optimized by gcc -O0, similar to here where it's optimized by the JITer even in an OSR stub.)



      However, if you run the same method several times, the second and the subsequent runs will execute the regular (non-OSR) code, which is fully optimized. Here is a benchmark to prove the theory:



      @State(Scope.Benchmark)
      public class Div {

      @Benchmark
      public void divConst(Blackhole blackhole) {
      long startNum = 0;
      long stopNum = 100000000L;

      for (long i = startNum; i <= stopNum; i++) {
      if (i % 50000 == 0) {
      blackhole.consume(i);
      }
      }
      }

      @Benchmark
      public void divVar(Blackhole blackhole) {
      long startNum = 0;
      long stopNum = 100000000L;
      long progressCheck = 50000;

      for (long i = startNum; i <= stopNum; i++) {
      if (i % progressCheck == 0) {
      blackhole.consume(i);
      }
      }
      }
      }


      And the results:



      # Benchmark: bench.Div.divConst

      # Run progress: 0,00% complete, ETA 00:00:16
      # Fork: 1 of 1
      # Warmup Iteration 1: 126,967 ms/op
      # Warmup Iteration 2: 105,660 ms/op
      # Warmup Iteration 3: 106,205 ms/op
      Iteration 1: 105,620 ms/op
      Iteration 2: 105,789 ms/op
      Iteration 3: 105,915 ms/op
      Iteration 4: 105,629 ms/op
      Iteration 5: 105,632 ms/op


      # Benchmark: bench.Div.divVar

      # Run progress: 50,00% complete, ETA 00:00:09
      # Fork: 1 of 1
      # Warmup Iteration 1: 844,708 ms/op <-- much slower!
      # Warmup Iteration 2: 105,893 ms/op <-- as fast as divConst
      # Warmup Iteration 3: 105,601 ms/op
      Iteration 1: 105,570 ms/op
      Iteration 2: 105,475 ms/op
      Iteration 3: 105,702 ms/op
      Iteration 4: 105,535 ms/op
      Iteration 5: 105,766 ms/op


      The very first iteration of divVar is indeed much slower, because of inefficiently compiled OSR stub. But as soon as the method reruns from the beginning, the new unconstrained version is executed which leverages all the available compiler optimizations.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited 1 hour ago









      Peter Cordes

      123k18187314




      123k18187314










      answered 4 hours ago









      apanginapangin

      50.2k796127




      50.2k796127













      • I hesitate to vote on this. On the one hand, it sounds like an elaborate way of saying "You messed up your benchmark, read something about JIT". On the other hand, I wonder why you seem to be so sure that OSR was the main relevant point here. I mean, doing a (micro) benchmark that involves System.out.println will nearly necessarily produce garbage results, and the fact that both versions are equally fast does not have to do anything with OSR in particular, as far as I can tell..

        – Marco13
        1 hour ago



















      • I hesitate to vote on this. On the one hand, it sounds like an elaborate way of saying "You messed up your benchmark, read something about JIT". On the other hand, I wonder why you seem to be so sure that OSR was the main relevant point here. I mean, doing a (micro) benchmark that involves System.out.println will nearly necessarily produce garbage results, and the fact that both versions are equally fast does not have to do anything with OSR in particular, as far as I can tell..

        – Marco13
        1 hour ago

















      I hesitate to vote on this. On the one hand, it sounds like an elaborate way of saying "You messed up your benchmark, read something about JIT". On the other hand, I wonder why you seem to be so sure that OSR was the main relevant point here. I mean, doing a (micro) benchmark that involves System.out.println will nearly necessarily produce garbage results, and the fact that both versions are equally fast does not have to do anything with OSR in particular, as far as I can tell..

      – Marco13
      1 hour ago





      I hesitate to vote on this. On the one hand, it sounds like an elaborate way of saying "You messed up your benchmark, read something about JIT". On the other hand, I wonder why you seem to be so sure that OSR was the main relevant point here. I mean, doing a (micro) benchmark that involves System.out.println will nearly necessarily produce garbage results, and the fact that both versions are equally fast does not have to do anything with OSR in particular, as far as I can tell..

      – Marco13
      1 hour ago













      5














      As others have noted, the general modulus operation requires a division to be done. In some cases, the division can be replaced (by the compiler) by a multiplication. But both can be slow compared to addition/subtraction. Hence, the best performance can be expected by something along these lines:



      long progressCheck = 50000;

      long counter = progressCheck;

      for (long i = startNum; i <= stopNum; i++){
      if (--counter == 0) {
      System.out.println(i);
      counter = progressCheck;
      }
      }


      Notice that often you don't really want/need modulus, because you know that your loop counter (i) or whatever is only ever incremented by 1, and you really don't care about the actual remainder the modulus will give you, just see if the incrementing-by-one counter hits some value.



      Another 'trick' is to use power-of-two values/limits, e.g. progressCheck = 1024;. Modulus a power of two can be quickly calculated via bitwise and, i.e. if ( (i & (1024-1)) == 0 ) {...}. This should be pretty fast too, and may on some architectures outperform the explicit counter above.






      share|improve this answer
























      • A smart compiler would invert the loops here. Or you could do that in the source. The if() body becomes an outer-loop body, and the stuff outside the if() becomes an inner loop body that runs for min(progressCheck, stopNum-i) iterations. So at the start, and every time counter reaches 0, you do long next_stop = i + min(progressCheck, stopNum-i); to set up for a for(; i< next_stop; i++) {} loop. In this case that inner loop is empty and should hopefully optimize away entirely, of you can do that in the source and make it easy for the JITer, reducing your loop to i+=50k.

        – Peter Cordes
        1 hour ago






      • 1





        But yes, in general a down-counter is a good efficient technique for fizzbuzz / progresscheck type stuff.

        – Peter Cordes
        1 hour ago
















      5














      As others have noted, the general modulus operation requires a division to be done. In some cases, the division can be replaced (by the compiler) by a multiplication. But both can be slow compared to addition/subtraction. Hence, the best performance can be expected by something along these lines:



      long progressCheck = 50000;

      long counter = progressCheck;

      for (long i = startNum; i <= stopNum; i++){
      if (--counter == 0) {
      System.out.println(i);
      counter = progressCheck;
      }
      }


      Notice that often you don't really want/need modulus, because you know that your loop counter (i) or whatever is only ever incremented by 1, and you really don't care about the actual remainder the modulus will give you, just see if the incrementing-by-one counter hits some value.



      Another 'trick' is to use power-of-two values/limits, e.g. progressCheck = 1024;. Modulus a power of two can be quickly calculated via bitwise and, i.e. if ( (i & (1024-1)) == 0 ) {...}. This should be pretty fast too, and may on some architectures outperform the explicit counter above.






      share|improve this answer
























      • A smart compiler would invert the loops here. Or you could do that in the source. The if() body becomes an outer-loop body, and the stuff outside the if() becomes an inner loop body that runs for min(progressCheck, stopNum-i) iterations. So at the start, and every time counter reaches 0, you do long next_stop = i + min(progressCheck, stopNum-i); to set up for a for(; i< next_stop; i++) {} loop. In this case that inner loop is empty and should hopefully optimize away entirely, of you can do that in the source and make it easy for the JITer, reducing your loop to i+=50k.

        – Peter Cordes
        1 hour ago






      • 1





        But yes, in general a down-counter is a good efficient technique for fizzbuzz / progresscheck type stuff.

        – Peter Cordes
        1 hour ago














      5












      5








      5







      As others have noted, the general modulus operation requires a division to be done. In some cases, the division can be replaced (by the compiler) by a multiplication. But both can be slow compared to addition/subtraction. Hence, the best performance can be expected by something along these lines:



      long progressCheck = 50000;

      long counter = progressCheck;

      for (long i = startNum; i <= stopNum; i++){
      if (--counter == 0) {
      System.out.println(i);
      counter = progressCheck;
      }
      }


      Notice that often you don't really want/need modulus, because you know that your loop counter (i) or whatever is only ever incremented by 1, and you really don't care about the actual remainder the modulus will give you, just see if the incrementing-by-one counter hits some value.



      Another 'trick' is to use power-of-two values/limits, e.g. progressCheck = 1024;. Modulus a power of two can be quickly calculated via bitwise and, i.e. if ( (i & (1024-1)) == 0 ) {...}. This should be pretty fast too, and may on some architectures outperform the explicit counter above.






      share|improve this answer













      As others have noted, the general modulus operation requires a division to be done. In some cases, the division can be replaced (by the compiler) by a multiplication. But both can be slow compared to addition/subtraction. Hence, the best performance can be expected by something along these lines:



      long progressCheck = 50000;

      long counter = progressCheck;

      for (long i = startNum; i <= stopNum; i++){
      if (--counter == 0) {
      System.out.println(i);
      counter = progressCheck;
      }
      }


      Notice that often you don't really want/need modulus, because you know that your loop counter (i) or whatever is only ever incremented by 1, and you really don't care about the actual remainder the modulus will give you, just see if the incrementing-by-one counter hits some value.



      Another 'trick' is to use power-of-two values/limits, e.g. progressCheck = 1024;. Modulus a power of two can be quickly calculated via bitwise and, i.e. if ( (i & (1024-1)) == 0 ) {...}. This should be pretty fast too, and may on some architectures outperform the explicit counter above.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered 3 hours ago









      JimmyBJimmyB

      9,25311637




      9,25311637













      • A smart compiler would invert the loops here. Or you could do that in the source. The if() body becomes an outer-loop body, and the stuff outside the if() becomes an inner loop body that runs for min(progressCheck, stopNum-i) iterations. So at the start, and every time counter reaches 0, you do long next_stop = i + min(progressCheck, stopNum-i); to set up for a for(; i< next_stop; i++) {} loop. In this case that inner loop is empty and should hopefully optimize away entirely, of you can do that in the source and make it easy for the JITer, reducing your loop to i+=50k.

        – Peter Cordes
        1 hour ago






      • 1





        But yes, in general a down-counter is a good efficient technique for fizzbuzz / progresscheck type stuff.

        – Peter Cordes
        1 hour ago



















      • A smart compiler would invert the loops here. Or you could do that in the source. The if() body becomes an outer-loop body, and the stuff outside the if() becomes an inner loop body that runs for min(progressCheck, stopNum-i) iterations. So at the start, and every time counter reaches 0, you do long next_stop = i + min(progressCheck, stopNum-i); to set up for a for(; i< next_stop; i++) {} loop. In this case that inner loop is empty and should hopefully optimize away entirely, of you can do that in the source and make it easy for the JITer, reducing your loop to i+=50k.

        – Peter Cordes
        1 hour ago






      • 1





        But yes, in general a down-counter is a good efficient technique for fizzbuzz / progresscheck type stuff.

        – Peter Cordes
        1 hour ago

















      A smart compiler would invert the loops here. Or you could do that in the source. The if() body becomes an outer-loop body, and the stuff outside the if() becomes an inner loop body that runs for min(progressCheck, stopNum-i) iterations. So at the start, and every time counter reaches 0, you do long next_stop = i + min(progressCheck, stopNum-i); to set up for a for(; i< next_stop; i++) {} loop. In this case that inner loop is empty and should hopefully optimize away entirely, of you can do that in the source and make it easy for the JITer, reducing your loop to i+=50k.

      – Peter Cordes
      1 hour ago





      A smart compiler would invert the loops here. Or you could do that in the source. The if() body becomes an outer-loop body, and the stuff outside the if() becomes an inner loop body that runs for min(progressCheck, stopNum-i) iterations. So at the start, and every time counter reaches 0, you do long next_stop = i + min(progressCheck, stopNum-i); to set up for a for(; i< next_stop; i++) {} loop. In this case that inner loop is empty and should hopefully optimize away entirely, of you can do that in the source and make it easy for the JITer, reducing your loop to i+=50k.

      – Peter Cordes
      1 hour ago




      1




      1





      But yes, in general a down-counter is a good efficient technique for fizzbuzz / progresscheck type stuff.

      – Peter Cordes
      1 hour ago





      But yes, in general a down-counter is a good efficient technique for fizzbuzz / progresscheck type stuff.

      – Peter Cordes
      1 hour ago











      4














      In follow-up to @phuclv comment, I checked the code generated by JIT1, the results are as follows:



      for variable % 5000 (division by constant):



      mov     rax,29f16b11c6d1e109h
      imul rbx
      mov r10,rbx
      sar r10,3fh
      sar rdx,0dh
      sub rdx,r10
      imul r10,rdx,0c350h ; <-- imul
      mov r11,rbx
      sub r11,r10
      test r11,r11
      jne 1d707ad14a0h


      for variable % variable (by variable):



      mov     rax,r14
      mov rdx,8000000000000000h
      cmp rax,rdx
      jne 22ccce218edh
      xor edx,edx
      cmp rbx,0ffffffffffffffffh
      je 22ccce218f2h
      cqo
      idiv rax,rbx ; <-- idiv
      test rdx,rdx
      jne 22ccce218c0h


      Because division always takes longer than multiplication, the last code snippet is less performant.



      Java version:



      java version "11" 2018-09-25
      Java(TM) SE Runtime Environment 18.9 (build 11+28)
      Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11+28, mixed mode)




      1 - VM options used: -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,src/java/Main.main






      share|improve this answer






























        4














        In follow-up to @phuclv comment, I checked the code generated by JIT1, the results are as follows:



        for variable % 5000 (division by constant):



        mov     rax,29f16b11c6d1e109h
        imul rbx
        mov r10,rbx
        sar r10,3fh
        sar rdx,0dh
        sub rdx,r10
        imul r10,rdx,0c350h ; <-- imul
        mov r11,rbx
        sub r11,r10
        test r11,r11
        jne 1d707ad14a0h


        for variable % variable (by variable):



        mov     rax,r14
        mov rdx,8000000000000000h
        cmp rax,rdx
        jne 22ccce218edh
        xor edx,edx
        cmp rbx,0ffffffffffffffffh
        je 22ccce218f2h
        cqo
        idiv rax,rbx ; <-- idiv
        test rdx,rdx
        jne 22ccce218c0h


        Because division always takes longer than multiplication, the last code snippet is less performant.



        Java version:



        java version "11" 2018-09-25
        Java(TM) SE Runtime Environment 18.9 (build 11+28)
        Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11+28, mixed mode)




        1 - VM options used: -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,src/java/Main.main






        share|improve this answer




























          4












          4








          4







          In follow-up to @phuclv comment, I checked the code generated by JIT1, the results are as follows:



          for variable % 5000 (division by constant):



          mov     rax,29f16b11c6d1e109h
          imul rbx
          mov r10,rbx
          sar r10,3fh
          sar rdx,0dh
          sub rdx,r10
          imul r10,rdx,0c350h ; <-- imul
          mov r11,rbx
          sub r11,r10
          test r11,r11
          jne 1d707ad14a0h


          for variable % variable (by variable):



          mov     rax,r14
          mov rdx,8000000000000000h
          cmp rax,rdx
          jne 22ccce218edh
          xor edx,edx
          cmp rbx,0ffffffffffffffffh
          je 22ccce218f2h
          cqo
          idiv rax,rbx ; <-- idiv
          test rdx,rdx
          jne 22ccce218c0h


          Because division always takes longer than multiplication, the last code snippet is less performant.



          Java version:



          java version "11" 2018-09-25
          Java(TM) SE Runtime Environment 18.9 (build 11+28)
          Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11+28, mixed mode)




          1 - VM options used: -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,src/java/Main.main






          share|improve this answer















          In follow-up to @phuclv comment, I checked the code generated by JIT1, the results are as follows:



          for variable % 5000 (division by constant):



          mov     rax,29f16b11c6d1e109h
          imul rbx
          mov r10,rbx
          sar r10,3fh
          sar rdx,0dh
          sub rdx,r10
          imul r10,rdx,0c350h ; <-- imul
          mov r11,rbx
          sub r11,r10
          test r11,r11
          jne 1d707ad14a0h


          for variable % variable (by variable):



          mov     rax,r14
          mov rdx,8000000000000000h
          cmp rax,rdx
          jne 22ccce218edh
          xor edx,edx
          cmp rbx,0ffffffffffffffffh
          je 22ccce218f2h
          cqo
          idiv rax,rbx ; <-- idiv
          test rdx,rdx
          jne 22ccce218c0h


          Because division always takes longer than multiplication, the last code snippet is less performant.



          Java version:



          java version "11" 2018-09-25
          Java(TM) SE Runtime Environment 18.9 (build 11+28)
          Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11+28, mixed mode)




          1 - VM options used: -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,src/java/Main.main







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 2 hours ago

























          answered 4 hours ago









          OleksandrOleksandr

          8,93643770




          8,93643770























              1














              I also surprised by seeing the performance of the above codes. Its all about the time taken by the compiler for executing the program as per the declared variable. In the second code i.e. inefficient one :



              for (long i = startNum; i <= stopNum; i++){
              if (i % progressCheck == 0) {
              System.out.println(i)
              }
              }


              you are performing the modulus operation between two variable. Here compiler have to check the value of stopNum and progressCheck to go to the specific memory block located for these variable every time after each iteration because it is a variable and its value might be change. Thats why after each iteration compiler went to the memory location to check the latest value of the variables. Therefore at the compile time compiler not able to create efficient byte code. In the first code , you are performing modulus operator between a variable and a constant numeric value which is not going to change within execution and compiler no need to check the value of that numeric value from the memory location. Thats why compiler able to create efficient byte code. If you declare progressCheck as a final/final static variable then at the time of run-time/compile-time compiler know that its a final variable and its value not going to change then compiler replace the progressCheck with 50000 in code :



              for (long i = startNum; i <= stopNum; i++){
              if (i % 50000== 0) {
              System.out.println(i)
              }
              }


              Now you can notice that this code is also look like the first code i.e. efficient one. The performance of first code and as we mentioned above both code will work efficiently. There will not be much difference in execution time of both code.






              share|improve this answer








              New contributor




              Bishal Dubey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.

























                1














                I also surprised by seeing the performance of the above codes. Its all about the time taken by the compiler for executing the program as per the declared variable. In the second code i.e. inefficient one :



                for (long i = startNum; i <= stopNum; i++){
                if (i % progressCheck == 0) {
                System.out.println(i)
                }
                }


                you are performing the modulus operation between two variable. Here compiler have to check the value of stopNum and progressCheck to go to the specific memory block located for these variable every time after each iteration because it is a variable and its value might be change. Thats why after each iteration compiler went to the memory location to check the latest value of the variables. Therefore at the compile time compiler not able to create efficient byte code. In the first code , you are performing modulus operator between a variable and a constant numeric value which is not going to change within execution and compiler no need to check the value of that numeric value from the memory location. Thats why compiler able to create efficient byte code. If you declare progressCheck as a final/final static variable then at the time of run-time/compile-time compiler know that its a final variable and its value not going to change then compiler replace the progressCheck with 50000 in code :



                for (long i = startNum; i <= stopNum; i++){
                if (i % 50000== 0) {
                System.out.println(i)
                }
                }


                Now you can notice that this code is also look like the first code i.e. efficient one. The performance of first code and as we mentioned above both code will work efficiently. There will not be much difference in execution time of both code.






                share|improve this answer








                New contributor




                Bishal Dubey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.























                  1












                  1








                  1







                  I also surprised by seeing the performance of the above codes. Its all about the time taken by the compiler for executing the program as per the declared variable. In the second code i.e. inefficient one :



                  for (long i = startNum; i <= stopNum; i++){
                  if (i % progressCheck == 0) {
                  System.out.println(i)
                  }
                  }


                  you are performing the modulus operation between two variable. Here compiler have to check the value of stopNum and progressCheck to go to the specific memory block located for these variable every time after each iteration because it is a variable and its value might be change. Thats why after each iteration compiler went to the memory location to check the latest value of the variables. Therefore at the compile time compiler not able to create efficient byte code. In the first code , you are performing modulus operator between a variable and a constant numeric value which is not going to change within execution and compiler no need to check the value of that numeric value from the memory location. Thats why compiler able to create efficient byte code. If you declare progressCheck as a final/final static variable then at the time of run-time/compile-time compiler know that its a final variable and its value not going to change then compiler replace the progressCheck with 50000 in code :



                  for (long i = startNum; i <= stopNum; i++){
                  if (i % 50000== 0) {
                  System.out.println(i)
                  }
                  }


                  Now you can notice that this code is also look like the first code i.e. efficient one. The performance of first code and as we mentioned above both code will work efficiently. There will not be much difference in execution time of both code.






                  share|improve this answer








                  New contributor




                  Bishal Dubey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.










                  I also surprised by seeing the performance of the above codes. Its all about the time taken by the compiler for executing the program as per the declared variable. In the second code i.e. inefficient one :



                  for (long i = startNum; i <= stopNum; i++){
                  if (i % progressCheck == 0) {
                  System.out.println(i)
                  }
                  }


                  you are performing the modulus operation between two variable. Here compiler have to check the value of stopNum and progressCheck to go to the specific memory block located for these variable every time after each iteration because it is a variable and its value might be change. Thats why after each iteration compiler went to the memory location to check the latest value of the variables. Therefore at the compile time compiler not able to create efficient byte code. In the first code , you are performing modulus operator between a variable and a constant numeric value which is not going to change within execution and compiler no need to check the value of that numeric value from the memory location. Thats why compiler able to create efficient byte code. If you declare progressCheck as a final/final static variable then at the time of run-time/compile-time compiler know that its a final variable and its value not going to change then compiler replace the progressCheck with 50000 in code :



                  for (long i = startNum; i <= stopNum; i++){
                  if (i % 50000== 0) {
                  System.out.println(i)
                  }
                  }


                  Now you can notice that this code is also look like the first code i.e. efficient one. The performance of first code and as we mentioned above both code will work efficiently. There will not be much difference in execution time of both code.







                  share|improve this answer








                  New contributor




                  Bishal Dubey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|improve this answer



                  share|improve this answer






                  New contributor




                  Bishal Dubey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered 5 hours ago









                  Bishal DubeyBishal Dubey

                  311




                  311




                  New contributor




                  Bishal Dubey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  Bishal Dubey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  Bishal Dubey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54405842%2fwhy-is-if-variable1-variable2-0-inefficient%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Terni

                      A new problem with tex4ht and tikz

                      Sun Ra