Very slow frame rate in C++ N-body simulation using Barnes-Hutt and RK4

I am writing the code for my EPQ project and am aiming to produce a graphic showing the collapse of a particle cloud with around 10^5/6 particles.

My current code will spawn a specified number of particles in a uniform distribution across a specified area. It will then calculate the acceleration of the particles using a Barnes-Hutt tree and integrate this to find the new position of each particle. I currently have no graphics but the program will print to the console upon each movement.

Unfortunately, each iteration takes around half a minute (running with 35000 particles) which is way too slow for a graphic. So i am looking for a way to improve my algorithm(s) to make it faster.

Here is some of my code:

Tree class:

class tree

{

public:

    obj* body;  // obj class stores the mass, velocity, position ans acceleration of an object

    bool parent = false;

    double total_mass;

    region R; // region class defines boundaaries and can check if an object is in them

    vect moment;





    tree* nw, *ne, *sw, *se; // the children nodes of the tree



    tree(vect corner, double length) : R(length, corner), total_mass(0), moment(vect(0, 0)), body(nullptr) {}



    tree(cloud& a) : R(a.size + 1, vect(0,0)), total_mass(0), moment(vect(0, 0)), body(nullptr)

    {

        for (int i = 0; i < a.chest.size(); i++)

        {

            insert(a.chest[i]);  // this constructer id called for the root node only

        }

    }



    ~tree()

    {

        delete nw;

        delete sw;

        delete ne;

        delete se;

    }



    void insert(obj* i)

    {

        if (!R.in(i->pos))  // cant insert into a region its not in

        {

            return;

        }

        else

        {

            if (body == nullptr)  // no other bodies in this region

            {

                if (!parent)

                {

                    body = i;   // region is not divides so you can insert particle here

                }

                else

                {

                    nw->insert(i);  //region is divided so try to insert into children nodes

                    ne->insert(i);

                    sw->insert(i);

                    se->insert(i);

                }

            }

            else      // trying to have more than one particle in a node

            {



                divide();  // splits node into its childrem



                nw->insert(i);

                ne->insert(i);

                sw->insert(i);

                se->insert(i);

                nw->insert(body);  // insert current object and the object that was previouly in the parent node into the children nodes

                ne->insert(body);

                sw->insert(body);

                se->insert(body);



                body = nullptr; // no longer bodies in the node

            }

            total_mass += i->mass;

            moment += (i->pos) * (i->mass);

        }

    }



    void divide()

    {

        double l = R.length / 2;



        nw = new tree(R.point + vect(0, l), l);

        ne = new tree(R.point + vect(l, l), l);

        sw = new tree(R.point + vect(0, 0), l);

        se = new tree(R.point + vect(l, 0), l);



        parent = true;

    }



    vect COM()

    {



        return moment / total_mass;



    }

};

Accelerator:

constexpr double theta = 0.5; //theta criterion

double G = 1 * pow(10,-11); // gravitational constant



void accelerate(obj& i, tree& t)

{

    vect r = t.COM() - i.pos; // vector between the position of the particle and the center of mass of the node

    if (!t.parent) //checks if node is undivided

    {

        i.a += (t.body == nullptr || t.R.in(i.pos)) ? vect(0, 0) : r.norm() * G * t.total_mass / r.mag2(); 

    }//if there are also no bodys or the object being accelerated is in the node then there is no effect on the particle



    else

    {

        if (t.R.in(i.pos) || t.R.length / r.mag() > theta)

        {

            accelerate(i, *t.nw); //object is in the node or the node does not meet the theta criterion so try the nodes children

            accelerate(i, *t.ne);

            accelerate(i, *t.sw);

            accelerate(i, *t.se);

        }

        else

        {

            i.a += r.norm() * G * t.total_mass / r.mag2();

        }

    }

}

RK4:

void move(cloud& a)  // cloud class stores an array of pointers to objects

{

    tree* t1 = new tree(a); //adds objects in cloud to a new tree

    for (obj* i : a.chest) //chest is the array of pointer to objects

    {

        accelerate(*i, *t1); //uses tree to refresh the accelration of the particle

        i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration

        i->tv = i->pv = i->v;

        i->ta = i->pa = i->a;



        vect dr1 = i->v * h;

        vect dv1 = i->a * h;



        i->pos = i->ppos + dr1 / 2;

        i->v = i->pv + dv1 / 2;

        i->tpos += dr1 / 6;

        i->tv += dv1 / 6;

    }



    delete t1;

    tree* t2 = new tree(a); // deletes previous tree and creates a new one to culculate the new acceleration



    for (obj* i : a.chest)

    {

        accelerate(*i, *t2);



        vect dr2 = i->v * h;

        vect dv2 = i->a * h;



        i->pos = i->ppos + dr2 / 2;

        i->v = i->pv + dv2 / 2;

        i->tpos += dr2 / 3;

        i->tv += dv2 / 3;

    }



    delete t2;

    tree* t3 = new tree(a);



    for (obj* i : a.chest)

    {

        accelerate(*i, *t3);



        vect dr3 = i->v * h;

        vect dv3 = i->a * h;



        i->pos = i->ppos + dr3;

        i->v = i->pv + dv3;

        i->tpos += dr3 / 3;

        i->tv += dv3 / 3;

    }



    delete t3;

    tree* t4 = new tree(a);



    for (obj* i : a.chest)

    {

        accelerate(*i, *t4);



        vect dr4 = i->v * h;

        vect dv4 = i->a * h;



        i->tpos += dr4 / 6;

        i->tv += dv4 / 6;



        i->pos = i->tpos;

        i->v = i->tv;

        i->a = i->pa;



    }



    delete t4;

}

Would half a minute be normal when simulating this many particles? If not how could I improve this code to make it run faster?

edited 2 days ago

200_success

128k15152413

asked 2 days ago

Callum

183

New contributor

2

There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
– Hulk
2 days ago

(Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to ask Would half a minute be normal when simulating [10^5…6] particles?
– greybeard
2 days ago

Would half a minute be normal when simulating this many particles? No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.
– bruglesco
2 days ago

1

Please include the cloud class and a main with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.
– Cris Luengo
2 days ago

The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
– Callum
2 days ago

|
show 4 more comments

I am writing the code for my EPQ project and am aiming to produce a graphic showing the collapse of a particle cloud with around 10^5/6 particles.

Unfortunately, each iteration takes around half a minute (running with 35000 particles) which is way too slow for a graphic. So i am looking for a way to improve my algorithm(s) to make it faster.

Here is some of my code:

Tree class:

class tree

{

public:

    obj* body;  // obj class stores the mass, velocity, position ans acceleration of an object

    bool parent = false;

    double total_mass;

    region R; // region class defines boundaaries and can check if an object is in them

    vect moment;





    tree* nw, *ne, *sw, *se; // the children nodes of the tree



    tree(vect corner, double length) : R(length, corner), total_mass(0), moment(vect(0, 0)), body(nullptr) {}



    tree(cloud& a) : R(a.size + 1, vect(0,0)), total_mass(0), moment(vect(0, 0)), body(nullptr)

    {

        for (int i = 0; i < a.chest.size(); i++)

        {

            insert(a.chest[i]);  // this constructer id called for the root node only

        }

    }



    ~tree()

    {

        delete nw;

        delete sw;

        delete ne;

        delete se;

    }



    void insert(obj* i)

    {

        if (!R.in(i->pos))  // cant insert into a region its not in

        {

            return;

        }

        else

        {

            if (body == nullptr)  // no other bodies in this region

            {

                if (!parent)

                {

                    body = i;   // region is not divides so you can insert particle here

                }

                else

                {

                    nw->insert(i);  //region is divided so try to insert into children nodes

                    ne->insert(i);

                    sw->insert(i);

                    se->insert(i);

                }

            }

            else      // trying to have more than one particle in a node

            {



                divide();  // splits node into its childrem



                nw->insert(i);

                ne->insert(i);

                sw->insert(i);

                se->insert(i);

                nw->insert(body);  // insert current object and the object that was previouly in the parent node into the children nodes

                ne->insert(body);

                sw->insert(body);

                se->insert(body);



                body = nullptr; // no longer bodies in the node

            }

            total_mass += i->mass;

            moment += (i->pos) * (i->mass);

        }

    }



    void divide()

    {

        double l = R.length / 2;



        nw = new tree(R.point + vect(0, l), l);

        ne = new tree(R.point + vect(l, l), l);

        sw = new tree(R.point + vect(0, 0), l);

        se = new tree(R.point + vect(l, 0), l);



        parent = true;

    }



    vect COM()

    {



        return moment / total_mass;



    }

};

Accelerator:

constexpr double theta = 0.5; //theta criterion

double G = 1 * pow(10,-11); // gravitational constant



void accelerate(obj& i, tree& t)

{

    vect r = t.COM() - i.pos; // vector between the position of the particle and the center of mass of the node

    if (!t.parent) //checks if node is undivided

    {

        i.a += (t.body == nullptr || t.R.in(i.pos)) ? vect(0, 0) : r.norm() * G * t.total_mass / r.mag2(); 

    }//if there are also no bodys or the object being accelerated is in the node then there is no effect on the particle



    else

    {

        if (t.R.in(i.pos) || t.R.length / r.mag() > theta)

        {

            accelerate(i, *t.nw); //object is in the node or the node does not meet the theta criterion so try the nodes children

            accelerate(i, *t.ne);

            accelerate(i, *t.sw);

            accelerate(i, *t.se);

        }

        else

        {

            i.a += r.norm() * G * t.total_mass / r.mag2();

        }

    }

}

RK4:

void move(cloud& a)  // cloud class stores an array of pointers to objects

{

    tree* t1 = new tree(a); //adds objects in cloud to a new tree

    for (obj* i : a.chest) //chest is the array of pointer to objects

    {

        accelerate(*i, *t1); //uses tree to refresh the accelration of the particle

        i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration

        i->tv = i->pv = i->v;

        i->ta = i->pa = i->a;



        vect dr1 = i->v * h;

        vect dv1 = i->a * h;



        i->pos = i->ppos + dr1 / 2;

        i->v = i->pv + dv1 / 2;

        i->tpos += dr1 / 6;

        i->tv += dv1 / 6;

    }



    delete t1;

    tree* t2 = new tree(a); // deletes previous tree and creates a new one to culculate the new acceleration



    for (obj* i : a.chest)

    {

        accelerate(*i, *t2);



        vect dr2 = i->v * h;

        vect dv2 = i->a * h;



        i->pos = i->ppos + dr2 / 2;

        i->v = i->pv + dv2 / 2;

        i->tpos += dr2 / 3;

        i->tv += dv2 / 3;

    }



    delete t2;

    tree* t3 = new tree(a);



    for (obj* i : a.chest)

    {

        accelerate(*i, *t3);



        vect dr3 = i->v * h;

        vect dv3 = i->a * h;



        i->pos = i->ppos + dr3;

        i->v = i->pv + dv3;

        i->tpos += dr3 / 3;

        i->tv += dv3 / 3;

    }



    delete t3;

    tree* t4 = new tree(a);



    for (obj* i : a.chest)

    {

        accelerate(*i, *t4);



        vect dr4 = i->v * h;

        vect dv4 = i->a * h;



        i->tpos += dr4 / 6;

        i->tv += dv4 / 6;



        i->pos = i->tpos;

        i->v = i->tv;

        i->a = i->pa;



    }



    delete t4;

}

Would half a minute be normal when simulating this many particles? If not how could I improve this code to make it run faster?

edited 2 days ago

200_success

128k15152413

asked 2 days ago

Callum

183

New contributor

2

There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
– Hulk
2 days ago

(Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to ask Would half a minute be normal when simulating [10^5…6] particles?
– greybeard
2 days ago

Would half a minute be normal when simulating this many particles? No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.
– bruglesco
2 days ago

1

Please include the cloud class and a main with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.
– Cris Luengo
2 days ago

The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
– Callum
2 days ago

|
show 4 more comments

I am writing the code for my EPQ project and am aiming to produce a graphic showing the collapse of a particle cloud with around 10^5/6 particles.

Unfortunately, each iteration takes around half a minute (running with 35000 particles) which is way too slow for a graphic. So i am looking for a way to improve my algorithm(s) to make it faster.

Here is some of my code:

Tree class:

class tree

{

public:

    obj* body;  // obj class stores the mass, velocity, position ans acceleration of an object

    bool parent = false;

    double total_mass;

    region R; // region class defines boundaaries and can check if an object is in them

    vect moment;





    tree* nw, *ne, *sw, *se; // the children nodes of the tree



    tree(vect corner, double length) : R(length, corner), total_mass(0), moment(vect(0, 0)), body(nullptr) {}



    tree(cloud& a) : R(a.size + 1, vect(0,0)), total_mass(0), moment(vect(0, 0)), body(nullptr)

    {

        for (int i = 0; i < a.chest.size(); i++)

        {

            insert(a.chest[i]);  // this constructer id called for the root node only

        }

    }



    ~tree()

    {

        delete nw;

        delete sw;

        delete ne;

        delete se;

    }



    void insert(obj* i)

    {

        if (!R.in(i->pos))  // cant insert into a region its not in

        {

            return;

        }

        else

        {

            if (body == nullptr)  // no other bodies in this region

            {

                if (!parent)

                {

                    body = i;   // region is not divides so you can insert particle here

                }

                else

                {

                    nw->insert(i);  //region is divided so try to insert into children nodes

                    ne->insert(i);

                    sw->insert(i);

                    se->insert(i);

                }

            }

            else      // trying to have more than one particle in a node

            {



                divide();  // splits node into its childrem



                nw->insert(i);

                ne->insert(i);

                sw->insert(i);

                se->insert(i);

                nw->insert(body);  // insert current object and the object that was previouly in the parent node into the children nodes

                ne->insert(body);

                sw->insert(body);

                se->insert(body);



                body = nullptr; // no longer bodies in the node

            }

            total_mass += i->mass;

            moment += (i->pos) * (i->mass);

        }

    }



    void divide()

    {

        double l = R.length / 2;



        nw = new tree(R.point + vect(0, l), l);

        ne = new tree(R.point + vect(l, l), l);

        sw = new tree(R.point + vect(0, 0), l);

        se = new tree(R.point + vect(l, 0), l);



        parent = true;

    }



    vect COM()

    {



        return moment / total_mass;



    }

};

Accelerator:

constexpr double theta = 0.5; //theta criterion

double G = 1 * pow(10,-11); // gravitational constant



void accelerate(obj& i, tree& t)

{

    vect r = t.COM() - i.pos; // vector between the position of the particle and the center of mass of the node

    if (!t.parent) //checks if node is undivided

    {

        i.a += (t.body == nullptr || t.R.in(i.pos)) ? vect(0, 0) : r.norm() * G * t.total_mass / r.mag2(); 

    }//if there are also no bodys or the object being accelerated is in the node then there is no effect on the particle



    else

    {

        if (t.R.in(i.pos) || t.R.length / r.mag() > theta)

        {

            accelerate(i, *t.nw); //object is in the node or the node does not meet the theta criterion so try the nodes children

            accelerate(i, *t.ne);

            accelerate(i, *t.sw);

            accelerate(i, *t.se);

        }

        else

        {

            i.a += r.norm() * G * t.total_mass / r.mag2();

        }

    }

}

RK4:

void move(cloud& a)  // cloud class stores an array of pointers to objects

{

    tree* t1 = new tree(a); //adds objects in cloud to a new tree

    for (obj* i : a.chest) //chest is the array of pointer to objects

    {

        accelerate(*i, *t1); //uses tree to refresh the accelration of the particle

        i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration

        i->tv = i->pv = i->v;

        i->ta = i->pa = i->a;



        vect dr1 = i->v * h;

        vect dv1 = i->a * h;



        i->pos = i->ppos + dr1 / 2;

        i->v = i->pv + dv1 / 2;

        i->tpos += dr1 / 6;

        i->tv += dv1 / 6;

    }



    delete t1;

    tree* t2 = new tree(a); // deletes previous tree and creates a new one to culculate the new acceleration



    for (obj* i : a.chest)

    {

        accelerate(*i, *t2);



        vect dr2 = i->v * h;

        vect dv2 = i->a * h;



        i->pos = i->ppos + dr2 / 2;

        i->v = i->pv + dv2 / 2;

        i->tpos += dr2 / 3;

        i->tv += dv2 / 3;

    }



    delete t2;

    tree* t3 = new tree(a);



    for (obj* i : a.chest)

    {

        accelerate(*i, *t3);



        vect dr3 = i->v * h;

        vect dv3 = i->a * h;



        i->pos = i->ppos + dr3;

        i->v = i->pv + dv3;

        i->tpos += dr3 / 3;

        i->tv += dv3 / 3;

    }



    delete t3;

    tree* t4 = new tree(a);



    for (obj* i : a.chest)

    {

        accelerate(*i, *t4);



        vect dr4 = i->v * h;

        vect dv4 = i->a * h;



        i->tpos += dr4 / 6;

        i->tv += dv4 / 6;



        i->pos = i->tpos;

        i->v = i->tv;

        i->a = i->pa;



    }



    delete t4;

}

Would half a minute be normal when simulating this many particles? If not how could I improve this code to make it run faster?

edited 2 days ago

200_success

128k15152413

asked 2 days ago

Callum

183

New contributor

I am writing the code for my EPQ project and am aiming to produce a graphic showing the collapse of a particle cloud with around 10^5/6 particles.

Unfortunately, each iteration takes around half a minute (running with 35000 particles) which is way too slow for a graphic. So i am looking for a way to improve my algorithm(s) to make it faster.

Here is some of my code:

Tree class:

class tree

{

public:

    obj* body;  // obj class stores the mass, velocity, position ans acceleration of an object

    bool parent = false;

    double total_mass;

    region R; // region class defines boundaaries and can check if an object is in them

    vect moment;





    tree* nw, *ne, *sw, *se; // the children nodes of the tree



    tree(vect corner, double length) : R(length, corner), total_mass(0), moment(vect(0, 0)), body(nullptr) {}



    tree(cloud& a) : R(a.size + 1, vect(0,0)), total_mass(0), moment(vect(0, 0)), body(nullptr)

    {

        for (int i = 0; i < a.chest.size(); i++)

        {

            insert(a.chest[i]);  // this constructer id called for the root node only

        }

    }



    ~tree()

    {

        delete nw;

        delete sw;

        delete ne;

        delete se;

    }



    void insert(obj* i)

    {

        if (!R.in(i->pos))  // cant insert into a region its not in

        {

            return;

        }

        else

        {

            if (body == nullptr)  // no other bodies in this region

            {

                if (!parent)

                {

                    body = i;   // region is not divides so you can insert particle here

                }

                else

                {

                    nw->insert(i);  //region is divided so try to insert into children nodes

                    ne->insert(i);

                    sw->insert(i);

                    se->insert(i);

                }

            }

            else      // trying to have more than one particle in a node

            {



                divide();  // splits node into its childrem



                nw->insert(i);

                ne->insert(i);

                sw->insert(i);

                se->insert(i);

                nw->insert(body);  // insert current object and the object that was previouly in the parent node into the children nodes

                ne->insert(body);

                sw->insert(body);

                se->insert(body);



                body = nullptr; // no longer bodies in the node

            }

            total_mass += i->mass;

            moment += (i->pos) * (i->mass);

        }

    }



    void divide()

    {

        double l = R.length / 2;



        nw = new tree(R.point + vect(0, l), l);

        ne = new tree(R.point + vect(l, l), l);

        sw = new tree(R.point + vect(0, 0), l);

        se = new tree(R.point + vect(l, 0), l);



        parent = true;

    }



    vect COM()

    {



        return moment / total_mass;



    }

};

Accelerator:

constexpr double theta = 0.5; //theta criterion

double G = 1 * pow(10,-11); // gravitational constant



void accelerate(obj& i, tree& t)

{

    vect r = t.COM() - i.pos; // vector between the position of the particle and the center of mass of the node

    if (!t.parent) //checks if node is undivided

    {

        i.a += (t.body == nullptr || t.R.in(i.pos)) ? vect(0, 0) : r.norm() * G * t.total_mass / r.mag2(); 

    }//if there are also no bodys or the object being accelerated is in the node then there is no effect on the particle



    else

    {

        if (t.R.in(i.pos) || t.R.length / r.mag() > theta)

        {

            accelerate(i, *t.nw); //object is in the node or the node does not meet the theta criterion so try the nodes children

            accelerate(i, *t.ne);

            accelerate(i, *t.sw);

            accelerate(i, *t.se);

        }

        else

        {

            i.a += r.norm() * G * t.total_mass / r.mag2();

        }

    }

}

RK4:

void move(cloud& a)  // cloud class stores an array of pointers to objects

{

    tree* t1 = new tree(a); //adds objects in cloud to a new tree

    for (obj* i : a.chest) //chest is the array of pointer to objects

    {

        accelerate(*i, *t1); //uses tree to refresh the accelration of the particle

        i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration

        i->tv = i->pv = i->v;

        i->ta = i->pa = i->a;



        vect dr1 = i->v * h;

        vect dv1 = i->a * h;



        i->pos = i->ppos + dr1 / 2;

        i->v = i->pv + dv1 / 2;

        i->tpos += dr1 / 6;

        i->tv += dv1 / 6;

    }



    delete t1;

    tree* t2 = new tree(a); // deletes previous tree and creates a new one to culculate the new acceleration



    for (obj* i : a.chest)

    {

        accelerate(*i, *t2);



        vect dr2 = i->v * h;

        vect dv2 = i->a * h;



        i->pos = i->ppos + dr2 / 2;

        i->v = i->pv + dv2 / 2;

        i->tpos += dr2 / 3;

        i->tv += dv2 / 3;

    }



    delete t2;

    tree* t3 = new tree(a);



    for (obj* i : a.chest)

    {

        accelerate(*i, *t3);



        vect dr3 = i->v * h;

        vect dv3 = i->a * h;



        i->pos = i->ppos + dr3;

        i->v = i->pv + dv3;

        i->tpos += dr3 / 3;

        i->tv += dv3 / 3;

    }



    delete t3;

    tree* t4 = new tree(a);



    for (obj* i : a.chest)

    {

        accelerate(*i, *t4);



        vect dr4 = i->v * h;

        vect dv4 = i->a * h;



        i->tpos += dr4 / 6;

        i->tv += dv4 / 6;



        i->pos = i->tpos;

        i->v = i->tv;

        i->a = i->pa;



    }



    delete t4;

}

Would half a minute be normal when simulating this many particles? If not how could I improve this code to make it run faster?

c++ algorithm tree simulation physics

edited 2 days ago

200_success

128k15152413

asked 2 days ago

Callum

183

New contributor

edited 2 days ago

200_success

128k15152413

asked 2 days ago

Callum

183

New contributor

edited 2 days ago

200_success

128k15152413

edited 2 days ago

200_success

128k15152413

edited 2 days ago

200_success

128k15152413

asked 2 days ago

Callum

183

New contributor

asked 2 days ago

Callum

183

asked 2 days ago

Callum

183

New contributor

Callum is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

2

There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
– Hulk
2 days ago

(Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to ask Would half a minute be normal when simulating [10^5…6] particles?
– greybeard
2 days ago

Would half a minute be normal when simulating this many particles? No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.
– bruglesco
2 days ago

1

Please include the cloud class and a main with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.
– Cris Luengo
2 days ago

The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
– Callum
2 days ago

|
show 4 more comments

2

There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
– Hulk
2 days ago

(Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to ask Would half a minute be normal when simulating [10^5…6] particles?
– greybeard
2 days ago

Would half a minute be normal when simulating this many particles? No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.
– bruglesco
2 days ago

1

Please include the cloud class and a main with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.
– Cris Luengo
2 days ago

The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
– Callum
2 days ago

There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
– Hulk
2 days ago

(Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to ask Would half a minute be normal when simulating [10^5…6] particles?
– greybeard
2 days ago

Would half a minute be normal when simulating this many particles? No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.
– bruglesco
2 days ago

Please include the cloud class and a main with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.
– Cris Luengo
2 days ago

The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
– Callum
2 days ago

|
show 4 more comments

1 Answer
1

active

oldest

votes

This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.

Performance

It's hard to say for sure where the slowdown is because it's not possible for me to profile the code based just on what's here. But that's something you should do. Run it in a profiler and see where the slowdowns are rather than guessing.

That said, I see some things that I frequently find to be performance issues. As mentioned in the comments, the number of allocations that the code does might be an issue. Stack allocations may be faster in some cases than heap allocations. So if you break up the code in move() into 4 different functions, you can allocate t1-t4 on the stack rather than the heap. It might improve the speed since a stack allocation is generally just a pointer addition.

Your accelerate() function is recursive. You might be able to increase the speed by making it loop instead of recursing.

But if you want to get real performance gains I recommend one or more of the following:

Write a SIMD version of the code that operates on more than one particle at a time.

Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)

Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.

Data Hiding

All the members of your tree class are public. That's usually a bad idea because it means that any other code in your application can reach in and change values inside the object, potentially leaving it in an invalid state. It also make debugging harder because you can't narrow down where a member variable was changed. Traditionally, it makes sense to make your member variables private and have accessors to retrieve or change them. The accessors can be written in the header so they get inlined and are just as fast as if they were public. But by having accessors you can, for example, put a breakpoint in a single place and figure out where a variable is changed. It would appear that your cloud and obj classes suffer from the same problem.

Also, if you make the objects immutable (so they can be read but not changed or written to), it makes moving to multiple threads much easier. But there are other implications to doing that, such as needing to create new values rather than updating existing ones, which may or may not be a performance issue.

Naming

You really need to expand on your variable names. 1 and 2 letter names are too hard to read. You get it exactly right with names like moment and total_mass, so it's a little odd that arguments to your functions have names like a and i. Why do that?

Your class names are better than 1 character variable names, but could be expanded. For example, tree is a Barnes-Hutt tree. Why not call it that (or at least something like BHTree) to distinguish it from a typical binary search tree or a red-black tree?

The class name obj is exceedingly unhelpful. Every piece of data in your program is either a POD or an object, so as a class name obj is entirely free from meaning. Why not call it particle? Or body?

Most of your function names are pretty good with the exception of COM(). It's only called in one place, so there's no excuse for not just writing out center_of_mass().

Data Structures

It looks like there are several bits of data within a particle that need to be updated in sync and that go together. For example, the position, velocity and acceleration of the particle. That should be a struct or class unto itself. And instead of manually keeping all three in sync by doing this:

    i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration

    i->tv = i->pv = i->v;

    i->ta = i->pa = i->a;

You could do something like this:

struct movement {

    float3 pos;

    float3 velocity;

    float3 acceleration;

};

...

new_movement = movement;

prev_movement = movement;

So now you only have 2 assignments instead of 6. It's easier to read and it makes clear that position, velocity, and acceleration are all properties of the same object and that you have a next, current, previous relationship between them.

answered yesterday

user1118321

10.8k11145

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Callum is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210936%2fvery-slow-frame-rate-in-c-n-body-simulation-using-barnes-hutt-and-rk4%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.

Performance

Your accelerate() function is recursive. You might be able to increase the speed by making it loop instead of recursing.

But if you want to get real performance gains I recommend one or more of the following:

Write a SIMD version of the code that operates on more than one particle at a time.

Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)

Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.

Data Hiding

Naming

Most of your function names are pretty good with the exception of COM(). It's only called in one place, so there's no excuse for not just writing out center_of_mass().

Data Structures

    i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration

    i->tv = i->pv = i->v;

    i->ta = i->pa = i->a;

You could do something like this:

struct movement {

    float3 pos;

    float3 velocity;

    float3 acceleration;

};

...

new_movement = movement;

prev_movement = movement;

answered yesterday

user1118321

10.8k11145

add a comment |

This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.

Performance

Your accelerate() function is recursive. You might be able to increase the speed by making it loop instead of recursing.

But if you want to get real performance gains I recommend one or more of the following:

Write a SIMD version of the code that operates on more than one particle at a time.

Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)

Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.

Data Hiding

Naming

Most of your function names are pretty good with the exception of COM(). It's only called in one place, so there's no excuse for not just writing out center_of_mass().

Data Structures

    i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration

    i->tv = i->pv = i->v;

    i->ta = i->pa = i->a;

You could do something like this:

struct movement {

    float3 pos;

    float3 velocity;

    float3 acceleration;

};

...

new_movement = movement;

prev_movement = movement;

answered yesterday

user1118321

10.8k11145

add a comment |

This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.

Performance

Your accelerate() function is recursive. You might be able to increase the speed by making it loop instead of recursing.

But if you want to get real performance gains I recommend one or more of the following:

Write a SIMD version of the code that operates on more than one particle at a time.

Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)

Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.

Data Hiding

Naming

Most of your function names are pretty good with the exception of COM(). It's only called in one place, so there's no excuse for not just writing out center_of_mass().

Data Structures

    i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration

    i->tv = i->pv = i->v;

    i->ta = i->pa = i->a;

You could do something like this:

struct movement {

    float3 pos;

    float3 velocity;

    float3 acceleration;

};

...

new_movement = movement;

prev_movement = movement;

answered yesterday

user1118321

10.8k11145

This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.

Performance

Your accelerate() function is recursive. You might be able to increase the speed by making it loop instead of recursing.

But if you want to get real performance gains I recommend one or more of the following:

Write a SIMD version of the code that operates on more than one particle at a time.

Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)

Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.

Data Hiding

Naming

Most of your function names are pretty good with the exception of COM(). It's only called in one place, so there's no excuse for not just writing out center_of_mass().

Data Structures

    i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration

    i->tv = i->pv = i->v;

    i->ta = i->pa = i->a;

You could do something like this:

struct movement {

    float3 pos;

    float3 velocity;

    float3 acceleration;

};

...

new_movement = movement;

prev_movement = movement;

answered yesterday

user1118321

10.8k11145

answered yesterday

user1118321

10.8k11145

answered yesterday

user1118321

10.8k11145

answered yesterday

user1118321

10.8k11145

add a comment |

Callum is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Callum is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfrktyl

Very slow frame rate in C++ N-body simulation using Barnes-Hutt and RK4

1 Answer
1

Performance

Data Hiding

Naming

Data Structures

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Performance

Data Hiding

Naming

Data Structures

Performance

Data Hiding

Naming

Data Structures

Performance

Data Hiding

Naming

Data Structures

Performance

Data Hiding

Naming

Data Structures

Post as a guest

Popular posts from this blog

Terni

A new problem with tex4ht and tikz

Sun Ra

Very slow frame rate in C++ N-body simulation using Barnes-Hutt and RK4

1 Answer 1

Performance

Data Hiding

Naming

Data Structures

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Performance

Data Hiding

Naming

Data Structures

Performance

Data Hiding

Naming

Data Structures

Performance

Data Hiding

Naming

Data Structures

Performance

Data Hiding

Naming

Data Structures

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Terni

A new problem with tex4ht and tikz

Sun Ra

1 Answer
1

1 Answer
1

1 Answer
1