Python and SQL - Query takes too long to execute [closed]

up vote
-1
down vote

favorite

I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.

After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.

The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.

So far, my function responsible to execute the query is:

def read_query(access):



    start_time = time.time()



    conn = pyodbc.connect(access)



    df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)

Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?

edited Dec 12 at 20:05

asked Dec 12 at 20:00

Helena Martins

1012

closed as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast Dec 13 at 0:29

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.

3

Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
Dec 12 at 20:25

Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
Dec 12 at 20:29

add a comment |

up vote
-1
down vote

favorite

I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.

The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.

So far, my function responsible to execute the query is:

def read_query(access):



    start_time = time.time()



    conn = pyodbc.connect(access)



    df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)

Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?

edited Dec 12 at 20:05

asked Dec 12 at 20:00

Helena Martins

1012

closed as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast Dec 13 at 0:29

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.

3

Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
Dec 12 at 20:25

Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
Dec 12 at 20:29

add a comment |

up vote
-1
down vote

favorite

I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.

The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.

So far, my function responsible to execute the query is:

def read_query(access):



    start_time = time.time()



    conn = pyodbc.connect(access)



    df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)

Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?

edited Dec 12 at 20:05

asked Dec 12 at 20:00

Helena Martins

1012

I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.

The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.

So far, my function responsible to execute the query is:

def read_query(access):



    start_time = time.time()



    conn = pyodbc.connect(access)



    df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)

Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?

python python-3.x sql

edited Dec 12 at 20:05

asked Dec 12 at 20:00

Helena Martins

1012

edited Dec 12 at 20:05

asked Dec 12 at 20:00

Helena Martins

1012

edited Dec 12 at 20:05

asked Dec 12 at 20:00

Helena Martins

1012

asked Dec 12 at 20:00

Helena Martins

1012

asked Dec 12 at 20:00

Helena Martins

1012

closed as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast Dec 13 at 0:29

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.

closed as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast Dec 13 at 0:29

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.

3

Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
Dec 12 at 20:25

Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
Dec 12 at 20:29

add a comment |

3

Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
Dec 12 at 20:25

Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
Dec 12 at 20:29

Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
Dec 12 at 20:25

Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
Dec 12 at 20:29

add a comment |

active

oldest

votes

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfrktyl