Python and SQL - Query takes too long to execute [closed]











up vote
-1
down vote

favorite












I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.



After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.



The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.



So far, my function responsible to execute the query is:



def read_query(access):

start_time = time.time()

conn = pyodbc.connect(access)

df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)


Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?










share|improve this question















closed as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast Dec 13 at 0:29


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.









  • 3




    Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
    – яүυк
    Dec 12 at 20:25












  • Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
    – Helena Martins
    Dec 12 at 20:29















up vote
-1
down vote

favorite












I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.



After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.



The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.



So far, my function responsible to execute the query is:



def read_query(access):

start_time = time.time()

conn = pyodbc.connect(access)

df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)


Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?










share|improve this question















closed as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast Dec 13 at 0:29


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.









  • 3




    Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
    – яүυк
    Dec 12 at 20:25












  • Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
    – Helena Martins
    Dec 12 at 20:29













up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.



After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.



The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.



So far, my function responsible to execute the query is:



def read_query(access):

start_time = time.time()

conn = pyodbc.connect(access)

df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)


Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?










share|improve this question















I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.



After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.



The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.



So far, my function responsible to execute the query is:



def read_query(access):

start_time = time.time()

conn = pyodbc.connect(access)

df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)


Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?







python python-3.x sql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 12 at 20:05

























asked Dec 12 at 20:00









Helena Martins

1012




1012




closed as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast Dec 13 at 0:29


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.




closed as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast Dec 13 at 0:29


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.








  • 3




    Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
    – яүυк
    Dec 12 at 20:25












  • Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
    – Helena Martins
    Dec 12 at 20:29














  • 3




    Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
    – яүυк
    Dec 12 at 20:25












  • Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
    – Helena Martins
    Dec 12 at 20:29








3




3




Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
Dec 12 at 20:25






Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
Dec 12 at 20:25














Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
Dec 12 at 20:29




Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
Dec 12 at 20:29















active

oldest

votes






















active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes

Popular posts from this blog

Список кардиналов, возведённых папой римским Каликстом III

Deduzione

Mysql.sock missing - “Can't connect to local MySQL server through socket”