Compare 2 big source by Python 3 [on hold]
I have 11 big files (text file with raw data) on the FTP server.
2016-04-10 00:00:00| 000102840111|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485
And also have another source on Google Big Query (11 tables).
And I need to build a process to compare 2 sources (the data in FTP and GBQ) to find out the new records (this process need to run as daily, the FTP server will update files every morning). And this is the path I used:
- I export all the data ( 11 tables) on GBQ and download it to my local, and create historical data by MySQL.
- After that, I read the data in the FTP server and compare with historical data to find out the new records (use a simple query).
- Send it to GBQ, also insert to historical data.
Is this way is the best solution? Could anyone please can recommend me another solution?
python-3.x
put on hold as off-topic by Jamal♦ Dec 26 at 8:08
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – Jamal
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
I have 11 big files (text file with raw data) on the FTP server.
2016-04-10 00:00:00| 000102840111|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485
And also have another source on Google Big Query (11 tables).
And I need to build a process to compare 2 sources (the data in FTP and GBQ) to find out the new records (this process need to run as daily, the FTP server will update files every morning). And this is the path I used:
- I export all the data ( 11 tables) on GBQ and download it to my local, and create historical data by MySQL.
- After that, I read the data in the FTP server and compare with historical data to find out the new records (use a simple query).
- Send it to GBQ, also insert to historical data.
Is this way is the best solution? Could anyone please can recommend me another solution?
python-3.x
put on hold as off-topic by Jamal♦ Dec 26 at 8:08
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – Jamal
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
I have 11 big files (text file with raw data) on the FTP server.
2016-04-10 00:00:00| 000102840111|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485
And also have another source on Google Big Query (11 tables).
And I need to build a process to compare 2 sources (the data in FTP and GBQ) to find out the new records (this process need to run as daily, the FTP server will update files every morning). And this is the path I used:
- I export all the data ( 11 tables) on GBQ and download it to my local, and create historical data by MySQL.
- After that, I read the data in the FTP server and compare with historical data to find out the new records (use a simple query).
- Send it to GBQ, also insert to historical data.
Is this way is the best solution? Could anyone please can recommend me another solution?
python-3.x
I have 11 big files (text file with raw data) on the FTP server.
2016-04-10 00:00:00| 000102840111|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485
And also have another source on Google Big Query (11 tables).
And I need to build a process to compare 2 sources (the data in FTP and GBQ) to find out the new records (this process need to run as daily, the FTP server will update files every morning). And this is the path I used:
- I export all the data ( 11 tables) on GBQ and download it to my local, and create historical data by MySQL.
- After that, I read the data in the FTP server and compare with historical data to find out the new records (use a simple query).
- Send it to GBQ, also insert to historical data.
Is this way is the best solution? Could anyone please can recommend me another solution?
python-3.x
python-3.x
asked Dec 26 at 8:01
Han Van Pham
254
254
put on hold as off-topic by Jamal♦ Dec 26 at 8:08
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – Jamal
If this question can be reworded to fit the rules in the help center, please edit the question.
put on hold as off-topic by Jamal♦ Dec 26 at 8:08
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – Jamal
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes