Compare 2 big source by Python 3 [on hold]












0














I have 11 big files (text file with raw data) on the FTP server.



2016-04-10 00:00:00| 000102840111|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485
2018-04-10 00:00:00| 000102840687|4987043|4845485


And also have another source on Google Big Query (11 tables).



And I need to build a process to compare 2 sources (the data in FTP and GBQ) to find out the new records (this process need to run as daily, the FTP server will update files every morning). And this is the path I used:




  • I export all the data ( 11 tables) on GBQ and download it to my local, and create historical data by MySQL.

  • After that, I read the data in the FTP server and compare with historical data to find out the new records (use a simple query).

  • Send it to GBQ, also insert to historical data.


Is this way is the best solution? Could anyone please can recommend me another solution?










share|improve this question













put on hold as off-topic by Jamal Dec 26 at 8:08


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – Jamal

If this question can be reworded to fit the rules in the help center, please edit the question.


















    0














    I have 11 big files (text file with raw data) on the FTP server.



    2016-04-10 00:00:00| 000102840111|4987043|4845485
    2018-04-10 00:00:00| 000102840687|4987043|4845485
    2018-04-10 00:00:00| 000102840687|4987043|4845485
    2018-04-10 00:00:00| 000102840687|4987043|4845485


    And also have another source on Google Big Query (11 tables).



    And I need to build a process to compare 2 sources (the data in FTP and GBQ) to find out the new records (this process need to run as daily, the FTP server will update files every morning). And this is the path I used:




    • I export all the data ( 11 tables) on GBQ and download it to my local, and create historical data by MySQL.

    • After that, I read the data in the FTP server and compare with historical data to find out the new records (use a simple query).

    • Send it to GBQ, also insert to historical data.


    Is this way is the best solution? Could anyone please can recommend me another solution?










    share|improve this question













    put on hold as off-topic by Jamal Dec 26 at 8:08


    This question appears to be off-topic. The users who voted to close gave this specific reason:


    • "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – Jamal

    If this question can be reworded to fit the rules in the help center, please edit the question.
















      0












      0








      0







      I have 11 big files (text file with raw data) on the FTP server.



      2016-04-10 00:00:00| 000102840111|4987043|4845485
      2018-04-10 00:00:00| 000102840687|4987043|4845485
      2018-04-10 00:00:00| 000102840687|4987043|4845485
      2018-04-10 00:00:00| 000102840687|4987043|4845485


      And also have another source on Google Big Query (11 tables).



      And I need to build a process to compare 2 sources (the data in FTP and GBQ) to find out the new records (this process need to run as daily, the FTP server will update files every morning). And this is the path I used:




      • I export all the data ( 11 tables) on GBQ and download it to my local, and create historical data by MySQL.

      • After that, I read the data in the FTP server and compare with historical data to find out the new records (use a simple query).

      • Send it to GBQ, also insert to historical data.


      Is this way is the best solution? Could anyone please can recommend me another solution?










      share|improve this question













      I have 11 big files (text file with raw data) on the FTP server.



      2016-04-10 00:00:00| 000102840111|4987043|4845485
      2018-04-10 00:00:00| 000102840687|4987043|4845485
      2018-04-10 00:00:00| 000102840687|4987043|4845485
      2018-04-10 00:00:00| 000102840687|4987043|4845485


      And also have another source on Google Big Query (11 tables).



      And I need to build a process to compare 2 sources (the data in FTP and GBQ) to find out the new records (this process need to run as daily, the FTP server will update files every morning). And this is the path I used:




      • I export all the data ( 11 tables) on GBQ and download it to my local, and create historical data by MySQL.

      • After that, I read the data in the FTP server and compare with historical data to find out the new records (use a simple query).

      • Send it to GBQ, also insert to historical data.


      Is this way is the best solution? Could anyone please can recommend me another solution?







      python-3.x






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Dec 26 at 8:01









      Han Van Pham

      254




      254




      put on hold as off-topic by Jamal Dec 26 at 8:08


      This question appears to be off-topic. The users who voted to close gave this specific reason:


      • "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – Jamal

      If this question can be reworded to fit the rules in the help center, please edit the question.




      put on hold as off-topic by Jamal Dec 26 at 8:08


      This question appears to be off-topic. The users who voted to close gave this specific reason:


      • "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – Jamal

      If this question can be reworded to fit the rules in the help center, please edit the question.



























          active

          oldest

          votes






















          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes

          Popular posts from this blog

          Сан-Квентин

          Алькесар

          Josef Freinademetz