Customer Segmentation Using RFM Analysis - Procedural to Functional python script












0












$begingroup$



Situation: This is my first question in this Q&A segment of stackoverflow. please
bear with me. Currently, my code works perfectly well but i would like
to make it cleaner for other users by removing the duplicate and similar lines of code into functions or for loops. Because I am still new learning
python, I still did not get a hang of functions and for loops. My data
frame rfm includes 5 columns:





  • Max Date (Latest transaction )


  • Id (unique identifier)


  • Recency (Today's date minus Latest Transaction Date)


  • Frequency (Total # of transactions per Id since its subscription)


  • Monetary (Total amount of $ spent by Id since its subscription)




Seperating the main data frame into 3 different df because the sort differs for each cumulative sum column. Frequency and Monetary dfs have identical calculations:



rfm_recency = rfm[['Max_Date', 'Id', 'Member_id', 'Recency']].copy()
rfm_recency = rfm_recency.sort_values(['Recency'], ascending=True)

rfm_frequency = rfm[['Id', 'Member_id', 'Frequency']].copy()
rfm_frequency = rfm_frequency.sort_values(['Frequency'], ascending=False)
rfm_frequency['cum_sum'] = rfm_frequency['Frequency'].cumsum()
rfm_frequency['cum_sum_perc'] = rfm_frequency['cum_sum'] / rfm_frequency['Frequency'].sum()

rfm_monetary = rfm[['Id', 'Member_id', 'Monetary']].copy()
rfm_monetary = rfm_monetary.sort_values(['Monetary'], ascending=False)
rfm_monetary['cum_sum'] = rfm_monetary['Monetary'].cumsum()
rfm_monetary['cum_sum_perc'] = rfm_monetary['cum_sum'] / rfm_monetary['Monetary'].sum()

def scorefm(x):
"""Function for separating data into 5 bins for Frequency & Monetary df """
if x <= 0.20:
return 5
elif x <= 0.40:
return 4
elif x <= 0.60:
return 3
elif x <= 0.80:
return 2
else:
return 1


# Divide the Recency df into equal quantiles
rfm_recency['r_score'] = 5 - pd.qcut(rfm_recency['Recency'], q=5, labels=False)

# Create scores from cum_sum_perc for Frequency and Monetary
rfm_frequency['f_score'] = rfm_frequency['cum_sum_perc'].apply(scorefm)
rfm_monetary['m_score'] = rfm_monetary['cum_sum_perc'].apply(scorefm)

# Resorting data frames by ID to merge
rfm_recency = rfm_recency.sort_values('Id')
rfm_frequency = rfm_frequency.sort_values('Id')
rfm_monetary = rfm_monetary.sort_values('Id')

# Merging data frames together
result = rfm_recency.copy(['Recency', 'r_score'])
result = result.join(rfm_frequency[['Frequency', 'f_score']])
result = result.join(rfm_monetary[['Monetary', 'm_score']])

# Create an FM and RFM score based on the individual R, F, M scores.
result['FM'] = (result['f_score'] + result['m_score']) / 2
result['RFM_Score'] = result['r_score'] * 10 + result['FM']



Goal: This is one of my first python script and would like to see how this could have been done with functions or for loops.




Thank you all for your time and help on this review.










share|improve this question







New contributor




Roger Steinberg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    0












    $begingroup$



    Situation: This is my first question in this Q&A segment of stackoverflow. please
    bear with me. Currently, my code works perfectly well but i would like
    to make it cleaner for other users by removing the duplicate and similar lines of code into functions or for loops. Because I am still new learning
    python, I still did not get a hang of functions and for loops. My data
    frame rfm includes 5 columns:





    • Max Date (Latest transaction )


    • Id (unique identifier)


    • Recency (Today's date minus Latest Transaction Date)


    • Frequency (Total # of transactions per Id since its subscription)


    • Monetary (Total amount of $ spent by Id since its subscription)




    Seperating the main data frame into 3 different df because the sort differs for each cumulative sum column. Frequency and Monetary dfs have identical calculations:



    rfm_recency = rfm[['Max_Date', 'Id', 'Member_id', 'Recency']].copy()
    rfm_recency = rfm_recency.sort_values(['Recency'], ascending=True)

    rfm_frequency = rfm[['Id', 'Member_id', 'Frequency']].copy()
    rfm_frequency = rfm_frequency.sort_values(['Frequency'], ascending=False)
    rfm_frequency['cum_sum'] = rfm_frequency['Frequency'].cumsum()
    rfm_frequency['cum_sum_perc'] = rfm_frequency['cum_sum'] / rfm_frequency['Frequency'].sum()

    rfm_monetary = rfm[['Id', 'Member_id', 'Monetary']].copy()
    rfm_monetary = rfm_monetary.sort_values(['Monetary'], ascending=False)
    rfm_monetary['cum_sum'] = rfm_monetary['Monetary'].cumsum()
    rfm_monetary['cum_sum_perc'] = rfm_monetary['cum_sum'] / rfm_monetary['Monetary'].sum()

    def scorefm(x):
    """Function for separating data into 5 bins for Frequency & Monetary df """
    if x <= 0.20:
    return 5
    elif x <= 0.40:
    return 4
    elif x <= 0.60:
    return 3
    elif x <= 0.80:
    return 2
    else:
    return 1


    # Divide the Recency df into equal quantiles
    rfm_recency['r_score'] = 5 - pd.qcut(rfm_recency['Recency'], q=5, labels=False)

    # Create scores from cum_sum_perc for Frequency and Monetary
    rfm_frequency['f_score'] = rfm_frequency['cum_sum_perc'].apply(scorefm)
    rfm_monetary['m_score'] = rfm_monetary['cum_sum_perc'].apply(scorefm)

    # Resorting data frames by ID to merge
    rfm_recency = rfm_recency.sort_values('Id')
    rfm_frequency = rfm_frequency.sort_values('Id')
    rfm_monetary = rfm_monetary.sort_values('Id')

    # Merging data frames together
    result = rfm_recency.copy(['Recency', 'r_score'])
    result = result.join(rfm_frequency[['Frequency', 'f_score']])
    result = result.join(rfm_monetary[['Monetary', 'm_score']])

    # Create an FM and RFM score based on the individual R, F, M scores.
    result['FM'] = (result['f_score'] + result['m_score']) / 2
    result['RFM_Score'] = result['r_score'] * 10 + result['FM']



    Goal: This is one of my first python script and would like to see how this could have been done with functions or for loops.




    Thank you all for your time and help on this review.










    share|improve this question







    New contributor




    Roger Steinberg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      0












      0








      0





      $begingroup$



      Situation: This is my first question in this Q&A segment of stackoverflow. please
      bear with me. Currently, my code works perfectly well but i would like
      to make it cleaner for other users by removing the duplicate and similar lines of code into functions or for loops. Because I am still new learning
      python, I still did not get a hang of functions and for loops. My data
      frame rfm includes 5 columns:





      • Max Date (Latest transaction )


      • Id (unique identifier)


      • Recency (Today's date minus Latest Transaction Date)


      • Frequency (Total # of transactions per Id since its subscription)


      • Monetary (Total amount of $ spent by Id since its subscription)




      Seperating the main data frame into 3 different df because the sort differs for each cumulative sum column. Frequency and Monetary dfs have identical calculations:



      rfm_recency = rfm[['Max_Date', 'Id', 'Member_id', 'Recency']].copy()
      rfm_recency = rfm_recency.sort_values(['Recency'], ascending=True)

      rfm_frequency = rfm[['Id', 'Member_id', 'Frequency']].copy()
      rfm_frequency = rfm_frequency.sort_values(['Frequency'], ascending=False)
      rfm_frequency['cum_sum'] = rfm_frequency['Frequency'].cumsum()
      rfm_frequency['cum_sum_perc'] = rfm_frequency['cum_sum'] / rfm_frequency['Frequency'].sum()

      rfm_monetary = rfm[['Id', 'Member_id', 'Monetary']].copy()
      rfm_monetary = rfm_monetary.sort_values(['Monetary'], ascending=False)
      rfm_monetary['cum_sum'] = rfm_monetary['Monetary'].cumsum()
      rfm_monetary['cum_sum_perc'] = rfm_monetary['cum_sum'] / rfm_monetary['Monetary'].sum()

      def scorefm(x):
      """Function for separating data into 5 bins for Frequency & Monetary df """
      if x <= 0.20:
      return 5
      elif x <= 0.40:
      return 4
      elif x <= 0.60:
      return 3
      elif x <= 0.80:
      return 2
      else:
      return 1


      # Divide the Recency df into equal quantiles
      rfm_recency['r_score'] = 5 - pd.qcut(rfm_recency['Recency'], q=5, labels=False)

      # Create scores from cum_sum_perc for Frequency and Monetary
      rfm_frequency['f_score'] = rfm_frequency['cum_sum_perc'].apply(scorefm)
      rfm_monetary['m_score'] = rfm_monetary['cum_sum_perc'].apply(scorefm)

      # Resorting data frames by ID to merge
      rfm_recency = rfm_recency.sort_values('Id')
      rfm_frequency = rfm_frequency.sort_values('Id')
      rfm_monetary = rfm_monetary.sort_values('Id')

      # Merging data frames together
      result = rfm_recency.copy(['Recency', 'r_score'])
      result = result.join(rfm_frequency[['Frequency', 'f_score']])
      result = result.join(rfm_monetary[['Monetary', 'm_score']])

      # Create an FM and RFM score based on the individual R, F, M scores.
      result['FM'] = (result['f_score'] + result['m_score']) / 2
      result['RFM_Score'] = result['r_score'] * 10 + result['FM']



      Goal: This is one of my first python script and would like to see how this could have been done with functions or for loops.




      Thank you all for your time and help on this review.










      share|improve this question







      New contributor




      Roger Steinberg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$





      Situation: This is my first question in this Q&A segment of stackoverflow. please
      bear with me. Currently, my code works perfectly well but i would like
      to make it cleaner for other users by removing the duplicate and similar lines of code into functions or for loops. Because I am still new learning
      python, I still did not get a hang of functions and for loops. My data
      frame rfm includes 5 columns:





      • Max Date (Latest transaction )


      • Id (unique identifier)


      • Recency (Today's date minus Latest Transaction Date)


      • Frequency (Total # of transactions per Id since its subscription)


      • Monetary (Total amount of $ spent by Id since its subscription)




      Seperating the main data frame into 3 different df because the sort differs for each cumulative sum column. Frequency and Monetary dfs have identical calculations:



      rfm_recency = rfm[['Max_Date', 'Id', 'Member_id', 'Recency']].copy()
      rfm_recency = rfm_recency.sort_values(['Recency'], ascending=True)

      rfm_frequency = rfm[['Id', 'Member_id', 'Frequency']].copy()
      rfm_frequency = rfm_frequency.sort_values(['Frequency'], ascending=False)
      rfm_frequency['cum_sum'] = rfm_frequency['Frequency'].cumsum()
      rfm_frequency['cum_sum_perc'] = rfm_frequency['cum_sum'] / rfm_frequency['Frequency'].sum()

      rfm_monetary = rfm[['Id', 'Member_id', 'Monetary']].copy()
      rfm_monetary = rfm_monetary.sort_values(['Monetary'], ascending=False)
      rfm_monetary['cum_sum'] = rfm_monetary['Monetary'].cumsum()
      rfm_monetary['cum_sum_perc'] = rfm_monetary['cum_sum'] / rfm_monetary['Monetary'].sum()

      def scorefm(x):
      """Function for separating data into 5 bins for Frequency & Monetary df """
      if x <= 0.20:
      return 5
      elif x <= 0.40:
      return 4
      elif x <= 0.60:
      return 3
      elif x <= 0.80:
      return 2
      else:
      return 1


      # Divide the Recency df into equal quantiles
      rfm_recency['r_score'] = 5 - pd.qcut(rfm_recency['Recency'], q=5, labels=False)

      # Create scores from cum_sum_perc for Frequency and Monetary
      rfm_frequency['f_score'] = rfm_frequency['cum_sum_perc'].apply(scorefm)
      rfm_monetary['m_score'] = rfm_monetary['cum_sum_perc'].apply(scorefm)

      # Resorting data frames by ID to merge
      rfm_recency = rfm_recency.sort_values('Id')
      rfm_frequency = rfm_frequency.sort_values('Id')
      rfm_monetary = rfm_monetary.sort_values('Id')

      # Merging data frames together
      result = rfm_recency.copy(['Recency', 'r_score'])
      result = result.join(rfm_frequency[['Frequency', 'f_score']])
      result = result.join(rfm_monetary[['Monetary', 'm_score']])

      # Create an FM and RFM score based on the individual R, F, M scores.
      result['FM'] = (result['f_score'] + result['m_score']) / 2
      result['RFM_Score'] = result['r_score'] * 10 + result['FM']



      Goal: This is one of my first python script and would like to see how this could have been done with functions or for loops.




      Thank you all for your time and help on this review.







      python






      share|improve this question







      New contributor




      Roger Steinberg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Roger Steinberg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Roger Steinberg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 14 mins ago









      Roger SteinbergRoger Steinberg

      101




      101




      New contributor




      Roger Steinberg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Roger Steinberg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Roger Steinberg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          0






          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Roger Steinberg is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f212955%2fcustomer-segmentation-using-rfm-analysis-procedural-to-functional-python-scrip%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          Roger Steinberg is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          Roger Steinberg is a new contributor. Be nice, and check out our Code of Conduct.













          Roger Steinberg is a new contributor. Be nice, and check out our Code of Conduct.












          Roger Steinberg is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f212955%2fcustomer-segmentation-using-rfm-analysis-procedural-to-functional-python-scrip%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Сан-Квентин

          Алькесар

          Josef Freinademetz