Count appearances of a value until it changes to another value











up vote
8
down vote

favorite
1












I have the following DataFrame:



df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])


I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.



I tried:



df['values'].value_counts()


but it gives me



10    6
9 3
23 2
12 1


The desired output is



10:2 
23:2
9:3
10:4
12:1


How can I do this?










share|improve this question
























  • You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
    – Buhb
    Nov 29 at 21:36















up vote
8
down vote

favorite
1












I have the following DataFrame:



df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])


I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.



I tried:



df['values'].value_counts()


but it gives me



10    6
9 3
23 2
12 1


The desired output is



10:2 
23:2
9:3
10:4
12:1


How can I do this?










share|improve this question
























  • You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
    – Buhb
    Nov 29 at 21:36













up vote
8
down vote

favorite
1









up vote
8
down vote

favorite
1






1





I have the following DataFrame:



df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])


I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.



I tried:



df['values'].value_counts()


but it gives me



10    6
9 3
23 2
12 1


The desired output is



10:2 
23:2
9:3
10:4
12:1


How can I do this?










share|improve this question















I have the following DataFrame:



df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])


I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.



I tried:



df['values'].value_counts()


but it gives me



10    6
9 3
23 2
12 1


The desired output is



10:2 
23:2
9:3
10:4
12:1


How can I do this?







python pandas count frequency






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 29 at 20:01









Alex Riley

76.6k21156160




76.6k21156160










asked Nov 29 at 15:43









Mischa

717




717












  • You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
    – Buhb
    Nov 29 at 21:36


















  • You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
    – Buhb
    Nov 29 at 21:36
















You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
Nov 29 at 21:36




You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
Nov 29 at 21:36












6 Answers
6






active

oldest

votes

















up vote
12
down vote













Use:



df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


Or:



df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




print (df)
values values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
Name: values, dtype: int64


Last for remove first level:



df = df.reset_index(level=0, drop=True)
print (df)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64


Explanation:



Compare original column by shifted with not equal ne and then add cumsum for helper Series:



print (pd.concat([df['values'], a, b, c], 
keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
orig shifted not_equal cumsum
0 10 NaN True 1
1 10 10.0 False 1
2 23 10.0 True 2
3 23 23.0 False 2
4 9 23.0 True 3
5 9 9.0 False 3
6 9 9.0 False 3
7 10 9.0 True 4
8 10 10.0 False 4
9 10 10.0 False 4
10 10 10.0 False 4
11 12 10.0 True 5





share|improve this answer























  • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
    – Mischa
    Nov 29 at 15:52






  • 1




    @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
    – jezrael
    Nov 29 at 15:53










  • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
    – RavinderSingh13
    Nov 30 at 12:34


















up vote
6
down vote













You can keep track of where the changes in df['values'] occur:



changes = df['values'].diff().ne(0).cumsum()
print(changes)

0 1
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5


And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

values
10 2
23 2
9 3
10 4
12 1
dtype: int64





share|improve this answer






























    up vote
    5
    down vote













    itertools.groupby



    from itertools import groupby

    pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

    10 2
    23 2
    9 3
    10 4
    12 1
    dtype: int64




    It's a generator



    def f(x):
    count = 1
    for this, that in zip(x, x[1:]):
    if this == that:
    count += 1
    else:
    yield count, this
    count = 1
    yield count, [*x][-1]

    pd.Series(*zip(*f(df['values'])))

    10 2
    23 2
    9 3
    10 4
    12 1
    dtype: int64





    share|improve this answer






























      up vote
      4
      down vote













      Using crosstab



      df['key']=df['values'].diff().ne(0).cumsum()
      pd.crosstab(df['key'],df['values'])
      Out[353]:
      values 9 10 12 23
      key
      1 0 2 0 0
      2 0 0 0 2
      3 3 0 0 0
      4 0 4 0 0
      5 0 0 1 0


      Slightly modify the result above



      pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
      Out[355]:
      key values
      1 10 2
      2 23 2
      3 9 3
      4 10 4
      5 12 1
      dtype: int64




      Base on python groupby



      from itertools import groupby

      [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
      Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]





      share|improve this answer






























        up vote
        0
        down vote













        This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



        import pandas as pd

        df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

        dict_count = {}
        for v in df['values'].unique():
        dict_count[v] = 0

        curr_val = df.iloc[0]['values']
        count = 1
        for i in range(1, len(df)):
        if df.iloc[i]['values'] == curr_val:
        count += 1
        else:
        if count > dict_count[curr_val]:
        dict_count[curr_val] = count
        curr_val = df.iloc[i]['values']
        count = 1
        if count > dict_count[curr_val]:
        dict_count[curr_val] = count

        df_count = pd.DataFrame(dict_count, index=[0])
        print(df_count)





        share|improve this answer




























          up vote
          0
          down vote













          The function groupby in itertools can help you, for str:



          >>> string = 'aabbaacc'
          >>> for char, freq in groupby('aabbaacc'):
          >>> print(char, len(list(freq)), sep=':', end='n')
          [out]:
          a:2
          b:2
          a:2
          c:2


          This function also works for list:



          >>> df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
          >>> for char, freq in groupby(df['values'].tolist()):
          >>> print(char, len(list(freq)), sep=':', end='n')
          [out]:
          10:2
          23:2
          9:3
          10:4
          12:1


          Note: for df, you always use this way like df['values'] to take 'values' column, because DataFrame have a attribute values






          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53542668%2fcount-appearances-of-a-value-until-it-changes-to-another-value%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            6 Answers
            6






            active

            oldest

            votes








            6 Answers
            6






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            12
            down vote













            Use:



            df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


            Or:



            df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




            print (df)
            values values
            1 10 2
            2 23 2
            3 9 3
            4 10 4
            5 12 1
            Name: values, dtype: int64


            Last for remove first level:



            df = df.reset_index(level=0, drop=True)
            print (df)
            values
            10 2
            23 2
            9 3
            10 4
            12 1
            dtype: int64


            Explanation:



            Compare original column by shifted with not equal ne and then add cumsum for helper Series:



            print (pd.concat([df['values'], a, b, c], 
            keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
            orig shifted not_equal cumsum
            0 10 NaN True 1
            1 10 10.0 False 1
            2 23 10.0 True 2
            3 23 23.0 False 2
            4 9 23.0 True 3
            5 9 9.0 False 3
            6 9 9.0 False 3
            7 10 9.0 True 4
            8 10 10.0 False 4
            9 10 10.0 False 4
            10 10 10.0 False 4
            11 12 10.0 True 5





            share|improve this answer























            • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
              – Mischa
              Nov 29 at 15:52






            • 1




              @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
              – jezrael
              Nov 29 at 15:53










            • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
              – RavinderSingh13
              Nov 30 at 12:34















            up vote
            12
            down vote













            Use:



            df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


            Or:



            df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




            print (df)
            values values
            1 10 2
            2 23 2
            3 9 3
            4 10 4
            5 12 1
            Name: values, dtype: int64


            Last for remove first level:



            df = df.reset_index(level=0, drop=True)
            print (df)
            values
            10 2
            23 2
            9 3
            10 4
            12 1
            dtype: int64


            Explanation:



            Compare original column by shifted with not equal ne and then add cumsum for helper Series:



            print (pd.concat([df['values'], a, b, c], 
            keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
            orig shifted not_equal cumsum
            0 10 NaN True 1
            1 10 10.0 False 1
            2 23 10.0 True 2
            3 23 23.0 False 2
            4 9 23.0 True 3
            5 9 9.0 False 3
            6 9 9.0 False 3
            7 10 9.0 True 4
            8 10 10.0 False 4
            9 10 10.0 False 4
            10 10 10.0 False 4
            11 12 10.0 True 5





            share|improve this answer























            • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
              – Mischa
              Nov 29 at 15:52






            • 1




              @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
              – jezrael
              Nov 29 at 15:53










            • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
              – RavinderSingh13
              Nov 30 at 12:34













            up vote
            12
            down vote










            up vote
            12
            down vote









            Use:



            df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


            Or:



            df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




            print (df)
            values values
            1 10 2
            2 23 2
            3 9 3
            4 10 4
            5 12 1
            Name: values, dtype: int64


            Last for remove first level:



            df = df.reset_index(level=0, drop=True)
            print (df)
            values
            10 2
            23 2
            9 3
            10 4
            12 1
            dtype: int64


            Explanation:



            Compare original column by shifted with not equal ne and then add cumsum for helper Series:



            print (pd.concat([df['values'], a, b, c], 
            keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
            orig shifted not_equal cumsum
            0 10 NaN True 1
            1 10 10.0 False 1
            2 23 10.0 True 2
            3 23 23.0 False 2
            4 9 23.0 True 3
            5 9 9.0 False 3
            6 9 9.0 False 3
            7 10 9.0 True 4
            8 10 10.0 False 4
            9 10 10.0 False 4
            10 10 10.0 False 4
            11 12 10.0 True 5





            share|improve this answer














            Use:



            df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


            Or:



            df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




            print (df)
            values values
            1 10 2
            2 23 2
            3 9 3
            4 10 4
            5 12 1
            Name: values, dtype: int64


            Last for remove first level:



            df = df.reset_index(level=0, drop=True)
            print (df)
            values
            10 2
            23 2
            9 3
            10 4
            12 1
            dtype: int64


            Explanation:



            Compare original column by shifted with not equal ne and then add cumsum for helper Series:



            print (pd.concat([df['values'], a, b, c], 
            keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
            orig shifted not_equal cumsum
            0 10 NaN True 1
            1 10 10.0 False 1
            2 23 10.0 True 2
            3 23 23.0 False 2
            4 9 23.0 True 3
            5 9 9.0 False 3
            6 9 9.0 False 3
            7 10 9.0 True 4
            8 10 10.0 False 4
            9 10 10.0 False 4
            10 10 10.0 False 4
            11 12 10.0 True 5






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 29 at 15:51

























            answered Nov 29 at 15:45









            jezrael

            317k22257336




            317k22257336












            • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
              – Mischa
              Nov 29 at 15:52






            • 1




              @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
              – jezrael
              Nov 29 at 15:53










            • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
              – RavinderSingh13
              Nov 30 at 12:34


















            • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
              – Mischa
              Nov 29 at 15:52






            • 1




              @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
              – jezrael
              Nov 29 at 15:53










            • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
              – RavinderSingh13
              Nov 30 at 12:34
















            i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
            – Mischa
            Nov 29 at 15:52




            i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
            – Mischa
            Nov 29 at 15:52




            1




            1




            @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
            – jezrael
            Nov 29 at 15:53




            @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
            – jezrael
            Nov 29 at 15:53












            @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
            – RavinderSingh13
            Nov 30 at 12:34




            @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
            – RavinderSingh13
            Nov 30 at 12:34












            up vote
            6
            down vote













            You can keep track of where the changes in df['values'] occur:



            changes = df['values'].diff().ne(0).cumsum()
            print(changes)

            0 1
            1 1
            2 2
            3 2
            4 3
            5 3
            6 3
            7 4
            8 4
            9 4
            10 4
            11 5


            And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



            df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

            values
            10 2
            23 2
            9 3
            10 4
            12 1
            dtype: int64





            share|improve this answer



























              up vote
              6
              down vote













              You can keep track of where the changes in df['values'] occur:



              changes = df['values'].diff().ne(0).cumsum()
              print(changes)

              0 1
              1 1
              2 2
              3 2
              4 3
              5 3
              6 3
              7 4
              8 4
              9 4
              10 4
              11 5


              And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



              df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

              values
              10 2
              23 2
              9 3
              10 4
              12 1
              dtype: int64





              share|improve this answer

























                up vote
                6
                down vote










                up vote
                6
                down vote









                You can keep track of where the changes in df['values'] occur:



                changes = df['values'].diff().ne(0).cumsum()
                print(changes)

                0 1
                1 1
                2 2
                3 2
                4 3
                5 3
                6 3
                7 4
                8 4
                9 4
                10 4
                11 5


                And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



                df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

                values
                10 2
                23 2
                9 3
                10 4
                12 1
                dtype: int64





                share|improve this answer














                You can keep track of where the changes in df['values'] occur:



                changes = df['values'].diff().ne(0).cumsum()
                print(changes)

                0 1
                1 1
                2 2
                3 2
                4 3
                5 3
                6 3
                7 4
                8 4
                9 4
                10 4
                11 5


                And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



                df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

                values
                10 2
                23 2
                9 3
                10 4
                12 1
                dtype: int64






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 29 at 16:01

























                answered Nov 29 at 15:55









                nixon

                2,9351221




                2,9351221






















                    up vote
                    5
                    down vote













                    itertools.groupby



                    from itertools import groupby

                    pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

                    10 2
                    23 2
                    9 3
                    10 4
                    12 1
                    dtype: int64




                    It's a generator



                    def f(x):
                    count = 1
                    for this, that in zip(x, x[1:]):
                    if this == that:
                    count += 1
                    else:
                    yield count, this
                    count = 1
                    yield count, [*x][-1]

                    pd.Series(*zip(*f(df['values'])))

                    10 2
                    23 2
                    9 3
                    10 4
                    12 1
                    dtype: int64





                    share|improve this answer



























                      up vote
                      5
                      down vote













                      itertools.groupby



                      from itertools import groupby

                      pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

                      10 2
                      23 2
                      9 3
                      10 4
                      12 1
                      dtype: int64




                      It's a generator



                      def f(x):
                      count = 1
                      for this, that in zip(x, x[1:]):
                      if this == that:
                      count += 1
                      else:
                      yield count, this
                      count = 1
                      yield count, [*x][-1]

                      pd.Series(*zip(*f(df['values'])))

                      10 2
                      23 2
                      9 3
                      10 4
                      12 1
                      dtype: int64





                      share|improve this answer

























                        up vote
                        5
                        down vote










                        up vote
                        5
                        down vote









                        itertools.groupby



                        from itertools import groupby

                        pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

                        10 2
                        23 2
                        9 3
                        10 4
                        12 1
                        dtype: int64




                        It's a generator



                        def f(x):
                        count = 1
                        for this, that in zip(x, x[1:]):
                        if this == that:
                        count += 1
                        else:
                        yield count, this
                        count = 1
                        yield count, [*x][-1]

                        pd.Series(*zip(*f(df['values'])))

                        10 2
                        23 2
                        9 3
                        10 4
                        12 1
                        dtype: int64





                        share|improve this answer














                        itertools.groupby



                        from itertools import groupby

                        pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

                        10 2
                        23 2
                        9 3
                        10 4
                        12 1
                        dtype: int64




                        It's a generator



                        def f(x):
                        count = 1
                        for this, that in zip(x, x[1:]):
                        if this == that:
                        count += 1
                        else:
                        yield count, this
                        count = 1
                        yield count, [*x][-1]

                        pd.Series(*zip(*f(df['values'])))

                        10 2
                        23 2
                        9 3
                        10 4
                        12 1
                        dtype: int64






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Nov 29 at 16:38

























                        answered Nov 29 at 15:59









                        piRSquared

                        151k22141283




                        151k22141283






















                            up vote
                            4
                            down vote













                            Using crosstab



                            df['key']=df['values'].diff().ne(0).cumsum()
                            pd.crosstab(df['key'],df['values'])
                            Out[353]:
                            values 9 10 12 23
                            key
                            1 0 2 0 0
                            2 0 0 0 2
                            3 3 0 0 0
                            4 0 4 0 0
                            5 0 0 1 0


                            Slightly modify the result above



                            pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
                            Out[355]:
                            key values
                            1 10 2
                            2 23 2
                            3 9 3
                            4 10 4
                            5 12 1
                            dtype: int64




                            Base on python groupby



                            from itertools import groupby

                            [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
                            Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]





                            share|improve this answer



























                              up vote
                              4
                              down vote













                              Using crosstab



                              df['key']=df['values'].diff().ne(0).cumsum()
                              pd.crosstab(df['key'],df['values'])
                              Out[353]:
                              values 9 10 12 23
                              key
                              1 0 2 0 0
                              2 0 0 0 2
                              3 3 0 0 0
                              4 0 4 0 0
                              5 0 0 1 0


                              Slightly modify the result above



                              pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
                              Out[355]:
                              key values
                              1 10 2
                              2 23 2
                              3 9 3
                              4 10 4
                              5 12 1
                              dtype: int64




                              Base on python groupby



                              from itertools import groupby

                              [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
                              Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]





                              share|improve this answer

























                                up vote
                                4
                                down vote










                                up vote
                                4
                                down vote









                                Using crosstab



                                df['key']=df['values'].diff().ne(0).cumsum()
                                pd.crosstab(df['key'],df['values'])
                                Out[353]:
                                values 9 10 12 23
                                key
                                1 0 2 0 0
                                2 0 0 0 2
                                3 3 0 0 0
                                4 0 4 0 0
                                5 0 0 1 0


                                Slightly modify the result above



                                pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
                                Out[355]:
                                key values
                                1 10 2
                                2 23 2
                                3 9 3
                                4 10 4
                                5 12 1
                                dtype: int64




                                Base on python groupby



                                from itertools import groupby

                                [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
                                Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]





                                share|improve this answer














                                Using crosstab



                                df['key']=df['values'].diff().ne(0).cumsum()
                                pd.crosstab(df['key'],df['values'])
                                Out[353]:
                                values 9 10 12 23
                                key
                                1 0 2 0 0
                                2 0 0 0 2
                                3 3 0 0 0
                                4 0 4 0 0
                                5 0 0 1 0


                                Slightly modify the result above



                                pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
                                Out[355]:
                                key values
                                1 10 2
                                2 23 2
                                3 9 3
                                4 10 4
                                5 12 1
                                dtype: int64




                                Base on python groupby



                                from itertools import groupby

                                [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
                                Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]






                                share|improve this answer














                                share|improve this answer



                                share|improve this answer








                                edited Nov 29 at 15:59

























                                answered Nov 29 at 15:48









                                W-B

                                99.1k73162




                                99.1k73162






















                                    up vote
                                    0
                                    down vote













                                    This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



                                    import pandas as pd

                                    df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

                                    dict_count = {}
                                    for v in df['values'].unique():
                                    dict_count[v] = 0

                                    curr_val = df.iloc[0]['values']
                                    count = 1
                                    for i in range(1, len(df)):
                                    if df.iloc[i]['values'] == curr_val:
                                    count += 1
                                    else:
                                    if count > dict_count[curr_val]:
                                    dict_count[curr_val] = count
                                    curr_val = df.iloc[i]['values']
                                    count = 1
                                    if count > dict_count[curr_val]:
                                    dict_count[curr_val] = count

                                    df_count = pd.DataFrame(dict_count, index=[0])
                                    print(df_count)





                                    share|improve this answer

























                                      up vote
                                      0
                                      down vote













                                      This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



                                      import pandas as pd

                                      df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

                                      dict_count = {}
                                      for v in df['values'].unique():
                                      dict_count[v] = 0

                                      curr_val = df.iloc[0]['values']
                                      count = 1
                                      for i in range(1, len(df)):
                                      if df.iloc[i]['values'] == curr_val:
                                      count += 1
                                      else:
                                      if count > dict_count[curr_val]:
                                      dict_count[curr_val] = count
                                      curr_val = df.iloc[i]['values']
                                      count = 1
                                      if count > dict_count[curr_val]:
                                      dict_count[curr_val] = count

                                      df_count = pd.DataFrame(dict_count, index=[0])
                                      print(df_count)





                                      share|improve this answer























                                        up vote
                                        0
                                        down vote










                                        up vote
                                        0
                                        down vote









                                        This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



                                        import pandas as pd

                                        df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

                                        dict_count = {}
                                        for v in df['values'].unique():
                                        dict_count[v] = 0

                                        curr_val = df.iloc[0]['values']
                                        count = 1
                                        for i in range(1, len(df)):
                                        if df.iloc[i]['values'] == curr_val:
                                        count += 1
                                        else:
                                        if count > dict_count[curr_val]:
                                        dict_count[curr_val] = count
                                        curr_val = df.iloc[i]['values']
                                        count = 1
                                        if count > dict_count[curr_val]:
                                        dict_count[curr_val] = count

                                        df_count = pd.DataFrame(dict_count, index=[0])
                                        print(df_count)





                                        share|improve this answer












                                        This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



                                        import pandas as pd

                                        df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

                                        dict_count = {}
                                        for v in df['values'].unique():
                                        dict_count[v] = 0

                                        curr_val = df.iloc[0]['values']
                                        count = 1
                                        for i in range(1, len(df)):
                                        if df.iloc[i]['values'] == curr_val:
                                        count += 1
                                        else:
                                        if count > dict_count[curr_val]:
                                        dict_count[curr_val] = count
                                        curr_val = df.iloc[i]['values']
                                        count = 1
                                        if count > dict_count[curr_val]:
                                        dict_count[curr_val] = count

                                        df_count = pd.DataFrame(dict_count, index=[0])
                                        print(df_count)






                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered Nov 30 at 19:22









                                        UBears

                                        111111




                                        111111






















                                            up vote
                                            0
                                            down vote













                                            The function groupby in itertools can help you, for str:



                                            >>> string = 'aabbaacc'
                                            >>> for char, freq in groupby('aabbaacc'):
                                            >>> print(char, len(list(freq)), sep=':', end='n')
                                            [out]:
                                            a:2
                                            b:2
                                            a:2
                                            c:2


                                            This function also works for list:



                                            >>> df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
                                            >>> for char, freq in groupby(df['values'].tolist()):
                                            >>> print(char, len(list(freq)), sep=':', end='n')
                                            [out]:
                                            10:2
                                            23:2
                                            9:3
                                            10:4
                                            12:1


                                            Note: for df, you always use this way like df['values'] to take 'values' column, because DataFrame have a attribute values






                                            share|improve this answer

























                                              up vote
                                              0
                                              down vote













                                              The function groupby in itertools can help you, for str:



                                              >>> string = 'aabbaacc'
                                              >>> for char, freq in groupby('aabbaacc'):
                                              >>> print(char, len(list(freq)), sep=':', end='n')
                                              [out]:
                                              a:2
                                              b:2
                                              a:2
                                              c:2


                                              This function also works for list:



                                              >>> df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
                                              >>> for char, freq in groupby(df['values'].tolist()):
                                              >>> print(char, len(list(freq)), sep=':', end='n')
                                              [out]:
                                              10:2
                                              23:2
                                              9:3
                                              10:4
                                              12:1


                                              Note: for df, you always use this way like df['values'] to take 'values' column, because DataFrame have a attribute values






                                              share|improve this answer























                                                up vote
                                                0
                                                down vote










                                                up vote
                                                0
                                                down vote









                                                The function groupby in itertools can help you, for str:



                                                >>> string = 'aabbaacc'
                                                >>> for char, freq in groupby('aabbaacc'):
                                                >>> print(char, len(list(freq)), sep=':', end='n')
                                                [out]:
                                                a:2
                                                b:2
                                                a:2
                                                c:2


                                                This function also works for list:



                                                >>> df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
                                                >>> for char, freq in groupby(df['values'].tolist()):
                                                >>> print(char, len(list(freq)), sep=':', end='n')
                                                [out]:
                                                10:2
                                                23:2
                                                9:3
                                                10:4
                                                12:1


                                                Note: for df, you always use this way like df['values'] to take 'values' column, because DataFrame have a attribute values






                                                share|improve this answer












                                                The function groupby in itertools can help you, for str:



                                                >>> string = 'aabbaacc'
                                                >>> for char, freq in groupby('aabbaacc'):
                                                >>> print(char, len(list(freq)), sep=':', end='n')
                                                [out]:
                                                a:2
                                                b:2
                                                a:2
                                                c:2


                                                This function also works for list:



                                                >>> df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
                                                >>> for char, freq in groupby(df['values'].tolist()):
                                                >>> print(char, len(list(freq)), sep=':', end='n')
                                                [out]:
                                                10:2
                                                23:2
                                                9:3
                                                10:4
                                                12:1


                                                Note: for df, you always use this way like df['values'] to take 'values' column, because DataFrame have a attribute values







                                                share|improve this answer












                                                share|improve this answer



                                                share|improve this answer










                                                answered Dec 7 at 2:58









                                                TimeSeam

                                                1815




                                                1815






























                                                    draft saved

                                                    draft discarded




















































                                                    Thanks for contributing an answer to Stack Overflow!


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.





                                                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                                    Please pay close attention to the following guidance:


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.




                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function () {
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53542668%2fcount-appearances-of-a-value-until-it-changes-to-another-value%23new-answer', 'question_page');
                                                    }
                                                    );

                                                    Post as a guest















                                                    Required, but never shown





















































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown

































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown







                                                    Popular posts from this blog

                                                    Список кардиналов, возведённых папой римским Каликстом III

                                                    Deduzione

                                                    Mysql.sock missing - “Can't connect to local MySQL server through socket”