Count appearances of a value until it changes to another value
up vote
8
down vote
favorite
I have the following DataFrame:
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.
I tried:
df['values'].value_counts()
but it gives me
10 6
9 3
23 2
12 1
The desired output is
10:2
23:2
9:3
10:4
12:1
How can I do this?
python pandas count frequency
add a comment |
up vote
8
down vote
favorite
I have the following DataFrame:
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.
I tried:
df['values'].value_counts()
but it gives me
10 6
9 3
23 2
12 1
The desired output is
10:2
23:2
9:3
10:4
12:1
How can I do this?
python pandas count frequency
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
Nov 29 at 21:36
add a comment |
up vote
8
down vote
favorite
up vote
8
down vote
favorite
I have the following DataFrame:
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.
I tried:
df['values'].value_counts()
but it gives me
10 6
9 3
23 2
12 1
The desired output is
10:2
23:2
9:3
10:4
12:1
How can I do this?
python pandas count frequency
I have the following DataFrame:
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.
I tried:
df['values'].value_counts()
but it gives me
10 6
9 3
23 2
12 1
The desired output is
10:2
23:2
9:3
10:4
12:1
How can I do this?
python pandas count frequency
python pandas count frequency
edited Nov 29 at 20:01
Alex Riley
76.6k21156160
76.6k21156160
asked Nov 29 at 15:43
Mischa
717
717
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
Nov 29 at 21:36
add a comment |
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
Nov 29 at 21:36
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
Nov 29 at 21:36
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
Nov 29 at 21:36
add a comment |
6 Answers
6
active
oldest
votes
up vote
12
down vote
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
Explanation:
Compare original column by shift
ed with not equal ne
and then add cumsum
for helper Series
:
print (pd.concat([df['values'], a, b, c],
keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
orig shifted not_equal cumsum
0 10 NaN True 1
1 10 10.0 False 1
2 23 10.0 True 2
3 23 23.0 False 2
4 9 23.0 True 3
5 9 9.0 False 3
6 9 9.0 False 3
7 10 9.0 True 4
8 10 10.0 False 4
9 10 10.0 False 4
10 10 10.0 False 4
11 12 10.0 True 5
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
Nov 29 at 15:52
1
@Mischa - Then add.rename
likedf['values'].ne(df['values'].shift()).cumsum().rename('val1')
– jezrael
Nov 29 at 15:53
@jezrael, ++ve for nice code sir, could you please explain it by dividing it into partsdf = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
as it is not clear, will be grateful to you.
– RavinderSingh13
Nov 30 at 12:34
add a comment |
up vote
6
down vote
You can keep track of where the changes in df['values']
occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0 1
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5
And groupby
the changes and also df['values']
(to keep them as index) computing the size
of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
add a comment |
up vote
5
down vote
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10 2
23 2
9 3
10 4
12 1
dtype: int64
It's a generator
def f(x):
count = 1
for this, that in zip(x, x[1:]):
if this == that:
count += 1
else:
yield count, this
count = 1
yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10 2
23 2
9 3
10 4
12 1
dtype: int64
add a comment |
up vote
4
down vote
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]:
values 9 10 12 23
key
1 0 2 0 0
2 0 0 0 2
3 3 0 0 0
4 0 4 0 0
5 0 0 1 0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]:
key values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
dtype: int64
Base on python
groupby
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
add a comment |
up vote
0
down vote
This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.
import pandas as pd
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
dict_count = {}
for v in df['values'].unique():
dict_count[v] = 0
curr_val = df.iloc[0]['values']
count = 1
for i in range(1, len(df)):
if df.iloc[i]['values'] == curr_val:
count += 1
else:
if count > dict_count[curr_val]:
dict_count[curr_val] = count
curr_val = df.iloc[i]['values']
count = 1
if count > dict_count[curr_val]:
dict_count[curr_val] = count
df_count = pd.DataFrame(dict_count, index=[0])
print(df_count)
add a comment |
up vote
0
down vote
The function groupby
in itertools
can help you, for str
:
>>> string = 'aabbaacc'
>>> for char, freq in groupby('aabbaacc'):
>>> print(char, len(list(freq)), sep=':', end='n')
[out]:
a:2
b:2
a:2
c:2
This function also works for list
:
>>> df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
>>> for char, freq in groupby(df['values'].tolist()):
>>> print(char, len(list(freq)), sep=':', end='n')
[out]:
10:2
23:2
9:3
10:4
12:1
Note
: for df
, you always use this way like df['values'] to take 'values' column, because DataFrame have a attribute values
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53542668%2fcount-appearances-of-a-value-until-it-changes-to-another-value%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
12
down vote
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
Explanation:
Compare original column by shift
ed with not equal ne
and then add cumsum
for helper Series
:
print (pd.concat([df['values'], a, b, c],
keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
orig shifted not_equal cumsum
0 10 NaN True 1
1 10 10.0 False 1
2 23 10.0 True 2
3 23 23.0 False 2
4 9 23.0 True 3
5 9 9.0 False 3
6 9 9.0 False 3
7 10 9.0 True 4
8 10 10.0 False 4
9 10 10.0 False 4
10 10 10.0 False 4
11 12 10.0 True 5
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
Nov 29 at 15:52
1
@Mischa - Then add.rename
likedf['values'].ne(df['values'].shift()).cumsum().rename('val1')
– jezrael
Nov 29 at 15:53
@jezrael, ++ve for nice code sir, could you please explain it by dividing it into partsdf = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
as it is not clear, will be grateful to you.
– RavinderSingh13
Nov 30 at 12:34
add a comment |
up vote
12
down vote
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
Explanation:
Compare original column by shift
ed with not equal ne
and then add cumsum
for helper Series
:
print (pd.concat([df['values'], a, b, c],
keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
orig shifted not_equal cumsum
0 10 NaN True 1
1 10 10.0 False 1
2 23 10.0 True 2
3 23 23.0 False 2
4 9 23.0 True 3
5 9 9.0 False 3
6 9 9.0 False 3
7 10 9.0 True 4
8 10 10.0 False 4
9 10 10.0 False 4
10 10 10.0 False 4
11 12 10.0 True 5
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
Nov 29 at 15:52
1
@Mischa - Then add.rename
likedf['values'].ne(df['values'].shift()).cumsum().rename('val1')
– jezrael
Nov 29 at 15:53
@jezrael, ++ve for nice code sir, could you please explain it by dividing it into partsdf = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
as it is not clear, will be grateful to you.
– RavinderSingh13
Nov 30 at 12:34
add a comment |
up vote
12
down vote
up vote
12
down vote
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
Explanation:
Compare original column by shift
ed with not equal ne
and then add cumsum
for helper Series
:
print (pd.concat([df['values'], a, b, c],
keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
orig shifted not_equal cumsum
0 10 NaN True 1
1 10 10.0 False 1
2 23 10.0 True 2
3 23 23.0 False 2
4 9 23.0 True 3
5 9 9.0 False 3
6 9 9.0 False 3
7 10 9.0 True 4
8 10 10.0 False 4
9 10 10.0 False 4
10 10 10.0 False 4
11 12 10.0 True 5
Use:
df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
Or:
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
print (df)
values values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
Name: values, dtype: int64
Last for remove first level:
df = df.reset_index(level=0, drop=True)
print (df)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
Explanation:
Compare original column by shift
ed with not equal ne
and then add cumsum
for helper Series
:
print (pd.concat([df['values'], a, b, c],
keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
orig shifted not_equal cumsum
0 10 NaN True 1
1 10 10.0 False 1
2 23 10.0 True 2
3 23 23.0 False 2
4 9 23.0 True 3
5 9 9.0 False 3
6 9 9.0 False 3
7 10 9.0 True 4
8 10 10.0 False 4
9 10 10.0 False 4
10 10 10.0 False 4
11 12 10.0 True 5
edited Nov 29 at 15:51
answered Nov 29 at 15:45
jezrael
317k22257336
317k22257336
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
Nov 29 at 15:52
1
@Mischa - Then add.rename
likedf['values'].ne(df['values'].shift()).cumsum().rename('val1')
– jezrael
Nov 29 at 15:53
@jezrael, ++ve for nice code sir, could you please explain it by dividing it into partsdf = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
as it is not clear, will be grateful to you.
– RavinderSingh13
Nov 30 at 12:34
add a comment |
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
Nov 29 at 15:52
1
@Mischa - Then add.rename
likedf['values'].ne(df['values'].shift()).cumsum().rename('val1')
– jezrael
Nov 29 at 15:53
@jezrael, ++ve for nice code sir, could you please explain it by dividing it into partsdf = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
as it is not clear, will be grateful to you.
– RavinderSingh13
Nov 30 at 12:34
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
Nov 29 at 15:52
i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
– Mischa
Nov 29 at 15:52
1
1
@Mischa - Then add
.rename
like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
– jezrael
Nov 29 at 15:53
@Mischa - Then add
.rename
like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
– jezrael
Nov 29 at 15:53
@jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
as it is not clear, will be grateful to you.– RavinderSingh13
Nov 30 at 12:34
@jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts
df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
as it is not clear, will be grateful to you.– RavinderSingh13
Nov 30 at 12:34
add a comment |
up vote
6
down vote
You can keep track of where the changes in df['values']
occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0 1
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5
And groupby
the changes and also df['values']
(to keep them as index) computing the size
of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
add a comment |
up vote
6
down vote
You can keep track of where the changes in df['values']
occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0 1
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5
And groupby
the changes and also df['values']
(to keep them as index) computing the size
of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
add a comment |
up vote
6
down vote
up vote
6
down vote
You can keep track of where the changes in df['values']
occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0 1
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5
And groupby
the changes and also df['values']
(to keep them as index) computing the size
of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
You can keep track of where the changes in df['values']
occur:
changes = df['values'].diff().ne(0).cumsum()
print(changes)
0 1
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5
And groupby
the changes and also df['values']
(to keep them as index) computing the size
of each group
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64
edited Nov 29 at 16:01
answered Nov 29 at 15:55
nixon
2,9351221
2,9351221
add a comment |
add a comment |
up vote
5
down vote
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10 2
23 2
9 3
10 4
12 1
dtype: int64
It's a generator
def f(x):
count = 1
for this, that in zip(x, x[1:]):
if this == that:
count += 1
else:
yield count, this
count = 1
yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10 2
23 2
9 3
10 4
12 1
dtype: int64
add a comment |
up vote
5
down vote
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10 2
23 2
9 3
10 4
12 1
dtype: int64
It's a generator
def f(x):
count = 1
for this, that in zip(x, x[1:]):
if this == that:
count += 1
else:
yield count, this
count = 1
yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10 2
23 2
9 3
10 4
12 1
dtype: int64
add a comment |
up vote
5
down vote
up vote
5
down vote
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10 2
23 2
9 3
10 4
12 1
dtype: int64
It's a generator
def f(x):
count = 1
for this, that in zip(x, x[1:]):
if this == that:
count += 1
else:
yield count, this
count = 1
yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10 2
23 2
9 3
10 4
12 1
dtype: int64
itertools.groupby
from itertools import groupby
pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
10 2
23 2
9 3
10 4
12 1
dtype: int64
It's a generator
def f(x):
count = 1
for this, that in zip(x, x[1:]):
if this == that:
count += 1
else:
yield count, this
count = 1
yield count, [*x][-1]
pd.Series(*zip(*f(df['values'])))
10 2
23 2
9 3
10 4
12 1
dtype: int64
edited Nov 29 at 16:38
answered Nov 29 at 15:59
piRSquared
151k22141283
151k22141283
add a comment |
add a comment |
up vote
4
down vote
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]:
values 9 10 12 23
key
1 0 2 0 0
2 0 0 0 2
3 3 0 0 0
4 0 4 0 0
5 0 0 1 0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]:
key values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
dtype: int64
Base on python
groupby
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
add a comment |
up vote
4
down vote
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]:
values 9 10 12 23
key
1 0 2 0 0
2 0 0 0 2
3 3 0 0 0
4 0 4 0 0
5 0 0 1 0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]:
key values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
dtype: int64
Base on python
groupby
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
add a comment |
up vote
4
down vote
up vote
4
down vote
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]:
values 9 10 12 23
key
1 0 2 0 0
2 0 0 0 2
3 3 0 0 0
4 0 4 0 0
5 0 0 1 0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]:
key values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
dtype: int64
Base on python
groupby
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
Using crosstab
df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]:
values 9 10 12 23
key
1 0 2 0 0
2 0 0 0 2
3 3 0 0 0
4 0 4 0 0
5 0 0 1 0
Slightly modify the result above
pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]:
key values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
dtype: int64
Base on python
groupby
from itertools import groupby
[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
edited Nov 29 at 15:59
answered Nov 29 at 15:48
W-B
99.1k73162
99.1k73162
add a comment |
add a comment |
up vote
0
down vote
This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.
import pandas as pd
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
dict_count = {}
for v in df['values'].unique():
dict_count[v] = 0
curr_val = df.iloc[0]['values']
count = 1
for i in range(1, len(df)):
if df.iloc[i]['values'] == curr_val:
count += 1
else:
if count > dict_count[curr_val]:
dict_count[curr_val] = count
curr_val = df.iloc[i]['values']
count = 1
if count > dict_count[curr_val]:
dict_count[curr_val] = count
df_count = pd.DataFrame(dict_count, index=[0])
print(df_count)
add a comment |
up vote
0
down vote
This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.
import pandas as pd
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
dict_count = {}
for v in df['values'].unique():
dict_count[v] = 0
curr_val = df.iloc[0]['values']
count = 1
for i in range(1, len(df)):
if df.iloc[i]['values'] == curr_val:
count += 1
else:
if count > dict_count[curr_val]:
dict_count[curr_val] = count
curr_val = df.iloc[i]['values']
count = 1
if count > dict_count[curr_val]:
dict_count[curr_val] = count
df_count = pd.DataFrame(dict_count, index=[0])
print(df_count)
add a comment |
up vote
0
down vote
up vote
0
down vote
This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.
import pandas as pd
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
dict_count = {}
for v in df['values'].unique():
dict_count[v] = 0
curr_val = df.iloc[0]['values']
count = 1
for i in range(1, len(df)):
if df.iloc[i]['values'] == curr_val:
count += 1
else:
if count > dict_count[curr_val]:
dict_count[curr_val] = count
curr_val = df.iloc[i]['values']
count = 1
if count > dict_count[curr_val]:
dict_count[curr_val] = count
df_count = pd.DataFrame(dict_count, index=[0])
print(df_count)
This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.
import pandas as pd
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
dict_count = {}
for v in df['values'].unique():
dict_count[v] = 0
curr_val = df.iloc[0]['values']
count = 1
for i in range(1, len(df)):
if df.iloc[i]['values'] == curr_val:
count += 1
else:
if count > dict_count[curr_val]:
dict_count[curr_val] = count
curr_val = df.iloc[i]['values']
count = 1
if count > dict_count[curr_val]:
dict_count[curr_val] = count
df_count = pd.DataFrame(dict_count, index=[0])
print(df_count)
answered Nov 30 at 19:22
UBears
111111
111111
add a comment |
add a comment |
up vote
0
down vote
The function groupby
in itertools
can help you, for str
:
>>> string = 'aabbaacc'
>>> for char, freq in groupby('aabbaacc'):
>>> print(char, len(list(freq)), sep=':', end='n')
[out]:
a:2
b:2
a:2
c:2
This function also works for list
:
>>> df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
>>> for char, freq in groupby(df['values'].tolist()):
>>> print(char, len(list(freq)), sep=':', end='n')
[out]:
10:2
23:2
9:3
10:4
12:1
Note
: for df
, you always use this way like df['values'] to take 'values' column, because DataFrame have a attribute values
add a comment |
up vote
0
down vote
The function groupby
in itertools
can help you, for str
:
>>> string = 'aabbaacc'
>>> for char, freq in groupby('aabbaacc'):
>>> print(char, len(list(freq)), sep=':', end='n')
[out]:
a:2
b:2
a:2
c:2
This function also works for list
:
>>> df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
>>> for char, freq in groupby(df['values'].tolist()):
>>> print(char, len(list(freq)), sep=':', end='n')
[out]:
10:2
23:2
9:3
10:4
12:1
Note
: for df
, you always use this way like df['values'] to take 'values' column, because DataFrame have a attribute values
add a comment |
up vote
0
down vote
up vote
0
down vote
The function groupby
in itertools
can help you, for str
:
>>> string = 'aabbaacc'
>>> for char, freq in groupby('aabbaacc'):
>>> print(char, len(list(freq)), sep=':', end='n')
[out]:
a:2
b:2
a:2
c:2
This function also works for list
:
>>> df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
>>> for char, freq in groupby(df['values'].tolist()):
>>> print(char, len(list(freq)), sep=':', end='n')
[out]:
10:2
23:2
9:3
10:4
12:1
Note
: for df
, you always use this way like df['values'] to take 'values' column, because DataFrame have a attribute values
The function groupby
in itertools
can help you, for str
:
>>> string = 'aabbaacc'
>>> for char, freq in groupby('aabbaacc'):
>>> print(char, len(list(freq)), sep=':', end='n')
[out]:
a:2
b:2
a:2
c:2
This function also works for list
:
>>> df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
>>> for char, freq in groupby(df['values'].tolist()):
>>> print(char, len(list(freq)), sep=':', end='n')
[out]:
10:2
23:2
9:3
10:4
12:1
Note
: for df
, you always use this way like df['values'] to take 'values' column, because DataFrame have a attribute values
answered Dec 7 at 2:58
TimeSeam
1815
1815
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53542668%2fcount-appearances-of-a-value-until-it-changes-to-another-value%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
Nov 29 at 21:36