Groupby class and count missing values in features

I have a problem and I cannot find any solution in the web or documentation, even if I think that it is very trivial.

What do I want to do?

I have a dataframe like this

CLASS FEATURE1 FEATURE2 FEATURE3

  X      A       NaN      NaN

  X     NaN       A       NaN

  B      A        A        A

I want to group by the label(CLASS) and display the number of NaN-Values that are counted in every feature so that it looks like this. The purpose of this is to get a general idea how missing values are distributed over the different classes.

CLASS FEATURE1 FEATURE2 FEATURE3

  X      1        1        2

  B      0        0        0

I know how to recieve the amount of nonnull-Values - df.groupby['CLASS'].count()

Is there something similar for the NaN-Values?

I tried to subtract the count() from the size() but it returned an unformatted output filled with the value NaN

edited Dec 27 '18 at 16:11

coldspeed

120k19119194

asked Dec 27 '18 at 15:15

FelTry2

955

New contributor

add a comment |

I have a problem and I cannot find any solution in the web or documentation, even if I think that it is very trivial.

What do I want to do?

I have a dataframe like this

CLASS FEATURE1 FEATURE2 FEATURE3

  X      A       NaN      NaN

  X     NaN       A       NaN

  B      A        A        A

CLASS FEATURE1 FEATURE2 FEATURE3

  X      1        1        2

  B      0        0        0

I know how to recieve the amount of nonnull-Values - df.groupby['CLASS'].count()

Is there something similar for the NaN-Values?

I tried to subtract the count() from the size() but it returned an unformatted output filled with the value NaN

edited Dec 27 '18 at 16:11

coldspeed

120k19119194

asked Dec 27 '18 at 15:15

FelTry2

955

New contributor

add a comment |

I have a problem and I cannot find any solution in the web or documentation, even if I think that it is very trivial.

What do I want to do?

I have a dataframe like this

CLASS FEATURE1 FEATURE2 FEATURE3

  X      A       NaN      NaN

  X     NaN       A       NaN

  B      A        A        A

CLASS FEATURE1 FEATURE2 FEATURE3

  X      1        1        2

  B      0        0        0

I know how to recieve the amount of nonnull-Values - df.groupby['CLASS'].count()

Is there something similar for the NaN-Values?

I tried to subtract the count() from the size() but it returned an unformatted output filled with the value NaN

edited Dec 27 '18 at 16:11

coldspeed

120k19119194

asked Dec 27 '18 at 15:15

FelTry2

955

New contributor

I have a problem and I cannot find any solution in the web or documentation, even if I think that it is very trivial.

What do I want to do?

I have a dataframe like this

CLASS FEATURE1 FEATURE2 FEATURE3

  X      A       NaN      NaN

  X     NaN       A       NaN

  B      A        A        A

CLASS FEATURE1 FEATURE2 FEATURE3

  X      1        1        2

  B      0        0        0

I know how to recieve the amount of nonnull-Values - df.groupby['CLASS'].count()

Is there something similar for the NaN-Values?

I tried to subtract the count() from the size() but it returned an unformatted output filled with the value NaN

python pandas dataframe group-by nan

edited Dec 27 '18 at 16:11

coldspeed

120k19119194

asked Dec 27 '18 at 15:15

FelTry2

955

New contributor

edited Dec 27 '18 at 16:11

coldspeed

120k19119194

asked Dec 27 '18 at 15:15

FelTry2

955

New contributor

edited Dec 27 '18 at 16:11

coldspeed

120k19119194

edited Dec 27 '18 at 16:11

coldspeed

120k19119194

edited Dec 27 '18 at 16:11

coldspeed

120k19119194

asked Dec 27 '18 at 15:15

FelTry2

955

New contributor

asked Dec 27 '18 at 15:15

FelTry2

955

asked Dec 27 '18 at 15:15

FelTry2

955

New contributor

FelTry2 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

3 Answers
3

active

oldest

votes

Compute a mask with isna, then group and find the sum:

df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()



  CLASS  FEATURE1  FEATURE2  FEATURE3

0     X       1.0       1.0       2.0

1     B       0.0       0.0       0.0

Another option is to subtract the size from the count using rsub along the 0^th axis for index aligned subtraction:

df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)

Or,

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

There are quite a few good answers, so here are some timeits for your perusal:

df_ = df

df = pd.concat([df_] * 10000)



%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()

%timeit df.set_index('CLASS').isna().sum(level=0)    

%%timeit

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)



11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Actual performance depends on your data and setup, so your mileage may vary.

edited Dec 27 '18 at 15:29

answered Dec 27 '18 at 15:16

coldspeed

120k19119194

2

Amazing - thank you very, very much!
– FelTry2
Dec 27 '18 at 15:37

add a comment |

You can use set_index and sum:

df.set_index('CLASS').isna().sum(level=0)

Output:

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

X           1.0       1.0       2.0

B           0.0       0.0       0.0

answered Dec 27 '18 at 15:18

Scott Boston

51.3k72955

add a comment |

Using the diff between count and size

g=df.groupby('CLASS')



-g.count().sub(g.size(),0)



          FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

And we can transform this question to the more generic question how to count how many NaN in dataframe with for loop

pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)

Out[468]: 

   FEATURE1  FEATURE2  FEATURE3

B         0         0         0

X         1         1         2

edited Dec 27 '18 at 16:25

answered Dec 27 '18 at 15:19

W-B

101k73163

2

That's a good one. +1
– Scott Boston
Dec 27 '18 at 15:26

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

FelTry2 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53947196%2fgroupby-class-and-count-missing-values-in-features%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Compute a mask with isna, then group and find the sum:

df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()



  CLASS  FEATURE1  FEATURE2  FEATURE3

0     X       1.0       1.0       2.0

1     B       0.0       0.0       0.0

Another option is to subtract the size from the count using rsub along the 0^th axis for index aligned subtraction:

df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)

Or,

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

There are quite a few good answers, so here are some timeits for your perusal:

df_ = df

df = pd.concat([df_] * 10000)



%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()

%timeit df.set_index('CLASS').isna().sum(level=0)    

%%timeit

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)



11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Actual performance depends on your data and setup, so your mileage may vary.

edited Dec 27 '18 at 15:29

answered Dec 27 '18 at 15:16

coldspeed

120k19119194

2

Amazing - thank you very, very much!
– FelTry2
Dec 27 '18 at 15:37

add a comment |

Compute a mask with isna, then group and find the sum:

df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()



  CLASS  FEATURE1  FEATURE2  FEATURE3

0     X       1.0       1.0       2.0

1     B       0.0       0.0       0.0

Another option is to subtract the size from the count using rsub along the 0^th axis for index aligned subtraction:

df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)

Or,

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

There are quite a few good answers, so here are some timeits for your perusal:

df_ = df

df = pd.concat([df_] * 10000)



%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()

%timeit df.set_index('CLASS').isna().sum(level=0)    

%%timeit

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)



11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Actual performance depends on your data and setup, so your mileage may vary.

edited Dec 27 '18 at 15:29

answered Dec 27 '18 at 15:16

coldspeed

120k19119194

2

Amazing - thank you very, very much!
– FelTry2
Dec 27 '18 at 15:37

add a comment |

Compute a mask with isna, then group and find the sum:

df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()



  CLASS  FEATURE1  FEATURE2  FEATURE3

0     X       1.0       1.0       2.0

1     B       0.0       0.0       0.0

Another option is to subtract the size from the count using rsub along the 0^th axis for index aligned subtraction:

df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)

Or,

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

There are quite a few good answers, so here are some timeits for your perusal:

df_ = df

df = pd.concat([df_] * 10000)



%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()

%timeit df.set_index('CLASS').isna().sum(level=0)    

%%timeit

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)



11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Actual performance depends on your data and setup, so your mileage may vary.

edited Dec 27 '18 at 15:29

answered Dec 27 '18 at 15:16

coldspeed

120k19119194

Compute a mask with isna, then group and find the sum:

df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()



  CLASS  FEATURE1  FEATURE2  FEATURE3

0     X       1.0       1.0       2.0

1     B       0.0       0.0       0.0

Another option is to subtract the size from the count using rsub along the 0^th axis for index aligned subtraction:

df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)

Or,

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

There are quite a few good answers, so here are some timeits for your perusal:

df_ = df

df = pd.concat([df_] * 10000)



%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()

%timeit df.set_index('CLASS').isna().sum(level=0)    

%%timeit

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)



11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Actual performance depends on your data and setup, so your mileage may vary.

edited Dec 27 '18 at 15:29

answered Dec 27 '18 at 15:16

coldspeed

120k19119194

edited Dec 27 '18 at 15:29

answered Dec 27 '18 at 15:16

coldspeed

120k19119194

answered Dec 27 '18 at 15:16

coldspeed

120k19119194

answered Dec 27 '18 at 15:16

coldspeed

120k19119194

2

Amazing - thank you very, very much!
– FelTry2
Dec 27 '18 at 15:37

add a comment |

2

Amazing - thank you very, very much!
– FelTry2
Dec 27 '18 at 15:37

Amazing - thank you very, very much!
– FelTry2
Dec 27 '18 at 15:37

add a comment |

You can use set_index and sum:

df.set_index('CLASS').isna().sum(level=0)

Output:

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

X           1.0       1.0       2.0

B           0.0       0.0       0.0

answered Dec 27 '18 at 15:18

Scott Boston

51.3k72955

add a comment |

You can use set_index and sum:

df.set_index('CLASS').isna().sum(level=0)

Output:

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

X           1.0       1.0       2.0

B           0.0       0.0       0.0

answered Dec 27 '18 at 15:18

Scott Boston

51.3k72955

add a comment |

You can use set_index and sum:

df.set_index('CLASS').isna().sum(level=0)

Output:

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

X           1.0       1.0       2.0

B           0.0       0.0       0.0

answered Dec 27 '18 at 15:18

Scott Boston

51.3k72955

You can use set_index and sum:

df.set_index('CLASS').isna().sum(level=0)

Output:

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

X           1.0       1.0       2.0

B           0.0       0.0       0.0

answered Dec 27 '18 at 15:18

Scott Boston

51.3k72955

answered Dec 27 '18 at 15:18

Scott Boston

51.3k72955

answered Dec 27 '18 at 15:18

Scott Boston

51.3k72955

answered Dec 27 '18 at 15:18

Scott Boston

51.3k72955

add a comment |

Using the diff between count and size

g=df.groupby('CLASS')



-g.count().sub(g.size(),0)



          FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

And we can transform this question to the more generic question how to count how many NaN in dataframe with for loop

pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)

Out[468]: 

   FEATURE1  FEATURE2  FEATURE3

B         0         0         0

X         1         1         2

edited Dec 27 '18 at 16:25

answered Dec 27 '18 at 15:19

W-B

101k73163

2

That's a good one. +1
– Scott Boston
Dec 27 '18 at 15:26

add a comment |

Using the diff between count and size

g=df.groupby('CLASS')



-g.count().sub(g.size(),0)



          FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

And we can transform this question to the more generic question how to count how many NaN in dataframe with for loop

pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)

Out[468]: 

   FEATURE1  FEATURE2  FEATURE3

B         0         0         0

X         1         1         2

edited Dec 27 '18 at 16:25

answered Dec 27 '18 at 15:19

W-B

101k73163

2

That's a good one. +1
– Scott Boston
Dec 27 '18 at 15:26

add a comment |

Using the diff between count and size

g=df.groupby('CLASS')



-g.count().sub(g.size(),0)



          FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

And we can transform this question to the more generic question how to count how many NaN in dataframe with for loop

pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)

Out[468]: 

   FEATURE1  FEATURE2  FEATURE3

B         0         0         0

X         1         1         2

edited Dec 27 '18 at 16:25

answered Dec 27 '18 at 15:19

W-B

101k73163

Using the diff between count and size

g=df.groupby('CLASS')



-g.count().sub(g.size(),0)



          FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

And we can transform this question to the more generic question how to count how many NaN in dataframe with for loop

pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)

Out[468]: 

   FEATURE1  FEATURE2  FEATURE3

B         0         0         0

X         1         1         2

edited Dec 27 '18 at 16:25

answered Dec 27 '18 at 15:19

W-B

101k73163

edited Dec 27 '18 at 16:25

answered Dec 27 '18 at 15:19

W-B

101k73163

answered Dec 27 '18 at 15:19

W-B

101k73163

answered Dec 27 '18 at 15:19

W-B

101k73163

2

That's a good one. +1
– Scott Boston
Dec 27 '18 at 15:26

add a comment |

2

That's a good one. +1
– Scott Boston
Dec 27 '18 at 15:26

That's a good one. +1
– Scott Boston
Dec 27 '18 at 15:26

add a comment |

FelTry2 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

FelTry2 is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfrktyl