Count data divided by year and by region in R
up vote
8
down vote
favorite
I have a very large (too big to open in Excel) biological dataset that looks something like this
year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990,
1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
'C', 'C', 'C', 'A')
region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
3, 2, 2, 1, 1, 1)
df <- data.frame(year, species, region)
df
year species region
1 1990 A 1
2 1980 A 1
3 1985 B 1
4 1980 B 3
5 1990 B 2
6 1990 C 3
7 1980 C 3
8 1985 C 2
9 1985 A 1
10 1990 A 1
11 1980 A 3
12 1985 B 3
13 1980 B 3
14 1990 B 2
15 1990 C 2
16 1980 C 1
17 1985 C 1
18 1985 A 1
19 1990 A 1
20 1980 A 3
21 1985 B 3
22 1980 B 3
23 1990 B 2
24 1990 C 2
25 1980 C 1
26 1985 C 1
27 1985 A 1
What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).
I'm looking to end up with a dataset that looks something along the lines of this,
region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
1 1 0 0 0 0 0 0 0 0 0
2 2 1 1 1 1 1 1 1 1 1
3 3 2 2 2 2 2 2 2 2 2
such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread
function in conjunction with the group_by
dplyr function, but I couldn't get it to do anything close to what I want.
Does anyone have any suggestions?
r grouping tidyverse data-management
add a comment |
up vote
8
down vote
favorite
I have a very large (too big to open in Excel) biological dataset that looks something like this
year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990,
1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
'C', 'C', 'C', 'A')
region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
3, 2, 2, 1, 1, 1)
df <- data.frame(year, species, region)
df
year species region
1 1990 A 1
2 1980 A 1
3 1985 B 1
4 1980 B 3
5 1990 B 2
6 1990 C 3
7 1980 C 3
8 1985 C 2
9 1985 A 1
10 1990 A 1
11 1980 A 3
12 1985 B 3
13 1980 B 3
14 1990 B 2
15 1990 C 2
16 1980 C 1
17 1985 C 1
18 1985 A 1
19 1990 A 1
20 1980 A 3
21 1985 B 3
22 1980 B 3
23 1990 B 2
24 1990 C 2
25 1980 C 1
26 1985 C 1
27 1985 A 1
What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).
I'm looking to end up with a dataset that looks something along the lines of this,
region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
1 1 0 0 0 0 0 0 0 0 0
2 2 1 1 1 1 1 1 1 1 1
3 3 2 2 2 2 2 2 2 2 2
such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread
function in conjunction with the group_by
dplyr function, but I couldn't get it to do anything close to what I want.
Does anyone have any suggestions?
r grouping tidyverse data-management
add a comment |
up vote
8
down vote
favorite
up vote
8
down vote
favorite
I have a very large (too big to open in Excel) biological dataset that looks something like this
year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990,
1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
'C', 'C', 'C', 'A')
region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
3, 2, 2, 1, 1, 1)
df <- data.frame(year, species, region)
df
year species region
1 1990 A 1
2 1980 A 1
3 1985 B 1
4 1980 B 3
5 1990 B 2
6 1990 C 3
7 1980 C 3
8 1985 C 2
9 1985 A 1
10 1990 A 1
11 1980 A 3
12 1985 B 3
13 1980 B 3
14 1990 B 2
15 1990 C 2
16 1980 C 1
17 1985 C 1
18 1985 A 1
19 1990 A 1
20 1980 A 3
21 1985 B 3
22 1980 B 3
23 1990 B 2
24 1990 C 2
25 1980 C 1
26 1985 C 1
27 1985 A 1
What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).
I'm looking to end up with a dataset that looks something along the lines of this,
region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
1 1 0 0 0 0 0 0 0 0 0
2 2 1 1 1 1 1 1 1 1 1
3 3 2 2 2 2 2 2 2 2 2
such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread
function in conjunction with the group_by
dplyr function, but I couldn't get it to do anything close to what I want.
Does anyone have any suggestions?
r grouping tidyverse data-management
I have a very large (too big to open in Excel) biological dataset that looks something like this
year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990,
1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
'C', 'C', 'C', 'A')
region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
3, 2, 2, 1, 1, 1)
df <- data.frame(year, species, region)
df
year species region
1 1990 A 1
2 1980 A 1
3 1985 B 1
4 1980 B 3
5 1990 B 2
6 1990 C 3
7 1980 C 3
8 1985 C 2
9 1985 A 1
10 1990 A 1
11 1980 A 3
12 1985 B 3
13 1980 B 3
14 1990 B 2
15 1990 C 2
16 1980 C 1
17 1985 C 1
18 1985 A 1
19 1990 A 1
20 1980 A 3
21 1985 B 3
22 1980 B 3
23 1990 B 2
24 1990 C 2
25 1980 C 1
26 1985 C 1
27 1985 A 1
What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).
I'm looking to end up with a dataset that looks something along the lines of this,
region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
1 1 0 0 0 0 0 0 0 0 0
2 2 1 1 1 1 1 1 1 1 1
3 3 2 2 2 2 2 2 2 2 2
such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread
function in conjunction with the group_by
dplyr function, but I couldn't get it to do anything close to what I want.
Does anyone have any suggestions?
r grouping tidyverse data-management
r grouping tidyverse data-management
edited Nov 18 at 0:54
m0nhawk
14.9k83160
14.9k83160
asked Nov 18 at 0:34
cb14
434
434
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
10
down vote
accepted
Something like this?
library(dplyr)
df2 <- df %>%
mutate(sp_year = paste(species, year, sep = "_")) %>%
group_by(region) %>%
count(sp_year) %>%
spread(sp_year,n)
df2
Which gives this:
# A tibble: 3 x 10
# Groups: region [3]
region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 3 3 NA 1 NA 2 2 NA
2 2 NA NA NA NA NA 3 NA 1 2
3 3 2 NA NA 3 2 NA 1 NA 1
1
also possible to use?tidyr::unite
instead ofmutate(paste)
. Would be less verbose at the very least.
– Shree
Nov 18 at 1:36
add a comment |
up vote
5
down vote
Similar to wl1234's answer but more concise. We can use unite
to combine columns. We can also use count
without group_by
the variable. Finally, we can set fill = 0
in the spread
function to replace NA
with 0.
library(tidyverse)
df2 <- df %>%
unite(sp_year, species, year, sep = "_") %>%
count(sp_year, region) %>%
spread(sp_year, n, fill = 0)
df2
# # A tibble: 3 x 10
# region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 3 3 0 1 0 2 2 0
# 2 2 0 0 0 0 0 3 0 1 2
# 3 3 2 0 0 3 2 0 1 0 1
1
This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53
I didn't know aboutunite
. I will use that instead ofpaste
next time.
– wl1234
Nov 18 at 3:45
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
10
down vote
accepted
Something like this?
library(dplyr)
df2 <- df %>%
mutate(sp_year = paste(species, year, sep = "_")) %>%
group_by(region) %>%
count(sp_year) %>%
spread(sp_year,n)
df2
Which gives this:
# A tibble: 3 x 10
# Groups: region [3]
region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 3 3 NA 1 NA 2 2 NA
2 2 NA NA NA NA NA 3 NA 1 2
3 3 2 NA NA 3 2 NA 1 NA 1
1
also possible to use?tidyr::unite
instead ofmutate(paste)
. Would be less verbose at the very least.
– Shree
Nov 18 at 1:36
add a comment |
up vote
10
down vote
accepted
Something like this?
library(dplyr)
df2 <- df %>%
mutate(sp_year = paste(species, year, sep = "_")) %>%
group_by(region) %>%
count(sp_year) %>%
spread(sp_year,n)
df2
Which gives this:
# A tibble: 3 x 10
# Groups: region [3]
region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 3 3 NA 1 NA 2 2 NA
2 2 NA NA NA NA NA 3 NA 1 2
3 3 2 NA NA 3 2 NA 1 NA 1
1
also possible to use?tidyr::unite
instead ofmutate(paste)
. Would be less verbose at the very least.
– Shree
Nov 18 at 1:36
add a comment |
up vote
10
down vote
accepted
up vote
10
down vote
accepted
Something like this?
library(dplyr)
df2 <- df %>%
mutate(sp_year = paste(species, year, sep = "_")) %>%
group_by(region) %>%
count(sp_year) %>%
spread(sp_year,n)
df2
Which gives this:
# A tibble: 3 x 10
# Groups: region [3]
region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 3 3 NA 1 NA 2 2 NA
2 2 NA NA NA NA NA 3 NA 1 2
3 3 2 NA NA 3 2 NA 1 NA 1
Something like this?
library(dplyr)
df2 <- df %>%
mutate(sp_year = paste(species, year, sep = "_")) %>%
group_by(region) %>%
count(sp_year) %>%
spread(sp_year,n)
df2
Which gives this:
# A tibble: 3 x 10
# Groups: region [3]
region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 3 3 NA 1 NA 2 2 NA
2 2 NA NA NA NA NA 3 NA 1 2
3 3 2 NA NA 3 2 NA 1 NA 1
answered Nov 18 at 0:59
wl1234
183211
183211
1
also possible to use?tidyr::unite
instead ofmutate(paste)
. Would be less verbose at the very least.
– Shree
Nov 18 at 1:36
add a comment |
1
also possible to use?tidyr::unite
instead ofmutate(paste)
. Would be less verbose at the very least.
– Shree
Nov 18 at 1:36
1
1
also possible to use
?tidyr::unite
instead of mutate(paste)
. Would be less verbose at the very least.– Shree
Nov 18 at 1:36
also possible to use
?tidyr::unite
instead of mutate(paste)
. Would be less verbose at the very least.– Shree
Nov 18 at 1:36
add a comment |
up vote
5
down vote
Similar to wl1234's answer but more concise. We can use unite
to combine columns. We can also use count
without group_by
the variable. Finally, we can set fill = 0
in the spread
function to replace NA
with 0.
library(tidyverse)
df2 <- df %>%
unite(sp_year, species, year, sep = "_") %>%
count(sp_year, region) %>%
spread(sp_year, n, fill = 0)
df2
# # A tibble: 3 x 10
# region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 3 3 0 1 0 2 2 0
# 2 2 0 0 0 0 0 3 0 1 2
# 3 3 2 0 0 3 2 0 1 0 1
1
This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53
I didn't know aboutunite
. I will use that instead ofpaste
next time.
– wl1234
Nov 18 at 3:45
add a comment |
up vote
5
down vote
Similar to wl1234's answer but more concise. We can use unite
to combine columns. We can also use count
without group_by
the variable. Finally, we can set fill = 0
in the spread
function to replace NA
with 0.
library(tidyverse)
df2 <- df %>%
unite(sp_year, species, year, sep = "_") %>%
count(sp_year, region) %>%
spread(sp_year, n, fill = 0)
df2
# # A tibble: 3 x 10
# region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 3 3 0 1 0 2 2 0
# 2 2 0 0 0 0 0 3 0 1 2
# 3 3 2 0 0 3 2 0 1 0 1
1
This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53
I didn't know aboutunite
. I will use that instead ofpaste
next time.
– wl1234
Nov 18 at 3:45
add a comment |
up vote
5
down vote
up vote
5
down vote
Similar to wl1234's answer but more concise. We can use unite
to combine columns. We can also use count
without group_by
the variable. Finally, we can set fill = 0
in the spread
function to replace NA
with 0.
library(tidyverse)
df2 <- df %>%
unite(sp_year, species, year, sep = "_") %>%
count(sp_year, region) %>%
spread(sp_year, n, fill = 0)
df2
# # A tibble: 3 x 10
# region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 3 3 0 1 0 2 2 0
# 2 2 0 0 0 0 0 3 0 1 2
# 3 3 2 0 0 3 2 0 1 0 1
Similar to wl1234's answer but more concise. We can use unite
to combine columns. We can also use count
without group_by
the variable. Finally, we can set fill = 0
in the spread
function to replace NA
with 0.
library(tidyverse)
df2 <- df %>%
unite(sp_year, species, year, sep = "_") %>%
count(sp_year, region) %>%
spread(sp_year, n, fill = 0)
df2
# # A tibble: 3 x 10
# region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 3 3 0 1 0 2 2 0
# 2 2 0 0 0 0 0 3 0 1 2
# 3 3 2 0 0 3 2 0 1 0 1
edited Nov 18 at 1:40
answered Nov 18 at 1:35
www
25.5k102240
25.5k102240
1
This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53
I didn't know aboutunite
. I will use that instead ofpaste
next time.
– wl1234
Nov 18 at 3:45
add a comment |
1
This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53
I didn't know aboutunite
. I will use that instead ofpaste
next time.
– wl1234
Nov 18 at 3:45
1
1
This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53
This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53
I didn't know about
unite
. I will use that instead of paste
next time.– wl1234
Nov 18 at 3:45
I didn't know about
unite
. I will use that instead of paste
next time.– wl1234
Nov 18 at 3:45
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53356871%2fcount-data-divided-by-year-and-by-region-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown