Count data divided by year and by region in R











up vote
8
down vote

favorite












I have a very large (too big to open in Excel) biological dataset that looks something like this



    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 
1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
'C', 'C', 'C', 'A')
region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
3, 2, 2, 1, 1, 1)
df <- data.frame(year, species, region)

df
year species region
1 1990 A 1
2 1980 A 1
3 1985 B 1
4 1980 B 3
5 1990 B 2
6 1990 C 3
7 1980 C 3
8 1985 C 2
9 1985 A 1
10 1990 A 1
11 1980 A 3
12 1985 B 3
13 1980 B 3
14 1990 B 2
15 1990 C 2
16 1980 C 1
17 1985 C 1
18 1985 A 1
19 1990 A 1
20 1980 A 3
21 1985 B 3
22 1980 B 3
23 1990 B 2
24 1990 C 2
25 1980 C 1
26 1985 C 1
27 1985 A 1


What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).



I'm looking to end up with a dataset that looks something along the lines of this,



      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
1 1 0 0 0 0 0 0 0 0 0
2 2 1 1 1 1 1 1 1 1 1
3 3 2 2 2 2 2 2 2 2 2


such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread function in conjunction with the group_by dplyr function, but I couldn't get it to do anything close to what I want.



Does anyone have any suggestions?










share|improve this question




























    up vote
    8
    down vote

    favorite












    I have a very large (too big to open in Excel) biological dataset that looks something like this



        year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 
    1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
    1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
    species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
    'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
    'C', 'C', 'C', 'A')
    region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
    3, 2, 2, 1, 1, 1)
    df <- data.frame(year, species, region)

    df
    year species region
    1 1990 A 1
    2 1980 A 1
    3 1985 B 1
    4 1980 B 3
    5 1990 B 2
    6 1990 C 3
    7 1980 C 3
    8 1985 C 2
    9 1985 A 1
    10 1990 A 1
    11 1980 A 3
    12 1985 B 3
    13 1980 B 3
    14 1990 B 2
    15 1990 C 2
    16 1980 C 1
    17 1985 C 1
    18 1985 A 1
    19 1990 A 1
    20 1980 A 3
    21 1985 B 3
    22 1980 B 3
    23 1990 B 2
    24 1990 C 2
    25 1980 C 1
    26 1985 C 1
    27 1985 A 1


    What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).



    I'm looking to end up with a dataset that looks something along the lines of this,



          region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
    1 1 0 0 0 0 0 0 0 0 0
    2 2 1 1 1 1 1 1 1 1 1
    3 3 2 2 2 2 2 2 2 2 2


    such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread function in conjunction with the group_by dplyr function, but I couldn't get it to do anything close to what I want.



    Does anyone have any suggestions?










    share|improve this question


























      up vote
      8
      down vote

      favorite









      up vote
      8
      down vote

      favorite











      I have a very large (too big to open in Excel) biological dataset that looks something like this



          year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 
      1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
      1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
      species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
      'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
      'C', 'C', 'C', 'A')
      region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
      3, 2, 2, 1, 1, 1)
      df <- data.frame(year, species, region)

      df
      year species region
      1 1990 A 1
      2 1980 A 1
      3 1985 B 1
      4 1980 B 3
      5 1990 B 2
      6 1990 C 3
      7 1980 C 3
      8 1985 C 2
      9 1985 A 1
      10 1990 A 1
      11 1980 A 3
      12 1985 B 3
      13 1980 B 3
      14 1990 B 2
      15 1990 C 2
      16 1980 C 1
      17 1985 C 1
      18 1985 A 1
      19 1990 A 1
      20 1980 A 3
      21 1985 B 3
      22 1980 B 3
      23 1990 B 2
      24 1990 C 2
      25 1980 C 1
      26 1985 C 1
      27 1985 A 1


      What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).



      I'm looking to end up with a dataset that looks something along the lines of this,



            region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
      1 1 0 0 0 0 0 0 0 0 0
      2 2 1 1 1 1 1 1 1 1 1
      3 3 2 2 2 2 2 2 2 2 2


      such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread function in conjunction with the group_by dplyr function, but I couldn't get it to do anything close to what I want.



      Does anyone have any suggestions?










      share|improve this question















      I have a very large (too big to open in Excel) biological dataset that looks something like this



          year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 
      1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
      1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
      species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
      'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
      'C', 'C', 'C', 'A')
      region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
      3, 2, 2, 1, 1, 1)
      df <- data.frame(year, species, region)

      df
      year species region
      1 1990 A 1
      2 1980 A 1
      3 1985 B 1
      4 1980 B 3
      5 1990 B 2
      6 1990 C 3
      7 1980 C 3
      8 1985 C 2
      9 1985 A 1
      10 1990 A 1
      11 1980 A 3
      12 1985 B 3
      13 1980 B 3
      14 1990 B 2
      15 1990 C 2
      16 1980 C 1
      17 1985 C 1
      18 1985 A 1
      19 1990 A 1
      20 1980 A 3
      21 1985 B 3
      22 1980 B 3
      23 1990 B 2
      24 1990 C 2
      25 1980 C 1
      26 1985 C 1
      27 1985 A 1


      What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).



      I'm looking to end up with a dataset that looks something along the lines of this,



            region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
      1 1 0 0 0 0 0 0 0 0 0
      2 2 1 1 1 1 1 1 1 1 1
      3 3 2 2 2 2 2 2 2 2 2


      such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread function in conjunction with the group_by dplyr function, but I couldn't get it to do anything close to what I want.



      Does anyone have any suggestions?







      r grouping tidyverse data-management






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 18 at 0:54









      m0nhawk

      14.9k83160




      14.9k83160










      asked Nov 18 at 0:34









      cb14

      434




      434
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          10
          down vote



          accepted










          Something like this?



          library(dplyr)

          df2 <- df %>%
          mutate(sp_year = paste(species, year, sep = "_")) %>%
          group_by(region) %>%
          count(sp_year) %>%
          spread(sp_year,n)

          df2


          Which gives this:



          # A tibble: 3 x 10
          # Groups: region [3]
          region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
          1 1 1 3 3 NA 1 NA 2 2 NA
          2 2 NA NA NA NA NA 3 NA 1 2
          3 3 2 NA NA 3 2 NA 1 NA 1





          share|improve this answer

















          • 1




            also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
            – Shree
            Nov 18 at 1:36


















          up vote
          5
          down vote













          Similar to wl1234's answer but more concise. We can use unite to combine columns. We can also use count without group_by the variable. Finally, we can set fill = 0 in the spread function to replace NA with 0.



          library(tidyverse)

          df2 <- df %>%
          unite(sp_year, species, year, sep = "_") %>%
          count(sp_year, region) %>%
          spread(sp_year, n, fill = 0)
          df2
          # # A tibble: 3 x 10
          # region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # 1 1 1 3 3 0 1 0 2 2 0
          # 2 2 0 0 0 0 0 3 0 1 2
          # 3 3 2 0 0 3 2 0 1 0 1





          share|improve this answer



















          • 1




            This is awesome, and I love the NA => 0 addition as well! Thank you!
            – cb14
            Nov 18 at 1:53












          • I didn't know about unite. I will use that instead of paste next time.
            – wl1234
            Nov 18 at 3:45











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53356871%2fcount-data-divided-by-year-and-by-region-in-r%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          10
          down vote



          accepted










          Something like this?



          library(dplyr)

          df2 <- df %>%
          mutate(sp_year = paste(species, year, sep = "_")) %>%
          group_by(region) %>%
          count(sp_year) %>%
          spread(sp_year,n)

          df2


          Which gives this:



          # A tibble: 3 x 10
          # Groups: region [3]
          region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
          1 1 1 3 3 NA 1 NA 2 2 NA
          2 2 NA NA NA NA NA 3 NA 1 2
          3 3 2 NA NA 3 2 NA 1 NA 1





          share|improve this answer

















          • 1




            also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
            – Shree
            Nov 18 at 1:36















          up vote
          10
          down vote



          accepted










          Something like this?



          library(dplyr)

          df2 <- df %>%
          mutate(sp_year = paste(species, year, sep = "_")) %>%
          group_by(region) %>%
          count(sp_year) %>%
          spread(sp_year,n)

          df2


          Which gives this:



          # A tibble: 3 x 10
          # Groups: region [3]
          region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
          1 1 1 3 3 NA 1 NA 2 2 NA
          2 2 NA NA NA NA NA 3 NA 1 2
          3 3 2 NA NA 3 2 NA 1 NA 1





          share|improve this answer

















          • 1




            also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
            – Shree
            Nov 18 at 1:36













          up vote
          10
          down vote



          accepted







          up vote
          10
          down vote



          accepted






          Something like this?



          library(dplyr)

          df2 <- df %>%
          mutate(sp_year = paste(species, year, sep = "_")) %>%
          group_by(region) %>%
          count(sp_year) %>%
          spread(sp_year,n)

          df2


          Which gives this:



          # A tibble: 3 x 10
          # Groups: region [3]
          region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
          1 1 1 3 3 NA 1 NA 2 2 NA
          2 2 NA NA NA NA NA 3 NA 1 2
          3 3 2 NA NA 3 2 NA 1 NA 1





          share|improve this answer












          Something like this?



          library(dplyr)

          df2 <- df %>%
          mutate(sp_year = paste(species, year, sep = "_")) %>%
          group_by(region) %>%
          count(sp_year) %>%
          spread(sp_year,n)

          df2


          Which gives this:



          # A tibble: 3 x 10
          # Groups: region [3]
          region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
          1 1 1 3 3 NA 1 NA 2 2 NA
          2 2 NA NA NA NA NA 3 NA 1 2
          3 3 2 NA NA 3 2 NA 1 NA 1






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 18 at 0:59









          wl1234

          183211




          183211








          • 1




            also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
            – Shree
            Nov 18 at 1:36














          • 1




            also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
            – Shree
            Nov 18 at 1:36








          1




          1




          also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
          – Shree
          Nov 18 at 1:36




          also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
          – Shree
          Nov 18 at 1:36












          up vote
          5
          down vote













          Similar to wl1234's answer but more concise. We can use unite to combine columns. We can also use count without group_by the variable. Finally, we can set fill = 0 in the spread function to replace NA with 0.



          library(tidyverse)

          df2 <- df %>%
          unite(sp_year, species, year, sep = "_") %>%
          count(sp_year, region) %>%
          spread(sp_year, n, fill = 0)
          df2
          # # A tibble: 3 x 10
          # region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # 1 1 1 3 3 0 1 0 2 2 0
          # 2 2 0 0 0 0 0 3 0 1 2
          # 3 3 2 0 0 3 2 0 1 0 1





          share|improve this answer



















          • 1




            This is awesome, and I love the NA => 0 addition as well! Thank you!
            – cb14
            Nov 18 at 1:53












          • I didn't know about unite. I will use that instead of paste next time.
            – wl1234
            Nov 18 at 3:45















          up vote
          5
          down vote













          Similar to wl1234's answer but more concise. We can use unite to combine columns. We can also use count without group_by the variable. Finally, we can set fill = 0 in the spread function to replace NA with 0.



          library(tidyverse)

          df2 <- df %>%
          unite(sp_year, species, year, sep = "_") %>%
          count(sp_year, region) %>%
          spread(sp_year, n, fill = 0)
          df2
          # # A tibble: 3 x 10
          # region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # 1 1 1 3 3 0 1 0 2 2 0
          # 2 2 0 0 0 0 0 3 0 1 2
          # 3 3 2 0 0 3 2 0 1 0 1





          share|improve this answer



















          • 1




            This is awesome, and I love the NA => 0 addition as well! Thank you!
            – cb14
            Nov 18 at 1:53












          • I didn't know about unite. I will use that instead of paste next time.
            – wl1234
            Nov 18 at 3:45













          up vote
          5
          down vote










          up vote
          5
          down vote









          Similar to wl1234's answer but more concise. We can use unite to combine columns. We can also use count without group_by the variable. Finally, we can set fill = 0 in the spread function to replace NA with 0.



          library(tidyverse)

          df2 <- df %>%
          unite(sp_year, species, year, sep = "_") %>%
          count(sp_year, region) %>%
          spread(sp_year, n, fill = 0)
          df2
          # # A tibble: 3 x 10
          # region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # 1 1 1 3 3 0 1 0 2 2 0
          # 2 2 0 0 0 0 0 3 0 1 2
          # 3 3 2 0 0 3 2 0 1 0 1





          share|improve this answer














          Similar to wl1234's answer but more concise. We can use unite to combine columns. We can also use count without group_by the variable. Finally, we can set fill = 0 in the spread function to replace NA with 0.



          library(tidyverse)

          df2 <- df %>%
          unite(sp_year, species, year, sep = "_") %>%
          count(sp_year, region) %>%
          spread(sp_year, n, fill = 0)
          df2
          # # A tibble: 3 x 10
          # region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
          # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # 1 1 1 3 3 0 1 0 2 2 0
          # 2 2 0 0 0 0 0 3 0 1 2
          # 3 3 2 0 0 3 2 0 1 0 1






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 18 at 1:40

























          answered Nov 18 at 1:35









          www

          25.5k102240




          25.5k102240








          • 1




            This is awesome, and I love the NA => 0 addition as well! Thank you!
            – cb14
            Nov 18 at 1:53












          • I didn't know about unite. I will use that instead of paste next time.
            – wl1234
            Nov 18 at 3:45














          • 1




            This is awesome, and I love the NA => 0 addition as well! Thank you!
            – cb14
            Nov 18 at 1:53












          • I didn't know about unite. I will use that instead of paste next time.
            – wl1234
            Nov 18 at 3:45








          1




          1




          This is awesome, and I love the NA => 0 addition as well! Thank you!
          – cb14
          Nov 18 at 1:53






          This is awesome, and I love the NA => 0 addition as well! Thank you!
          – cb14
          Nov 18 at 1:53














          I didn't know about unite. I will use that instead of paste next time.
          – wl1234
          Nov 18 at 3:45




          I didn't know about unite. I will use that instead of paste next time.
          – wl1234
          Nov 18 at 3:45


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53356871%2fcount-data-divided-by-year-and-by-region-in-r%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Сан-Квентин

          Алькесар

          Josef Freinademetz