library(tidyverse)

Here is a very typical data format for community ecologists: a matrix with rows= species, columns= sites, and entries = abundances:

comDat <- tribble(
  ~Species, ~Site1, ~Site2, ~Site3, ~Site4,
  #----------------------------------------
  "SpeciesA", 1,0,0,2,
  "SpeciesB", 3,0,0,0,
  "SpeciesC", 0,2,1,0,
  "SpeciesD", 0,0,0,1
  #----------------------------------------
)
  print(comDat)
## # A tibble: 4 x 5
##   Species  Site1 Site2 Site3 Site4
##   <chr>    <dbl> <dbl> <dbl> <dbl>
## 1 SpeciesA     1     0     0     2
## 2 SpeciesB     3     0     0     0
## 3 SpeciesC     0     2     1     0
## 4 SpeciesD     0     0     0     1

Using dplyr or the older melt and recast functions, we can convert this to the long format:

longDat <- comDat %>%
  gather(Site1:Site4, key="Site",value="Abundance")
print(longDat)
## # A tibble: 16 x 3
##    Species  Site  Abundance
##    <chr>    <chr>     <dbl>
##  1 SpeciesA Site1         1
##  2 SpeciesB Site1         3
##  3 SpeciesC Site1         0
##  4 SpeciesD Site1         0
##  5 SpeciesA Site2         0
##  6 SpeciesB Site2         0
##  7 SpeciesC Site2         2
##  8 SpeciesD Site2         0
##  9 SpeciesA Site3         0
## 10 SpeciesB Site3         0
## 11 SpeciesC Site3         1
## 12 SpeciesD Site3         0
## 13 SpeciesA Site4         2
## 14 SpeciesB Site4         0
## 15 SpeciesC Site4         0
## 16 SpeciesD Site4         1

That’s fine, and it is the transformation you need if you want to start grouping and summarizing by species or by site.

But suppose you want to set this up as a tibble to add data columns that give information for each individual organism? Examples might be sequence data or measures of body mass or other individual traits. In this case, we would like the data to look like this:

Species Site
SpeciesA Site1
SpeciesB Site1
SpeciesB Site1
SpeciesB Site1
SpeciesC Site2
SpeciesC Site2
SpeciesC Site3
SpeciesA Site4
SpeciesA Site4
SpeciesD Site4

In this way, there would be a row for each individual collected in the study, with columns indicating the species identity and the site. You could then add the other columns of information you collect on these individuals.

But how can we do this using dplyr or any other tools? Yes, I could hack my way to this by stripping out 0 rows, and then copying and binding rows multiple times, but that seems ugly.

Any ideas??

Here is a first try:

# first set up an empty data frame with no rows
specimenData <- data.frame(ID=numeric(),Species=character(),Site=character(),stringsAsFactors=FALSE)
rowNum <- 1

for(i in seq_len(nrow(longDat))) {
 for(j in seq_len(longDat$Abundance[i])) {
 specimenData <- rbind(specimenData,list(
                       ID=rowNum,
                       Species=longDat$Species[i],
                       Site=longDat$Site[i]))
 rowNum <- rowNum + 1
 }
}

print(specimenData)
##    ID  Species  Site
## 1   1 SpeciesA Site1
## 2   2     <NA> Site1
## 3   3     <NA> Site1
## 4   4     <NA> Site1
## 5   5     <NA>  <NA>
## 6   6     <NA>  <NA>
## 7   7     <NA>  <NA>
## 8   8 SpeciesA  <NA>
## 9   9 SpeciesA  <NA>
## 10 10     <NA>  <NA>