22  R Reordering Exercise

22.1 Load Libraries

library(tidyverse)
library(tidylog)

22.2 Create some example data

Here we set up a data dictionary dd and some corresponding data ds. However, it is better if the order of the rows in the data dictionary dd match the order of the columns in the data ds.

set.seed(1562345)
# Set up a data dictionary
dd <- data.frame(VARNAME = sample(letters, 26), TYPE = "numeric")
# Set up data
ds <- as.data.frame(t(dd %>%
    arrange(VARNAME)))
names(ds) <- letters
rownames(ds) <- NULL
ds[1, ] <- rnorm(26)
ds[2, ] <- runif(26)
ds$ID <- c(1, 2)
ds <- ds %>%
    select(ID, everything())
select: columns reordered (ID, a, b, c, d, …)
# Randomly rearrange the columns
idx <- sample(letters, 26)
idx <- c("ID", idx)
ds <- ds %>%
    select(all_of(idx))
select: columns reordered (ID, b, z, a, p, …)
dd <- bind_rows(dd, data.frame(VARNAME = "ID", TYPE = "string"))
dim(dd)
[1] 27  2
head(dd)
  VARNAME    TYPE
1       c numeric
2       m numeric
3       f numeric
4       e numeric
5       a numeric
6       d numeric
dim(ds)
[1]  2 27
head(ds[1:3])
  ID                 b                 z
1  1  1.02333343074042  0.47956883003516
2  2 0.858655267162248 0.136965574463829
names(ds)
 [1] "ID" "b"  "z"  "a"  "p"  "f"  "u"  "m"  "q"  "n"  "d"  "o"  "s"  "k"  "e" 
[16] "x"  "c"  "h"  "i"  "g"  "j"  "r"  "t"  "y"  "l"  "w"  "v" 

22.3 Task: Reorder rows in dd in the order of ds’s columns

colnames(ds)
 [1] "ID" "b"  "z"  "a"  "p"  "f"  "u"  "m"  "q"  "n"  "d"  "o"  "s"  "k"  "e" 
[16] "x"  "c"  "h"  "i"  "g"  "j"  "r"  "t"  "y"  "l"  "w"  "v" 
dd$VARNAME
 [1] "c"  "m"  "f"  "e"  "a"  "d"  "v"  "h"  "k"  "t"  "p"  "j"  "l"  "x"  "w" 
[16] "y"  "b"  "o"  "s"  "r"  "i"  "z"  "u"  "n"  "g"  "q"  "ID"

This assumes that every row of dd is in colnames(ds) and every colnames(ds) value is represented in dd. Perhaps that should be checked first.

22.4 Assumption Check Question

How would you check that every variable listed in the data dictionary dd is named in colnames(ds) and every colnames(ds) value is represented in the data dictionary dd?

table(dd$VARNAME %in% colnames(ds))

TRUE 
  27 
table(colnames(ds) %in% dd$VARNAME)

TRUE 
  27 

Note that we should also check to see if the VARNAME’s are unique and the colnames of ds are unique.

sum(duplicated(dd$VARNAME))
[1] 0
sum(duplicated(colnames(ds)))
[1] 0

22.5 Task: Reorder rows in dd to match the order of the columns in ds

Task: Reorder rows in the data dictionary dd to match the order of the columns in the data ds

  • What are various ways you could rearrange the rows of a data frame?
# Assign VARNAME to be the rownames of dd
rownames(dd) <- dd$VARNAME
# Rearrange by row names:
dd2 <- dd[colnames(ds), ]
# Check if this worked:
all.equal(dd2$VARNAME, colnames(ds))
[1] TRUE

We can use match also:

# match returns a vector of the positions of (first) matches of its first
# argument in its second.
dd3 <- dd[match(colnames(ds), dd$VARNAME), ]
# Check if this worked:
all.equal(dd3$VARNAME, colnames(ds))
[1] TRUE

22.6 Question: use arrange?

Question: Is there a way to do this using arrange?

This does not work, because tidyverse wants to work on columns of data within dd:

dd4 <- dd %>%
    arrange(colnames(ds))
# Check if this worked:
all.equal(dd4$VARNAME, colnames(ds))
[1] "26 string mismatches"

22.7 Question: use arrange?

Question: Is there a way to do this using arrange?

arrange() orders the rows of a data frame by the values of selected columns.

dd4 <- dd %>%
    mutate(neworder = match(.$VARNAME, colnames(ds))) %>%
    arrange(neworder) %>%
    select(-neworder)
mutate: new variable 'neworder' (integer) with 27 unique values and 0% NA
select: dropped one variable (neworder)
all.equal(dd4$VARNAME, colnames(ds))
[1] TRUE

22.8 Question: use slice

Question: Is there a way to do this using the slice command?

slice() lets you index rows by their (integer) locations.

dd6 <- dd %>%
    slice(match(colnames(ds), .$VARNAME))
slice: no rows removed
all.equal(dd6$VARNAME, colnames(ds))
[1] TRUE

22.9 Question: use select?

Question: Is there a way to do this by transposing and then using select?

# Transpose so rows become columns, and then we can use 'select' to rearrange
# those columns, and then transpose back, and rename columns as needed.
dd5 <- dd %>%
    t() %>%
    as_tibble(.name_repair = "unique") %>%
    select(colnames(ds)) %>%
    t() %>%
    as.data.frame() %>%
    rename(VARNAME = "V1", TYPE = "V2")
select: columns reordered (ID, b, z, a, p, …)
rename: renamed 2 variables (VARNAME, TYPE)
all.equal(dd5$VARNAME, colnames(ds))
[1] TRUE

22.10 Question: use row names

Question: What about using row names?

“While a tibble can have row names (e.g., when converting from a regular data frame), they are removed when subsetting with the [ operator. A warning will be raised when attempting to assign non-NULL row names to a tibble. Generally, it is best to avoid row names, because they are basically a character column with different semantics than every other column.”

From: https://tibble.tidyverse.org/reference/rownames.html