For overview read


Create /data-raw

This creates the folder /data-raw and adds a file DATASET.R


Create data processing scripts

Create script file

usethis::use_data_raw(name = "human_gene_lengths")

Get the data

In this case I’m using data from Whitlock and Schulter’s 2nd edition of Analysis of Biological Data, downloaded from the book’s website

human_gene_lengths <- read.csv(url(""))

names(human_gene_lengths) <- "gene_length"

Create the .RData file

This converts and R object into a .RData file that can be loaded with data().

If necessary it creates a data/ folder for the package.

                  overwrite = TRUE)

Create documentation file for the data

All datasets need a .R file the goes in the R/ folder, along with any .R files that define functions.

usethis::use_r(name = "human_gene_lengths", open = T)

This is an opportunity though to provide full documentation for the dataset. A minimal helpfile could look like this:

#' Dataset helpfile header . . .
#' Short description of data . . .
#' @format A data frame with x rows and y column(s)
#' \describe{
#'   \item{column1}{Describe column here . . .}
#'   \item{column2}{Describe column here . . .}
#'  ...
#' }
#' @source \url{"}

You can also add additional things that appear in R help files such as full citation information and examples.

For other examples see

The function defined below can be used to build this helpfile template automatically. It is also found in my biodata package.

# Function to build template for dataset helpf file
make_dateset_helpfile <- function(dataset,
                                  dataset_name  = "temp"){

  dataset <- human_gene_lengths
  dataset_name <- "human_gene_lengths"
  to_sink <- paste(dataset_name,"R",sep = ".")
  to_sink_with_dir <- here::here("R",to_sink)

  cat("#' Dataset helpfile header . . .\n")
  cat("#' Short description of data . . . \n")
  cat("#' @format A data frame with ", dim(dataset)[1], " rows and ",dim(dataset)[2]," column(s)\n", sep = "")
  cat("#' \\describe{\n", sep = "")

  for(i in 1:ncol(dataset)){
    colname.i <- names(dataset)[i]
    cat("#'   \\item{",colname.i,"}{Describe column ",colname.i, " here . . .}\n",sep = "")

  cat("#' }\n")
  cat("#' @source \\url{}\n")
  cat("'",dataset_name,"'", sep = "")