A Little Book of R for Bioinformatics
Preface to version 2.0
1
Downloading R
1.1
Preface
1.2
Introduction to R
1.3
Installing R
1.3.1
Installing
R
on a Windows PC
1.3.2
How to install
R
on non-Windows computers (eg. Macintosh or Linux computers)
1.4
Starting
R
2
Installing the RStudio IDE {$installRStudio}
2.1
Getting to know RStudio
2.2
RStudio versus RStudio Cloud
3
Installing
R
packages
3.1
Downloading packages with the RStudio IDE
3.2
Downloading packages with the function
install.packages()
3.3
Using packages after they are downloaded
4
Installing Bioconductor
4.1
Bioconductor
4.2
Installing BiocManager
4.3
The ins and outs of package installation
4.3.1
Updating other packages when downloading a package
4.3.2
Packages “from source”
4.3.3
More on angry red text
4.4
Actually loading a package
5
A Brief introduction to R
5.1
Vocabulary
5.2
R functions
5.3
Interacting with R
5.4
Variables in R
5.4.1
Vectors
5.4.2
Vector indexing
5.4.3
Character vectors
5.4.4
Lists
5.4.5
Tables
5.5
Arguments
5.6
Help files with
help()
and
?
5.7
Searching for functions with
help.search()
and
RSiteSearch()
5.8
More on functions
5.8.1
Writing your own functions
5.9
Quiting R
5.10
Links and Further Reading
6
A primer for working with vectors
6.1
Preface
6.2
Vocab
7
Functions
7.1
Vectors in R
7.2
Math on vectors
7.3
Functions on vectors
7.4
Operations with two vectors
7.5
Subsetting vectors
7.6
Sequences of numbers
7.7
Vectors can hold numeric or character data
7.8
Regular expressions can modify character data
8
Plotting vectors in base R
8.1
Preface
8.2
Plotting numeric data
8.3
Other plotting packages
9
Intro to R objects
9.1
Commands used
9.2
R Objects
9.3
Differences between objects
9.4
The Data
9.5
The assignment operator “<-” makes object
9.5.1
is()
9.5.2
length()
9.5.3
str()
9.5.4
c()
9.6
Debrief
10
Nucleic Acids
10.1
DNA and RNA
10.2
DNA Double-Helix Structure
10.3
RNA
10.4
Summary
10.5
Analysis Questions
10.6
Glossary
10.7
Contributors and Attributions
11
Proteins
11.0.1
Skills to develop:
11.1
Types and Functions of Proteins
11.2
Amino Acids
11.3
Evolution Connection:
11.4
Protein Structure
11.4.1
Primary Structure
11.4.2
Secondary Structure
11.4.3
Tertiary Structure
11.4.4
Quaternary Structure
11.5
Denaturation and Protein Folding
11.6
Summary
11.7
Art Connections
11.8
Review Questions
11.9
Free Response
11.10
Glossary
11.11
Contributors and Attributions
12
NCBI: The National Center for Biotechnology Information
12.1
Key concepts
12.2
NCBI
12.3
GenBank sequence database
12.4
PubMed and PubMed Central article database
12.5
Entrez
12.6
BLAST
13
Introduction to biological sequences databases
13.1
Topics
13.2
Introduction
13.3
Biological sequence databases
13.4
The NCBI Sequence Database
13.5
The NCBI Sub-Databases
13.6
NCBI GenBank Record Format
13.7
The FASTA file format
13.8
RefSeq
13.9
Querying the NCBI Database
13.10
Querying the NCBI Database via the NCBI Website (for reference)
13.11
Example: finding the sequences published in Nature 460:352-358 (for reference)
14
Downloading NCBI sequence data by hand
14.1
Preface
14.2
Retrieving genome sequence data via the NCBI website
15
Downloading sequences from UniProt by hand
15.1
Vocab
15.2
Downloading Protein data from UniProt
15.3
Viewing the UniProt webpage for a protein sequence
15.3.1
Protein function
15.3.2
Protein sequence and size
15.3.3
Other information
15.4
Retrieving a UniProt protein sequence via the UniProt website
16
Introducing FASTA Files
16.1
Example FASTA file
16.2
Multiple sequences in a single FASTA file
16.3
Multiple sequence alignments can be stored in FASTA format
16.4
FASTQ Format
17
A complete bioinformatics workflow in R
18
“Worked example: Building a phylogeny in R”
18.1
Introduction
18.1.1
Vocab
18.2
Software Preliminaires
18.2.1
R functions
18.2.2
Download necessary packages
18.2.3
Load packages into memory
18.3
Downloading macro-molecular sequences
18.4
Prepping macromolecular sequences
18.5
Aligning sequences
18.6
The shroom family of genes
18.7
Downloading multiple sequences
18.8
Multiple sequence alignment
18.8.1
Building an Multiple Sequence Alignment (MSA)
18.8.2
Viewing an MSA
18.9
A subset of sequences
18.10
Genetic distances of sequence in subset
18.11
Phylognetic trees of subset sequences (finally!)
18.11.1
Plotting phylogenetic trees
19
Simple for() loop example
19.1
Key functions / terms
20
Phylogenetic tree example using neighbor joining
20.1
Key vocab / concepts (not exhaustive)
21
Downloading DNA sequences as FASTA files in R
21.0.1
Functions
21.0.2
Software/websites
21.0.3
R vocabulary
21.0.4
File types
21.0.5
Bioinformatics vocabulary
21.1
Learning objectives
21.1.1
Organisms and Sequence accessions
21.1.2
Preliminaries
21.2
DNA Sequence Statistics: Part 1
21.2.1
Using R for Bioinformatics
21.2.2
R packages for bioinformatics: Bioconductor and SeqinR
21.2.3
FASTA file format
21.2.4
The NCBI sequence database
21.2.5
Retrieving genome sequence data using rentrez
21.3
Saving FASTA files
21.4
Next steps
21.5
Exercises
21.6
Review questions
22
Cleaning and preparing FASTA files for analysis in R
22.1
Preliminaries
22.2
Convert FASTA sequence to an R variable
22.2.1
Removing unwanted characters
22.2.2
Spliting unbroken strings in character vectors
23
DNA descriptive statics - Part 1
23.1
Preface
23.2
Introduction
23.3
Vocabulary
23.4
Functions
23.5
Learning objectives
23.6
Preliminaries
23.7
Converting DNA from FASTA format
23.8
Length of a DNA sequence
23.8.1
Base composition of a DNA sequence
23.8.2
GC Content of DNA
23.8.3
DNA words
23.8.4
Summary
23.9
Acknowledgements
23.9.1
License
23.9.2
Exercises
(PART) Appendices
Appendix 02: Getting access to R
23.10
Getting Started With R and RStudio
23.10.1
RStudio Cloud
23.10.2
Getting R onto your own computer
23.10.3
Getting RStudio onto your computer
23.10.4
Keep R and RStudio current
Getting started with R itself (or not)
Vocabulary
R commands
23.11
Help!
23.11.1
Getting “help” from R
23.11.2
Getting help from the internet
23.11.3
Getting help from online forums
23.11.4
Getting help from twitter
23.12
Other features of RStudio
23.12.1
Ajusting pane the layout
23.12.2
Adjusting size of windows
23.13
Practice (OPTIONAL)
24
Logarithms in R
Published with bookdown
A Little Book of R for Bioinformatics 2.0
Chapter 17
A complete bioinformatics workflow in R
By
: Nathan L. Brouwer