1.1 What is data science?

Data analysis include “procedures for analyzing data, techniques for interpreting the results…, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.” John Tukey, “The future of data analysis”, Annals of Mathematical Statistics, 1962.

  • People argue about what data science is
  • What Tukey calls “data analysis” is now termed “data science” by many.
  • Some define data science as closely allied with computer science and want its use most closely associated with things like “big data”, data mining, machine learning, and artificial intelligence.
  • Others, such as RStudio’s Hadley Whickham (creator of ggplot2, dplyr, and most of the infrasture of the tidyverse of R package) define it more broadly to involve all aspects of the life cycle of data.
  • (Wickham also defines a data scientists as “A data scientist is a statistician who is wearing a bow tie” https://twitter.com/hadleywickham/status/906146116412039169?lang=en)