csv files
loading data
Reading a csv is fairly simple, we just need the file path. Lucky for you this has been prepared, the file was already downloaded to the Virtual File System (VFS) that webR uses. Classically, we can use the read.csv()
function from baseR. It is not as convenient, but you do not need to install or download any package to work with it. Check out the variable content using the commands from data shipped with R!
Base R
dplyr
In the tidyverse
we also have the readr package. It give more flexibility, for example specifiying columns. We do not need to care about that now, but it is good to know.
plotting data
Ok, the data is there, but what can we do with it? First, use some of commands from before to explore the dataset.
Consider using the glimpse()
function from dplyr
.
|> glimpse() flights_data_readr
Nice use of the pipe there! Next, we will do some plotting using the package ggplot2
which is based on the Grammar of Graphics (Wilkinson July). This is an extensive plotting framework and we will not cover all details here. At first, it is intimidating, but it gets better (I promise). Here, we will be using ggplot2
to look at the departure delay of airports in the U.S.
The plot is fairly simple and we will not customize it further to keep things simple. Just look at it and admire its beauty. Keep in mind, that practically every aspect of it is editable. It is hard to see anything, so we will now use a method called faceting to show the ordinal variable origin
, which stands for the origin where the flights started.
Below, you can see a plot that is highly customized to show, what ggplot2
is capable of.
It also tries to maximize the data-ink ration, a concept that was introduced by Edward Tufte, a graphic designer that worked on statistical graphics. We will not dive into this work, but let this be said: Plotting is an art and making good plots takes time and effort (even when they look simple). A good plot brings the message across without the need for further explanation.