Planning Methods: Loading and examining data

This page describes how to upload and look at data in the R package. First, we will downlaod a CSV of UN population estimates on central African cities to import into R: Central Africa West.

The first thing to do is to let R know where to find the CSV. You can do this by navigating through the bottom right window in R Studio (Files/More/Set As Working Directory) or setting your working directory using the setwd command.

Then read the CSV file: read.csv(“Central Africa West.csv”)
Note how the data are read directly printed on the command console. Next we will create a new object called pop so that we can call on the data more easily and interact with it.

Try: pop <- read.csv(“Central Africa West.csv”)

NB you can use = instead of <- but it is better coding practice to save the = sign for other uses.

You can also load data in Rstudio using a point-and-click interface. To set your working directory, click on Session at the top of the window and then click Set Working Directory/Choose Directory. To import a data file, click on File/Import Dataset.

To look at the data, try cutting and pasting the following commands into the command console and hitting enter.

str(pop)
## 'data.frame':    17 obs. of  12 variables:
##  $ Year                   : int  1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 ...
##  $ Period                 : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Douala                 : int  95 117 153 205 298 433 571 740 940 1184 ...
##  $ Yaounde                : int  32 49 75 112 183 292 415 578 777 1025 ...
##  $ Libreville             : int  15 17 29 49 77 155 234 293 366 439 ...
##  $ Brazzaville            : int  83 92 124 172 238 329 446 596 704 830 ...
##  $ Pointe_Noire           : int  16 35 64 89 116 154 217 300 363 439 ...
##  $ Sub_Saharan_300kplus   : int  3923 4909 7083 10779 16335 24143 34813 47767 62327 75996 ...
##  $ Central_Africa_300kplus: int  3660 4521 5651 7047 8921 11495 14465 18043 22566 28525 ...
##  $ Cameroon_urban         : int  417 557 747 1011 1375 2112 2851 3761 4787 5930 ...
##  $ Congo_urban            : int  201 253 320 408 522 672 860 1086 1295 1535 ...
##  $ Gabon_urban            : int  54 68 87 126 189 279 397 515 655 814 ...

This tells you the structure of the data. In this case, all of the variables are integers, but it is also common to see characters, factors, and other types of data.

names(pop)
##  [1] "Year"                    "Period"                 
##  [3] "Douala"                  "Yaounde"                
##  [5] "Libreville"              "Brazzaville"            
##  [7] "Pointe_Noire"            "Sub_Saharan_300kplus"   
##  [9] "Central_Africa_300kplus" "Cameroon_urban"         
## [11] "Congo_urban"             "Gabon_urban"

This command gives the names of all the variables in the data. The country, city, and regions contain population estimates in thousands.

summary(pop)
##       Year          Period       Douala          Yaounde      
##  Min.   :1950   Min.   : 1   Min.   :  95.0   Min.   :  32.0  
##  1st Qu.:1970   1st Qu.: 5   1st Qu.: 205.0   1st Qu.: 112.0  
##  Median :1990   Median : 9   Median : 571.0   Median : 415.0  
##  Mean   :1990   Mean   : 9   Mean   : 804.8   Mean   : 693.8  
##  3rd Qu.:2010   3rd Qu.:13   3rd Qu.:1184.0   3rd Qu.:1025.0  
##  Max.   :2030   Max.   :17   Max.   :2361.0   Max.   :2349.0  
##                              NA's   :4        NA's   :4       
##    Libreville     Brazzaville      Pointe_Noire   Sub_Saharan_300kplus
##  Min.   : 15.0   Min.   :  83.0   Min.   : 16.0   Min.   :  3923      
##  1st Qu.: 49.0   1st Qu.: 172.0   1st Qu.: 89.0   1st Qu.: 16335      
##  Median :234.0   Median : 446.0   Median :217.0   Median : 62327      
##  Mean   :258.5   Mean   : 575.3   Mean   :293.1   Mean   : 91980      
##  3rd Qu.:439.0   3rd Qu.: 830.0   3rd Qu.:439.0   3rd Qu.:137283      
##  Max.   :631.0   Max.   :1574.0   Max.   :815.0   Max.   :300153      
##  NA's   :4       NA's   :4        NA's   :4                           
##  Central_Africa_300kplus Cameroon_urban   Congo_urban    Gabon_urban    
##  Min.   :  3660          Min.   :  417   Min.   : 201   Min.   :  54.0  
##  1st Qu.:  8921          1st Qu.: 1375   1st Qu.: 522   1st Qu.: 189.0  
##  Median : 22566          Median : 4787   Median :1295   Median : 655.0  
##  Mean   : 34795          Mean   : 6835   Mean   :1723   Mean   : 819.6  
##  3rd Qu.: 51883          3rd Qu.:10625   3rd Qu.:2600   3rd Qu.:1334.0  
##  Max.   :107747          Max.   :20492   Max.   :4804   Max.   :2122.0  
## 

Another good way to get a sense for the data is to look at the first or last entries, using the head or tail commands.

head(pop)
##   Year Period Douala Yaounde Libreville Brazzaville Pointe_Noire
## 1 1950      1     95      32         15          83           16
## 2 1955      2    117      49         17          92           35
## 3 1960      3    153      75         29         124           64
## 4 1965      4    205     112         49         172           89
## 5 1970      5    298     183         77         238          116
## 6 1975      6    433     292        155         329          154
##   Sub_Saharan_300kplus Central_Africa_300kplus Cameroon_urban Congo_urban
## 1                 3923                    3660            417         201
## 2                 4909                    4521            557         253
## 3                 7083                    5651            747         320
## 4                10779                    7047           1011         408
## 5                16335                    8921           1375         522
## 6                24143                   11495           2112         672
##   Gabon_urban
## 1          54
## 2          68
## 3          87
## 4         126
## 5         189
## 6         279

For help with any of these commands, use the help function by typing ? before the command name. For example, try typing ?head into the command console

EXERCISE

Walk through this brief introduction to R and R Studio.

 

 

This entry was posted in Planning Methods. Bookmark the permalink.