Planning Methods: Predicting changes in jobs by Zip Code





For this lab, upload the Zip Code data from the previous lab. This should include the following files wih the following dimensions. You can also download the data here.

dim(zbp02)
## [1] 2207   24
dim(zbp12)
## [1] 2125   22

The purposes of this lab is to reinforce the regression and visualization skills learned in lab 2 and lab 3 using the Zip Business Patterns data.

Start by summarzing the 2002 data, which you will use to preidct job changes in 2012.

summary(zbp02)
##       ZIP         jobs_plus10         jobs.tot           jobs.23      
##  Min.   :15001   Min.   :    2.5   Min.   :    2.50   Min.   :   0.0  
##  1st Qu.:16010   1st Qu.:   71.0   1st Qu.:   72.25   1st Qu.:   2.5  
##  Median :17235   Median :  411.0   Median :  387.50   Median :  22.0  
##  Mean   :17318   Mean   : 2654.3   Mean   : 2597.33   Mean   : 138.6  
##  3rd Qu.:18618   3rd Qu.: 2327.0   3rd Qu.: 2234.25   3rd Qu.: 126.5  
##  Max.   :19980   Max.   :79883.5   Max.   :77848.50   Max.   :4476.5  
##                  NA's   :82        NA's   :40         NA's   :40      
##     jobs.31           jobs.42          jobs.44           jobs.48       
##  Min.   :    0.0   Min.   :   0.0   Min.   :    0.0   Min.   :   0.00  
##  1st Qu.:    0.0   1st Qu.:   0.0   1st Qu.:    2.5   1st Qu.:   0.00  
##  Median :   34.5   Median :   7.5   Median :   31.0   Median :   7.00  
##  Mean   :  357.2   Mean   : 128.8   Mean   :  377.0   Mean   :  73.68  
##  3rd Qu.:  341.5   3rd Qu.:  74.5   3rd Qu.:  273.2   3rd Qu.:  47.25  
##  Max.   :10076.0   Max.   :5025.5   Max.   :10491.0   Max.   :5435.00  
##  NA's   :40        NA's   :40       NA's   :40        NA's   :40       
##     jobs.51           jobs.52            jobs.53           jobs.54       
##  Min.   :   0.00   Min.   :    0.00   Min.   :   0.00   Min.   :    0.0  
##  1st Qu.:   0.00   1st Qu.:    0.00   1st Qu.:   0.00   1st Qu.:    0.0  
##  Median :   0.00   Median :    7.00   Median :   0.00   Median :    5.0  
##  Mean   :  72.96   Mean   :  155.58   Mean   :  39.46   Mean   :  158.4  
##  3rd Qu.:  14.50   3rd Qu.:   50.25   3rd Qu.:  17.50   3rd Qu.:   57.0  
##  Max.   :4203.50   Max.   :15671.50   Max.   :1739.50   Max.   :22421.0  
##  NA's   :40        NA's   :40         NA's   :40        NA's   :40       
##     jobs.56          jobs.61           jobs.62          jobs.71       
##  Min.   :   0.0   Min.   :   0.00   Min.   :   0.0   Min.   :   0.00  
##  1st Qu.:   0.0   1st Qu.:   0.00   1st Qu.:   0.0   1st Qu.:   0.00  
##  Median :   5.0   Median :   0.00   Median :  16.5   Median :   0.00  
##  Mean   : 145.6   Mean   :  66.51   Mean   : 362.8   Mean   :  41.55  
##  3rd Qu.:  56.0   3rd Qu.:   8.50   3rd Qu.: 208.2   3rd Qu.:  17.00  
##  Max.   :9158.5   Max.   :3734.50   Max.   :8389.5   Max.   :2858.00  
##  NA's   :40       NA's   :40        NA's   :40       NA's   :40       
##     jobs.72          jobs.81          jobs.95           jobs.99       
##  Min.   :   0.0   Min.   :   0.0   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:   0.0   1st Qu.:   2.5   1st Qu.:   0.00   1st Qu.:  0.000  
##  Median :  19.5   Median :  19.5   Median :   0.00   Median :  0.000  
##  Mean   : 213.5   Mean   : 139.4   Mean   :  26.79   Mean   :  1.355  
##  3rd Qu.: 154.2   3rd Qu.: 114.8   3rd Qu.:   0.00   3rd Qu.:  0.000  
##  Max.   :5445.5   Max.   :4624.5   Max.   :1961.00   Max.   :214.500  
##  NA's   :40       NA's   :40       NA's   :40        NA's   :40       
##     jobs.21            jobs.11          jobs.22           jobs.55      
##  Min.   :   0.000   Min.   :  0.00   Min.   :   0.00   Min.   :   0.0  
##  1st Qu.:   0.000   1st Qu.:  0.00   1st Qu.:   0.00   1st Qu.:   0.0  
##  Median :   0.000   Median :  0.00   Median :   0.00   Median :   0.0  
##  Mean   :   8.461   Mean   :  2.05   Mean   :  18.45   Mean   :  69.4  
##  3rd Qu.:   0.000   3rd Qu.:  0.00   3rd Qu.:   0.00   3rd Qu.:   2.5  
##  Max.   :1538.500   Max.   :194.00   Max.   :1785.50   Max.   :6602.0  
##  NA's   :40         NA's   :40       NA's   :40        NA's   :40

Like with the census tract data, not all of the zip coned match across the two years, resulting in missing data.

Plot the relationship between the number of jobs in 2002 and 2012.

plot(zbp02$jobs.tot, zbp02$jobs_plus10)

plot of chunk unnamed-chunk-4

And look at the distribution of jobs across Zip Codes. Most Zip Codes have very few jobs, with a long tail of job-rich Zip Codes.

plot(density(zbp02$jobs_plus10, na.rm = T))

plot of chunk unnamed-chunk-5

hist(zbp02$jobs_plus10)

plot of chunk unnamed-chunk-6

Look at a specific Zip Code (19104).

plot(zbp02$jobs_plus10, zbp02$jobs.tot, col = "tan")
points(zbp02$jobs_plus10[zbp02$ZIP == 19104], zbp02$jobs.tot[zbp02$ZIP==19104], col="red")

plot of chunk unnamed-chunk-7

And all Philadelphia tracts, plus a few others. These tracts do not look particularly different from others in PA.

plot(zbp02$jobs_plus10, zbp02$jobs.tot, col = "tan")
points(zbp02$jobs_plus10[zbp02$ZIP > 19018 & zbp02$ZIP < 19256], 
       zbp02$jobs.tot[zbp02$ZIP> 19018 & zbp02$ZIP < 19256], col="red")

plot of chunk unnamed-chunk-8

Now, predict the 2012 jobs as a function of the number of jobs in 2002. The regression line fits the data almost perfectly.

reg1 <- lm(jobs_plus10 ~ jobs.tot, zbp02) 
summary(reg1)
## 
## Call:
## lm(formula = jobs_plus10 ~ jobs.tot, data = zbp02)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7887.2  -158.2   -82.6    26.7 10332.6 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 82.743743  26.051260   3.176  0.00151 ** 
## jobs.tot     0.967799   0.004028 240.296  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1081 on 2083 degrees of freedom
##   (122 observations deleted due to missingness)
## Multiple R-squared:  0.9652, Adjusted R-squared:  0.9652 
## F-statistic: 5.774e+04 on 1 and 2083 DF,  p-value: < 2.2e-16

The R-squared statistic indicates that the number of jobs in 2002 predicts the number of jobs in 2012 quite well.

plot(zbp02$jobs_plus10, zbp02$jobs.tot, col = "tan")
abline(reg1)

plot of chunk unnamed-chunk-10

The R-squared statistic indicates that the number of jobs in 2002 predicts the number of jobs in 2012 quite well.

plot(zbp02$jobs_plus10, zbp02$jobs.tot, col = "tan")
abline(reg1)

plot of chunk unnamed-chunk-11

It also looks like there might be some systematic differences in prediction quality by the number of jobs.

plot(predict(reg1), resid(reg1))
abline(h=0,col=3,lty=3)

plot of chunk unnamed-chunk-12

Try to see which jobs are more or less likely to predict job losses or job increases.

summary(
  lm(jobs_plus10 ~jobs.23 +jobs.31 +jobs.42 +jobs.44 + jobs.48+ jobs.51 + jobs.52 +
       jobs.53 + jobs.54 +jobs.56  + jobs.61 + jobs.62 + jobs.71  + jobs.72 + 
       jobs.81 + jobs.95 + jobs.99 + jobs.21 + jobs.11 + jobs.22   + jobs.55, zbp02)
)
## 
## Call:
## lm(formula = jobs_plus10 ~ jobs.23 + jobs.31 + jobs.42 + jobs.44 + 
##     jobs.48 + jobs.51 + jobs.52 + jobs.53 + jobs.54 + jobs.56 + 
##     jobs.61 + jobs.62 + jobs.71 + jobs.72 + jobs.81 + jobs.95 + 
##     jobs.99 + jobs.21 + jobs.11 + jobs.22 + jobs.55, data = zbp02)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7705.6  -159.0   -51.4    39.6  9224.1 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 47.75265   25.06739   1.905 0.056923 .  
## jobs.23      1.48264    0.11445  12.954  < 2e-16 ***
## jobs.31      0.86957    0.03842  22.633  < 2e-16 ***
## jobs.42      1.75718    0.11463  15.329  < 2e-16 ***
## jobs.44      0.90575    0.05998  15.100  < 2e-16 ***
## jobs.48      0.66433    0.10652   6.237 5.41e-10 ***
## jobs.51      0.68900    0.13370   5.153 2.80e-07 ***
## jobs.52      0.59305    0.06731   8.811  < 2e-16 ***
## jobs.53      2.82325    0.31606   8.933  < 2e-16 ***
## jobs.54      1.51710    0.07039  21.552  < 2e-16 ***
## jobs.56      0.17040    0.08420   2.024 0.043112 *  
## jobs.61      1.47967    0.11901  12.433  < 2e-16 ***
## jobs.62      0.85000    0.04409  19.281  < 2e-16 ***
## jobs.71      0.75343    0.20036   3.760 0.000174 ***
## jobs.72      1.52875    0.12252  12.478  < 2e-16 ***
## jobs.81      0.44943    0.18831   2.387 0.017095 *  
## jobs.95      0.74847    0.19179   3.903 9.82e-05 ***
## jobs.99      5.25699    4.88099   1.077 0.281591    
## jobs.21      1.13877    0.36736   3.100 0.001962 ** 
## jobs.11      3.63400    2.12568   1.710 0.087495 .  
## jobs.22      0.88365    0.22376   3.949 8.11e-05 ***
## jobs.55      0.11145    0.11360   0.981 0.326664    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 984.3 on 2063 degrees of freedom
##   (122 observations deleted due to missingness)
## Multiple R-squared:  0.9714, Adjusted R-squared:  0.9711 
## F-statistic:  3339 on 21 and 2063 DF,  p-value: < 2.2e-16

Most of the paramter estimates are statistically different from zero with a high degree of confidence, but that is really not a very useful finding. You really want to know whether job types are statisticallyt different from 1 (i.e., are certain job types more or less likely to be decreasing over time?). You can roughly approximate this by looking at the coefficent estimate and the standard error. If the coefficient estimate plus or minus two standard errors crosses the number 1, than the estimate is not statistically different from one with 95% confidence.

You can aslo set up the regression to compare the coefficient estimate against the total number of jobs in 2002.

summary(
  lm(jobs_plus10 ~ jobs.23 +jobs.31 +jobs.42 +jobs.44 + jobs.48+ jobs.51 + jobs.52 +
       jobs.53 + jobs.54 +jobs.56  + jobs.61 + jobs.62 + jobs.71  + jobs.72 + 
       jobs.81 + jobs.95 + jobs.99 + jobs.21 + jobs.11 + jobs.22   + jobs.55, zbp02, offset= 1.00*jobs.tot)
)
## 
## Call:
## lm(formula = jobs_plus10 ~ jobs.23 + jobs.31 + jobs.42 + jobs.44 + 
##     jobs.48 + jobs.51 + jobs.52 + jobs.53 + jobs.54 + jobs.56 + 
##     jobs.61 + jobs.62 + jobs.71 + jobs.72 + jobs.81 + jobs.95 + 
##     jobs.99 + jobs.21 + jobs.11 + jobs.22 + jobs.55, data = zbp02, 
##     offset = 1 * jobs.tot)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7705.6  -159.0   -51.4    39.6  9224.1 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 47.75265   25.06739   1.905 0.056923 .  
## jobs.23      0.48264    0.11445   4.217 2.58e-05 ***
## jobs.31     -0.13043    0.03842  -3.395 0.000699 ***
## jobs.42      0.75718    0.11463   6.605 5.03e-11 ***
## jobs.44     -0.09425    0.05998  -1.571 0.116262    
## jobs.48     -0.33567    0.10652  -3.151 0.001650 ** 
## jobs.51     -0.31100    0.13370  -2.326 0.020107 *  
## jobs.52     -0.40695    0.06731  -6.046 1.76e-09 ***
## jobs.53      1.82325    0.31606   5.769 9.20e-09 ***
## jobs.54      0.51710    0.07039   7.346 2.93e-13 ***
## jobs.56     -0.82960    0.08420  -9.853  < 2e-16 ***
## jobs.61      0.47967    0.11901   4.031 5.77e-05 ***
## jobs.62     -0.15000    0.04409  -3.402 0.000681 ***
## jobs.71     -0.24657    0.20036  -1.231 0.218608    
## jobs.72      0.52875    0.12252   4.316 1.67e-05 ***
## jobs.81     -0.55057    0.18831  -2.924 0.003497 ** 
## jobs.95     -0.25153    0.19179  -1.311 0.189836    
## jobs.99      4.25699    4.88099   0.872 0.383223    
## jobs.21      0.13877    0.36736   0.378 0.705657    
## jobs.11      2.63400    2.12568   1.239 0.215437    
## jobs.22     -0.11635    0.22376  -0.520 0.603127    
## jobs.55     -0.88855    0.11360  -7.822 8.22e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 984.3 on 2063 degrees of freedom
##   (122 observations deleted due to missingness)
## Multiple R-squared:  0.9714, Adjusted R-squared:  0.9711 
## F-statistic:  3339 on 21 and 2063 DF,  p-value: < 2.2e-16

Each job in Real Estate in a Zip Code in 2002 (sector 53) correlates with another 1.62 jobs in 2012. Better, yet use the types of jobs in 2002 to predict the net change in jobs from 2002 to 2012.

reg2 <- lm(jobs_plus10-jobs.tot ~ jobs.23 +jobs.31 +jobs.42 +jobs.44 + jobs.48+ jobs.51 + jobs.52 +
       jobs.53 + jobs.54 +jobs.56  + jobs.61 + jobs.62 + jobs.71  + jobs.72 + 
       jobs.81 + jobs.95 + jobs.99 + jobs.21 + jobs.11 + jobs.22   + jobs.55, zbp02)
summary(reg2)
## 
## Call:
## lm(formula = jobs_plus10 - jobs.tot ~ jobs.23 + jobs.31 + jobs.42 + 
##     jobs.44 + jobs.48 + jobs.51 + jobs.52 + jobs.53 + jobs.54 + 
##     jobs.56 + jobs.61 + jobs.62 + jobs.71 + jobs.72 + jobs.81 + 
##     jobs.95 + jobs.99 + jobs.21 + jobs.11 + jobs.22 + jobs.55, 
##     data = zbp02)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7705.6  -159.0   -51.4    39.6  9224.1 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 47.75265   25.06739   1.905 0.056923 .  
## jobs.23      0.48264    0.11445   4.217 2.58e-05 ***
## jobs.31     -0.13043    0.03842  -3.395 0.000699 ***
## jobs.42      0.75718    0.11463   6.605 5.03e-11 ***
## jobs.44     -0.09425    0.05998  -1.571 0.116262    
## jobs.48     -0.33567    0.10652  -3.151 0.001650 ** 
## jobs.51     -0.31100    0.13370  -2.326 0.020107 *  
## jobs.52     -0.40695    0.06731  -6.046 1.76e-09 ***
## jobs.53      1.82325    0.31606   5.769 9.20e-09 ***
## jobs.54      0.51710    0.07039   7.346 2.93e-13 ***
## jobs.56     -0.82960    0.08420  -9.853  < 2e-16 ***
## jobs.61      0.47967    0.11901   4.031 5.77e-05 ***
## jobs.62     -0.15000    0.04409  -3.402 0.000681 ***
## jobs.71     -0.24657    0.20036  -1.231 0.218608    
## jobs.72      0.52875    0.12252   4.316 1.67e-05 ***
## jobs.81     -0.55057    0.18831  -2.924 0.003497 ** 
## jobs.95     -0.25153    0.19179  -1.311 0.189836    
## jobs.99      4.25699    4.88099   0.872 0.383223    
## jobs.21      0.13877    0.36736   0.378 0.705657    
## jobs.11      2.63400    2.12568   1.239 0.215437    
## jobs.22     -0.11635    0.22376  -0.520 0.603127    
## jobs.55     -0.88855    0.11360  -7.822 8.22e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 984.3 on 2063 degrees of freedom
##   (122 observations deleted due to missingness)
## Multiple R-squared:  0.2035, Adjusted R-squared:  0.1954 
## F-statistic:  25.1 on 21 and 2063 DF,  p-value: < 2.2e-16

Note how all the parameter estimates are the same, but there is now a much lower and much more useful R-squared. Instead of showing that Zip Codes with a lot of jobs in 2002 tend to have a lot of jobs in 2012, the model now shows that the number and types of jobs in each Zip Code can be used to predict whether a Zip Code gained or lost jobs. Instead of explaining 97% of the variation in the relationship, the model now claims to explain 17%.

The distribution of this dependent variable also looks a lot more normal with most Zip Codes not having changed very much.

plot(density(zbp02$jobs_plus10- zbp02$jobs.tot, na.rm=T))

plot of chunk unnamed-chunk-16

The residual plit also looks a bit more homoschedastic.

plot(predict(reg2), resid(reg2))
abline(h=0,col=3,lty=3)

plot of chunk unnamed-chunk-17

EXERCISE

  1. Try to make the most parsimonious model that does a good job of predicting the change in jobs from 2002 to 2012. Use a A full model vs. reduced model Anova test to compare this model to the fully specific model (with all job types) and descibe the results.

  2. Try predicting the percent change in jobs instead of the absolute change. Compare and contrast the two models. Which model do you prefer and why? Hint: use the identity function to generate the percent change variable inside of your regression: I(jobs_plus10/jobs.tot-1). Input industry codes as a percentage of total jobs: I(jobs.23/jobs.tot).

  3. What types of jobs are the best indicators of job growth and job losses? Desribe the quantitative relationship as expressed by the two models in question 2. Does this contradict or concur with your expectations?

  4. Look at the differences in jobs in sector 99 across the two time periods. What do you think is happening?

This entry was posted in Planning Methods, Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *