pct.a.surf Percentage of asphalt in surface layerpct.a.base Percentage of asphalt in base layerfines Percentage of fines in surface layervoids Percentage of voids in surface layerrut.depth Change in rut depth per million vehicle passesviscosity Viscosity of asphaltrun 2 data collection periods: 1 for run 1, 0 for run 2.rut.depth response. Depends on other variables, how?Make sure to load MASS before tidyverse (for annoying technical reasons), or to load MASS excluding its select (as above).
Same idea as for plotting separate predictions on one plot:
“collect all the x-variables together into one column called x, with another column xname saying which x they were, then plot these x’s against rut.depth, a separate facet for each x-variable.”
I saved this graph to plot later (on the next page).
viscosity has strong but non-linear trend.run has effect but variability bigger when run is 1.voids.rut.depth-viscosity relationship should concern us.viscosity: more nearly linear?viscosity:(plot overleaf)
Call:
lm(formula = rut.depth ~ pct.a.surf + pct.a.base + fines + voids +
log(viscosity) + run, data = asphalt)
Residuals:
Min 1Q Median 3Q Max
-4.1211 -1.9075 -0.7175 1.6382 9.5947
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -12.9937 26.2188 -0.496 0.6247
pct.a.surf 3.9706 2.4966 1.590 0.1248
pct.a.base 1.2631 3.9703 0.318 0.7531
fines 0.1164 1.0124 0.115 0.9094
voids 0.5893 1.3244 0.445 0.6604
log(viscosity) -3.1515 0.9194 -3.428 0.0022 **
run -1.9655 3.6472 -0.539 0.5949
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.324 on 24 degrees of freedom
Multiple R-squared: 0.806, Adjusted R-squared: 0.7575
F-statistic: 16.62 on 6 and 24 DF, p-value: 1.743e-07
R-squared 81%, not so bad.
P-value in glance asserts that something helping to predict rut.depth.
Table of coefficients says log(viscosity).
But confused by clearly non-significant variables: remove those to get clearer picture of what is helpful.
Problem fixes:
asphalt.augment that combines these two together so that they can later be plotted: start with a model first, and then augment with a data frame: [1] "pct.a.surf" "pct.a.base" "fines" "voids" "rut.depth"
[6] "viscosity" "run" ".fitted" ".resid" ".hat"
[11] ".sigma" ".cooksd" ".std.resid"
From package MASS:
log(rut.depth)) against other explanatory variables, all in one shot:log.viscosity.log.rut.depth for each run have same spread.log.rut.depth in terms of everything else, see what can be removed:tidy from broom to display just the coefficients.rut.3.pct.a.base, not significant.tidy is itself a data frame, thus:pct.a.baselm code and remove what you’re removing:rut.4 <- lm(log(rut.depth) ~ pct.a.surf + fines + voids +
log(viscosity) + run, data = asphalt)
tidy(rut.4) %>% arrange(p.value) %>% select(term, p.value)fines is next to go, P-value 0.32.Another way to do the same thing:
fines is the one to go. (Output identical as it should be.)Can’t take out intercept, so run, with P-value 0.36, goes next.
Again, can’t take out intercept, so largest P-value is for voids, 0.044. But this is significant, so we shouldn’t remove voids.
pct.a.surf, voids and log.viscosity would all make fit significantly worse if removed. So they stay. (Intercept) pct.a.surf voids log(viscosity)
-1.0207945 0.5554686 0.2447934 -0.6464911
(Intercept) pct.a.surf log(viscosity)
0.9001389 0.3911481 -0.6185628
step that does backward selection, like this:Gets same answer as we did (by removing least significant x).
install.packages("leaps") first).Uses package leaps:
data.frame rather than tibble because there are several columns in outmat.rut.6:pct.a.surf increases and voids increases. This more or less checks out with out scatterplots against log.viscosity.
Comments and next steps