pct.a.surf
Percentage of asphalt in surface layerpct.a.base
Percentage of asphalt in base layerfines
Percentage of fines in surface layervoids
Percentage of voids in surface layerrut.depth
Change in rut depth per million vehicle passesviscosity
Viscosity of asphaltrun
2 data collection periods: 1 for run 1, 0 for run 2.rut.depth
response. Depends on other variables, how?Make sure to load MASS
before tidyverse
(for annoying technical reasons), or to load MASS
excluding its select
(as above).
Same idea as for plotting separate predictions on one plot:
“collect all the x-variables together into one column called x, with another column xname saying which x they were, then plot these x’s against rut.depth, a separate facet for each x-variable.”
I saved this graph to plot later (on the next page).
viscosity
has strong but non-linear trend.run
has effect but variability bigger when run is 1.voids
.rut.depth
-viscosity
relationship should concern us.viscosity
: more nearly linear?viscosity
:(plot overleaf)
Call:
lm(formula = rut.depth ~ pct.a.surf + pct.a.base + fines + voids +
log(viscosity) + run, data = asphalt)
Residuals:
Min 1Q Median 3Q Max
-4.1211 -1.9075 -0.7175 1.6382 9.5947
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -12.9937 26.2188 -0.496 0.6247
pct.a.surf 3.9706 2.4966 1.590 0.1248
pct.a.base 1.2631 3.9703 0.318 0.7531
fines 0.1164 1.0124 0.115 0.9094
voids 0.5893 1.3244 0.445 0.6604
log(viscosity) -3.1515 0.9194 -3.428 0.0022 **
run -1.9655 3.6472 -0.539 0.5949
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.324 on 24 degrees of freedom
Multiple R-squared: 0.806, Adjusted R-squared: 0.7575
F-statistic: 16.62 on 6 and 24 DF, p-value: 1.743e-07
R-squared 81%, not so bad.
P-value in glance
asserts that something helping to predict rut.depth.
Table of coefficients says log(viscosity)
.
But confused by clearly non-significant variables: remove those to get clearer picture of what is helpful.
Problem fixes:
asphalt
.augment
that combines these two together so that they can later be plotted: start with a model first, and then augment with a data frame: [1] "pct.a.surf" "pct.a.base" "fines" "voids" "rut.depth"
[6] "viscosity" "run" ".fitted" ".resid" ".hat"
[11] ".sigma" ".cooksd" ".std.resid"
From package MASS
:
log(rut.depth)
) against other explanatory variables, all in one shot:log.viscosity
.log.rut.depth
for each run
have same spread.log.rut.depth
in terms of everything else, see what can be removed:tidy
from broom
to display just the coefficients.rut.3
.pct.a.base
, not significant.tidy
is itself a data frame, thus:pct.a.base
lm
code and remove what you’re removing:rut.4 <- lm(log(rut.depth) ~ pct.a.surf + fines + voids +
log(viscosity) + run, data = asphalt)
tidy(rut.4) %>% arrange(p.value) %>% select(term, p.value)
fines
is next to go, P-value 0.32.Another way to do the same thing:
fines
is the one to go. (Output identical as it should be.)Can’t take out intercept, so run
, with P-value 0.36, goes next.
Again, can’t take out intercept, so largest P-value is for voids
, 0.044. But this is significant, so we shouldn’t remove voids
.
pct.a.surf
, voids
and log.viscosity
would all make fit significantly worse if removed. So they stay. (Intercept) pct.a.surf voids log(viscosity)
-1.0207945 0.5554686 0.2447934 -0.6464911
(Intercept) pct.a.surf log(viscosity)
0.9001389 0.3911481 -0.6185628
step
that does backward selection, like this:Gets same answer as we did (by removing least significant x).
install.packages("leaps")
first).Uses package leaps
:
data.frame
rather than tibble
because there are several columns in outmat
.rut.6
:pct.a.surf
increases and voids
increases. This more or less checks out with out scatterplots against log.viscosity
.
Comments and next steps