pct.a.surf
Percentage of asphalt in surface layerpct.a.base
Percentage of asphalt in base layerfines
Percentage of fines in surface layervoids
Percentage of voids in surface layerrut.depth
Change in rut depth per million vehicle passesviscosity
Viscosity of asphaltrun
2 data collection periods: 1 for run 1, 0 for run 2.rut.depth
response. Depends on other variables, how?Make sure to load MASS
before tidyverse
(for annoying technical reasons), or to load MASS
excluding its select
(as above).
Same idea as for plotting separate predictions on one plot:
“collect all the x-variables together into one column called x, with another column xname saying which x they were, then plot these x’s against rut.depth, a separate facet for each x-variable.”
I saved this graph to plot later (on the next page).
viscosity
has strong but non-linear trend.run
has effect but variability bigger when run is 1.voids
.rut.depth
-viscosity
relationship should concern us.viscosity
: more nearly linear?viscosity
:(plot overleaf)
summary(rut.1)
or:R-squared 81%, not so bad.
P-value in glance
asserts that something helping to predict rut.depth.
Table of coefficients says log(viscosity)
.
But confused by clearly non-significant variables: remove those to get clearer picture of what is helpful.
Problem fixes:
asphalt
.augment
that combines these two together so that they can later be plotted: start with a model first, and then augment with a data frame: [1] "pct.a.surf" "pct.a.base" "fines" "voids" "rut.depth"
[6] "viscosity" "run" ".fitted" ".resid" ".hat"
[11] ".sigma" ".cooksd" ".std.resid"
From package MASS
:
log(rut.depth)
) against other explanatory variables, all in one shot:log.viscosity
.log.rut.depth
for each run
have same spread.log.rut.depth
in terms of everything else, see what can be removed:tidy
from broom
to display just the coefficients.
Call:
lm(formula = log(rut.depth) ~ pct.a.surf + pct.a.base + fines +
voids + log(viscosity) + run, data = asphalt)
Residuals:
Min 1Q Median 3Q Max
-0.53072 -0.18563 -0.00003 0.20017 0.55079
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.57299 2.43617 -0.646 0.525
pct.a.surf 0.58358 0.23198 2.516 0.019 *
pct.a.base -0.10337 0.36891 -0.280 0.782
fines 0.09775 0.09407 1.039 0.309
voids 0.19885 0.12306 1.616 0.119
log(viscosity) -0.55769 0.08543 -6.528 9.45e-07 ***
run 0.34005 0.33889 1.003 0.326
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3088 on 24 degrees of freedom
Multiple R-squared: 0.961, Adjusted R-squared: 0.9512
F-statistic: 98.47 on 6 and 24 DF, p-value: 1.059e-15
Call:
lm(formula = log(rut.depth) ~ pct.a.surf + log(viscosity), data = asphalt)
Residuals:
Min 1Q Median 3Q Max
-0.61938 -0.21361 0.06635 0.14932 0.63012
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.90014 1.08059 0.833 0.4119
pct.a.surf 0.39115 0.21879 1.788 0.0846 .
log(viscosity) -0.61856 0.02713 -22.797 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3208 on 28 degrees of freedom
Multiple R-squared: 0.9509, Adjusted R-squared: 0.9474
F-statistic: 270.9 on 2 and 28 DF, p-value: < 2.2e-16
rut.3
.
Call:
lm(formula = log(rut.depth) ~ pct.a.surf + pct.a.base + fines +
voids + log(viscosity) + run, data = asphalt)
Residuals:
Min 1Q Median 3Q Max
-0.53072 -0.18563 -0.00003 0.20017 0.55079
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.57299 2.43617 -0.646 0.525
pct.a.surf 0.58358 0.23198 2.516 0.019 *
pct.a.base -0.10337 0.36891 -0.280 0.782
fines 0.09775 0.09407 1.039 0.309
voids 0.19885 0.12306 1.616 0.119
log(viscosity) -0.55769 0.08543 -6.528 9.45e-07 ***
run 0.34005 0.33889 1.003 0.326
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3088 on 24 degrees of freedom
Multiple R-squared: 0.961, Adjusted R-squared: 0.9512
F-statistic: 98.47 on 6 and 24 DF, p-value: 1.059e-15
pct.a.base
, not significant.tidy
is itself a data frame, thus:pct.a.base
lm
code and remove what you’re removing:rut.4 <- lm(log(rut.depth) ~ pct.a.surf + fines + voids +
log(viscosity) + run, data = asphalt)
summary(rut.4)
Call:
lm(formula = log(rut.depth) ~ pct.a.surf + fines + voids + log(viscosity) +
run, data = asphalt)
Residuals:
Min 1Q Median 3Q Max
-0.51610 -0.18785 -0.02248 0.18364 0.57160
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.07850 1.60665 -1.294 0.2076
pct.a.surf 0.59299 0.22526 2.632 0.0143 *
fines 0.08895 0.08701 1.022 0.3165
voids 0.20047 0.12064 1.662 0.1091
log(viscosity) -0.55249 0.08184 -6.751 4.48e-07 ***
run 0.35977 0.32533 1.106 0.2793
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3031 on 25 degrees of freedom
Multiple R-squared: 0.9608, Adjusted R-squared: 0.953
F-statistic: 122.7 on 5 and 25 DF, p-value: < 2.2e-16
fines
is next to go, P-value 0.32.Another way to do the same thing:
Call:
lm(formula = log(rut.depth) ~ pct.a.surf + fines + voids + log(viscosity) +
run, data = asphalt)
Residuals:
Min 1Q Median 3Q Max
-0.51610 -0.18785 -0.02248 0.18364 0.57160
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.07850 1.60665 -1.294 0.2076
pct.a.surf 0.59299 0.22526 2.632 0.0143 *
fines 0.08895 0.08701 1.022 0.3165
voids 0.20047 0.12064 1.662 0.1091
log(viscosity) -0.55249 0.08184 -6.751 4.48e-07 ***
run 0.35977 0.32533 1.106 0.2793
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3031 on 25 degrees of freedom
Multiple R-squared: 0.9608, Adjusted R-squared: 0.953
F-statistic: 122.7 on 5 and 25 DF, p-value: < 2.2e-16
fines
is the one to go. (Output identical as it should be.)
Call:
lm(formula = log(rut.depth) ~ pct.a.surf + voids + log(viscosity) +
run, data = asphalt)
Residuals:
Min 1Q Median 3Q Max
-0.57275 -0.20080 0.01061 0.17711 0.59774
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.25533 1.39147 -0.902 0.3753
pct.a.surf 0.54837 0.22118 2.479 0.0200 *
voids 0.23188 0.11676 1.986 0.0577 .
log(viscosity) -0.58039 0.07723 -7.516 5.59e-08 ***
run 0.29468 0.31931 0.923 0.3646
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3033 on 26 degrees of freedom
Multiple R-squared: 0.9592, Adjusted R-squared: 0.9529
F-statistic: 152.8 on 4 and 26 DF, p-value: < 2.2e-16
Can’t take out intercept, so run
, with P-value 0.36, goes next.
Call:
lm(formula = log(rut.depth) ~ pct.a.surf + voids + log(viscosity),
data = asphalt)
Residuals:
Min 1Q Median 3Q Max
-0.53548 -0.20181 -0.01702 0.16748 0.54707
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.02079 1.36430 -0.748 0.4608
pct.a.surf 0.55547 0.22044 2.520 0.0180 *
voids 0.24479 0.11560 2.118 0.0436 *
log(viscosity) -0.64649 0.02879 -22.458 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3025 on 27 degrees of freedom
Multiple R-squared: 0.9579, Adjusted R-squared: 0.9532
F-statistic: 204.6 on 3 and 27 DF, p-value: < 2.2e-16
Again, can’t take out intercept, so largest P-value is for voids
, 0.044. But this is significant, so we shouldn’t remove voids
.
pct.a.surf
, voids
and log.viscosity
would all make fit significantly worse if removed. So they stay.
Call:
lm(formula = log(rut.depth) ~ pct.a.surf + voids + log(viscosity),
data = asphalt)
Residuals:
Min 1Q Median 3Q Max
-0.53548 -0.20181 -0.01702 0.16748 0.54707
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.02079 1.36430 -0.748 0.4608
pct.a.surf 0.55547 0.22044 2.520 0.0180 *
voids 0.24479 0.11560 2.118 0.0436 *
log(viscosity) -0.64649 0.02879 -22.458 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3025 on 27 degrees of freedom
Multiple R-squared: 0.9579, Adjusted R-squared: 0.9532
F-statistic: 204.6 on 3 and 27 DF, p-value: < 2.2e-16
(Intercept) pct.a.surf voids log(viscosity)
-1.0207945 0.5554686 0.2447934 -0.6464911
(Intercept) pct.a.surf log(viscosity)
0.9001389 0.3911481 -0.6185628
step
that does backward selection, like this:Gets same answer as we did (by removing least significant x).
install.packages("leaps")
first).Uses package leaps
:
data.frame
rather than tibble
because there are several columns in outmat
.
Call:
lm(formula = log(rut.depth) ~ pct.a.surf + voids + log(viscosity),
data = asphalt)
Residuals:
Min 1Q Median 3Q Max
-0.53548 -0.20181 -0.01702 0.16748 0.54707
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.02079 1.36430 -0.748 0.4608
pct.a.surf 0.55547 0.22044 2.520 0.0180 *
voids 0.24479 0.11560 2.118 0.0436 *
log(viscosity) -0.64649 0.02879 -22.458 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3025 on 27 degrees of freedom
Multiple R-squared: 0.9579, Adjusted R-squared: 0.9532
F-statistic: 204.6 on 3 and 27 DF, p-value: < 2.2e-16
pct.a.surf
increases and voids
increases. This more or less checks out with out scatterplots against log.viscosity
.
Comments and next steps