DC_output
and wind_velocity
.DC_output
on vertical scale.
Call:
lm(formula = DC_output ~ wind_velocity, data = windmill)
Residuals:
Min 1Q Median 3Q Max
-0.59869 -0.14099 0.06059 0.17262 0.32184
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.13088 0.12599 1.039 0.31
wind_velocity 0.24115 0.01905 12.659 7.55e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2361 on 23 degrees of freedom
Multiple R-squared: 0.8745, Adjusted R-squared: 0.869
F-statistic: 160.3 on 1 and 23 DF, p-value: 7.546e-12
broom
has these:showing that the R-squared is 87%, and
showing the intercept and slope and their significance.
lm
actually fits the regression. Store results in a variable. Then look at the results, eg. via summary
or glance
/tidy
.wind.velocity
strongly significant, R-squared (87%) high.lm
by adding \(x^2\) to right side of model formula with +:I()
necessary because ^
in model formula otherwise means something different (to do with interactions in ANOVA).
Call:
lm(formula = DC_output ~ wind_velocity + I(wind_velocity^2),
data = windmill)
Residuals:
Min 1Q Median 3Q Max
-0.26347 -0.02537 0.01264 0.03908 0.19903
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.155898 0.174650 -6.618 1.18e-06 ***
wind_velocity 0.722936 0.061425 11.769 5.77e-11 ***
I(wind_velocity^2) -0.038121 0.004797 -7.947 6.59e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1227 on 22 degrees of freedom
Multiple R-squared: 0.9676, Adjusted R-squared: 0.9646
F-statistic: 328.3 on 2 and 22 DF, p-value: < 2.2e-16
Call:
lm(formula = DC_output ~ wind_velocity + I(wind_velocity^2),
data = windmill)
Residuals:
Min 1Q Median 3Q Max
-0.26347 -0.02537 0.01264 0.03908 0.19903
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.155898 0.174650 -6.618 1.18e-06 ***
wind_velocity 0.722936 0.061425 11.769 5.77e-11 ***
I(wind_velocity^2) -0.038121 0.004797 -7.947 6.59e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1227 on 22 degrees of freedom
Multiple R-squared: 0.9676, Adjusted R-squared: 0.9646
F-statistic: 328.3 on 2 and 22 DF, p-value: < 2.2e-16
This distribution has long tails, which should worry us at least some.
geom_point
);geom_smooth
, which draws best-fitting line);DC_output
values, joined by lines (with points not shown).geom_line
is use the predictions as the y
-points to join by lines (from DC.2
), instead of the original data points. Without the data
and aes
in the geom_line
, original data points would be joined by lines.Curve clearly fits better than line.
There is a problem with parabolas, which we’ll see later.
Ask engineer, “what should happen as wind velocity increases?”:
Mathematically, asymptote. Straight lines and parabolas don’t have them, but eg. \(y = 1/x\) does: as \(x\) gets bigger, \(y\) approaches zero without reaching it.
What happens to \(y = a + b(1/x)\) as \(x\) gets large?
Fit this, call it asymptote model.
Fitting the model here because we have math to justify it.
wind_velocity
we call wind_pace
.
Call:
lm(formula = DC_output ~ wind_pace, data = windmill)
Residuals:
Min 1Q Median 3Q Max
-0.20547 -0.04940 0.01100 0.08352 0.12204
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.9789 0.0449 66.34 <2e-16 ***
wind_pace -6.9345 0.2064 -33.59 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09417 on 23 degrees of freedom
Multiple R-squared: 0.98, Adjusted R-squared: 0.9792
F-statistic: 1128 on 1 and 23 DF, p-value: < 2.2e-16
Pretty straight. Blue actually smooth curve not line:
wind.pace
) vs. 2 for parabola model (wind.velocity
and its square).wind.pace
(unsurprisingly) strongly significant.This is skewed (left), but is not bad (and definitely better than the one for the parabola model).
w2
ggplot
likes to have one column of \(x\)’s to plot, and one column of \(y\)’s, with another column for distinguishing things.pivot_longer
, then plot:DC.output
).wind.velocity
higher. [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
[14] 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5
[27] 14.0 14.5 15.0 15.5 16.0
predict
, which requires what to predict for, as data frame. The data frame has to contain values, with matching names, for all explanatory variables in regression(s).wind_velocity
.wind_pace
(reciprocal of velocity).wv_new
with those in:wv_new
my_fits
DC.output
between 0 and 3 from asymptote model. Add rectangle to graph around where the data were:wind.velocity
, asymptote model behaves reasonably, parabola model does not.wind.velocity
goes to zero? Should find DC.output
goes to zero as well. Does it?wind.velocity
heads to 0, wind.pace heads to \(+\infty\), so DC.output heads to \(−\infty\)!wind.velocity
to understand relationship. (Is there a lower asymptote?)DC.output
to be zero for small wind.velocity
.
Comments
geom_smooth
smooths scatterplot trend. (Trend called “loess”, “Locally weighted least squares” which downweights outliers. Not constrained to be straight.)