DC_output and wind_velocity.DC_output on vertical scale.
Call:
lm(formula = DC_output ~ wind_velocity, data = windmill)
Residuals:
Min 1Q Median 3Q Max
-0.59869 -0.14099 0.06059 0.17262 0.32184
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.13088 0.12599 1.039 0.31
wind_velocity 0.24115 0.01905 12.659 7.55e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2361 on 23 degrees of freedom
Multiple R-squared: 0.8745, Adjusted R-squared: 0.869
F-statistic: 160.3 on 1 and 23 DF, p-value: 7.546e-12
broom has these:showing that the R-squared is 87%, and
showing the intercept and slope and their significance.
lm actually fits the regression. Store results in a variable. Then look at the results, eg. via summary or glance/tidy.wind.velocity strongly significant, R-squared (87%) high.lm by adding \(x^2\) to right side of model formula with +:I() necessary because ^ in model formula otherwise means something different (to do with interactions in ANOVA).
Call:
lm(formula = DC_output ~ wind_velocity + I(wind_velocity^2),
data = windmill)
Residuals:
Min 1Q Median 3Q Max
-0.26347 -0.02537 0.01264 0.03908 0.19903
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.155898 0.174650 -6.618 1.18e-06 ***
wind_velocity 0.722936 0.061425 11.769 5.77e-11 ***
I(wind_velocity^2) -0.038121 0.004797 -7.947 6.59e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1227 on 22 degrees of freedom
Multiple R-squared: 0.9676, Adjusted R-squared: 0.9646
F-statistic: 328.3 on 2 and 22 DF, p-value: < 2.2e-16
This distribution has long tails, which should worry us at least some.
geom_point);geom_smooth, which draws best-fitting line);DC_output values, joined by lines (with points not shown).geom_line is use the predictions as the y-points to join by lines (from DC.2), instead of the original data points. Without the data and aes in the geom_line, original data points would be joined by lines.Curve clearly fits better than line.
There is a problem with parabolas, which we’ll see later.
Ask engineer, “what should happen as wind velocity increases?”:
Mathematically, asymptote. Straight lines and parabolas don’t have them, but eg. \(y = 1/x\) does: as \(x\) gets bigger, \(y\) approaches zero without reaching it.
What happens to \(y = a + b(1/x)\) as \(x\) gets large?
Fit this, call it asymptote model.
Fitting the model here because we have math to justify it.
wind_velocity we call wind_pace.
Call:
lm(formula = DC_output ~ wind_pace, data = windmill)
Residuals:
Min 1Q Median 3Q Max
-0.20547 -0.04940 0.01100 0.08352 0.12204
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.9789 0.0449 66.34 <2e-16 ***
wind_pace -6.9345 0.2064 -33.59 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09417 on 23 degrees of freedom
Multiple R-squared: 0.98, Adjusted R-squared: 0.9792
F-statistic: 1128 on 1 and 23 DF, p-value: < 2.2e-16
Pretty straight. Blue actually smooth curve not line:
Call:
lm(formula = DC_output ~ wind_pace, data = windmill)
Residuals:
Min 1Q Median 3Q Max
-0.20547 -0.04940 0.01100 0.08352 0.12204
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.9789 0.0449 66.34 <2e-16 ***
wind_pace -6.9345 0.2064 -33.59 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09417 on 23 degrees of freedom
Multiple R-squared: 0.98, Adjusted R-squared: 0.9792
F-statistic: 1128 on 1 and 23 DF, p-value: < 2.2e-16
wind.pace) vs. 2 for parabola model (wind.velocity and its square).wind.pace (unsurprisingly) strongly significant.This is skewed (left), but is not bad (and definitely better than the one for the parabola model).
w2ggplot likes to have one column of \(x\)’s to plot, and one column of \(y\)’s, with another column for distinguishing things.pivot_longer, then plot:DC.output).wind.velocity higher. [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
[14] 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5
[27] 14.0 14.5 15.0 15.5 16.0
predict, which requires what to predict for, as data frame. The data frame has to contain values, with matching names, for all explanatory variables in regression(s).wind_velocity.wind_pace (reciprocal of velocity).wv_new with those in:wv_newmy_fitsDC.output between 0 and 3 from asymptote model. Add rectangle to graph around where the data were:wind.velocity, asymptote model behaves reasonably, parabola model does not.wind.velocity goes to zero? Should find DC.output goes to zero as well. Does it?wind.velocity heads to 0, wind.pace heads to \(+\infty\), so DC.output heads to \(−\infty\)!wind.velocity to understand relationship. (Is there a lower asymptote?)DC.output to be zero for small wind.velocity.
Comments
geom_smoothsmooths scatterplot trend. (Trend called “loess”, “Locally weighted least squares” which downweights outliers. Not constrained to be straight.)