`library(MASS, exclude = "select")`

# STAD29

# Statistics for the Life and Social Sciences

## Ken Butler

Welcome to the home page for STA 1007 / STAD29. This is the place to look for things course-related (notes, code, old exams etc., linked above) except for lecture videos, assignment hand-ins and marks, which will be on Quercus.

## News (newest first)

2024-04-25 15:00: all done, and grades submitted. They will reach you about two business days after being approved.

2024-04-25 22:45: 90% done. A little of questions 1 and 2 are all that remain. All being well, grades tomorrow.

2024-04-24 23:15: 78% done. What’s left is a bit of question 1, all of question 2, and a bit of question 3.

2024-04-23 23:00: We have all the exams now (there had been a problem with uploading the other exams, but we now have them all). This means that the percent done has actually slipped back to 51% (out of a now larger number of questions to grade), but I’ve had a good day: I am caught up with questions 5 and 6 on all the exams, plus the parts of question 7 I marked yesterday. What remains for me is the last few parts of question 4 for the exams I got today, and the last few parts of question 7 for all the exams. I have worked hard at being consistent between the exams I had and the ones I got today. The course graders are marking questions 1 through 3; expect the marking to be done by about Friday, which would mean grades reaching you on Monday.

2024-04-22 23:10: I’m now up to 7(c) on the exams I have; what remains is the rest of question 7, the other exams that are coming, and questions 1 to 3 that are being graded by the TAs. We are currently at 53%.

2024-04-21 23:00: As I said yesterday, I don’t have all the exams scanned yet, but out of the ones I do have, we are now 25% done, and I’m done questions 4 through 5(c). I’m expecting to get the other scans tomorrow, at which point the percentage done will go down before it goes up again, but hopefully taking care of the rest of the questions I’ve already marked some of won’t take too long. My solutions as they stand now.

2024-04-20 22:55: I spent most of the day marking my share of the final exam in my other course, but I did make some progress on our exam after that; we are now at the grand total of 5%. Not all of the exams are scanned, so I have marked 4(a) and 4(b) on the ones I do have, and will go back and finish those same questions on the other exams when I have them (endeavouring to remember what I did so that I can be consistent).

2024-04-19 22:00: Most of the exams are scanned, and I plan to start marking over the weekend. Our TAs will also be marking some of it. My solutions, with the Figures at the back.

2024-04-15 12:00:

- Assignment 8 is graded, and I will post the marks shortly. Appeals by usual procedure between April 16 and 19.
- Rules for the exam are the same as for all my exams. See note 2024-02-22 11:50 for a guide if you unsure. Expect questions from anywhere in the course, but with an emphasis on stuff that we did in lecture after the midterm.

2024-04-12 15:45: pre-exam office hours on

*Monday*from 11:00am to about 1:00pm, depending how busy things are.2024-04-10 17:55: Exam off for printing. It finished up as 7 questions with 40 parts and 87 points (I took a couple of things out). Coverage is everything from the lecture notes except for “durations, intervals, and periods” and also log-linear models as promised. The rules are the same as for the midterm: bring printed material as you wish (organize it or you will run out of time trying to find things), and a non-programmable non-communicating calculator if you wish. There is more explanation than coding (as is usual for this course).

2024-04-08 21:50: I have had a busy weekend finishing off your exam. In case you are curious:

- there is nothing on log-linear models as promised
- anything else from the course (that appeared in lectures) is fair game, with an emphasis on the material we talked about after the midterm
- there are currently 7 questions with a total of 42 parts worth a total of 90 points
- I probably will do a fair amount of editing between now and sending it off for printing, but don’t expect the number of parts or points to change much.

2024-04-05 13:20: Assignment 7 is marked, appeals as usual between Apr 7 and 12.

2024-04-03 21:45: I realized (after thinking about it) that we covered log-linear modelling at too high a speed this afternoon for it to be fair to put a question on that on the final exam (so I won’t). Expect, though, one or both of principal components and factor analysis to make an appearance.

2024-04-01 12:15: the last Monday update:

- I will let you know when assignments 7 and 8 are marked.
- the last tutorial this afternoon, with something on principal components and something (maybe) on maps.
- a note that leaflet maps and pdf don’t go well together (leaflet maps are actually html), so look at the html notes and PASIAS chapter for the stuff on maps.
- lecture this week: the end of principal components, factor analysis, log-linear models. (The last is like regression, but for tables of frequencies. The technique is new, but the ideas should be mainly familiar. I think this is a nice way to end the course.)
- Extra practice problems: PASIAS chapters 41 and 42. (We are not doing multidimensional scaling this time.)

2024-03-25 10:35: Monday update:

- Assignment 8 (the last one!) was due last night.
- Tutorial this afternoon, with something on cluster analysis.
- Lecture this week: drawing maps, principal components.
- Extra practice problems: PASIAS chapter 39. I thought there was a chapter on drawing maps as well; I will have to see what happened to that.
- My solutions to Assignment 7.

2024-03-22 12:50: Assignment 4 is graded as well, marks coming your way. Appeals as usual between Mar 25 and Apr 1.

2024-03-21 11:45: Assignments 3 and 5 are graded, and I am about to release the grades. Appeals by usual process between Mar 24 and Mar 31.

2024-03-18 11:00: Monday update:

- Assignment 7 was due last night, and Assignment 8 (the last one) opens tonight.
- Tutorial this afternoon on zoom, something on discriminant analysis.
- Lecture this week on cluster analysis. (There is another example in the discriminant analysis notes, which you can read if you like.)
- Extra practice problems on cluster analysis: PASIAS chapters 36 and 37.

2024-03-13 16:30: before I forget, from today’s class, loading MASS like this:

loads the whole of `MASS`

except for the one troublesome function `select`

that gets tangled up with the `tidyverse`

`select`

, and if you load `MASS`

this way, you don’t have to worry about the `conflicted`

stuff. (You can also use `include`

to say “load only these functions and no others”, which is in the same spirit as the Python `from x import y`

).

2024-03-13 11:20: my solutions to assignment 6.

2024-03-11 10:30: Monday update:

- The midterm is marked. Stats: Q1 32 (60%), median 38 (72%), Q3 43 (81%). I will be sharing the marks and the marked midterms shortly. The marks will be on Quercus, and you will be receiving an email from Crowdmark with instructions to see your marked exam.
- Appeals are the same procedure as for assignments, between Mar 14 and Mar 21 inclusive.
*Do not expect me to overrule the grader’s judgment*; thus, the only thing worth appealing is an actual grading error, such as an answer where the grader didn’t mark something relevant that you wrote, after consideration of my solutions and any comments on your exam. If you have not received full marks on a question part, you should have received a comment on your exam, or it should be obvious from my solutions what is missing or in error, or both of those things. Your appeal must include a clear description of the grading error (not grading judgment). If you appeal a grading judgment, you run the risk that I regrade your whole exam and your mark may go down as well as up. My philosophy on appeals is similar to that of Jeff Rosenthal.

- Appeals are the same procedure as for assignments, between Mar 14 and Mar 21 inclusive.
- Assignment 6 was due last night, and is open until Tuesday night (with late penalty). Assignment 7 opens tonight.
- Tutorial this afternoon on zoom, with something on repeated measures.
- Lecture this week: discriminant analysis, and if we have time, a start on cluster analysis.
- Extra practice problems on this week’s lecture material: PASIAS chapter 35, and Chapter 36 if we get to cluster analysis this week (which I think is unlikely).

- The midterm is marked. Stats: Q1 32 (60%), median 38 (72%), Q3 43 (81%). I will be sharing the marks and the marked midterms shortly. The marks will be on Quercus, and you will be receiving an email from Crowdmark with instructions to see your marked exam.
2024-03-09 21:30: Up to 89% done on the midterm. The last thing is the marking of question 1, which is happening now. (The grader was invigilating other midterms today.)

2024-03-08 21:15: We have more exam-marking progress: we are up to 83% done. I have marked my part; we are now waiting on one of the TAs to finish their part. I have been updating my solutions as I’ve been marking. In 5(c), some of you found an answer that I wasn’t expecting but that does answer the question. If that was you, you got full credit for it.

2024-03-07 22:15: I spent altogether too much of the evening struggling through 4(d), but that is now marked. I have updated my solutions with what I have seen while marking. We are now 61% done.

2024-03-06 23:00: the midterms are scanned and marking is underway. We are 31% done (some of question 2, and I did most of question 3 until I ran out of juice tonight).

2024-03-04 11:00: Monday update:

- there is
*no TA strike*. Last night, the university and the TA union came to a “tentative agreement”, which the TAs still have to vote on (so we are not safe yet), but for now there is no strike. - My exam solutions.
- exam marking will begin once the exams have been scanned and uploaded to Crowdmark.
- tutorial this afternoon as usual, with some problems on ANCOVA and MANOVA (don’t you love these acronyms?)
- lecture this week: repeated measures, maybe the start of discriminant analysis.
- extra practice problems: PASIAS chapter 34 (repeated measures), 35 (discriminant analysis).
- reminder that Assignment 6 is due on March 10.

- there is
2024-03-01 12:20:

- my solutions to Assignment 5
- if you have to miss the midterm, the weight goes automatically onto the final exam. Complete an absence declaration so that there is a record. (added 2024-03-04 11:30: if you already used your absence declaration, I do not need to see any additional documentation, but any other professors affected by your absence might, so be prepared.)
- conversely, if your final exam is better than your midterm, the final exam counts instead of your midterm.

2024-02-26 11:30: Monday update:

- Midterm: location and date is on an announcement on Quercus labelled “Midterm”. My reply to that announcement tells you which room you need to be in.
- Assignment 5 was due last night. Assignment 6 opens tonight; you once again have 2 weeks to complete it, so that the midterm and an assignment are not due the same weekend. You may find it useful to start Assignment 6, however, as part of your midterm preparation (the material on it is on the exam).
- Tutorial this afternoon. I will find something from the lecture before reading week.
- Lecture this week: analysis of covariance (short), multivariate analysis of variance (longer), maybe some of repeated measures.
- If you have questions as you prepare for the exam, post them on the Quercus discussion board or catch me after lecture.

2024-02-22 11:50:

- As I think I mentioned at the end of the last lecture, coverage for the midterm includes the last lecture before reading week (the “ANOVA revisited” stuff, as much as we saw in class).
- The midterm has gone for printing. It has five questions, with a total of 23 parts worth 53 points altogether. Expect to be doing more explanation than coding (since the focus of this course is the understanding of the statistics).
- The exam is open book, same rules as STAC32, and with the same expectation that you will organize your materials before the exam (or else you can expect to run out of time).
- Suggestion: bring a calculator to the exam (an actual calculator, not your phone). You might find it helpful in a couple of places.
- There will be a lecture next week before the midterm. That material will not be on the midterm, but you can count on it being on the final exam (and it will help you understand what follows in the course).
- re Assignment 2, I have asked the grader to add some explanation to the grading so that you know where you didn’t get full marks. In the meanwhile, compare your answers with my solutions. The same applies to Assignment 3 and Assignment 4 that you don’t have back yet.

2024-02-20 14:00: Assignment 2 is graded, and I am about to post the marks. Appeals by the usual procedure between Feb 23 and Mar 1.

b1153c55bd1d839d5cfc54d50fe1f522691e8145 - 2024-02-18 22:15: A Sunday night “Monday update”, to remind you that the upcoming week is Reading Week, and there is therefore no tutorial on Monday or lecture on Wednesday. (I will be spending the time sorting out your midterm.) We resume on the 26th.

2024-02-15 13:30: the midterm is at the end of the first week back after reading week. With that in mind, it seems best to say that we have now covered everything that could appear on the midterm (up to the end of ANOVA Revisited that we did yesterday). The material in between now and the midterm (ANCOVA, MANOVA and possibly some of repeated measures) can certainly appear on the final exam, and provides the foundation for some of the other things we do after that, so you won’t want to miss it.

2024-02-14 13:30: a couple of things:

- My solutions to Assignment 4
- The lecture notes for ANOVA Revisited are rather long, and I won’t be talking about all of it. The beginning is review; today, I plan to start with the Rats and Vitamin B example.

2024-02-12 11:25: Monday update:

- Assignment 4 was due last night; Assignment 5 opens tonight. You will have two weeks to do Assignment 5, since it is due on Sunday at the end of reading week.
- Tutorial this afternoon on Zoom as usual. I will find a survival analysis example from PASIAS to talk about.
- Lecture: there is a little more material at the end of the Survival Analysis slides that you would probably do well to look through, but this week I’ll be moving on to ANOVA Revisited. This is followed (probably after reading week) by some other ideas that are based on ANOVA: MANOVA, repeated measures, discriminant analysis and so on.

2024-02-08 10:50: my solutions to assignment 3.

2024-02-07 20:10: I updated the lecture notes to reflect what I added for class today (the slides “behind the scenes” and the one after that). This now seems to be working for both the .html and the .pdf versions of the slides.

2024-02-05 15:15: assignment planning: there will be no assignments due during reading week or the weekend of the midterm. That means you’ll get two weeks to do each of Assignment 5 (due the Sunday night at the end of reading week) and Assignment 6 (due March 10). The eighth and final assignment will be due on March 24, so you’ll get a break at the end of the course.

2024-02-05 11:30: Monday update:

- Assignments: #1 is marked, appeals by usual procedure between Feb 8 and 15; #2 is being marked; #3 was due yesterday; #4 opens tonight.
- Tutorial on zoom this afternoon as usual (usual link)
- Lecture this week: survival analysis. (I’ll decide whether I want to do any more of the dates and times stuff, but I think we have all we need of that.) Content warning on the survival analysis: often the data is on people that will die of something (often cancer), and we will be investigating treatments that will help them live longer, but some of them will die of whatever-it-is. I think that will take up our two hours this week.

2024-02-01 10:30: My solutions to Assignment 2.

2024-01-29 13:20: Monday update:

- Tutorial on zoom this afternoon at 4 (link in Quercus announcement)
- Assignments: #2 was due yesterday; #3 opens tonight. I will let you know when #1 is graded.
- Lecture this week:
- logistic regression with multi-category but
*unordered*response - dates and times (in which you learn just how fiddly dates and times are to handle, and how useful it is to have packages that handle them for us)

- logistic regression with multi-category but
- extra practice problems on this week’s material: PASIAS chapters 29 and 21 (respectively).

2024-01-25 20:30: After Wednesday’s class, I decided that you might like some extra practice on the stuff we did in the first half of class (the log-odds stuff), so I added a short question on that to the next assignment. If that material is still confusing you, we can talk about that on Monday. (It is probably not giving much away to say that the second question on that assignment is on the stuff we did after half-time in class this week.)

2024-01-25 12:20: my solutions to Assignment 1.

2024-01-24 11:25: We have a midterm date. See the announcement on Quercus for date, time, and place (not until early March).

2024-01-22 13:30: upcoming, this week:

- tutorial today at 4 on zoom, same coordinates as last week. As I write, Accuweather is telling me “periods of heavy snow for at least 60 minutes”, so I am quite happy to not be travelling today! I plan to pick a problem from PASIAS to talk about, and of course bring other questions if you have them.
- in lecture this week, some or all of:
- log-odds and odds ratios and relative risk (partly in response to last week’s question about what those slope and intercept numbers actually
*mean*) - logistic regression with a multi-category response when the response categories are ordered (the coal miners lung disease example)
- logistic regression with a multi-category response when the response categories are
*not*ordered (the brand preference example)

- log-odds and odds ratios and relative risk (partly in response to last week’s question about what those slope and intercept numbers actually
- extra examples: chapters 28 and 29 of PASIAS.
- Assignment 2, on the logistic regression stuff from last week’s lecture, opens tonight and is due next Sunday.

2024-01-17 13:30: practice problems for this week’s material: PASIAS chapter 26 (edited 2024-01-18: we’ll get to chapter 27 next week).

2024-01-15 11:15: on the agenda this week:

- tutorial today at 4:00pm, on zoom (link in Quercus announcement), on the regression stuff we did in lecture last week.
- The stuff in the slides that I didn’t talk about is regression review. Read through those if you feel you need more review.
- Assignment 1 opens tonight, is due next Sunday night (the 21st), on the stuff we did in lecture last week.
- This week’s lecture is on logistic regression. The thing that distinguishes this from regular regression is that the response variable is categorical rather than quantitative. There are three parts (that we won’t do all of this week): when the outcome is a success/failure, when the outcome is several categories but ordered, when the outcome is several categories that are not ordered. There are also variations in how the data come to us (and therefore how we have to deal with it).
- Next week’s tutorial (the 22nd) is on the stuff in this week’s lecture, and Assignment 2 (opens on the 22nd and is due on the 28th) is on the same material.

2024-01-11 20:45: we have a Zoom meeting set up for Monday’s tutorial. The coordinates are in a Quercus announcement (I am not announcing it publicly).

2024-01-11 11:45: Here is a worksheet on the material we looked at in class yesterday. I’m not promising a worksheet every week, but I will point you at some practice problems for each section of the course. Look out for information about Monday’s session tomorrow (once I have sorted that out).

2024-01-08 12:45: Our course begins this week:

- expect things to be structured a lot like they were in C32.
- one two-hour lecture a week, on Wednesdays (I will give you a break in the middle). Your ACORN has the location.
- one hour of tutorial-slash-office hour on Mondays, starting
*next*week, probably on zoom. You can also catch me after lecture, or post in the Quercus discussions. - weekly assignments, with the first one going out on January 15 (out Monday night, due the following Sunday night), in the same style as for C32.
- D29 is more about the statistics and less about the coding than C32 was; you will be learning some statistical methods that are definitely new to you, and I will assume that you are keeping up with the coding part.

2024-01-03 11:00: lecture 1 is a week away:

- Quercus page is up (for assignments, discussion board etc)
- be ready to learn some new statistics (there will be stuff you haven’t seen before)
- I will assume that you are familiar with the R stuff, and the statistical ideas, that you learned in C32. (If you are not, be prepared to do some catching up.)
- we begin with some new ideas in regression.
- I don’t think there is anyone taking the course as STA 1007 (graduate course) this year; if you are, let me know.

2023-12-12 18:30: a bit more detail:

- lectures are once a week on Wednesdays for 2 hours
- there is a “practical” on Mondays at 4, starting in week 2, probably on zoom. This is a sort of combined office hour / tutorial to which you can bring questions about the previous week’s material.
- weekly assignments (that you are used to from C32), opening Monday night, due the following Sunday night.
- a 2-hour midterm and a 3-hour final as usual, on dates to be announced.

2023-12-08 14:30: here is this year’s version of the site. Class meets once a week for two hours, starting on Wed January 10, 2024.