STAD29

Statistics for the Life and Social Sciences

Ken Butler

bad picture

Welcome to the home page for STA 1007 / STAD29. This is the place to look for things course-related (notes, code, old exams etc., linked above) except for lecture videos, assignment hand-ins and marks, which will be on Quercus.

News (newest first)

  • 2023-12-08 14:30: here is this year’s version of the site. Class meets once a week for two hours, starting on Wed January 10, 2024.

Last year

  • 2023-05-01 20:15: I have calculated the course grades you earned, and will submit them shortly. When they have been submitted and approved, they will come to you. Some things I need to say:

    • Some of the work done on the final exam was very good, and there will be several A+ grades.
    • I was rather generous in marking the final exam.
    • I have reviewed grades close to key grade boundaries and am happy that I have things in the right places.
    • Course grades are based only on the work you have done and the understanding you have displayed.
      • Your personal circumstances do not, and cannot, figure into your grade. It does not matter how hard you worked, or that you need a certain grade for a scholarship or to graduate. If you want those things, it is up to you to do work of sufficient quality to earn them.
      • If you don’t like your grade, you have exactly two options: a clerical check and, after obtaining a copy of your marked exam, a petition for re-reading of your final exam. Note that I use R to calculate course grades, so it is very unlikely that they are incorrectly calculated.
      • Contacting me about your grade is not one of your options, and to do so is a waste of your time. Doing so may also contravene the Code of Behaviour on Academic Matters; for example, if you ask for a grade you did not earn, you are asking me to commit an offence under B.I.3(a) on your behalf.
    • It is no longer possible to appeal any course work, such as assignments or the midterm.
    • Enjoy your summer!
  • 2023-05-01 17:30: the exams are marked! Next thing, maybe tonight, is course grades.

  • 2023-04-30 22:55: into the last question, with 6(b) now being done, and I am 88% done. Four more parts of question 6 for tomorrow. My solutions as they stand now. I guess the terrible weather helped this weekend; I didn’t exactly feel like going outside and doing anything.

  • 2023-04-29 22:45: out of juice at the end of day 4. I am now 69% done, but that’s a bit of a cheat because I sped through 5(a) and the one-point 5(b) just now to get something done, having gotten bogged down on 4(d). I don’t like leaving questions half-done, but I will go back to it tomorrow and be as consistent as I can with what I was able to do today. Crowdmark tells me that I have over 3,000 question parts to read through altogether, of which I am now done just over 2,000. (Tonight, I had to resort to randomly sampling mp3 files from my music collection; what came up while I was working on question 5 was Mozart’s piano concerto no. 24, one of my favourites.) My solutions as they stand now.

  • 2023-04-28 23:50: day 3. I powered through 3(h) just now so that I could say I’m 50% done (which I am, exactly). I guess that means three more days of marking, plus getting course grades together. The laptop-on-couch method seems to be working well.

  • 2023-04-27 23:15: day 2, and I am now up to 32% done. My solutions so far. I have done all of the first two questions plus 3(a) and 3(e) from the long discriminant analysis question. It seems to be much quicker marking if I do it on my laptop sitting on the couch (!), minimizing the comments on the exams themselves and putting relevant comments in my solutions.

  • 2023-04-26 23:30: day 1 of exam marking. I am done through question 2(a), which makes me 15% done. I had an exam in my other course yesterday, so couldn’t get started until today. I am updating my solutions as I go with other things I notice, and will share them with you later.

  • 2023-04-23 12:00: reminder that the final exam is tomorrow, Monday April 24, at 9:00am in HW 216.

  • 2023-04-18 14:30: I have office hours tomorrow, Wed 19th, 11:00am-12:00pm and 2:00-3:00pm or longer if needed. There are two parts to the office hours in case you have an exam in one of them.

  • 2023-04-17 15:45: the final exam has been sent for printing. For those that care about such things: it has 6 questions on 9 pages, with a total of 34 parts worth a total of 84 points. The guideline is therefore 5 minutes per part or 2 minutes per point. The exam rules are the same as for the midterm (and, indeed, for all my exams): bring whatever you think may be helpful, but you will need to be able to quickly find what you are looking for in whatever you bring.

  • 2023-04-14 11:45: assignments 7 and 8 have been marked. Appeals, usual process, Apr 17-20.

  • 2023-04-09 13:00: happy Easter all. Here is a worksheet on the last week’s material. I remembered I had an old map problem that hasn’t made it into PASIAS yet, so I thought I would share it with you.

  • 2023-04-05 16:10:

    • it seems that I forgot to release marks for assignment 6. This I am about to do now. Appeals by usual process between Apr 7 and Apr 13 inclusive.
    • Assignments 7 and 8 will be released when marked.
    • My solutions to Assignment 8.
  • 2023-03-29 20:30:

  • 2023-03-23 11:30: My solutions to Assignment 6.

  • 2023-03-20 20:45: the help desk folks eventually sorted me out, but not in time for tutorial. I feel bad about this, so I made some recordings of the problems from PASIAS that I would have talked about in tutorial, and as soon as Zoom has finished processing them, I’ll share them with you:

    • MANOVA example with Australian athletes (and a very shaky transcript).
    • repeated measures example with rats and drinking water. The videos are about 15 mins each. (added 2023-03-22 13:30: if you need a passcode, this one should work: 1XrqhB=2*B starting at the 1 and ending at the B.)
  • 2023-03-20 16:10: sorry, folks, I’m not able to connect to zoom. (I got a new phone, which is apparently not set up for 2-factor authentication to get into zoom.)

  • 2023-03-20 13:15: we are running out of weeks! Here’s what I plan to talk about the rest of the way:

    • the rest of repeated measures
    • discriminant analysis
    • cluster analysis
    • maps / principal components / frequency tables as time permits.

    I have, as is usually the way, more stuff that I would like to talk about than we have weeks left, so the above seems like a decent compromise. Also, I am planning 8 assignments, so the last two should be due Mar 26 and Apr 2, with nothing due in the last week.

  • 2023-03-15 13:15:

    • your midterm mark should be up on Quercus. If you wrote the midterm, please check that your mark is there, and email me if it is not. I want to make sure that I have everybody.
    • lecture today:
    • PASIAS for these:
      • chapter 32 for MANOVA, eg problem 32.2 (solution in 32.5)
      • chapter 33 for repeated measures, eg problem 33.2 (solution in 33.8)
  • 2023-03-11 21:45:

    • the midterm is graded! I am about to release the marks, and also those of assignment 5, which I apparently forgot to do. Appeals on both by the usual procedure between March 18 and 25 inclusive.
    • exam stats, out of 57: Q1 34 (59%), median 41 (71%), Q3 45 (79%). These are very much in line with typical values for this course.
    • I have been (a) as consistent and (b) as generous as I reasonably can in grading the midterm. It is therefore at least as likely that your midterm mark will go down rather than up if you appeal (I may easily see somewhere where I have been too generous), and if you think I have graded you harshly somewhere, you can be confident that the same work earned the same grade for everybody there.
    • There was a lot of good work, and some very nice answers that displayed clear understanding.
    • My midterm solutions.
    • You will be getting an email from Crowdmark with your graded exam. Read through this and my solutions to see what you did well and where you could have done better.
  • 2023-03-10 22:30: grading has now reached the dizzying heights of 91%. Just 5(a) and (d) to go. I didn’t have the brain cells for 5(a) today.

  • 2023-03-09 23:45: after I combed through the various different ways people had of doing it, 4(c) is now done, and I am 73% done, and ready for bed. My solutions have been updated with my additional grading comments from today.

  • 2023-03-08 22:45: reached my limit for tonight. I have marked through question 3(a), which makes me now 45% done. I have a feeling 3(b) will be an adventure, but question 4 (the coding one) ought to be quick going to mark. My midterm solutions as they stand now (with additional comments added while grading).

  • 2023-03-07 23:10: progress report: around my lecture in my other course, and grading some of that exam, I have now completed 1(e) and have made a first pass at 1(f), namely the answers that looked like what I was expecting. Tomorrow I will make a call about the rest. I am currently 24% done.

  • 2023-03-06 22:45: the midterm was scanned late this afternoon, so this evening I have made a start on grading it. I have graded 1(a) through 1(c), and I am now apparently 14% done. (This feels rather like being at the gym, where you set up a certain workout, and it tells you what percentage of the way through you are so far, and you keep going because mainly you want it to be over.)

  • 2023-03-06 13:00: Monday update:

    • no tutorial today (as below)
    • lecture this week: analysis of covariance, multivariate ANOVA.
    • Assignment 6 will be due on Mar 19 (so you get an extra week off). There should be 8 assignments altogether.
    • PASIAS:
      • ANOVA review chapter 30. Problems 30.3 through 30.6 are about contrasts. There seem to be a lot of those.
      • Analysis of covariance chapter 31.
      • Multivariate ANOVA chapter 32.
  • 2023-03-06 09:00: no tutorial today. I will need to be either marking your exam, or marking my other exam, or preparing for one or both of those things.

  • 2023-03-04 11:30: I am in my office, so the exam goes ahead. If you are sick, or it is truly impossible for you to get to campus despite your best efforts, mark yourself absent on Acorn, and the weight of your midterm goes to the final exam (course policy).

  • 2023-03-04 06:00: according to the campus status page, campus is open today, and so our midterm goes ahead. Now, if you’ll excuse me, I’m going back to bed!

  • 2023-03-03 13:55 (edit 14:30): I tried to see whether there was a good way to reschedule our midterm, but it turns out that there isn’t. My thought was to hold it in class time on Wed, but our lecture room isn’t big enough for everybody for an exam (especially not for an open-book exam), and there are no other appropriately-sized rooms available at a time reasonably close to our class time. So the best we can do is (try) to hold the midterm tomorrow as scheduled. If it gets cancelled because campus is closed, the registrar’s office will re-schedule everything that was missed, and they will do it to minimize conflicts as they normally do. (Better to have the registrar’s office reschedule things than me, I figure.) I am expecting campus to be closed sometime tonight (added: tonight’s 7:00pm midterms are cancelled), and a decision to be made about reopening tomorrow morning.

  • 2023-03-03 11:15: weather report: the snowstorm is supposed to start tonight and, according to the Weather Network, it will taper off to flurries at 8:00 am tomorrow and stop snowing by 10. This means that if campus is open, our midterm goes ahead (and if campus is closed, it will be rescheduled.) Keep an eye on the campus status page. I don’t know what the plans are (and there undoubtedly are some): my guess is either that the campus opens at noon, or that the campus stays closed all day.

  • 2023-03-02 15:05: Campus status page. Our campus is on the right. Check before you come to campus, or before your exam if you are already here.

  • 2023-03-02 13:30: my solutions to assignment 5.

  • 2023-03-01 13:15: My slides for course review (part 2 today).

  • 2023-03-01 13:10: to summarize about the midterm:

    • in-person hand-written exam in HLB 101 Saturday Mar 4, 1:00-3:00pm
    • open book: you can bring any notes or other course materials, printed. Bear in mind that what you bring needs to be organized so that you can quickly find what you’re looking for.
    • exam comes with a booklet of Figures to refer to (that might contain code or output or graphs).
    • questions may ask for code to perform a task or for explanations of what you see in a Figure, or occasionally a description of the output some code will produce.
    • questions will be in similar style to what you see on assignments or old midterms.
    • coverage is all the material before reading week (up to and including simple effects in ANOVA). The new material in today’s class is not on the midterm, but will be on Assignment 6, due March 19).
    • there is no assignment due March 5, to allow you to study for the midterm, and there is likely no assignment due March 12 either.
    • as I recall, the midterm now contains 5 questions with 22 parts altogether worth a total of 54 points.
    • If you are sick on midterm day (especially if it is something that other people might catch from you), do not come to the midterm. In that case, fill out an absence declaration. The weight of the midterm goes onto the final exam (no makeups: this is course policy).
  • 2023-02-24 11:30: An end-of-reading-week Friday update:

    • Don’t forget that Assignment 5 is due on Sunday night.
    • Our midterm is on Saturday March 4, 1:00-3:00pm in HLB 101. Coverage is anything up to the end of the last lecture before reading week.
    • There is a lecture this coming Wednesday on the rest of the ANOVA material, as well as analysis of covariance.
    • Assignment 6 will not be due until March 12 or so (depending how long the midterm takes to mark, because I will be dividing my time between marking our midterm, marking the midterm for my other course, and putting Assignment 6 together for you).
  • 2023-02-23 11:45: Assignment 4 is graded. Appeals by the usual procedure between March 2 and March 8.

  • 2023-02-20 21:30: the midterm is nearly ready to go to printing. As it stands now, it has 5 questions with a total of 22 parts worth a total of 54 points. I may yet decide to modify that, however.

  • 2023-02-16 16:00: Apparently a Thursday update:

    • my solutions to Assignment 4.
    • next week is Reading Week. No tutorial or lecture.
    • Assignment 5 is not due until the end of Reading Week (the 26th).
    • our midterm is on March 4 (see details below). The coverage is up to including the lecture we had yesterday (that is, ANOVA up to and including simple effects). There is, according to current plans, a lecture on March 1, the material in which will not be on the midterm (but will no doubt be on the final exam). The midterm rules are as for C32, bring any course materials printed, and prepare for a handwritten in-person exam. Expect less code-writing and more interpretation and explanation than in C32, since this course is more about the statistics and less about the coding.
  • 2023-02-13 20:00: Assignment 3 has been graded. Appeals by usual process between Feb 20 and Feb 26.

  • 2023-02-12 12:45: there appears to be a Sunday update this week:

    • PASIAS problems to work through from last week’s lecture: dates and times 21,2, 21.4, 21.5; survival analysis, any of the (rather long) problems in chapter 29. (In the survival analysis problems, you may see me using crossing to create dataframes of combinations of values for predictions; think about how you would use datagrid to achieve the same thing.)
    • lecture this week: a short revisit of survival analysis (I raced through this last week), followed by another look at analysis of variance. Some of the ANOVA material will be familiar from C32 and B27, and some of it will be new. (Coming up after reading week are some things related to ANOVA, such as analysis of covariance, MANOVA, repeated measures, and discriminant analysis; this leads into multivariate methods where you have several response variables to analyze at once).
    • Assignment 5 goes out this week, but you have until Feb 26 to get it done (see below for why).
    • The week of Feb 20-24 that includes Family Day is our Reading Week. There is no tutorial or lecture that week.
    • There is a lecture on March 1, right before our midterm. The material in this lecture will not be on the midterm, but you can be sure that it will be on the final exam, so miss it at your own risk.
    • There will be no assignment due on March 5, since you will have just written a midterm, and I will be in the middle of grading it!
  • 2023-02-08 23:30: I had a burst of activity tonight working on your Assignment 5, which it seemed needed to be due on Feb 19. This, however, is on the beginning of Reading Week (Feb 20 is Family Day), so I am making it due a week later, on Feb 26. This gives you the option of getting it done before Reading Week, or of finishing it off after Reading Week. Looking further ahead, the following Sunday is March 5, which is the weekend of our midterm, so there will be no assignment due that week either. I still have to decide whether we have an assignment due on March 12.

  • 2023-02-08 13:30: My solutions to assignment 3, with extra discussion about predictions.

  • 2023-02-08 12:00: Survival analysis slides updated.

  • 2023-02-06 13:00: Assignment 2 has been graded, and I am about to release the grades. Appeals by the usual procedure, between Feb 13 and 19 inclusive.

  • 2023-02-05 16:10: There is something funny going on with predictions from the marginaleffects package, possibly at my end (though possibly not). Some people, possibly working on jupyter or r.datatools, are getting a different layout of the result from predictions, without the values being predicted from. A look at the documentation suggests that what I’m getting is not what should be happening, and what you might be getting is what should be happening. To make things work for now, follow my suggestion in the discussion thread on Quercus, summarized here as:

    • run cbind(predictions(...)) instead of predictions(...) if you need to to get the predictions displayed next to the values they are predictions for
    • the column with the predictions in it might be called estimate instead; bear this in if you select the columns you want to display.

    While I figure out what is happening, you can help by contributing to the Quercus discussion thread to let me know whether your output from predictions looks like mine, or like the student’s output at the top of the thread.

  • 2023-02-03 11:30: Friday update:

    • We have a midterm date: Saturday March 4, 1:00-3:00pm in HLB 101.
    • Assignment 3 due on Sunday night. You may find this one rather long. (Assignment 4 will be shorter.)
    • Next week’s lecture: dates and times (part 1), survival analysis (part 2). Survival analysis is a regression-like analysis of the time until something happens; often this is of patients in hospital, and the time of admission and the time of event are recorded as dates (or dates and times), so it is as well to be able to handle dates and times.
    • Hint: if you’re going to watch the video lectures, watch them with the current slides. Some things have changed between when I made the videos and now, and I am expecting you to do things the new way. (The same may be true of PASIAS in places; there may be some predictions done the old way, for example).
  • 2023-02-01 20:30:

    • I finally got around to publicizing the correct assignment solutions (in the last note).
    • (edited 2023-02-03 11:40) Relevant PASIAS problems for today’s lecture are from chapter 27 (ordinal response) and chapter 28 (nominal response). I already suggested 27.3 and 27.4 for the former; you might like 28.3 and the rather large 28.4 (the data set is very big), or possibly 28.6, which is a data set you know already. (I have a feeling the predictions in 28.6 and maybe elsewhere are done the old way, not the datagrid and predictions way we saw in lecture today. R is a “moving target” sometimes, and things change faster than I can update everything. I can talk about predictions some more in tutorial on Monday if you wish, but for the assignment, do the best you can.) >>>>>>> 81a38e8061be51674c841a9d8b13bf8cd5959e30
  • 2023-02-01 11:55: two things:

  • 2023-01-30 12:40: Assignment 1 has been graded, and I am about to release the marks. Appeal process is the same as for C32: you need to be able to say that there has been a mistake in the grading (disagreeing with the grader’s judgement is not enough). Window for appeals is February 6-12.

  • 2023-01-27 15:30: Friday update:

    • Assignment 2 is due on Sunday.
    • My solutions to Assignment 1.
    • In lecture next week: the rest of ordinal logistic regression, and multinomial logistic regression. The distinction between these two is whether the response categories are or are not ordered. One of the problems on Assignment 3 is on ordinal logistic regression, so you might benefit from reading ahead or waiting until Wednesday to tackle it. The slides for this are all here. I recognize that Assignment 3 will be long; I will make sure that Assignment 4 is shorter.
    • Problems to work through from this week’s lecture: the ones listed in the note from two days ago.
  • 2023-01-25 20:00:

    • We got part of the way through ordinal logistic regression in today’s abbreviated class. I glossed over some things, and will fill in the details next week.
    • I hope everyone got home safely and not too slowly.
    • I don’t have a worksheet for you this week, because there are lots of problems in PASIAS to work through. For example, 26.4 and the rather complex 26.5 get you into multiple logistic regression (my solutions in 26.14 and 26.15), and 27.3 and 27.4 get you into ordinal logistic regression. For 27.3, you might like to either do 27.2 first, or find out about drop_na to get rid of the missing values.
    • Next week, I plan to finish what I started today, and get into multinomial logistic regression (where the response categories are not ordered). There is an Assignment 2 out now, on last week’s material, and there will be an Assignment 3 next week for which some of the material in next Wednesday’s class may be helpful to you. As I recall, Assignment 3 is on the long side, but Assignment 4 is likely to be short.
  • 2023-01-25 12:45: if you don’t make it to class today, you can watch the old lecture videos. On the lecture video page on Quercus, the ones you want are the second half of lecture 2a, and (in principle) all of lecture 2b. Next week, I will pick up from wherever I leave off today.

  • 2023-01-25 10:30: campus is open, and I am in my office, so class will go ahead (unless you hear otherwise). According to the last weather forecast I saw, the worst of the snow will start around 5, and we’ll be done class by then. (Added 11:45: looking out of my office window and watching the snow come down, my inclination is to only go to about 3:30 today, so that if you want to get off campus before 4, you’ll be able to do so.)

  • 2023-01-24 14:00: I just looked at the weather forecast for tomorrow, and it is supposed to start snowing here right when our class begins. It would probably be wise to check the campus status page before you head to campus. Sometimes UTSC pre-emptively closes campus if bad weather is forecast. If we do miss class, then it’s the same as an old-fashioned snow day; this week’s class would not be made up or taken online, and next week’s class would pick up from where we left off last week.

  • 2023-01-20 12:45: we seem to have a Friday update this week:

    • Monday tutorial: bring questions, confusions etc about what’s been happening so far in the course. I can talk about worksheet 2 if that would be helpful.
    • next week’s lectures: see the Jan 18 20:40 note. You might like to read through the slides (or even watch the videos) before lecture, so that you have a sense of where we are going.
  • 2023-01-18 23:50: I have spent rather too much of my evening putting together a more suitable question for your assignment 2 (next week), one that is actually based on what we did today. This means that I already have one question ready for your assignment 3 (the one that was until tonight on assignment 2), and that will save me some time next week.

  • 2023-01-18 20:40:

    • a smallish worksheet for practice on what we did in class today. I realized that the second problem I put on next week’s assignment is something we didn’t really get to today, so I will be shuffling things about between now and next week to make it fairer for you. (Figuring out how far I will get in class is as much an art as a science.)
    • Next week’s class begins with multiple logistic regression, where the response side still has a success/failure, but the explanatory side contains several things, like a multiple regression, and we want to figure out which ones of them to keep.
    • After that, we have to deal with the response side having more than two categories, and this breaks up into two parts according to whether they have a natural order (like levels of severity of disease), or not (like favourite colour or brand of cellphone someone prefers). I’m expecting that we’ll get into ordered response next week.
  • 2023-01-16 11:30: on the agenda for this week:

    • lectures: the rest of the regression stuff (part 1 on Wed), the start of logistic regression (part 2).
    • Assignment 1 (just one question: there will usually be two) is posted at midnight on the beginning of Tuesday, and will be due on Sunday night.
  • 2023-01-13 12:00: There seems to be a Friday update:

    • earlier this week I learned about a new version: click this link of the online access to R Studio, the one that you may have been accessing through jupyter.utoronto.ca. The new link has the newest version of R and R Studio, so you might like to switch to it. (The old one will continue to exist at least for a while.) If you are using R Studio on your own computer, this does not affect you at all.
    • lecture code: the tab by that name on this website now links to my most up-to-date project of lecture notes, so that what you see there is consistent with what you will see in lecture, and tells you how you can get a copy for yourself should you wish to.
    • I will be setting up a zoom meeting for Monday’s “tutorial”. The actual link will be shown on Quercus, since I only want it to be accessible to students in this class (and not be public).
  • 2023-01-11 11:45 (edited 2023-01-12 18:00): a worksheet to go through (for yourself) after today’s class. There is one longish question (after editing). I can’t promise that you’ll get a worksheet every week, but there didn’t seem to be a good PASIAS problem on this stuff, so you get a worksheet this week.

  • 2023-01-08 20:00: almost time to begin. Our first class is on Wednesday afternoon, and our first tutorial (or practical or whatever it is called) will be on Monday Jan 16. We should probably make some arrangements for that on Wednesday.

  • 2023-01-03 14:50: this year’s course outline.

  • 2022-12-31 21:30: Happy new year, everyone!

    • This course follows on from STAC32, and assumes mastery of the statistical and coding material you learned there. The focus here is less on the coding and more on the statistical methods. These will be almost entirely new to you, but a lot of them build on the regression and ANOVA ideas you learned in previous courses.
    • This course is cross-listed as the graduate course STA 1007. If you are enrolled in that course, you should know or be willing to learn about what was covered in STAC32 (statistical methods up to multiple regression, and R coding for them).
    • The Quercus page for this course is now up and running. If you are in STA 1007, you should get redirected to the STAD29 one.
    • I suspect I will have to edit the course outline to reflect how we are doing things this year. There will be a midterm and final exam, assignments, and (for the STA 1007 people only) a project. (Last year, we were still online at this point.)