Worksheet 9

Published

March 6, 2025

Packages

library(tidyverse)
library(MASS, exclude = "select")

Diabetes

According to the Mayo Clinic,

Diabetes mellitus refers to a group of diseases that affect how the body uses blood sugar (glucose). Glucose is an important source of energy for the cells that make up the muscles and tissues. It’s also the brain’s main source of fuel. The main cause of diabetes varies by type. But no matter what type of diabetes you have, it can lead to excess sugar in the blood. Too much sugar in the blood can lead to serious health problems.

The data in http://ritsokiguess.site/datafiles/diabetes1.csv are from 145 non-obese adult patients classified into three groups (types of diabetes): “normal”, “overt”, and “chemical”. For each patient, five other variables were also recorded:

  • rw: relative weight, the ratio of actual weight to ideal weight for the person’s height.
  • fpg: fasting plasma glucose
  • glucose: area under plasma glucose curve after 3-hour glucose tolerance test
  • insulin: area under plasma insulin curve after 3-hour glucose tolerance test
  • sspg: steady-state plasma glucose

These variables are recorded here as \(z\)-scores (they were originally measured on vastly different scales).

Our aim is to investigate any association between the five measured variables and the diabetes type (in group).

  1. Read in and display (some of) the data.
  1. Using manova, demonstrate that the group has some kind of effect on the other variables.
  1. Run a discriminant analysis and display the results.
  1. Comment briefly on the relative importance of the linear discriminants.
  1. Which two of the original quantitative variables play the largest role in LD1? What kind of values on those variables would make the LD1 score large (very positive)?
  1. Obtain and save a dataframe containing the predicted group memberships, posterior probabilities, and discriminant scores for each individual, along with the original data. Display (some of) your dataframe.
  1. Obtain a table counting the number of individuals who actually had each type of diabetes, cross-classified by the type of diabetes they were predicted to have. Does the classification appear to be good or bad? Explain briefly.
  1. Find an individual that was misclassified (it doesn’t matter which one). For your chosen individual, was the misclassification clear-cut or a close thing? Explain briefly.
  1. Make a plot of LD1 and LD2 scores for each individual, distinguished by the group they belong to. There are too many points on this plot to label individually.
  1. Which group is on the right on your plot? What does that say about this group’s values on the original quantitative variables?