John Holcomb (Cleveland State University)
Angela Spalsbury (Youngstown State University)
Journal of Statistics Education Volume 13, Number 3 (2005), www.amstat.org/publications/jse/v13n3/datasets.holcomb.html
The objective of the study for which you will analyze the data was to determine if significant gender differences existed between subjects 65 years of age and older with regard to calcium, inorganic phosphorous, and alkaline phosphatase levels (Boyd et al., 1998). The researchers performed a retrospective chart review of laboratory procedures performed in 6 different physician practices. The data consisted of 178 subjects representing 92 males and 86 females age 65 or older. In the dataset, there are three discrete variables, sex, lab, and agegroup. The coding is as follows:
Var | Code |
---|---|
Sex | 1 = Male; 2 = Female |
Lab | 1 = Metpath; 2 = Deyor; 3 = St. Elizabeth's; 4 = CB Rouche; 5 = YOH; 6 = Horizon |
Agegroup | 1 = 65-69; 2 = 70-74; 3 = 75-79; 4 = 80-84; 5 = 85-89 |
The other variables of age (years), alkphos - alkaline phosphatase (IU/L), cammol - calcium (mmol/L), and phosmmol – inorganic phosphorus (mmol/L), are continuous.
The first task of the assignment is to check the validity of the data. Determine if this is a “messy” dataset with variable values that appear incorrect. Attempt to recover the correct values by looking up the true values from the actual data records. Copies of these can be found on bigtable.htm. Be sure to catalogue the problem values in the data and the changes that were made to clean the dataset. Include a paragraph detailing the steps taken to clean the dataset.
Once the data are “clean”, perform a summary analysis of the three discrete variables (sex, lab, and agegroup). For the variables alkphos, cammol and phosmmol, report the mean, median, standard deviation, min and max broken down by sex. Also summarize the variables alkphos, cammol and phosmmol in a similar way with the factor variable as lab.
Construct side by side boxplots of the variables alkphos, cammol and phosmmol with the factor variable as sex. Next construct side by side boxplots of the alkphos, cammol and phosmmol continuous variables with the factor variable as lab.
Compare the mean and standard deviation of age, alkphos, cammol and phosmmol from the messy dataset with the mean and standard deviation from your cleaned dataset. Does cleaning the data make a difference? Explain.
Using your summary statistics and your side-by-side boxplots, do you believe a significant difference exists in alkphos, cammol and phosmmol levels with respect to sex? Why or why not? Do you believe a significant difference exists in alkphos, cammol and phosmmol levels with respect to lab? Why or why not?
Suppose Mr. and Mrs. Contrarian are married and Mrs. Contrarian has lower calcium than Mr. Contrarian. She refuses to believe the results of the study that men tend to have lower calcium than women because she has lower calcium than her husband. Using your results to question #3, explain to Mrs. Contrarian the flaw in her thinking.
One of the objectives of this research was to propose a reference range of values that are to be considered “normal” for calcium, inorganic phosphorus, and alkaline phosphatase. Looking at the results for cammol alone for each of the labs, explain why a single reference range is so difficult to establish.
The file calcium.dat.txt contains the data with the problem values. The file calciumgood.dat.txt contains the data with the problem values corrected. The observation grid can be found at bigtable.htm. The file calcium.txt is a documentation file that contains a brief description of the dataset and the purpose of the assignment.
Calcium.dat.txt
Columns | Variable | Comment |
---|---|---|
9-11 | OBSNO | Patient Observation Number |
21-22 | AGE | Years |
33 | SEX | 1=Male, 2=Female |
42-44 | ALKPHOS | Alkaline Phosphatase International Units/Liter |
55 | Lab | Lab: 1=Metpath; 2=Deyor; 3=St. Elizabeth's; 4=CB Rouche; 5=Youngstown Osteopathic Hospital; 6=Horizon |
63-66 | CAMMOL | Calcium mmol/L |
74-77 | PHOSMMOL | Inorganic Phosphorus mmol/L |
88 | AGEGROUP | Age group 1=65-69; 2=70-74; 3=75-79; 4=80-84; 5=85-89 Years |
Calciumgood.dat.txt
Columns | Variable | Comment |
---|---|---|
9-11 | OBSNO | Patient Observation Number |
20-22 | AGE | Years |
32-33 | SEX | 1=Male, 2=Female |
42-44 | ALKPHOS | Alkaline Phosphatase International Units/Liter |
54-55 | Lab | Lab: 1=Metpath; 2=Deyor; 3=St. Elizabeth's; 4=CB Rouche; 5=Youngstown Osteopathic Hospital; 6=Horizon |
62-66 | CAMMOL | Calcium mmol/L |
74-77 | PHOSMMOL | Inorganic Phosphorus mmol/L |
88 | AGEGROUP | Age group 1=65-69; 2=70-74; 3=75-79; 4=80-84; 5=85-89 Years |