Math 204 Assignment 3
Deadline: 11.59pm, Thursday 2nd April, 2015
Submit via myCourses
Comp and Circumstance
A McGill statistics professor (name withheld for legal reasons) is being sued by a number of former
study participants. The volunteers are demanding compensation for being forced to follow a strict
diet of poutine, hamburgers, or fish and chips, drink large amounts of coffee and – worst of all – attend
statistics lectures. Each volunteer is claiming a different amount of compensation, with the figure
appearing to vary depending on which diet they were subscribed to, how much coffee they drank,
and how many statistics lectures they were forced to attend. Despite the seriousness of the situation,
the professor is determined to have a fun time, and so is interested in using statistical methods to
analyze these data.
In the zip file “Assignment3Files” you will find datasets with file names of the form xxxxxxxxx.csv.
You should find the dataset with your McGill ID number as the file name, and only use that particular
file for this assignment. If you can’t find your ID number in the list email me as soon as possible!
Note that each file will contain a unique dataset so make sure you use the correct one!
Your dataset contains five variables, with each row corresponding to one ‘volunteer’. The variables
are as follows:
• comp: the compensation being sought (in Canadian dollars).
• diet: the diet the subject was forced to follow, coded as “CAN”, “USA”, and “UK” for poutine,
hambugers, and fish and chips, respectively.
• coff: the amount of coffee the subject was forced to drink (in millilitres).
• lect: the number of statistics lectures the subject was forced to attend.
• mood: the subject’s reported mood after completion of their diet, coded as “:)” and “:(” for
‘happy’ and ‘unhappy’, respectively.
For the purposes of this assignment, you may treat comp, coff and lect as continuous covariates,
while diet and mood are categorical. You should use these data to answer the questions that follow.
You may assume that all necessary model assumptions are satisfied.
Late work will be penalized by loss of marks, and should be submitted to me by email.
The time of your submission will be determined by my McGill email client (rounded to the nearest
minute). Work submitted from 0000 to 0159 (inclusive) on Friday 3rd April will lose 1 mark. Work
submitted from 0200 to 0359 (inclusive) will lose 2 marks. This pattern will continue, with 1 mark
lost for every additional 2 hours, until your grade is a non-positive integer.
Part I: 13 marks
(a) Fit a linear regression model that supposes mean compensation is a linear combination of diet
followed, coffee drunk and lectures attended (without any interaction terms). Summarize the results
of your model in a manner similar to the following table. Your answer must clearly indicate each
parameter estimate and the p-value associated with a test of the null hypothesis that the true
parameter value is equal to zero against the alternative that it is not equal to zero. Note that your
table may vary slightly depending on what is chosen to act as the ‘baseline’ diet, and that some
marks are available for presentation. [4 marks]
Parameter Estimate p-value
Intercept
Diet (UK)
Diet (USA)
Coffee
Lectures
(b) According to your model, discuss how predicted compensation varies depending on diet. [3 marks]
Next, consider the following four models, where compensation is modeled as a linear combination of
the corresponding terms. Note that Model 1 is the model fit in part (a).
• Model 1: Diet, coffee, and lectures.
• Model 2: Diet.
• Model 3: Diet and lectures.
• Model 4: Diet, coffee, lectures, and an interaction between diet and coffee.
(c) Use backward selection based on the ANOVA F-test to choose the best model of these four. You
should summarize your results in a similar manner to the table below, making it clear at each stage
which models are being compared, which is the ‘complete’ model, and which is the ‘reduced’ model
(as per the terminology used in class), the sum of squared errors for both models, and the resulting
F statistic and p-value. Your table should be followed by a sentence stating clearly which
model is recommended by the procedure. [6 marks]
Complete Reduced
Step Model Model SSEC SSER F p-value
1
2
3
Warning: this isn’t the end of the assignment – more statistical fun over the page!
Part II: 7 marks
In preparing his defense, the professor decides to investigate whether a volunteer’s mood was independent
of their diet. He hopes that, if he finds evidence against the null hypothesis of independence
between these variables, he can persuade the academic community (and a jury) that his research is
worthwhile and not, as one of the volunteers put it, “the work of a man driven insane by the power
of having a PhD in statistics”.
(d) Produce a table like the one below of the observed counts of mood and diet, along with row and
column totals. [2 marks]
Diet
Mood CAN UK USA Total
Total
(e) If the null hypothesis were true, how many of the subjects who followed the Canadian diet would
you expect to be happy? Give your answer to 1 decimal place. (You do not need to show your
working – a correct answer is worth full marks – but if your final answer is wrong you could be
awarded 1 mark for appropriate working.) [2 marks]
(f) Carry out an appropriate Chi-squared test of the null hypothesis that mood and diet are independent,
summarizing the results of your analysis in 100 words or fewer. Note that your answer should
be written in complete sentences, such as for the conclusion of a report. (Hyphenated words, such
as p-value, count as one word, as do any numbers. If you exceed the word limit only the first 100
words of your answer will be marked.) [3 marks]