---
title: "Homework 03"
author: "James H. Steiger"
output: html_document
---
Maximum Score = 100 points
1. (35 points). Murnane and Willett (2011), in their textbook *Methods Matter*, have a simplified, conceptual introduction to notions of power calculation, sample size estimation, and effect size in their Chapter 6. The chapter is available for downloading from the course website. The authors give a technical example which is in full development by page 87, on which there is a footnote 5 that purports to give a formula for the mean of the noncentral $t$ distribution when the null hypothesis is false. This example concerns the two-sample, independent sample (between subjects) design described in detail in their Chapter 4. In this study, some randomly selected subjects received a school voucher (V) and some did not (NV). The null hypothesis is that
$$H_0:~~\mu_V - \mu_{NV} = 0$$
The authors define
$$\delta = \mu_{V} - \mu_{NV}$$
then state in the footnote that the noncentral $t$ has a mean of
$$\delta \sqrt{\nu/2} \frac{\Gamma((\nu-1)/2)}{\Gamma(\nu/2)}$$
with degrees of freedom $\nu = n_1 + n_2 -2$.
Read the chapter, and then answer the following questions.
+ a. (5 points). The formula in footnote 5 has typos (mismatched parentheses, which I fixed in the version above), but is also obviously incorrect. Why? Something (a well-known statistical quantity expressed in a single Greek letter) is not present in the formula which any student with a proper general conception of power analysis should realize is missing. What is this missing quantity that is a ``dead giveaway'' that the formula is incorrect? (Note: You cannot just insert this quantity into the formula and make it correct. More surgery is needed.)
+ b. (20 points). Suppose that the actual population statistics match the sample statistics, i.e., $\mu_{V} - \mu_{NV} = 4.899$, and $\sigma = 19.209$. Following the system we used in class, perform a simulation experiment with $n_1 = n_2 = 500$. Perform 100000 replications of a 2-sample $t$, and report the mean of the resulting statistics. Compare it to the mean calculated from the Murnane-Willett formula, and your own formula. To help you, I include R code to calculate the mean. Note how I use the `lgamma` function rather than the `gamma` function. The former gives the logarithm of the gamma. Note how I compute the ratio of two gammas by exponentiating the difference between two lgammas. This is to avoid overflow, because $\Gamma(n) = (n-1)!$ and becomes very large very quickly. Show your code.
You will find that the simulation results fail to match the Murnane-Willett formula.
+ c. (10 points). Go to Wikipedia and look up the mean (expected value) of a noncentral $t$ variable. Compare it to results in my class notes on *The t distribution and its application* (slide 27) and the formula in Murnane and Willett. Rewrite the Murnane-Willett formula so that it is correct for the two-sample $t$. Write an R formula to compute the correct mean, and, using it, show that your simulation results match the corrected formula.
2. (10 points). You wish to perform a 2-sample, independent sample $t$-test of the hypothesis
$$
\begin{equation}
\mu_1 = \mu_2
\end{equation}
$$
in a situation in which $\alpha = 0.05$, and the standardized effect size
$$
E_s = \frac{\mu_1 - \mu_2}{\sigma}
$$
is believed to be 0.50. Assuming equal $n$ per group, how large a sample size *per group* would you need to guarantee power of at least 0.90?
3. (25 points). In the same situation described in the previous problem, suppose the the standardized effect size $E_s$ is to be estimated from sample data, based on sample sizes $n_1 = 20$, and $n_2 = 30$. You gather your data, and obtain a $t$ statistic value of $2.922$.
+ a. (5 points). What are the degrees of freedom?
+ b. (10 points). Give a 95\% confidence interval for $\delta$, the noncentrality parameter of the noncentral $t$ statistic's distribution.
+ c. (10 points). Convert this into a 95\% confidence interval on $E_s$, the population standardized effect size (or ``standardized mean difference'').
4. (30 points). You are testing the hypothesis that $\mu_1 - \mu_2 - \mu_3 + \mu_4 = 0$ with 4 equal *independent* samples each of size $n = 25$. You observe a $t$ statistic of 2.098.
+ a. (5 points). What are the degrees of freedom?
+ b. (10 points). Give a 95\% confidence interval for $\delta$, the noncentrality parameter of the noncentral $t$ statistic's distribution.
+ c. (15 points). Convert this into a 95\% confidence interval on $E_s$, the population standardized effect size (or ``standardized mean difference''). $E_s$ is defined in this case as
$$
E_s = \frac{\mu_1-\mu_2-\mu_3+\mu_4}{\sigma}
$$