1 Probability / 確率

1.1 z score

The area of the standard normal distribution (PDF: \(\phi\)) to the left of \(z = 1.96\) is the probability of \(\Pr(Z \le 1.96)\).

\[ \Pr (Z \le 1.96) = \int_{- \infty}^{1.96} \phi (z) dz \approx 0.975 \]

The function name pnorm comes from p (probability) + norm (normal distribution). pnorm is a function that returns the probability corresponding to a given z-score.

標準正規分布の \(z = 1.96\) より左側の面積は確率 \(Pr(Z \le 1.96)\) である.

pnorm は与えられた \(z\) スコアに対応する確率を返す関数で,関数名は p (probability) + norm (normal distribution) から来ている.

pnorm(q = 1.96, mean = 0, sd = 1)
## [1] 0.9750021

Conversely, qnorm is a function that returns the z-score corresponding to a given probability.

その逆に,qnorm は与えられた確率に対応する \(z\) スコアを返す関数.

qnorm(p = 0.975, mean = 0, sd = 1)
## [1] 1.959964
qnorm(p = 0.975)  # for N(0,1), the mean and sd arguments may be omitted 
## [1] 1.959964

1.2 t score

Just like with the z-score, we can find the corresponding \(t\) value for a given probability.

\[ \Pr(T < t) = 0.975 \]

Since the \(t\) distribution depends on the degrees of freedom, you need to specify the df (degrees of freedom) parameter.

z score の場合と同様に,所与の確率に対応する \(t\) 値を求めることができる.

\(t\) 分布は自由度に依存するので,df (degree of freedom) 引数を指定しなければならない.

qt(p = 0.975, df = 10)
## [1] 2.228139

As the degrees of freedom increase, the values become almost identical to those of the standard normal distribution.

自由度が大きくなると標準正規分布の場合とほとんど同じ値になる.

qt(p = 0.975, df = 1000)
## [1] 1.962339

2 Confidence interval / 信頼区間

2.1 Population mean with unknown \(\sigma\) / 母集団の標準偏差が未知の場合の母平均

For a sample mean of 100, sample standard deviation of 2, and sample size of 100, the confidence interval is calculated as follows:

\[ \bar{x} \pm t \frac{s}{\sqrt{n}} = 100 \pm 1.98 \frac{2}{\sqrt{100}} = 100 \pm 0.396, \quad \mbox{95% CI}: [99.60, 100.40] \]

Note that we use t-value, not z-value.

標本平均が100,「標本」標準偏差が2,サンプルサイズが100の場合,信頼水準95%の信頼区間は上のようになる.

qt(p = 0.025, df = 100-1)  # P(T<t) = 0.025
## [1] -1.984217
100 + qt(p = 0.025, df = 100-1) * 2 / sqrt(100)  # lower limit
## [1] 99.60316
100 + qt(p = 0.975, df = 100-1) * 2 / sqrt(100)  # upper limit
## [1] 100.3968

2.1.1 Using t.test function

The function t.test, which performs a \(t\)-test, also outputs a confidence interval.

\(t\)-検定を行うt.testという関数は信頼区間をついでに出力してくれる.

wage <- c(1000, 1200, 1300, 1200, 1150, 1000, 1450, 1500, 1150, 1350)
t.test(wage)  # default: 95% CI
## 
##  One Sample t-test
## 
## data:  wage
## t = 22.841, df = 9, p-value = 2.806e-09
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  1108.179 1351.821
## sample estimates:
## mean of x 
##      1230

The confidence level (\(1 - \alpha\)) is set to 95% by default. We can change it to a different confidence level by specifying the conf.level argument.

信頼水準 (\(1 - \alpha\)) はデフォルトで95%に設定されている. conf.level 引数を指定すればそれ以外の信頼水準に変更できる.

t.test(wage, conf.level = 0.99)  # 99% CI
## 
##  One Sample t-test
## 
## data:  wage
## t = 22.841, df = 9, p-value = 2.806e-09
## alternative hypothesis: true mean is not equal to 0
## 99 percent confidence interval:
##  1054.991 1405.009
## sample estimates:
## mean of x 
##      1230

3 Statistical hypothesis testing / 統計的仮説検定

3.1 One-sample \(t\) test for population mean / 1標本の母平均の \(t\) 検定

Null and alternative hypothesis:

\[ H_0 : \mu = \mu_0, \quad H_1 : \mu \ne \mu_0 \]

where \(\mu_0\) is hypothesized value under the null hypothesis.

Test statistic \(t\):

\[ t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}} . \]

Specify it as t.test(vector of sample data, mu = mu_0).

t.test(ベクトル型の標本データ, mu = 帰無仮説の仮説値) と指定.

wage <- c(1000, 1200, 1300, 1200, 1150, 1000, 1450, 1500, 1150, 1350)
t.test(wage, mu = 1100)  # default: two sided ... H1 is [mu != 1100]
## 
##  One Sample t-test
## 
## data:  wage
## t = 2.414, df = 9, p-value = 0.03899
## alternative hypothesis: true mean is not equal to 1100
## 95 percent confidence interval:
##  1108.179 1351.821
## sample estimates:
## mean of x 
##      1230

3.2 Two-sample \(t\) test for population mean / 2標本の母平均の \(t\) 検定

3.2.1 Independent \(t\) test (Welch’s \(t\) test)

Null and alternative hypothesis:

\[ H_0 : \mu_1 = \mu_2, \quad H_1 : \mu_1 \ne \mu_2 \]

Test statistic \(t\):

\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s_1^2 / n_1 + s_2^2 / n_2}} . \]

Specify it as t.test(x = sample 1, y = sample 2).

t.test(x = 1つ目の標本(ベクトル), y = 2つ目の標本(ベクトル)) のように指定する.

wage_jp <- c(1000, 1200, 1300, 1200, 1150, 1000, 1450, 1500, 1150, 1350)  # Japan
wage_us <- c(900, 1300, 1200, 800, 1600, 850, 1000, 950)  # US
t.test(wage_jp, wage_us)  # default: Welch's test (assuming unequal variance)
## 
##  Welch Two Sample t-test
## 
## data:  wage_jp and wage_us
## t = 1.4041, df = 11.205, p-value = 0.1874
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -87.42307 397.42307
## sample estimates:
## mean of x mean of y 
##      1230      1075

3.2.2 Paired (dependent) \(t\) test

A paired t-test is calculated by adding the argument paired = TRUE in the t.test function.

対応のある t 検定は t.test の引数で paired = TRUE 引数を追加して計算する.

wage_w <- c(1000, 1200, 1300, 1200, 1150, 1000, 1450, 1500, 1150, 1350)  # wife
wage_h <- c(900, 1300, 1200, 800, 1600, 850, 1000, 950, 1200, 1400)  # husband
t.test(wage_w, wage_h, paired = TRUE)
## 
##  Paired t-test
## 
## data:  wage_w and wage_h
## t = 1.1638, df = 9, p-value = 0.2744
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -103.8108  323.8108
## sample estimates:
## mean difference 
##             110

It is equivalent to a one-sample test on the differences between the elements.

「要素ごとの差」に対する一標本の検定と同値になっている.

t.test(wage_w - wage_h)  # same as an ordinary "one-sample t test"
## 
##  One Sample t-test
## 
## data:  wage_w - wage_h
## t = 1.1638, df = 9, p-value = 0.2744
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -103.8108  323.8108
## sample estimates:
## mean of x 
##       110