The area of the standard normal distribution (PDF: \(\phi\)) to the left of \(z = 1.96\) is the probability of \(\Pr(Z \le 1.96)\).
\[ \Pr (Z \le 1.96) = \int_{- \infty}^{1.96} \phi (z) dz \approx 0.975 \]
The function name pnorm
comes from p (probability) +
norm (normal distribution). pnorm
is a function that
returns the probability corresponding to a given z-score.
標準正規分布の \(z = 1.96\) より左側の面積は確率 \(Pr(Z \le 1.96)\) である.
pnorm
は与えられた \(z\)
スコアに対応する確率を返す関数で,関数名は p (probability) + norm
(normal distribution) から来ている.
pnorm(q = 1.96, mean = 0, sd = 1)
## [1] 0.9750021
Conversely, qnorm
is a function that returns the z-score
corresponding to a given probability.
その逆に,qnorm
は与えられた確率に対応する \(z\) スコアを返す関数.
qnorm(p = 0.975, mean = 0, sd = 1)
## [1] 1.959964
qnorm(p = 0.975) # for N(0,1), the mean and sd arguments may be omitted
## [1] 1.959964
Just like with the z-score, we can find the corresponding \(t\) value for a given probability.
\[ \Pr(T < t) = 0.975 \]
Since the \(t\) distribution depends on the degrees of freedom, you need to specify the df (degrees of freedom) parameter.
z score の場合と同様に,所与の確率に対応する \(t\) 値を求めることができる.
\(t\)
分布は自由度に依存するので,df
(degree of freedom)
引数を指定しなければならない.
qt(p = 0.975, df = 10)
## [1] 2.228139
As the degrees of freedom increase, the values become almost identical to those of the standard normal distribution.
自由度が大きくなると標準正規分布の場合とほとんど同じ値になる.
qt(p = 0.975, df = 1000)
## [1] 1.962339
For a sample mean of 100, sample standard deviation of 2, and sample size of 100, the confidence interval is calculated as follows:
\[ \bar{x} \pm t \frac{s}{\sqrt{n}} = 100 \pm 1.98 \frac{2}{\sqrt{100}} = 100 \pm 0.396, \quad \mbox{95% CI}: [99.60, 100.40] \]
Note that we use t-value, not z-value.
標本平均が100,「標本」標準偏差が2,サンプルサイズが100の場合,信頼水準95%の信頼区間は上のようになる.
qt(p = 0.025, df = 100-1) # P(T<t) = 0.025
## [1] -1.984217
100 + qt(p = 0.025, df = 100-1) * 2 / sqrt(100) # lower limit
## [1] 99.60316
100 + qt(p = 0.975, df = 100-1) * 2 / sqrt(100) # upper limit
## [1] 100.3968
t.test
functionThe function t.test
, which performs a \(t\)-test, also outputs a confidence
interval.
\(t\)-検定を行うt.test
という関数は信頼区間をついでに出力してくれる.
wage <- c(1000, 1200, 1300, 1200, 1150, 1000, 1450, 1500, 1150, 1350)
t.test(wage) # default: 95% CI
##
## One Sample t-test
##
## data: wage
## t = 22.841, df = 9, p-value = 2.806e-09
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 1108.179 1351.821
## sample estimates:
## mean of x
## 1230
The confidence level (\(1 -
\alpha\)) is set to 95% by default. We can change it to a
different confidence level by specifying the conf.level
argument.
信頼水準 (\(1 - \alpha\))
はデフォルトで95%に設定されている. conf.level
引数を指定すればそれ以外の信頼水準に変更できる.
t.test(wage, conf.level = 0.99) # 99% CI
##
## One Sample t-test
##
## data: wage
## t = 22.841, df = 9, p-value = 2.806e-09
## alternative hypothesis: true mean is not equal to 0
## 99 percent confidence interval:
## 1054.991 1405.009
## sample estimates:
## mean of x
## 1230
Null and alternative hypothesis:
\[ H_0 : \mu = \mu_0, \quad H_1 : \mu \ne \mu_0 \]
where \(\mu_0\) is hypothesized value under the null hypothesis.
Test statistic \(t\):
\[ t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}} . \]
Specify it as
t.test(vector of sample data, mu = mu_0)
.
t.test(ベクトル型の標本データ, mu = 帰無仮説の仮説値)
と指定.
wage <- c(1000, 1200, 1300, 1200, 1150, 1000, 1450, 1500, 1150, 1350)
t.test(wage, mu = 1100) # default: two sided ... H1 is [mu != 1100]
##
## One Sample t-test
##
## data: wage
## t = 2.414, df = 9, p-value = 0.03899
## alternative hypothesis: true mean is not equal to 1100
## 95 percent confidence interval:
## 1108.179 1351.821
## sample estimates:
## mean of x
## 1230
Null and alternative hypothesis:
\[ H_0 : \mu_1 = \mu_2, \quad H_1 : \mu_1 \ne \mu_2 \]
Test statistic \(t\):
\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s_1^2 / n_1 + s_2^2 / n_2}} . \]
Specify it as t.test(x = sample 1, y = sample 2)
.
t.test(x = 1つ目の標本(ベクトル), y = 2つ目の標本(ベクトル))
のように指定する.
wage_jp <- c(1000, 1200, 1300, 1200, 1150, 1000, 1450, 1500, 1150, 1350) # Japan
wage_us <- c(900, 1300, 1200, 800, 1600, 850, 1000, 950) # US
t.test(wage_jp, wage_us) # default: Welch's test (assuming unequal variance)
##
## Welch Two Sample t-test
##
## data: wage_jp and wage_us
## t = 1.4041, df = 11.205, p-value = 0.1874
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -87.42307 397.42307
## sample estimates:
## mean of x mean of y
## 1230 1075
A paired t-test is calculated by adding the argument
paired = TRUE
in the t.test
function.
対応のある t 検定は t.test
の引数で
paired = TRUE
引数を追加して計算する.
wage_w <- c(1000, 1200, 1300, 1200, 1150, 1000, 1450, 1500, 1150, 1350) # wife
wage_h <- c(900, 1300, 1200, 800, 1600, 850, 1000, 950, 1200, 1400) # husband
t.test(wage_w, wage_h, paired = TRUE)
##
## Paired t-test
##
## data: wage_w and wage_h
## t = 1.1638, df = 9, p-value = 0.2744
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -103.8108 323.8108
## sample estimates:
## mean difference
## 110
It is equivalent to a one-sample test on the differences between the elements.
「要素ごとの差」に対する一標本の検定と同値になっている.
t.test(wage_w - wage_h) # same as an ordinary "one-sample t test"
##
## One Sample t-test
##
## data: wage_w - wage_h
## t = 1.1638, df = 9, p-value = 0.2744
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -103.8108 323.8108
## sample estimates:
## mean of x
## 110