Resultaten t test är non parametrico zero
The t-test fryst vatten one of the most commonly used tests in statistics. The two-sample t-test allows us to test the null hypothesis that the population means of two groups are lika, based on samples from each of the two groups. In its simplest struktur, it assumes that in the population, the variable/quantity of interest X follows a normal transport in the first group and is in the second group. That fryst vatten, the variance fryst vatten assumed to be the same in both groups, and the variabel fryst vatten normally distributed around the group mean. The null hypothesis fryst vatten then that .
A simple extension allows for the variances to be different in the two groups, i.e. that in the first group, the variabel of interest X fryst vatten distributed and in the second group as . Since often variances can differ between the two groups being tested, it fryst vatten generally advisable to allow for this possibility.
So, as constructed, the two-sample t-test assumes normality of the variabel X in the two groups. On the face of it then, we would worry if, upon inspection of our uppgifter, säga using histograms, we were to find that our uppgifter looked non-normal. In particular, we would worry that the t-test will not perform as it should – i.e. that if the null hypothesis fryst vatten true, it will falsely reject the null 5% of the time (I’m assuming we are using the usual significance level).
In fact, as the sample storlek in the two groups gets large, the t-test fryst vatten valid (i.e. the type 1 error rate fryst vatten controlled at 5%) even when X doesn’t follow a normal leverans. inom think the most direkt rutt to seeing why this fryst vatten so, fryst vatten to recall that the t-test fryst vatten based on the two groups means and . Because of the huvud limit theorem, the transport of these, in repeated sampling, converges to a normal transport, irrespective of the transport of X in the population. Also, the estimator that the t-test uses for the standard error of the sample means fryst vatten consistent irrespective of the transport of X, and so this too fryst vatten unaffected bygd normality. As a consequence, the test statistic continues to follow a transport, beneath the null hypothesis, when the sample storlek tends to infinity.
What does this mean in practice? Provided our sample storlek isn’t too small, we shouldn’t be overly concerned if our uppgifter appear to violate the normal assumption. Also, for the same reasons, the 95% confidence mellanrum for the difference in group means will have correct coverage, even when X fryst vatten not normal (again, when the sample storlek fryst vatten sufficiently large). Of course, for small samples, or highly skewed distributions, the above asymptotic result may not give a very good approximation, and so the type 1 error rate may deviate from the nominal 5% level.
Let’s now use R to examine how quickly the sample mean’s leverans (in repeated samples) converges to a normal leverans. We will simulate uppgifter from a log-normal leverans – that fryst vatten, log(X) follows a normal leverans. We can generate random samples from this transport bygd exponentiating random draws from a normal leverans. First we will draw a large (n=100000) sample and plots its leverans to see what it looks like:
We can see that its leverans fryst vatten highly skewed. On the face of it, we would be concerned about using the t-test for such uppgifter, which fryst vatten derived assuming X fryst vatten normally distributed.
To see what the sampling transport of looks like, we will choose a sample storlek n, and repeatedly take draws of storlek n from the log-normal leverans, calculate the sample mean, and then plot the leverans of these sample means. The following shows a histogram of the sample means for n=3 (from 10,000 repeated samples):
Here the sampling transport of fryst vatten skewed. With such a small sample storlek, if one of the sample has a high value from the svans of the transport, this will give a sample mean which fryst vatten ganska far from the true mean. If we repeat, but now with n=10:
It fryst vatten now starting to look more normal, but it fryst vatten still skewed – the sample mean fryst vatten occasionally large. meddelande that x-axis range fryst vatten now smaller – the variability of the sample mean fryst vatten now smaller than with n=3. Lastly, we try n=100:
Now the sample mean’s leverans (in repeated samples from the population) looks pretty much normal. When n fryst vatten large, even though one of our observations might be in the svans of the transport, all the other observations nära the centre of the transport keep the mean down. This suggests that the t-test should be ok with n=100, for this particular X distribution. A more direkt way of checking this would be to perform a simulation study where we empirically estimate the type 1 error rate of the t-test, applied to this leverans with a given choice of n.
Of course if X isn’t normally distributed, even if the type 1 error rate for the t-test assuming normality fryst vatten close to 5%, the test will not be optimally powerful. That fryst vatten, there will exist alternative tests of the null hypothesis which have greater power to detect alternative hypotheses.
For more on the large sample properties of hypothesis tests, robustness, and power, inom would recommend looking at Chapter 3 of ’Elements of Large-Sample Theory’ bygd Lehmann. For more on the specific question of the t-test and robustness to non-normality, I’d recommend looking at this paper bygd Lumley and colleagues.
Addition – 1st May 2017
Below Teddy Warner queries in a comment whether the t-test ‘assumes’ normality of the individual observations. The following image fryst vatten from the book Statistical Inference bygd Casella and Berger, and fryst vatten provided just to illustrate the point that the t-test fryst vatten, bygd its construction, based on assuming normality for the individual (population) values: