Statistical significance and p value: What do they really mean?

Journals are more likely to publish articles that have statistically significant findings. That is, if the test of significance yields a result that has a p value of less than/ equal to 0.05 (p= 0.05).

Not surprisingly, researchers the world over chase after the elusive p value (p<0.05).

What does the p value really mean?

In the last two posts we looked at the null hypothesis and Type I, Type II errors. There are a few ways of looking at the p value.

The commonest one uses the null hypothesis to define it:

“It is the probability of obtaining a result as extreme as, or more extreme than the one obtained, assuming the null hypothesis is true.”

I’ll try to explain this using an example:

Let us assume we are trying to decide who the fastest martial arts expert of all time is- Bruce Lee or Chick Norris. So a contest is set up between the two.

The rules are simple: since both are fearsome martial arts exponents, they will not fight each other. However, they will have to complete a complex set of moves in the fastest possible time. Each person will get 100 chances. The fighter with the fastest time (average of 100) will be crowned the victor.

Chuck Norris goes first, and clocks an average of 2 minutes (120 seconds). Bruce Lee clocks an average of 1:30 minutes (90 seconds). Bruce Lee wins the contest by a handsome margin. But is he indisputably the fastest?In order to determine that, we run a test of significance, comparing the means of the two fighters.

Before proceeding, we need to set up the null and alternative hypotheses.

Null Hypothesis (H0): There is no difference in speed between Bruce Lee and Chuck Norris.

Alternative Hypothesis (Ha)[two sided]: There is a difference in speed between Bruce Lee and Chuck Norris.

Level of significance (alpha error): 0.05

The test is run, and the p value obtained was 0.02 (p=0.02).

What does the p value indicate?

It tells us that if the null hypothesis were true, the probability of obtaining such a difference (or more extreme difference) in timing between the two fighters is 2 in 100, or 0.02.

[If there is truly no difference in speed between Bruce Lee and Chuck Norris, the probability of obtaining such a difference in timing (30 seconds) between them is 2/100.]

Since we had already set the level of significance at 0.05, any p value equal to/ less than 0.05 would be considered to be statistically significant. That is, the results could not have been obtained by chance.

[The cut-off value (p=0.05) is an arbitrary level agreed upon by statisticians. Based upon their calculations, beyond this level (p<0.05) chance is effectively eliminated. ]

The important thing to note is that statistical significance does not mean practical significance. Statistical significance merely suggests that the findings could not have been obtained by chance.

The cut-off value of 0.05 simply means that there is a 1/20 possibility of obtaining similar findings purely by chance. This does not mean that chance has been eliminated. 5 out 100 times we could get significant values purely by chance.

Returning to the discussion about statistical significance and practical significance, let us consider what happened when Chuck Norris and Bruce Lee went head-to-head in competitive fighting.

Both fighters won an equal number of bouts.

In this case, if the null hypothesis had stated that they are equally good fighters, we would not have been able to reject the null hypothesis.

How is this possible, you wonder?

Speed is only one of many skills required to be a successful fighter. Speed alone may not have been enough for Bruce Lee to overcome Chuck Norris. Although there was a statistically significant difference in speed, in practical terms their comparable skill levels meant that they were similar in overall fighting prowess.

Bottomline: 

1. Statistical significance is determined by the results of a test of significance.

2. Statistical significance implies the results could not have occurred by chance alone.

3. A p value of 0.05 indicates that if the null hypothesis were true, one would obtain similar results 5/100 times. As the p value decreases, so does the probability of obtaining such results (assuming the null hypothesis is true).

4. Statistical significance does not imply practical significance. The two are independent considerations. Use common sense when interpreting the results of significance testing.

One thought on “Statistical significance and p value: What do they really mean?

  1. Pingback: The American Statistical Association issues a statement on p-values: context, process and purpose | communitymedicine4asses

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s