In the previous post we looked at the meaning of p value and statistical significance. The larger question is: “Is it always useful/ desirable to obtain a statistically significant result?”
The answer is “No.”
Let us explore the notion of statistical significance in some more detail using two illustrative examples.
You have conducted a large study involving 35,453 subjects. The study was comparing the decrease in systolic blood pressure between those only on lifestyle modification, and those only on anti-hypertensive medications from baseline values. At the end of the study, a t test was performed to determine if there was a statistically significant difference in mean systolic blood pressure between the two groups. The difference in mean systolic blood pressure between the two groups is 0.5 mm of Hg, and is statistically significant (p=0.0001), with the lifestyle only group demonstrating a greater decrease in systolic blood pressure.
Considering the highly significant findings, would you insist on switching all patients with hypertension from anti-hypertensives to lifestyle modification only?
For those unfamiliar with the measurement of blood pressure, a regular (manual) sphygmomanometer (BP apparatus) cannot detect a difference of less than 2 mm of Hg.
Although digital sphygmomanometers can detect small differences, they cost considerably more.
I would not change anything, as the difference between the two interventions is not practically useful. However significant the findings, when I cannot measure the 0.5 mm of Hg difference using a manual sphygmomanometer, why would I even consider a change?
You are investigating whether there is a significant difference in case detection between three sputum and two sputum samples for the diagnosis of pulmonary tuberculosis. For those unfamiliar with the diagnosis of pulmonary tuberculosis, diagnosis involves obtaining a sputum sample from the patient and testing it for the presence of Acid Fast Bacilli (AFB). Typically three sputum samples are obtained; now you are investigating whether two sputum samples are adequate.
After obtaining the necessary permissions, you conduct a record based study. You obtain patient details from the sputum registers of all labs in the study area and note down how many new cases were detected by obtaining two sputum samples. Next, you count how many cases were detected by obtaining three sputum samples. You calculate the mean number of cases detected using both methods.
In order to test whether there is a significant difference in case detection using the two methods, you employ the paired t test (since the observations were not independent). In this case the null and alternative hypotheses are as follows:
Null hypothesis (H0): There is no difference in the mean number of cases detected by the two methods
Alternative hypothesis (Ha): There is a difference in the mean number of cases detected by the two methods.
The test of significance returns a p value of 0.13
Therefore, you have failed to reject the null hypothesis that there is no difference in mean number of cases detected by the two methods.
Now that you have generated some evidence that the number of cases detected by obtaining two sputum samples is not significantly different from that by obtaining three sputum samples, how do you feel?
Are you disappointed?
I would be very pleased with this outcome. After all, it suggests that obtaining two sputum samples might be enough. This translates into fewer patient visits for diagnosis; lower burden on the lab technicians; and lower cost to diagnose each case of tuberculosis.
1. Statistically significant results are not always useful or desirable.
2. Sometimes failing to reject the null hypothesis is a desirable outcome.
3. Statistical results must be interpreted with caution and common sense.