A study in the December issue of Pediatrics reports that an intensive intervention to reduce children's secondhand smoke exposure in households in Armenia decreased exposure by 17% compared to minimal intervention. Based on this finding, the study concludes that the intensive intervention is effective.
The intensive intervention consisted of motivational interviewing and follow-up telephone calls to try to discourage smoking in the household. The minimal intervention consisted of simply providing an educational brochure about the hazards of secondhand smoke exposure. At baseline, all households contained at least one daily smoker and at least one child between the ages of 2 and 6. Child exposure to secondhand smoke was measured by determining hair nicotine concentrations at baseline and at four-month follow-up.
According to the paper, the chief result of the study was as follows: "Multiple linear regression analysis demonstrated that after adjusting for the baseline hair nicotine concentration, child's age, and child's gender, the follow-up GM (geometric mean) of hair nicotine concentration was 17% lower in the intervention group compared with the control group (P = .239)."
The study concludes that: "The results of this trial suggest that adjusted follow-up GM of hair nicotine concentration in the intervention group was 17% lower compared with the control group."
The Rest of the Story
There is an important flaw in this paper which is critical to point out to readers. It is also something that I will use to teach my public health students about common pitfalls in epidemiology and biostatistics and about investigator bias in epidemiologic studies.
Prior to conducting any statistical analysis in an epidemiologic studies, researchers set an a priori level of statistical significance for their findings. Typically, this is set at a level of 5%.
Let us assume that the intervention group is found to have lower exposure than the control group at follow-up. One needs to know whether this finding of decreased exposure is an actual effect of the intervention, or whether is could have occurred simply due to chance. What a 5% level of statistical significance means is that if the findings of the study would have been expected to have occurred by chance alone less than 5% of the time, the result is deemed to be "statistically significant" and one can conclude that there really was a decreased exposure in the intervention group. The probability that the study findings would have occurred by chance alone if there was no true effect is known as the "p-value."
If the p-value is less than 0.05, then the finding is deemed statistically significant. If the p-value is greater than 0.05, then the finding is not statistically significant. What this means is that the finding could have occurred by chance alone and the probability of it occurring by chance alone is higher than the level that the researchers determined in advance they would feel comfortable concluding that the effect is a real one.
In this study, the p-value was 0.239. This means that the findings were not even close to statistical significance. In other words, one cannot conclude that there was any reduction in exposure associated with the intervention. The observation of a lower average exposure level in the intervention group could well have occurred by chance alone.
What all of this means is that it is not correct or appropriate to report that the study found the intervention to be effective in reducing secondhand smoke exposure by 17%. The more appropriate reporting of this finding is that the study failed to find that the intervention had any significant effect on follow-up exposure levels.
In other words, one cannot conclude from this research that the intervention reduced follow-up exposure levels compared to the minimal intervention in the control group.
Interestingly, the paper does note that this key finding was not statistically significant. However, instead of taking the scientifically appropriate action, which is to report that the evidence does not support the conclusion that the intervention reduced exposure, the paper dismissed the lack of statistical significance by arguing that it "could be due to insufficient power, as fewer numbers of families provided hair samples at follow-up."
There are times when one might still conclude that an intervention is effective despite a lack of statistical significance, because of low study power. But this is usually reserved for a situation when the significance level is very close to 0.05. I have never before seen a paper conclude that an effect is real when the observed significance level is as high as 0.24.
In my view, this is a great example of investigator bias influencing the reporting of a result.
Imagine what would happen if every time a result was statistically insignificant, the paper simply dismissed the lack of significance, arguing that the lack of significance was due to low study power. This would undermine scientific research and essentially negate the need to conduct any research in the first place. Why set a pre-determined level of significance if one is simply going to ignore it and conclude that any observed effect is real, regardless of the significiance level?
I can understand the desire to report a positive effect of a public health intervention, especially one designed to protect the health of children. We all want to see our interventions succeed. But in my view, the desire to see an intervention succeed does not justify the dismissal of well-accepted scientific standards to try to show that the intervention is effective.
Incidentally, the paper does the same thing with its examination of whether there was a significant difference in less than daily exposure for children in the intervention group at follow-up, compared to the control group. Using either the mothers' reports or the household smokers' reports, there was no significant difference in less than daily exposure at follow-up (see Table 6 in the paper).
In fact, every analysis in the study of the difference in exposure between the intervention and control group shows no significant difference in exposure.
This doesn't stop the paper from concluding that: "The findings of this study emphasize the importance of motivational interviewing and providing immediate personalized feedback for addictive behavior change."
If that is going to be your conclusion from a study which fails to find any significant difference between the intensive and minimal intervention, then I would argue that there seems to be no point in wasting your time on an evaluation study. Why not just deliver the intervention and don't waste time and money to evaluate it?
After all, if you are going to conclude that the intervention works even if it fails to significantly reduce exposure compared to the minimal intervention, then why bother going through the motions of conducting a statistical analysis?
My greatest concern about this lack of scientific rigor that I repeatedly see cropping up in the tobacco control literature is that it is going to result in a loss of the scientific credibility of the tobacco control movement. If we cannot be trusted to report scientific results objectively in one area of inquiry, then what reason is there to trust us in other areas. We risk not only losing in the effort to convince people that this particular intervention is effective, but in the process we may also lose credibility on the very issue of the harms of secondhand smoke in the first place.