How to Analyze a Study: Lithium for prevention of suicide, the VA RCT

by | Dec 11, 2021 | Blog posts

The recent VA study on lithium provides a good example about why it is important to analyze research studies for yourself. Don’t rely on what the authors tell you, and don’t rely on the journal reviewers to vet the study sufficiently.

The VA study concluded that lithium had no benefit for suicidality in general in patients with bipolar illness or “major depression.” The accompanying commentary pointed out that there is a large literature otherwise, and commented on the lithium levels being on the lower end of the usual therapeutic range, and attrition in the study.

What the paper did not describe is a simple observation: there were two groups in the study – those with bipolar illness and those with “major depression”. The study reported the outcomes in each group separately but it didn’t analyze those outcomes, which, had they been analyzed, would have shown clear benefit in the bipolar subgroup in one of the outcomes.

So let’s review how the study should have been analyzed, beginning with a general approach I take to all studies, whether they report positive results, or negative ones, as in this case.

The key is to look at all the tables and figures. Forget about the abstract and introduction. I don’t even read them. I come back to the methods later, after seeing the results. Go straight to the results, and don’t bother with the text. Go straight to the tables and figures.

There are two basic types of content to tables and figures: Predictors and outcomes. Predictors are the baseline characteristics of the sample. Outcomes are the endpoints of the study.

The whole point of randomization is to equalize predictors, and most studies of sufficient size do so. Sometimes studies have inclusion criteria that will bias the result based on predictors. For instance, the drug group may be naïve, never treated before with the drug, while a comparison group might included non responders to a standard comparison. The methods section should be assessed looking for such difference between groups before it even begins.

Outcomes reflect the effects of treatment, either with drug or placebo. If randomization was successful and predictors are similar in both groups, then outcome differences can be interpreted as causal.

So let’s apply these principles to this study:

I went straight to the tables and figures. First there is table 1, the baseline demographic and clinical characteristics of the sample. There were little differences there. There was more “other mental disorders” in the lithium group (30% vs 19% for placebo), but it was not clear what those conditions were, and I let it slide.

Next was table 2, which was patient outcomes. The first line was “Primary outcomes, first and subsequent events” and it was immediately divided by bipolar illness and “major depressive disorder” (MDD). The screenshot of the table is provided at the top of this post.

Scanning the table overall we can see that the bipolar and MDD groups are equal or similar on all outcomes down the line, except for the very first line, where there is a clear difference: 10 for lithium and 20 for placebo. That’s a two-fold difference. That’s a major effect. From there, I simply went to the methods to see how many patients with bipolar illness were included in the study. There were 80 patients, distributed as 37 for lithium and 43 for placebo. Now that a small percentage (15%) of the overall sample of over 500 subjects, but the effect size is large enough – a doubling of effect – that it was worth making an analysis to see if this difference was statistically significant despite the small subgroup size.

The bipolar subgroup consisted of 10/37 lithium patients versus 20/43 placebo patients who had the primary outcomes in first and subsequent events, 27% for lithium vs 46.5% for placebo. Standard statistical software produces a RR of 1.44, with 95% confidence intervals of 0.98 to 2.15, which produces statistically meaningful results of a 44% increased risk without lithium, that barely include the null value.

That’s it.

Readers will note that this analysis has to do with “first and subsequent events” of the primary outcome. The next row in table 2 reflects the primary outcome as a first event, for which there are no meaningful differences between the groups. But the paper does not comment on why subsequent events don’t count, or why differences are seen when subsequent events are included. It would seem to me that when such a big difference is seen, it ought to be explained in some way, and not ignored. It seems relevant as a positive outcome. One rationale may be that the inclusion of subsequent events increases the number of outcomes, and thus might allow for more ability to show differences.

One could go on and look at other tables and figures. Figure 2 was the primary outcome of all suicidal phenomena, showing no difference between groups, but the subgroup effect of bipolar illness would not be visible in that figure. Table 3 had hazard ratios for outcomes with models using different predictors, including different types of suicidal phenomena. In that table, we could talk further about the authors’ inappropriate exclusion of 3 suicides with placebo in their analysis, although they included one suicide with lithium, but for now we will leave that question aside.

In general, in table 3, the authors tried to look at different predictors and subgroups and they kept finding no differences. But they never bothered to look at the bipolar subgroup. Why? The difference was obvious and leaped out in the first row of table 2, but there is no comment about this difference anywhere in the text.

As noted, the study reported the outcomes in each group separately but it didn’t analyze those outcomes. When one does so, there is clear statistically significant benefit in the bipolar subgroup. Why did the authors not do this analysis? Why did the peer reviewers not ask for it?

It’s important to note that most clinical researchers in psychiatry are not trained formally in statistics, such as with a public health biostatistics degree. Unfortunately, neither the main authors nor the reviewers usually have the statistical expertise to notice such issues. Authors tend to rely on a statistician for their study, but statisticians do not have the clinical knowledge base to identify important issues of concern. Things easily fall between the cracks.

As a reader of the scientific literature, you have to fill the cracks. Don’t read the text. Study the tables and figures, and look for differences between groups. Then you will either confirm the authors’ findings, or you might reinterpret the study more accurately than the authors themselves have done.