1 Dukree

Bmg Research Review Assignment

WESTMINSTER was agog. On May 31st, eight days before Britain’s general election, the Times splashed on YouGov’s forecast of a hung parliament. Other pollsters were predicting an average lead of eight percentage points for the incumbent Conservatives. Party grandees, sure that Theresa May, the prime minister, would secure a big majority, rubbished the prediction—as did officials from the opposition Labour Party, convinced they were heading for defeat. Jim Messina, a former campaign manager for Barack Obama who flew in to advise the Conservatives, tweeted that he had “spent the day laughing at yet another stupid poll”.

On the eve of the election, the polling average put the Conservatives at 44% of the vote, and Labour at 36%. In the event, Labour beat expectations by five percentage points, gaining 30 seats and denying Mrs May a majority. YouGov was vindicated. Mr Messina has not tweeted since.

Critics of polling spy a pattern. They cite a series of surprise results leading up to the latest: the Conservatives’ narrow win in Britain in 2015, after predictions of a hung parliament; last year’s vote for Britain to leave the European Union, after every big political party campaigned to stay; and Donald Trump’s successful insurgent campaign for the American presidency.

Sam Wang, a neuroscience professor at Princeton and part-time psephologist, kept a pre-election promise to eat an insect on live television if Mr Trump won more than 240 electoral-college votes. Some Britons also made foolhardy food wagers. In 2015 Paddy Ashdown, a former leader of the Liberal Democrats, a small party, said he would eat his hat after pooh-poohing the exit poll (one specially made of marzipan was later presented to him). Last week Matthew Goodwin, a political scientist, went one better by eating a copy of his book about Brexit (again, on live television) after he insisted that Labour had no chance of getting 38% of the vote.

Statistical models of election outcomes attempt to quantify the uncertainty in polls’ central findings by generating probability estimates for various outcomes. Some put Hillary Clinton’s chance of victory against Mr Trump above 99% (Mr Wang came to grief because his model almost totally discounted the chance the polls in battleground states were all askew). Among the model-makers, Nate Silver, an American journalist, was a shining success. He came to prominence by using polling averages to call every state correctly in the presidential contest of 2012. Indeed, that success may have encouraged misplaced faith in statistical models. He did as badly as the pollsters before Britain’s election in 2015. But he rightly spied uncertainty in the Trump-Clinton race, and stuck to his guns despite much ridicule.

Predicting the outcome of elections is an inherently chancy endeavour. “If you look into the crystal ball,” says an experienced pollster, “you’ve got to be ready to eat ground glass.” In fact, the accuracy of polling in developed countries has not declined over the past half-century. American pollsters’ predictions for presidential races are even improving (see chart 1). Last week’s five-point average error in Britain was not far from the average of 4.3 points in general elections since 1979.

But pollsters’ job is getting harder. The number of people willing to answer their questions is plummeting. Of every ten people in rich countries they contact by telephone, at least nine now refuse to talk. New political faultlines are complicating their efforts to find representative groups to question, and voters’ changing behaviour blindsides them as they try to discern the truth behind polling responses. Old political allegiances are weakening and public opinion is becoming more fickle. Confidence in polling has been shaken. Pollsters are scrambling to regain it.

One of the problems they face is beyond them to fix: electoral systems that confound shares of the total vote. Mrs Clinton defeated Mr Trump in the popular vote by 2.1 percentage points—within one point of the average polling prediction—but lost because of the rules of the electoral college. Britain’s first-past-the-post system regularly produces parliaments that only hazily reflect national vote shares; in 2015 the nativist UK Independence Party got 12.6% of the vote, but just one of 650 seats. Though pollsters urge caution in translating vote shares into final results, that warning often goes ignored.

In such systems, knife-edge local contests can be decisive. Just 77,747 extra votes distributed suitably across Michigan, Wisconsin and Pennsylvania would have netted Mrs Clinton 46 more electoral-college votes, enough to take the White House. A total of just 75 British voters switching to the Conservatives, in the seats where they lost by the narrowest margins, could have given Mrs May a working majority. British pollsters would still have got the vote share badly wrong. But they would have come in for less criticism, since their central prediction would have fallen on the right side. Like servants and goalies, pollsters are noticed only when they fail.

As for the Brexit referendum, more polls had put Leave than Remain ahead. “The message of the polls was, it’s very much a toss-up,” says John Curtice of Strathclyde University. But that got lost as the two big parties campaigned for Remain, and newspaper columnists simply could not believe that so many British voters would really plump for the upheaval of leaving the EU.

The widespread impression that polls are bunk may also have been partly due to the much-publicised betting odds offered online. Earlier this century, online betting exchanges beat pollsters before several big elections. Economists argued that the forecasts made by punters with money on the line were likely to be more considered than the sometimes offhand responses given to pollsters. But the betting markets have flunked their recent tests. Bettors favoured a Remain victory, a Clinton presidency and a Conservative parliamentary majority, with closing odds of more than 80%.

Last week’s election in Britain weakened the evidence for the theory that campaigns have little effect on voting behaviour, advanced by many political scientists. Mrs May’s support seems to have plunged during her dismal campaign: Survation, the pollster that most accurately predicted the final result on the eve of the election, and YouGov both gave her party double-digit leads just three weeks before election day. Picking up such rapid changes in public sentiment is straightforward, though not cheap: it requires larger sample sizes and more frequent surveys. These also help with the “noise” found in any random sample, which pollsters refer to as sampling error.

Far more intractable is the bias that creeps in when samples are not representative of the electorate. Taking bigger samples does not help. The margins of error cited by pollsters refer to the caution appropriate to sampling error, not to this flaw, which is revealed only on polling day.

A striking example came in 1936, when Literary Digest, a weekly American magazine, asked its affluent readers whom they would vote for in that year’s presidential election. Nearly 2m replied. But the sample, though large, was horribly biased. Based on it, Literary Digest forecast a landslide for Alf Landon. He went on to lose all but two states to Franklin Roosevelt.

Poll another day

When Mrs May announced this year’s snap election, British pollsters had not yet got their houses fully in order after their failure in 2015. An inquiry by the British Polling Council, an industry group, blamed unrepresentative samples: British polls have long tended to overstate support for Labour and understate support for the Conservatives (see chart 2).

Faced with an election much sooner than they had expected, they made rushed tweaks in the hope of correcting this bias. That led to a wide variation in their predictions. On the eve of the election they pegged the Conservative lead as anywhere between one point and 13. One pollster, whose firm predicted a double-digit lead, says that his “golden rule” was to adopt any plausible adjustment that would take a point or two off Labour and reallocate that share to the Conservatives.

Such adjustments seem to have contributed to the latest miss. Preliminary estimates by Will Jennings and Patrick Sturgis of Southampton University suggest that fixes intended to account for variable turnout—in previous elections, declared Labour supporters have been less likely than others to end up casting a vote—increased the average estimate of the Conservative vote share by five percentage points. Survation credits its success to sticking closer to the raw numbers. “It’s the ultimate Greek tragedy, isn’t it?” says Michael Turner of BMG Research, the pollster that gave the Conservatives the largest lead. “What you do to correct the error ends up causing it.”

Internet-polling companies try to sidestep sampling bias by recruiting large, stable “panels” made up of the right numbers of the educated, the young and so on, from which they pick representative samples each time they run a poll. But this can still produce poor results. After finding that its internet polling in 2015 oversampled politically engaged voters, who tend to be leftish, YouGov tried hard to recruit less-engaged voters to its panel.

For telephone and face-to-face pollsters, who try to avoid bias by choosing randomly from a list of telephone numbers or addresses, another problem looms. Across the rich world, they are struggling to find anyone willing to talk to them. In 1980, 72% of Americans responded to a phone call seeking their opinion. That share had plummeted to 8% by 2012, and has kept falling. Last year, less than 1% of calls received a reply. Essential government statistics, such as figures on consumer confidence, unemployment and household income, are also being undermined by fading willingness to respond to official surveys.

Pollsters would not worry so much if everyone were equally unlikely to respond. But some types of people are more reluctant than others. Pollsters refer to this variation as non-response bias. According to Matt Lackey of Civis Analytics, a data-science firm, it now takes an American pollster 350 calls to find a young Latino man willing to answer questions—21 times as many calls as required for an elderly white woman. Low response rates contributed to the failures of predictions in individual states before last year’s presidential election. “The biggest misses…were in places with low-educated voters,” says Mr Lackey. “And those were also the places that had the lowest response rates.”

Weight, weight, don’t tell me

To deal with non-response bias, pollsters try to correct their samples by a process known as weighting. The idea is simple: if one group is likelier to respond to a survey than another, giving a lower weight to the first group’s answers ought to set matters right. The procedure is well-established and respectable: all pollsters weight their samples to correct for the differences in response rates between large demographic groups, and usually by similar amounts to each other.

But adjusting weights is also one of the ways pollsters can do what political scientists call “herding”. If one weighting scheme produces a seemingly outlandish result, the temptation is to tweak it. “There’s an enormous pressure for conformity,” says Ann Selzer, an American pollster. Polls can thus narrow around a false consensus, creating unwarranted certainty about the eventual outcome.

The British Polling Council tries to discourage herding by requiring its members to publicise any changes they make to their methodologies. Before the most recent election, British pollsters largely managed to resist the temptation—though YouGov’s final prediction, which relied on different methods from those used for the one in the Times, put the Conservatives’ lead at seven points, close to the average for other pollsters. And seven of the eight pollsters who predicted the outcome of the Brexit referendum adjusted their methods late in the campaign. All of those revisions favoured Remain by at least one percentage point.

To make weighting work, pollsters must pull off two difficult tricks. The first is to divide their samples into appropriate subgroups. Age, sex, ethnicity, social class and party affiliation are perennial favourites. The second is to choose appropriate weights for each group. This is usually done with the help of a previous election’s exit poll, or the most recent census.

But the old political dividing lines are being replaced by new ones. Increasingly, samples must be weighted to match the voting population for a much larger set of characteristics than was previously needed. Levels of education, household income and vaguer measures such as people’s feelings of connection to their communities have all started to be salient. Before the Brexit vote, both the Conservatives and Labour supported remaining in the EU, but their supporters split. Well-educated people voted heavily for Remain. Those with authoritarian leanings split for Leave by 66%, according to an analysis by NatCen, a social-research organisation. Age, always a factor in voting behaviour, is becoming more important. Young Britons seem to have plumped for Labour by an overwhelming 40-point margin last week, while the oldest were even keener than usual on the Conservatives.

The latest dividing line is disaffection. Unusually high turnout by white Americans living in rural areas, most of whom have low levels of education and a long history of political disengagement, helped propel Mr Trump to his narrow victory. Voters with poorer health and lower social cohesion, as measured by low expressed willingness to co-operate with others, also favoured Mr Trump. Many Britons who did not bother to vote in 2015 turned out for the EU referendum; they favoured Leave by a 20-point margin.

Even when pollsters do break their samples into appropriate groups, voters’ changing behaviour can still trip them up. Most British pollsters, for example, assigned lower weights to young people’s responses to reflect their habitually low turnout: just 43% of under-24s voted in the previous general election, compared with 66% across all age groups. But those that most heavily discounted the young portion of their samples did worst in their predictions this time round, suggesting that the youth vote rose. The past is also little help in deciding how to weight samples before one-off votes like the referendums in Britain, Italy and Colombia last year.

Spotting new electoral rifts and changing electoral habits will require much more data (and data science) than pollsters now use. And picking up changing social attitudes means measuring them, too—which will take never-ending checks and adjustments, since those measurements will suffer from the same problems as pre-election polls. Pollsters will also have to improve their handling of differential turnout and undecided voters. Most accept self-reported intention to vote, which turns out to be a poor guide. And they often assume that undecided voters will either stay away or eventually split the same way as everyone else, which seems not to have been the case in recent contests.

And dealing with declining response rates will probably require new ways to contact prospective voters. During the early days of internet polling, many feared that online samples were bound to be unrepresentative, mainly because they would include too few older people. But Britain’s online pollsters silenced their critics in the Brexit vote, where they came two percentage points closer than telephone pollsters to the result. Some startups are now testing what they call “programmatic sampling”: advertising very short surveys to smartphone users. Google, which runs bespoke market surveys for companies, tries to ensure representative samples by using browsing history to guess respondents’ demographics.

Finally, pollsters will have to become more statistically sophisticated. Sampling 1,000-2,000 people and massaging their responses to correct for past errors looks increasingly antiquated. YouGov’s recent success was based on rolling questionnaires administered daily to 7,000 people from a 50,000-strong online panel, with the results combined using advanced number-crunching known as “multilevel regression and post-stratification”.

Whither forecasting

Perhaps pollsters’ strongest defence is that no one else does better. In 2012 Peggy Noonan, an American columnist, contended that Mitt Romney would defeat Mr Obama because she had seen more Romney yard signs. Other commentators have based election predictions on nothing more than attendance at rallies or the volume of partisan posts on social media.

If such guesswork was all there was to go on, many more election results would be shocks. They would routinely cause market turmoil. From one vote to another, politicians would have no way to gauge the public mood. Turnout would suffer: a recent study of Swiss referendums found that it rose in close votes, but only when there were pre-vote polls. Pollsters sometimes deserve a kicking. But without them, democracies would fare worse.

Get our daily newsletter

Upgrade your inbox and get our Daily Dispatch and Editor's Picks.

Strengths and limitations of this study

  • National large-scale data from a network of integrated health systems.

  • Employed a new user design and developed a number of analytical approaches where we consistently found a significant association between PPI exposure and risk of death.

  • Cohort included mostly older white male US veterans, which may limit the generalisability.

  • Did not include information on the cause of death.


Proton pump inhibitors (PPI) are widely prescribed and are also available for sale over the counter without prescription in several countries.1 2 Several observational studies suggest that PPI use is associated with increased risk of a number of adverse health outcomes.1 A number of studies have shown that PPI use is associated with significant risk of acute interstitial nephritis.3–5 Recent studies established an association between exposure to PPI and risk of chronic kidney disease (CKD), kidney disease progression and end-stage renal disease.2 6 7 Results from a large prospective observational German cohort suggest that patients receiving PPI had a higher risk of incident dementia.8 Several reports highlighted a rare but potentially fatal risk of hypomagnesemia among users of PPI.9–11 PPI use has been associated with increased risk of both incident and recurrent Clostridium difficile infections.12 Several observational analyses have shown that PPI use was also associated with increased risk of osteoporotic fractures, including hip and spine fractures.13 14 Less convincing—and to some extent inconsistent—evidence suggests a relationship between PPI use and risks of community-acquired pneumonia and cardiovascular events.15–17 Emerging—and far from conclusive—in vitro evidence suggests that PPI results in inhibition of lysosomal acidification and impairment of proteostasis, leading to increased oxidative stress, endothelial dysfunction, telomere shortening and accelerated senescence in human endothelial cells.18 The experimental work provides a putative mechanistic link to explain some of the adverse events associated with PPI use.18

The adverse outcomes associated with PPI use are serious, and each is independently associated with higher risk of mortality. Evidence from several small cohort studies of older adults who were recently discharged from the hospital or institutionalised in long-term care facilities suggests inconsistently that PPI use may be associated with increased risk of 1 year mortality.19–22 Whether PPI use is associated with excess risk of death is not known and has not been examined in large epidemiological studies spanning a sufficiently long duration of follow-up. We hypothesised that owing to the consistently observed associations between PPI use and risk of adverse health outcomes, PPI use is associated with excess risk of death, and that the risk of death would be more pronounced with increased duration of use. We therefore used the Department of Veterans Affairs national databases to build a longitudinal cohort of incident users of acid suppression therapy, including PPI and histamine H2 receptor antagonists (H2 blockers), to examine the association between PPI use and risk of all-cause mortality and to determine whether risk of death is increased with prolonged duration of use.


Cohort participants

Primary cohort

Using administrative data from the US Department of Veterans Affairs, we identified patients who received an outpatient H2 blockers or PPI prescription between 1 October 2006 and 30 September 2008 (n=1 762 908). In order to select new users of acid suppression therapy (incident user design), we excluded 1 356 948 patients who received any outpatient H2 blockers or PPI prescriptions between 1 October 1998 and 30 September 2006. To account for patients’ kidney function, only patients with at least one outpatient serum creatinine value before the first acid suppression therapy prescription were selected in the cohort, yielding an analytic cohort of 349 312 patients. Patients whose first acid suppression therapy was PPI (n=275 977) were considered to be in the PPI group during follow-up. Patients who received H2 blockers as their first acid suppression therapy (n=73 335) served as the reference group before they received any PPI prescription (see online supplementary figure 1). Within the reference group, those who received a PPI prescription later (n=33 136) were considered to be in the PPI group from the date of their first PPI prescription until the end of follow-up.23 Time zero (T0) for primary cohort was defined as the first acid suppression therapy prescription date.

Supplementary Material

Supplementary data 1

Secondary cohorts

We additionally built two secondary cohorts to examine the association of PPI use and risk of death in (a) PPI versus no PPI users and (b) PPI versus non-users of acid suppression therapy. Patients with no PPI prescription between 1 October 1998 and 30 September 2006, and with at least one outpatient eGFR value before 1 October 2006, were selected to evaluate the risk of death associated with PPI use versus no PPI use (n=3 288 092) (see online supplementary figure 2a). Patients with no PPI prescription between 1 October 1998 and 30 September 2006, with no H2 blockers before the first PPI prescription and at least one outpatient eGFR value before 1 October 2006, were selected to evaluate the risk of death associated with PPI use versus no acid suppression therapy (n=2 887 030) (see online supplementary figure 2b). T0 for secondary cohorts was defined as 1 October 2006.

Patients in both primary and secondary cohorts were followed until 30 September 2013 or death. The study was approved by the Institutional Review Board of the VA Saint Louis Health Care System, Saint Louis, Missouri.

Data sources

We used the Department of Veterans Affairs databases, including inpatient and outpatient medical SAS data sets (that include utilisation of data related to all inpatient and outpatient encounters within the VA system), to ascertain detailed patient demographic characteristics and comorbidity information based on inpatient and outpatient encounters.2 24 The VA Managerial Cost Accounting System Laboratory Results (a comprehensive database that includes VA-wide results for selected laboratory tests obtained in the clinical setting) provided information on outpatient and inpatient laboratory results. The VA Corporate Data Warehouse Production Outpatient Pharmacy domain provided information on outpatient prescriptions. The VA Vital Status and Beneficiary Identification Records Locator Subsystem files provided demographic characteristics and death.

Primary predictor variable

PPI use was the primary predictor. Once cohort participants received PPI prescription, they were considered with the effect of PPI until the end of follow-up. Medications that contain esomeprazole, lansoprazole, omeprazole, pantoprazole or rabeprazole were counted as PPI. Medications including ranitidine, cimetidine and famotidine were counted as H2 blockers.


The primary outcome in survival analyses was time to death. Death information is routinely collected by the Veterans Benefit Administration for all United States Veterans.


Covariates included age, race, gender, eGFR, number of outpatient serum creatinine measurements, number of hospitalisations, diabetes mellitus, hypertension, cardiovascular disease, peripheral artery disease, cerebrovascular disease, chronic lung disease, cancer, hepatitis C, HIV, dementia and diseases associated with acid suppression therapy use such as gastro-oesophageal reflux disease (GERD), upper gastrointestinal (GI) tract bleeding, ulcer disease, Helicobacter pylori infection, Barrett's oesophagus, achalasia, stricture and oesophageal adenocarcinoma.25–28 eGFR was calculated using the abbreviated four-variable CKD epidemiology collaboration equation based on age, sex, race and outpatient serum creatinine.29 Race/ethnicity was categorised as white, black or other (Latino, Asian, Native American or other racial/ethnic minority groups). Comorbidities except for hepatitis C and HIV were assigned on the basis of relevant ICD-9-CM (the International Classification of Diseases, Ninth Revision, Clinical Modification) diagnostic and procedure codes and Current Procedural Terminology (CPT) codes in the VA Medical SAS data sets.2 30–33 Hepatitis C and HIV were assigned based on laboratory results.

Baseline covariates were ascertained from 1 October 1998 till T0. All covariates except for age, race and gender covariates values were treated as time-varying covariates where they were additionally assessed until the date of the first PPI prescription in those patients who did not have PPI prescription at T0. Any comorbidity occurring during the assessment period was considered present during the remaining follow-up. eGFR was the outpatient eGFR value within and most proximate to the end of the assessment period. Number of outpatient serum creatinine measurements and number of hospitalisations were accumulated during the assessment period.

Statistical analysis

Means, SD and t-tests are presented for normally distributed continuous variables; medians, interquartile ranges and Wilcoxon-Mann-Whitney tests are presented for non-normally distributed continuous variables; and counts, percentages and χ2 tests are presented for categorical variables. Incident rates per 100 person-years were computed for death, and CIs were estimated based on the normal distribution. The Simon and Makuch method for survival curves was used for time-dependent covariates.34

Cox regression models with time-dependent covariates were used in the assessment of the association between PPI exposure and risk of death where patients could switch from H2 blockers to PPI in the models. In order to account for potential delayed effect of PPI, patients were considered to have the effect of PPI from the first PPI prescription till the end of follow-up. In addition, time-dependent Cox models were conducted in subgroups where patients had no GI conditions and where patients had no GI conditions except for GERD and in the secondary cohorts.

Because exposure in this observational cohort is time dependent, we undertook 1:1 propensity score matching for the primary cohort where time-dependent propensity scores were calculated based on time-dependent Cox regression with all covariates35 (details are provided in online supplementary methods). After matching, all covariates except for age had an absolute standardised difference of less than 0.1, which indicated that all covariates except age were well balanced. Age had a standardised difference equal to 0.13. Doubly robust estimation was applied after matching, where all covariates were additionally controlled for in the model to obtain an unbiased effect estimator.36

Supplementary Material

Supplementary data 3

In order to optimise control of confounding, we additionally built high-dimensional propensity score-adjusted survival models following the multistep algorithm described by Schneeweiss et al37 (details are provided in online supplementary methods). We also applied a two-stage residual inclusion estimation based on instrumental variable approach (see online supplementary methods)38

In addition, we evaluated the association between duration of PPI prescription and risk of death among new users of PPI. Duration was defined in cumulative days of use and categorised as ≤30, 31–90, 91–180, 181–360 and 361–720, where ≤30 days was considered as the reference group. To avoid immortal time bias (by definition, cohort participants must be alive to receive prescription hence introducing a bias commonly referred to as immortal time bias), time of cohort entry was defined as the date of last PPI prescription plus days’ supply.39 40 In order to ensure sufficient length of follow-up time following T0, we excluded cohort participants with cumulative duration of exposure exceeding 720 days (because of limited overall cohort timeline, and because T0 starts at the end of last prescription, those with long exposure will necessarily have limited follow-up time). In regression analyses, a 95% CI of an HR that does not include unity was considered statistically significant. All analyses were performed using SAS Enterprise Guide version 7.1.

Sensitivity analyses

In order to further evaluate the consistency and robustness of study findings, we examined the observed associations in a less contemporary cohort (dating back to an era where PPI prescription and use were far less frequent) of patients without acid suppression therapy prescriptions between 1 October 1998 and 30 September 2000 (washout period) and with acid suppression therapy prescription between 1 October 2000 and 30 September 2002 and at least one outpatient serum creatinine value before that. Patients in this cohort were followed till 30 September 2007 or death. To examine the impact of potential residual confounding on study results, we conducted additional sensitivity analyses as described by Schneeweiss41: (a) we used the rule-out approach to identify the strength of the residual confounding that could fully explain the association observed in primary analyses, and (b) we applied an external adjustment approach using external information (prevalence and risk estimates from published literature) to evaluate potential net confounding bias due to unmeasured confounders.2 41–44 Methods are described elegantly by Schneeweiss.41 In addition, to remove death events that were less likely to be related to PPI exposure, we excluded cohort participants who died within 90 days after the first PPI or H2 blocker prescription.

We conducted analyses based on a three-level classification of exposure, where patient's status at time t could be current use (using PPI or finished last PPI prescription within 90 days before t), past use (used PPI after T0 but finished more than 90 days before t) and never use. We conducted additional sensitivity analyses, which included haemoglobin as a covariate in cohort participants with available data. We also undertook analyses that stratified the cohort based on cardiovascular disease, history of pneumonia, CKD (eGFR <60 and ≥60 mL/min/1.73 m2) or age (<65 and ≥65 years old) at T0. Finally, and in order to ascertain the specificity of the findings, we examined the association between PPI exposure and the risk of a motor vehicle accident as a tracer outcome where a priori knowledge suggests an association is not likely to exist.

Patient involvement

No patients were involved in developing the hypothesis, the specific aims or the research questions, nor were they involved in developing plans for design or implementation of the study. No patients were involved in the interpretation of study results or write up of the manuscript. There are no plans to disseminate the results of the research to study participants or the relevant patient community.


The demographic and health characteristics of the overall primary cohort of new users of acid suppression therapy (n=349 312), by type of acid suppressant drug at time of cohort entry (H2 blockers n=73 335; PPI n=275 977), and those who were ever exposed to PPI (n=309 113) are provided in table 1. There were significant baseline differences in that cohort participants who were treated with PPI were older and were more likely to have comorbid conditions, including diabetes, hypertension, cardiovascular disease and hyperlipidaemia. Cohort participants treated with PPI were also more likely to have upper GI tract bleeding, ulcer disease, H. pylori infection, Barrett's oesophagus, achalasia, stricture and oesophageal adenocarcinoma (table 1). Survival curves for PPI and H2 blockers are presented in figure 1.

Figure 1

Survival curves for PPI and H2 blockers. PPI, proton pump inhibitor.

Association between PPI use and risk of death

Among new users of acid suppression therapy (n=349 312), and over a median follow-up of 5.71 years (IQR 5.11–6.37), where exposure was treated as a time-dependent covariate, PPI use was associated with increased risk of death compared with H2 blockers use (HR 1.25, CI 1.23 to 1.28) (table 2). Among new users of acid suppression therapy (n=3 49 312), in high-dimensional propensity score-adjusted models, new PPI users had increased risk of death compared with new users of H2 blockers (HR 1.16, CI 1.13 to 1.18); based on two-stage residual inclusion estimation, risk of death was higher in new PPI users when compared with new users of H2 blockers (HR 1.21, CI 1.16 to 1.26). In a 1:1 time-dependent propensity score-matched cohort of new users of PPI and H2 blockers (n=1 46 670), PPI users had significantly increased risk of death (HR 1.34, CI 1.29 to 1.39).

We examined the relationship of PPI and risk of death in secondary cohorts (as described in the Methods section) where we considered risk associated with PPI use versus no known exposure to PPI (no PPI use ±H2 blockers use) (n=3 288 092); the results suggest that PPI use was associated with increased risk of death (HR 1.15, CI 1.14 to 1.15) (table 2). Assessment of risk of death associated with PPI use versus no known exposure to any acid suppression therapy (no PPI use and no H2 blockers use) (n=2 887 070) suggests increased risk of death with PPI use (HR 1.23, CI 1.22 to 1.24).

Table 2

Association between PPI use and risk of death

Association between PPI use and risk of death in those without GI conditions

We then analysed the association between PPI use and risk of death in cohort where we excluded participants with documented medical conditions generally considered as indications for treatment with PPI, including GERD, upper GI tract bleeding, ulcer disease, H. pylori infection, Barrett's oesophagus, achalasia, stricture and oesophageal adenocarcinoma. The intent of this analysis was to examine the putative association of PPI use and risk of death in a lower risk cohort. Examination of risk of death associated with use of acid suppression therapy (PPI vs H2 blockers) suggests that risk of death was increased with PPI use (HR 1.24, CI 1.21 to 1.27) (table 2). Examination of the risk of death associated with PPI use versus no known exposure to PPI (no PPI use ±H2 blockers use) suggests a higher risk of death associated with PPI use (HR 1.19, CI 1.18 to 1.20). Results were consistent where we examined risk of death associated with PPI use versus no known exposure to any acid suppression therapy (no PPI use and no H2 blockers use) (HR 1.22, CI 1.21 to 1.23). Risk of death associated with PPI use in cohort participants without GI conditions but included participants with GERD yielded consistent results (PPI vs H2 blockers (HR 1.24, CI 1.21 to 1.27); PPI vs no PPI (HR 1.14, CI 1.13 to 1.14); PPI vs no PPI and no H2 blockers (HR 1.22, CI 1.21 to 1.22)) (table 2).

Duration of exposure and excess risk of death

We examined the association between duration of PPI exposure and risk of death among new users of PPI (n=166 098). Compared with those exposed for ≤30 days, there was a graded association between duration of exposure and risk of death among those exposed for 31–90, 91–180, 181–360 and 361–720 days (table 3, figure 2).

Figure 2

Duration of PPI exposure and risk of death among new PPI users (n=166 098). PPI, proton pump inhibitor.

Table 3

Duration of exposure to PPI and risk of death among new users of PPI (n=1 66 098)

Table 1

Baseline demographic and health characteristics of overall primary cohort of new users of acid suppression therapy, by type of acid suppressant at the time of cohort entry, and those who were ever exposed to PPI

Sensitivity analyses

We tested the robustness of study results in sensitivity analyses where we built a less contemporary cohort as described in the Methods section; demographic and health characteristics of this cohort are provided in online supplementary table 1. Where exposure was treated as time dependent, PPI use was associated with increased risk of death compared with H2 blockers use (HR 1.17, CI 1.15 to 1.19). In a 1:1 time-dependent propensity score-matched cohort of PPI and H2 blockers, PPI users had significantly increased risk of death HR 1.21 (CI 1.19 to 1.24). Furthermore, we also observed a graded association between cumulative duration of exposure to PPI and risk of death (see online supplementary table 2 and online supplementary figure 3).

Supplementary Material

Supplementary data 2

To examine the potential impact of residual confounding on study results, we used rule-out and external adjustment approaches as described by Schneeweiss.41 Using the rule-out approach, we characterised a set of parameters (OR for relationship of PPI and confounder and HR for relationship of confounder and death) with sufficient strength to fully explain the association observed in primary analyses (see online supplementary figure 4). For example, if the confounder was two times as likely among PPI users (OR=2), and the HR of death associated with the uncontrolled confounder exceeded 4.0, then the uncontrolled confounder would fully explain the observed association between PPI and death (see online supplementary figure 4). Given that our analyses accounted for most known strong independent risk factors of death and employed an active comparator group, to cancel the results, any uncontrolled confounder of the required prevalence (OR 2 or more in the example above) and strength (HR 4 or more in the example above) would also have to be independent of the confounders already adjusted for and is unlikely to exist; thus, the results cannot be fully explained by this putative uncontrolled confounder.

External adjustment to estimate the impact of three unmeasured confounders, including obesity, smoking and use of therapeutics including anticoagulants, antiplatelet agents and non-steroidal anti-inflammatory drugs, shows a net confounding bias of 9.66% (see online supplementary figure 5). The total bias could move a null association between PPI and death from HR 1.00 to HR 1.10 (reflecting the net positive bias of 9.66% rounded up to 10.0%). The association we observed between PPI and death was 1.25>1.10, which cannot be fully due to bias of unmeasured confounding.

In analyses where time-dependent exposure was classified as current use (within 90 days), past use (use prior to 90 days) and never use of PPI, compared with use of H2 blockers and never use of PPI (the reference group), current use of PPI and past use of PPI were associated with increased in risk of death (HR 1.23, CI 1.21 to 1.26, and HR 1.53, CI 1.50 to 1.57, respectively).

The association between PPI and death remained significant after excluding cohort participants who died within 90 days after the first PPI or H2 blocker prescription (HR 1.23, CI 1.20 to 1.26), or additionally controlling for haemoglobin levels (HR 1.25, CI 1.23 to 1.28). In models stratified for the presence of cardiovascular disease, history of pneumonia, CKD and age at T0, there was increased risk of death associated with PPI use in those with and without cardiovascular disease (HR 1.19, CI 1.15 to 1.23, and HR 1.30, CI 1.27 to 1.34, respectively), with and without history of pneumonia (HR 1.39, CI 1.32 to 1.45, and HR 1.21, CI 1.18 to 1.24, respectively), with and without CKD (HR 1.18, CI 1.14 to 1.22, and HR 1.29, CI 1.26 to 1.33, respectively) and above and below age 65 years (HR 1.17, CI 1.13 to 1.20, and HR 1.44, CI 1.39 to 1.50, respectively). As a test of specificity, among users of acid suppression therapy, PPI use was not associated with increased risk of the tracer outcome of a motor vehicle accident (HR 0.99, CI 0.89 to 1.10).


This study provides insights into the excess risk of death associated with PPI use. In a large primary cohort of new users of acid suppression therapy followed for a median of 5.71 years, we show a significant association between PPI use and risk of all-cause mortality. Risk was increased among those with no documented medical indications for PPI use and with prolonged duration of use. The results were consistent in multiple analyses and robust to changes in epidemiological design and statistical specifications, and were reproduced in an earlier and less contemporary cohort from an era where PPI use was far less frequent.45

PPI are widely used by millions of people for indications and durations that were never tested or approved; they are available over the counter (without prescription) in several countries and generally perceived as safe class of therapeutics. They are often overprescribed, rarely deprescribed and frequently started inappropriately during a hospital stay, and their use extended for long-term duration without appropriate medical indication.46–50 Results of nationally representative data from the National Health and Nutrition Examination Survey, where analyses were weighted to represent the US adult population, showed that the use of prescription PPI increased from 3.9% to 7.8% from 1999–2000 to 2011–2012, representing a doubling of prevalence ratio.45 Studies estimate that between 53% and 69% of PPI prescriptions are for inappropriate indications46 51 where benefits of PPI use may not justify the risks for many users.51–53 The findings in our study highlight a potential excess risk of death among users of PPI, and in particular among cohort participants without GI comorbidities, and that risk is increased with prolonged duration of PPI exposure. Although our results should not deter prescription and use of PPI where medically indicated, they may be used to encourage and promote pharmacovigilance and emphasise the need to exercise judicious use of PPI and limit use and duration of therapy to instances where there is a clear medical indication and where benefit outweighs potential risk.1 Standardised guidelines for initiating PPI prescription may lead to reduced overuse,54 regular review of prescription and over-the-counter medications and deprescription where a medical indication for PPI treatment ceases to exist may be a meritorious approach.52

The biologic mechanism underpinning the association of PPI use and risk of death is not clear. Experimental evidence in rats suggests that PPI administration limits the regenerative capacity of livers following partial hepatectomy.55 Administration of PPI upregulates expression of mRNA and protein level and results in increased activity of the heme oxygenase-1 enzyme in gastric and endothelial cells.56 Heme oxygenase-1 is generally seen as salutary, but its beneficial properties are vitiated at higher doses, and with sustained duration of expression.57 PPI treatment impairs lysosomal acidification and proteostasis and results in increased oxidative stress, dysfunction, telomere shortening and accelerated senescence of human endothelial cells.18 58 Wu and collaborators undertook a systematic toxicity mechanism analysis using a high-throughput in silico analysis of microarray data; they reported that PPI upregulated genes in the cellular retinol metabolism pathway and downregulated genes in the complement and coagulation cascades pathway, and that PPI may block pathways of antigen presentation and abrogate the synthesis and secretion of cytokines and complement component proteins and coagulation factors.58 59 How the changes in gene expression contribute to excess risk of death is not yet entirely clear. The plausible clinical course leading to heightened risk of death is likely mediated by the occurrence of one or more of the adverse events associated with PPI use (kidney disease, dementia, hypomagnesemia, C. difficile infection, osteoporotic fracture and so on). Further studies are needed to characterise the biologic mechanisms that might explain the epidemiological findings in this report.

The constellation of findings in this report must be interpreted with the full cognizance of the observational study design where confounding by indication and selection bias may represent limitations. We employed an analytic strategy to evaluate the risk of death among users of acid suppression therapy (PPI and H2 blockers), a class of therapeutics generally prescribed for similar indications, a strategy that may lessen but does not completely eliminate the possibility of confounding by indication bias. We additionally built time-dependent propensity score-matched cohort and high-dimensional propensity score-adjusted models, and we employed the use of instrumental variable to reduce potential confounding bias. Although we accounted for known covariates in our analyses, it is possible that there are residual confounders (either unmeasured or unknown) that may still confound the association of PPI and risk of death. However, we evaluated the impact of residual confounding in quantitative bias analyses, and the results suggest that even with the application of unlikely (and exaggerated) set of assumptions, the risk cannot be fully explained by residual confounding. In our analyses, we defined drug exposure as having a prescription for it. Because PPIs (and H2 blockers) are available over the counter in the USA, it is possible that some patients in this cohort may have obtained and used PPI without prescription. However, owing to financial considerations, this is not highly likely, and if it occurred in some patients, it will have biased the results against the primary hypothesis and resulted in underestimation of risk. The cohort included mostly older white male US veterans, which may limit the generalisability of study results to a broader population. Our data sets did not include information on the cause of death. The study has a number of strengths, including the use of national large-scale data from a network of integrated health systems, which were captured during routine medical care that minimises selection bias. We employed a new user (incident user) approach and evaluated the association between PPI use and risk of death using a number of analytical approaches where we consistently found a significant association between PPI use and increased risk of death. The consistency of study findings in our report and the growing body of evidence in the literature showing a host of adverse events associated with PPI use are compelling, and because of the high prevalence of PPI use, it may have public health implications. Exercising pharmacovigilance and limiting PPI use to instances and durations where it is medically indicated may be warranted.

Supplementary Material

Supplementary Figure Legends


  1. 1.↵
  2. 2.↵
  3. 3.↵
  4. 4.↵
  5. 5.↵
  6. 6.↵
  7. 7.↵
  8. 8.↵
  9. 9.↵
  10. 10.↵

Leave a Comment


Your email address will not be published. Required fields are marked *