Background

Immune checkpoint blockers (ICBs), and specifically the use of antibodies against programmed cell death-1 (PD-1), programmed cell death ligand-1 (PD-L1), and cytotoxic T-lymphocyte-associated antigen-4 (CTLA-4) have been approved for use in several cancer types and demonstrated improved overall survival (OS) compared with the standard therapy (AstraZeneca Pharmaceuticals 2017; Bristol-Myers Squibb 2013; Bristol-Myers Squibb 2017; Genentech 2017; Merck and Company Inc. 2017; Pfizer 2017). The US Food and Drug Administration (FDA) supports the use of surrogate endpoints in oncology trials, especially for accelerated approvals (Johnson et al. 2003). In fact, several chemotherapies have relied on response endpoints, such as objective response rate (ORR) with or without duration data, for the basis of regular approvals (Johnson et al. 2003).

Although improvement in OS is generally the most desirable endpoint in oncology clinical trials, there is an ongoing interest in identifying and validating surrogate endpoints that can better predict the likelihood of OS to improve the design of clinical studies and potentially expedite the approval of novel agents (Booth and Eisenhauer 2012; Foster et al. 2011; Kemp and Prasad 2017). Correlations between endpoints such as progression-free survival (PFS), ORR, disease control rate (DCR), and time to progression and OS have been investigated, but correlations between these endpoints and OS have not been thoroughly investigated for ICBs (Flaherty et al. 2014; Prasad et al. 2015).

Endpoints such as ORR and PFS traditionally used to evaluate the effect of drugs that act directly on the tumor may not be the most appropriate for ICBs that are characterized by a very different mechanism of action. Unlike other cancer therapies that act directly on tumor cells, ICBs act indirectly by enhancing antitumor immune responses and eliciting lymphocyte infiltration into the tumor, thereby frequently resulting in an initial tumor enlargement of varying degrees depending on the tumor type, and possibly the appearance of new lesions, with subsequent reduction in tumor size and number of lesions mediated by ongoing immunologic mechanisms (Hersh et al. 2011; Hodi et al. 2016b; Seymour et al. 2017; Wolchok et al. 2009).

Among several randomized trials investigating ICBs vs standard therapies in patients with non-small-cell lung cancer (NSCLC), renal-cell carcinoma (RCC), and head and neck squamous-cell carcinoma (HNSCC) using the conventional response assessment tool, Response Evaluation Criteria in Solid Tumors (RECIST), it was found that, whereas PFS was similar between ICBs and standard treatment, OS associated with ICBs was statistically superior (Borghaei et al. 2015; Ferris et al. 2016; Herbst et al. 2016; Motzer et al. 2015; Rittmeyer et al. 2017). Only a few randomized trials testing ICBs vs chemotherapy in patients with melanoma and NSCLC have shown an OS benefit associated with ICBs in conjunction with both ORR and PFS benefit, as evaluated by standard RECIST criteria (Reck et al. 2016; Robert et al. 2015a). Furthermore, in a study investigating pembrolizumab in patients with melanoma, response patterns were evaluated using both RECIST and an alternative response assessment tool, called immune-related response criteria (irRC) (Hodi et al. 2016b). RECIST did not appear to capture the true benefit associated with ICBs, and it was suggested that use of irRC may prevent premature discontinuation of ICB therapy.

Relationships between clinical endpoints may vary with the use of single-agent ICBs or combination ICBs, and with the tumor type being treated. For example, in a recent study conducted in patients with RCC treated with ipilimumab/nivolumab combination therapy vs sunitinib, the combination regimen provided a superior OS benefit, but failed to provide a statistically significant PFS benefit over targeted therapy in either the intent to treat (ITT) population or the patients with intermediate/poor risk. Although the PFS curves started to diverge after the 6-month timepoint, in favor of ipilimumab/nivolumab, this trend was not statistically significant (Escudier et al. 2017). In contrast, in a study conducted in patients with NSCLC with a high PD-L1 expression level (tumor proportion score ≥ 50%) treated with pembrolizumab monotherapy vs chemotherapy, the single-agent ICB provided both PFS and OS benefit over chemotherapy (Reck et al. 2016).

The aim of this systematic literature review and meta-analysis was to assess whether any of these endpoints that are typically used in cancer studies could function as surrogates for OS in studies involving ICBs. We identified relevant randomized controlled trials (RCTs) of ICBs over the past 12 years and analyzed both arm- and comparison-level data to explore the relationship between OS and clinical endpoints (ORR, DCR, and PFS) in patients with solid tumors treated with single-agent ICBs or ICBs in combination with chemotherapy, compared with patients treated with chemotherapy.

Methods

Literature selection

A systematic literature review was conducted, using Medline, Embase, and CENTRAL (indexed databases) to identify RCTs published between January 2005 and March 2017. Congress proceedings from the American Society of Clinical Oncology, the European Society for Medical Oncology, the American Head and Neck Society, the European Lung Cancer Conference, and the Society for Melanoma Research published between 2014 and 2016 were also searched, as well as 2 clinical trials registries (clinicaltrials.gov and clinicaltrialsregister.eu). Studies that assessed the efficacy of agents targeting PD-1 (nivolumab, pembrolizumab, pidilizumab, MEDI0680, REGN2810, PDR001), PD-L1 (atezolizumab, avelumab, durvalumab), or CTLA-4 (ipilimumab, tremelimumab) in adult patients with melanoma, NSCLC, HNSCC, RCC, or urothelial carcinoma (UC) were selected. The detailed search strategies for this analysis are included in Tables 1 and 2.

Table 1 Search terms and yield
Table 2 Conference proceedings’ search strategy

Publications were initially screened by title and abstract by a single investigator with 10% of the selections reviewed by a second investigator and discrepancies resolved by consensus, or by a third investigator. Once selected as relevant, the full-text articles of the publications were reviewed. For inclusion in this analysis, the study had to report OS in addition to at least one other clinical endpoint [i.e., PFS, ORR, and DCR (complete response + partial response + stable disease), per RECIST or modified World Health Organization (WHO) criteria] as determined by review of the full article.

Studies were excluded if the investigation focused on another class of immunotherapy, such as a vaccine or cytokine-based agents, or if the ICB was delivered concurrently with radiotherapy and/or surgery, or if non-pharmacologic interventions were used as comparators. To minimize risk of bias, case studies, case series, and case reports were excluded from the analysis in favor of RCTs.

Data source

In the arm-level analysis, studies were included if each treatment arm’s absolute effects were reported/could be derived for ORR, DCR, 6- and 9-month PFS, median PFS, median OS, or OS at 12 or 18 months. For the comparison-level analyses, studies were included if the treatments’ relative effects [odds ratios (ORs)] on ORR and DCR or hazard ratios (HRs) on PFS and OS were reported/could be derived. When HRs for PFS/OS or PFS/OS rates at specific timepoints were unavailable, the Kaplan–Meier graphs were digitized to manually calculate this information (Guyot et al. 2012; Hoyle and Henley 2011). Similarly, when ORs for ORR/DCR were unavailable, these values were similarly calculated.

Relevant data were directly extracted from studies as reported. Arm-level data for each of the outcomes were extracted including the number of patients with the event, number of patients evaluated for the event, and the ITT or modified ITT results. Comparison-level data comparing the 2 treatments on any of the outcomes were also extracted including relative risk (OR or HR).

Data from full publications were extracted by one investigator. Data presented at congresses for the same study were reviewed and any unique, additional data were identified and captured. Data extraction was independently validated by a second investigator, and a third investigator was consulted to resolve disagreements, if necessary.

Arm-level correlative analysis provides initial insights into the absolute effect of a therapy on any given endpoint, although this analysis is limited by the inherent association between the selected candidate surrogates and OS in the same treatment arm, and is likely to be confounded by variations in baseline patient characteristics across different studies (Prasad et al. 2015). For example, a study with a patient population with poorer performance status is more likely to have both lower PFS and OS at any given timepoint than a study with healthier patients. Thus, a strong correlation identified by arm-level analysis can be an artifact, to some degree, of differences in patient populations across studies. Correlations found with arm-level analysis are less reliable compared with correlations found using data from multiple trials (comparison-level/trial-level correlative analysis) (Prasad et al. 2015). Comparison-level/trial-level correlative analysis provides insights into the relative effect of a therapy on a given endpoint, establishing the most reliable surrogates (Prasad et al. 2015). However, if a variable is a treatment-effect modifier, such that the relative effect is greatly dependent on factors that vary across studies, findings obtained with comparison-level analysis may not be reliable or generalizable.

Statistical analysis

A pooled analysis was conducted for single-agent ICBs and ICBs in combination with chemotherapy vs chemotherapy. A separate analysis was also conducted for studies including only ICBs as single agents vs chemotherapy. Weighted linear regression models were fitted with adjusted R2 values calculated to estimate the total amount of variation explainable by the predictor. Unlike standard R2 values, the adjusted R2 values account for the number of predictors in the model, and the closer the adjusted R2 is to 1, the stronger the correlation. We stratified analyses by treatment regimen (single-agent ICB or ICB in combination with chemotherapy), by type of ICB (PD-1/PD-L1 or CTLA-4), and by indication, where data permitted. For both levels of analysis, regression scatter plots were used to present results. In the arm-level analyses, we evaluated the predictive values of PFS rate at 6 months and at 9 months relative to the OS rate at 18 months for all studies (various solid tumors) with a separate analysis limited to NSCLC-only studies. For each of these investigations, subanalyses were conducted to assess effects linked to type of therapy (single-agent ICBs vs both single-agent ICBs plus ICBs in combination with chemotherapy) and class of ICB (PD-1/PD-L1 vs CTLA-4). In the comparison-level analyses, we evaluated whether the ORR OR and the DCR OR could predict the OS HR in the pooled analysis as well as in the single-agent ICB analysis. In addition, we evaluated whether the PFS HR at 6 months could predict the OS HR in the pooled analysis as well as in the single-agent ICB analysis. In this analysis, data from patients treated with a combination of 2 ICBs were not included.

To determine surrogacy, we used the following threshold criteria (Kemp and Prasad 2017; Validity of surrogate endpoints in oncology Executive summary of rapid report A10-05, Version 1.1 2005): low correlation was indicated by r ≤ 0.7 (corresponding to R2 ≤ 0.49), medium strength correlation was indicated by r > 0.7 to r < 0.85 (corresponding to R2 > 0.49 to R2 < 0.72), and high correlation was indicated by r ≥ 0.85 (corresponding to R2 ≥ 0.72).

Results

We identified 32 publications that met the inclusion criteria for a total of 27 RCTs involving 61 treatment arms and 10,300 patients (Fig. 1; Table 3). Most of these studies were conducted in patients with melanoma (52%; 14/27), followed by NSCLC (33%; 9/27); 2 trials were conducted in patients with UC, one study in patients with RCC, and one study in patients with HNSCC. Most studies (59%; 16/27) evaluated the efficacy of ICBs vs chemotherapy, and among these, 81% (13/16) investigated single-agent ICBs; 11% (3/27) of studies investigated ICBs in combination with chemotherapy. Ipilimumab monotherapy and nivolumab monotherapy were investigated in eight studies each (30%), pembrolizumab monotherapy in five studies (19%), atezolizumab monotherapy in two studies, and tremelimumab monotherapy in one study. Ipilimumab was tested as part of a combination regimen or a sequential therapy approach in nine studies, nivolumab-based combination regimens were assessed in two studies, and pembrolizumab-based combination regimens were assessed in one study. The analysis plan included assessments that stratified by PD-L1 expression of patient tumor samples, and while 48% (13/27) of studies reported PD-L1 expression status, the testing methods and thresholds for PD-L1 expression status were not uniform, thereby precluding a meaningful stratified analysis. From digitized Kaplan–Meier curves, 24 arms of virtual arm-level data were generated for PFS HRs at 6 and 9 months. Rates for PFS at 6 and 9 months and OS at 12 and 18 months were calculated from Kaplan–Meier curves of 11 RCTs.

Fig. 1
figure 1

PRISMA flow diagram. Graphical representation of the flow of citations reviewed in the course of this systematic review, including number of records identified, included and excluded, and the reasons for exclusions. *Melanoma: CA184-022 (NCT00289640), CA184-025 (NCT00162123), CA184-004 (NCT00261365), CA184-013 (NCT00050102), E1608 (NCT01134614), NCT01740297, CheckMate 069 (NCT01927419), CheckMate 064 (NCT01783938), KEYNOTE-002 (NCT01704287), CA184-024 (NCT00324155), NCT00257205, CheckMate 066 (NCT01721772), KEYNOTE-006 (NCT01866319), CheckMate 037 (NCT01721746). NSCLC: CA184-041 (NCT00527735), POPLAR (NCT01903993), KEYNOTE-021 (NCT02039674), KEYNOTE-010 (NCT01905657), CheckMate 017 (NCT01642004), CheckMate 057 (NCT01673867), CheckMate 026 (NCT02041533), KEYNOTE-024 (NCT02142738), OAK (NCT02008227). UC: CheckMate 032 (NCT01928394), KEYNOTE-045 (NCT02256436). RCC: CA209-010 (NCT01354431). HNSCC: CheckMate 141 (NCT02105636). HNSCC head and neck squamous cell carcinoma, MA meta-analysis, NSCLC non-small cell lung cancer, PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses, RCC renal cell carcinoma, SLR systematic literature review, UC urothelial carcinoma

Table 3 RCTs and related publications included in meta-analysis

Overall, the relationship between absolute effects for ICBs was similar to chemotherapy in that higher PFS rates at 6 months predicted higher OS rates at 18 months (Fig. 2). However, compared with chemotherapy arms, the ICB arms had a higher average OS rate for any given PFS rate (Fig. 2). The relationships between variables were the same in the pooled and in the single-agent ICB analyses (Figs. 2a, 3a). Results by type of ICB revealed stronger correlations for PD-1/PD-L1 than for CTLA-4 between potential surrogates and OS (Fig. 2c). Although there were very few anti-CTLA-4 studies, some analyses suggest that there may be a lower or nonexistent relationship at the absolute level between surrogates and OS in studies investigating antibodies against CTLA-4. This would need to be confirmed with a larger sample size. The relationship between PFS rates at 6 months and OS rates at 18 months was similar in studies conducted in various tumor types (Figs. 2a, 3a), as well as in those conducted in NSCLC only (Figs. 2b, 3b). For the NSCLC analysis, although the lines do cross, the slopes of the lines are not significantly different for the anti-PD-1/PD-L1 agents vs chemotherapy and indicate no distinct relationship particular to NSCLC patients. In addition to the correlation observed between 6-month PFS and 18-month OS, a similar correlation between 9-month PFS and 18-month OS was found (Figs. 4, 5).

Fig. 2
figure 2

Arm-level analyses of the correlation between PFS at 6 months and OS in the pooled ICB studies. PFS rate at 6 months predicting OS rate at 18 months in the pooled analysis (a), in NSCLC-only studies included in the pooled analysis (b), in the pooled analysis stratified by class of ICB therapy (c), and in NSCLC-only studies included in the pooled analysis stratified by class of ICB therapy (d). CTLA-4 cytotoxic T-lymphocyte-associated antigen-4, ICB immune checkpoint blocker, NE not estimable, NSCLC non-small cell lung cancer, OS overall survival, PD-1 programmed cell death-1, PD-L1 programmed cell death ligand-1, PFS progression-free survival

Fig. 3
figure 3

Arm-level analyses of the correlation between PFS and OS in the single-agent ICB studies. PFS rate at 6 months predicting OS rate at 18 months in the single-agent ICB analysis (a), and in NSCLC-only studies included in the single-agent ICB analysis (b). CTLA-4 cytotoxic T-lymphocyte-associated antigen-4, ICB immune checkpoint blocker, NSCLC non-small-cell lung cancer, OS overall survival, PD-1 programmed cell death-1, PD-L1 programmed cell death ligand-1, PFS progression-free survival

Fig. 4
figure 4

Arm-level analyses of the correlation between PFS at 9 months and OS in the pooled ICB studies. PFS rate at 9 months predicting OS rate at 18 months in the pooled analysis (a), in NSCLC-only studies included in the pooled analysis (b), in the pooled analysis stratified by class of ICB therapy (c), and in NSCLC-only studies included in the pooled analysis stratified by class of ICB therapy (d). CTLA-4 cytotoxic T-lymphocyte-associated antigen-4, ICB immune checkpoint blocker, NSCLC non-small-cell lung cancer, OS overall survival, PD-1 programmed cell death-1, PD-L1 programmed cell death ligand-1, PFS progression-free survival

Fig. 5
figure 5

Arm-level analyses of the correlation between PFS at 9 months and OS in the single-agent ICB studies. PFS rate at 9 months predicting OS rate at 18 months in the single-agent ICB analysis (a), and in NSCLC-only studies included in the single-agent ICB analysis (b). CTLA-4 cytotoxic T-lymphocyte-associated antigen-4, ICB immune checkpoint blocker, NE not estimable, NSCLC non-small-cell lung cancer, OS overall survival, PD-1 programmed cell death-1, PD-L1 programmed cell death ligand-1, PFS progression-free survival

The comparison-level analysis shows that, across the included studies, treatment superiority on some surrogate endpoints is weakly predictive of treatment superiority on the final outcome (OS). There was largely a weak or a nonsignificant correlation between either ORR OR or DCR OR and OS HR and this held true even when the data were stratified by treatment type (Fig. 6). In the pooled analysis, there was no significant correlation between ORR OR and OS HR (adjusted R2 = − 0.069; P = 0.866; Fig. 6a). Likewise, the relationship between DCR OR and OS HR was not statistically significant (adjusted R2 = 0.271; P = 0.107; Fig. 6c). In the single-agent ICB analysis, there was no significant correlation between ORR OR and OS HR (adjusted R2 = − 0.084; P = 0.799; Fig. 6b), and the correlation between DCR OR and OS was statistically significant, although this DCR analysis was based on a limited number of studies (n = 4; adjusted R2 = 0.964; P = 0.012; Fig. 6d). The relationship between DCR OR and OS is statistically significant, but it is based on very limited data and the slope remains nearly flat, close to zero, making it of limited utility even were it to be confirmed with additional data.

Fig. 6
figure 6

Comparison-level analyses of the correlation between ORR/DCR and OS. ORR and DCR odds ratio predicting OS hazard ratio in the pooled analysis (a, c, respectively), and in the single-agent ICB analysis (b, d, respectively). aPlease note this analysis is based on only 4 studies, hence, we cannot draw conclusions based on R2 and P values. CTLA-4 cytotoxic T-lymphocyte-associated antigen-4, DCR disease control rate, HNSCC head and neck squamous cell carcinoma, HR hazard ratio, ICB immune checkpoint blocker, NSCLC non-small cell lung cancer, OR odds ratio, ORR objective response rate, OS overall survival, PD-1 programmed cell death-1, PD-L1 programmed cell death ligand-1, UC urothelial carcinoma

There was a weak to moderate correlation between the PFS HR and the OS HR (Fig. 7). In the pooled analysis, the PFS HR correlated weakly with OS HR (adjusted R2 = 0.366; P = 0.005; Fig. 7a) and this correlation remained consistent in the single-agent ICB analysis (adjusted R2 = 0.452; P = 0.005; Fig. 7b). The PFS HR at 6 months was highly predictive of OS HR in the single-agent ICB analysis (adjusted R2 = 0.907; P < 0.001; Fig. 7d), but was weakly predictive in the pooled analysis (adjusted R2 = 0.333; P = 0.023; Fig. 7c).

Fig. 7
figure 7

Comparison-level analyses of the correlation between PFS and OS. PFS hazard ratio predicting OS hazard ratio in the pooled analysis (a), as well as in the single-agent ICB analysis (b), and 6-month PFS HR predicting OS hazard ratio in the pooled analysis (c), as well as in the single-agent ICB analysis (d). CTLA-4 cytotoxic T-lymphocyte-associated antigen-4, HNSCC head and neck squamous cell carcinoma, HR hazard ratio, ICB immune checkpoint blocker, NSCLC non-small cell lung cancer, OS overall survival, PD-1 programmed cell death-1, PD-L1 programmed cell death ligand-1, PFS progression-free survival, UC urothelial carcinoma

Discussion

The arm-level analysis indicated that higher PFS rates at 6-month predicted better OS rates at 18 months regardless of therapy. The comparison-level analysis showed that, among anti-PD-1/PD-L1 studies, PFS was an imperfect surrogate (low-to-moderate correlation) for OS, whereas ORR was not correlated with OS. DCR was not correlated with OS in the pooled analysis, but was correlated with OS in the single-agent ICB analysis. The predictive value of PFS HR at 6 months for OS HR in the single-agent ICB analysis was the strongest. Unfortunately, though this correlation is statistically significant in the analysis, it has little clinical value. For the majority of included studies, the PFS HRs’ cluster around 0.9–1.0, indicating little to no treatment effect of single-agent ICBs on PFS compared with chemotherapy. This corresponds to an OS HR of ~ 0.7, indicating OS advantage for a single-agent ICB vs chemotherapy. Although the minimal impact of a single-agent ICB on PFS may still underestimate the OS benefit, in a registrational trial of a new ICB, it would be illogical to predict an OS benefit of a single-agent ICB by this standard, as it would imply that a finding of PFS of near 1.0 would yield an OS of 0.7, possibly strong enough to declare success.

In a recent meta-analysis of 25 RCTs including a total of 20,013 patients with metastatic NSCLC (only 6 trials involved ICBs), a moderate association was found between OS rate at 12 months and OS HR (R2 = 0.80) and a modest association was found between OS rate at 9 months and OS HR (R2 = 0.67) (Blumenthal et al. 2017). The meta-analysis from Blumenthal et al. also found modest associations between ORR at 6 months and PFS HR (R2 = 0.70), and PFS rate at 9 months and PFS HR (R2 = 0.62) (Blumenthal et al. 2017). Our study was not designed to investigate correlations between OS rate and OS HR, or between ORR rate and PFS HR, or between PFS rate and PFS HR. However, both studies analyzed correlations between PFS and OS HR, and between ORR and OS HR. According to the meta-analysis from Blumenthal et al., there were no associations between the ORR at 6 months and OS HR, or between PFS rate at 9 months and OS HR, by trial-level analysis (Blumenthal et al. 2017). Our study, which involved 27 RCTs, all of which included ICBs, also found no association between ORR and OS HR by trial-level analysis, but did find an association between PFS HR at 6 months and OS HR, which was limited to the single-agent ICB analysis (adjusted R2 = 0.907).

Although used in ICB clinical trials, RECIST 1.1 criteria may not be the best metric to determine the clinical benefit associated with ICBs (Bellmunt et al. 2017; Hodi et al. 2016a; Ribas et al. 2013; Rittmeyer et al. 2017; Robert et al. 2015b). Chemotherapeutic agents are cytotoxic and act directly on rapidly dividing tumor cells, so these agents can quickly shrink tumor size, translating into an antitumor response as determined by RECIST 1.1 criteria (Eisenhauer et al. 2009). However, this antitumor response may not be sustained over time, so that an initial PFS benefit may not translate into an OS benefit (Booth and Eisenhauer 2012; Gatzemeier et al. 2000). In contrast, ICBs act on the immune system, whereby tumor-infiltrating lymphocytes and infiltration by other immune cells may lead to an initial increase in tumor size. This phenomenon is referred to as “pseudoprogression”, because by RECIST 1.1 standards, the apparent increase in tumor size indicates disease progression (Wolchok et al. 2009). With ICBs, the size increase may not be an increase in tumor burden, but rather an artifact of the inflammatory response that can be followed by subsequent tumor shrinkage, translating into a durable antitumor response (Hersh et al. 2011; Hodi et al. 2016b; Seymour et al. 2017). To better assess antitumor response associated with ICBs, the iRECIST criteria have been developed, which modify RECIST 1.1 criteria to account for unusual patterns of immune-based responses observed with ICBs (Seymour et al. 2017).

As novel ICB-based combination approaches are being evaluated in clinical trials, alternative endpoints that may fully capture the potential benefit associated with ICBs are being explored and validated (Checkpoint Inhibitors Spur Changes in Trial Design 2017). Therefore, in the near future, analyses of correlations between OS and novel endpoints may be possible. Potential endpoints for consideration may include classical clinical endpoints defined by the novel irRC/irRECIST/iRECIST criteria (Nishino et al. 2013; Seymour et al. 2017; Wolchok et al. 2009), which allow for progression prior to response, or newer endpoints defined by these criteria, such as durable response rate or sustained reduction or stability in overall tumor burden (Checkpoint Inhibitors Spur Changes in Trial Design 2017; Kaufman et al. 2017).

Study limitations

This study used only aggregate summary data from published studies and no patient-level data; therefore, we cannot necessarily assume that any statistical association observed between group-level variables may be translated to individual-level associations for analyses at the trial level. Therefore, our findings cannot be used to predict any outcome at the individual level. Analyses at arm level are limited by the inherent associations between different clinical endpoints and outcomes assessed. The relationship between potential clinical surrogate endpoints and OS may be further obscured by study crossover, wherein patients are allowed to switch from the control arm to the active treatment arm, thereby altering the disease course (Flaherty et al. 2014); and the extent of crossover is not always reported in the published studies, nor is the cross-over unadjusted/adjusted OS. In addition, due to the paucity of data, stratified analyses by indication or treatment type have limited power in detecting substantive relationships. Another limitation of this study regards the analysis conducted on DCR, given the fact that the duration requirement for stable disease, as part of the definition of DCR, differs across trials. Furthermore, this analysis is based on published data for ICBs that ultimately gained FDA approval; therefore, it remains uncertain how the surrogate endpoints assessed correlate to OS in the context of other ICBs not proven to impact OS. There are several studies underway that, once completed, will provide additional data for analysis, and may result in supporting more robust associations.

Conclusions

This study and previous meta-analyses have failed to identify a clinical endpoint that is suitable as a surrogate for OS in studies involving ICBs, with PFS HR at 6 months being a moderately strong predictor of OS for studies involving single-agent ICBs. Although identification of baseline gene signatures predictive of response has gained some traction (e.g., tumor mutational burden or interferon gamma gene signatures), there are few publications that attempt to identify markers (biologic, radiologic, or otherwise) that correlate with OS and can be measured at early timepoints after the onset of ICB therapy. As none of classical clinical endpoints used in oncology trials was found as potential surrogate for OS, it is of paramount importance that efforts to identify novel surrogates for efficacy be supported and encouraged in academia and in the biotech/pharma industry to expedite the development of life-saving drugs.