Introduction

Antiretroviral therapy (ART) effectively suppresses HIV viral load and reduces morbidity and mortality in people living with HIV (PLH). However, ART alone cannot eradicate the infection and lifelong treatment is needed [1]. The success of the first two individuals cured of HIV [2, 3] and new discoveries of multiple broadly neutralizing antibodies (bNAbs) and therapeutic vaccines have renewed interest in developing strategies for sustaining long-term viral control without ART [4•, 5]. The main obstacle to eradicating the virus is HIV reservoirs, cells where HIV is able to remain “latent” by being inactive [6]. Multiple laboratory assays have been developed to measure the size of the HIV reservoir or other metrics of HIV persistence in PLH on treatment (levels of residual viremia, cell-associated HIV RNA, integrated total HIV DNA, infectious units per million cells (IUPM) determined by quantitative viral outgrowth assay, and intact proviral DNA assay (IPDA)) [7, 8]. These biomarkers have been primary endpoints of trials aiming to evaluate HIV remission strategies. However, studies show these reservoir measures do not correlate well with each other, and can under- or over-estimate reservoir size (e.g., the replication-competent virus population that start new rounds of infection) [9, 10]. Furthermore, these biomarker levels and the real clinical endpoint, absence of viral rebound after stopping ART, are often inconsistent [11, 12]. Thus, ART interruption remains the essential component of clinical trials to evaluate new strategies or interventions aiming to achieve viral control without treatment. Analytical treatment interruption (ATI) makes endpoints like time to viral rebound or viral control feasible using plasma HIV RNA measurement, the only FDA-approved clinical measure, in trials assessing efficacy of interventions aimed at achieving HIV remission or viral control.

A review by Lau et al. [13••] found that 159 clinical studies (with and without interventions more than ART) have incorporated ATI from 2000 to 2017, with significant variation in duration of ATI, monitoring strategies, and thresholds for restarting ART. A 2018 forum at the Ragon Institute of MGH, MIT, and Harvard gathered clinical researchers in Cambridge, MA, to formulate recommendations for conducting ATI trials that covered scientific value, risks/benefits, and ATI methodologies, including ethical and community perspectives. Major points of discussion and consensus viewpoints achieved were published [14••]. In this current review, we will focus on key study design aspects of ATI trials from the perspective of statisticians, including choosing efficacy outcome measures, ART resumption criteria, and single-arm versus placebo-controlled design.

Types of ATI Trials

When ATI was first introduced in HIV clinical trials in the 1990s and early 2000s, the only antiviral treatment was ART, and stopping treatment meant stopping all ART. Recently, the term ATI has been used in trial designs that stop ART but continue other investigational agents with potential antiviral activity, such as bNAbs. In several trials evaluating the safety and effect of one (VRC01 or 3BNC117) or multiple (3BNC117 and 1010-74) bNAbs in delaying viral rebound, participants received a first bNAb(s) infusion and then stopped ART, while continuing to receive two more doses of bNAb(s) [15,16,17]. Because trial participants are under coverage of bNAbs during the ATI phase after stopping ART, these trials assess the antiviral activity of other agents without ART and should be considered switch or maintenance studies.

Another design type assesses anti-reservoir activity or ability to induce prolonged HIV remission by stopping all interventions, including ART and other investigational agents that may have antiviral activity. These trials begin their ATI phase after all agents are cleared from participants’ systems, which may require a washout period under ART coverage to allow clearance of such investigational agents. An element of ATI study design thus may involve estimation of the pharmacokinetics and half-life of the experimental agents, often based on data from prior studies but potentially reassessed in the first set of study participants before they reach the ATI time point. For example, the nonhuman primate study of bNAb ± TLR7 agonist included a 16-week period on ART after the interventions to allow bNAb washout prior to the ATI [18]. A washout period also helps to delineate the safety of ATI from that of investigational agents which will commonly be evaluated prior to initiating ATI. In this review, we will focus on the second type: trials that evaluate ATI after discontinuation of all interventions and if necessary, a washout period.

Evolution of ATI Trial Design

As initially used in late 1990s and early 2000s’ HIV clinical studies, the ATI was a mechanism for examining therapeutic vaccine effects or reducing ART exposure [19, 20]. With these objectives, typical ATI trials in that era were designed with one or more pre-determined treatment interruption periods, but with less stringent ART resumption criteria than most current ATI studies. The primary outcome measures for these trials were viral load set point, peak viral load, and viral burden [21,22,23,24].

The 2006 CD4-guided ART management trial called the SMART study assessed the benefits of treatment interruption and reduced ART exposure. SMART results showed that participants with prolonged treatment interruption had significantly increased risk of opportunistic disease, cardiovascular and other non-AIDS-defining events, and death [25], and raised concerns about ATI trials.

In 2009, Timothy Brown became the first individual to have been cured of HIV after stopping ART [2]. ATI trials have come back in focus as an approach to test the efficacy of strategies to control HIV viral replication without treatment. Contemporary ATI trial designs have addressed these safety concerns raised in the SMART study by evolving towards shorter treatment interruption phases and frequent participant monitoring [13••], aiming to reduce prolonged viremia risks through frequent viral load monitoring and then restarting ART when viral load thresholds are reached [11]. These studies use different primary outcome measures (e.g., time to viral rebound, viral control post-rebound) to understand viral rebound kinetics and identify sustained HIV suppression [26]. Their main design characteristics include a single, potentially shorter-duration ATI phase, frequent HIV RNA monitoring, and immediate ART restart when the viremia threshold is reached.

Choices of Virologic Outcome Measures

Outcome measures should reflect the anticipated mechanism of action of the intervention and the associated scientific questions. ATI trials have used various outcome measures as virologic endpoints, e.g., peak viral load, rate of initial viral load increase during ATI, time averaged area under the curve during viral rebound, time to viral rebound, and viral set point [27]. These outcome measures quantify viral rebound kinetics, and trials using them have provided valuable knowledge and experience in understanding viral rebound kinetics. Here we focus on three common outcome measures in ATI trials: time to viral rebound, viral control, and viral set point.

Time to Viral Rebound

Time to viral load rebound quantifies the time an individual is off ART and maintains viral control below a prespecified HIV RNA threshold. Viral rebound is confirmed when two consecutive HIV RNA levels exceed the threshold. Several ATI studies have used time to viral rebound as a primary outcome measure [28,29,30,31]. Time to viral rebound may be the safest endpoint in an ATI trial as participants resume ART promptly once the rebound is confirmed and the endpoint is observed. This approach minimizes time off ART with viral load above the threshold.

Time to viral rebound is a time-to-event outcome measure and is observed with interval censoring, as rebound is detectable only when measuring HIV RNA during study visits. The exact rebound time will likely occur between visits. This outcome measure can be analyzed using classical approaches accommodating interval censoring [32]. However, these approaches either rely heavily on parametric assumptions difficult to verify in practice, or are computationally challenging [33]. Approaches relying on different parametric assumptions may make it challenging to compare results from different studies. Hence, analyses in most trials with a time to viral rebound outcome measure use the standard approach for right-censored data instead of interval-censored analysis approaches [18, 30]. Comparison between treatment groups can use median time to viral rebound or proportion without rebound at certain time points. The advantage of approaches for right-censored data is that they require fewer assumptions than interval-censored approaches and are easy to implement with available statistical computing software. Participants who resume ART for reasons other than viral rebound can be censored as non-informative; i.e., their time to resume ART for other reasons is plausibly independent of their (unobserved) viral rebound. However, methods for dealing with right-censored data are prone to potential bias introduced by monitoring frequency, as less-frequent HIV RNA monitoring would cause estimation bias toward later time to viral rebound than more frequent monitoring. For example, study A has a monthly visit schedule while study B has a weekly visit schedule. Viral rebounds for some participants might be detected during the extra weekly visits between the monthly visits, and therefore study B will likely estimate a shorter time to rebound. The different HIV RNA monitoring schedules require consideration when comparing results between trials and highlights the importance of standardizing visit schedules when designing future ATI trials.

Viral Control

Viral control during ATI evaluates participants at a prespecified time point after they stop ART and with HIV RNA below a predetermined threshold, while remaining off ART. Depending on the evaluation time point and threshold level, participants can achieve viral control before or after viral rebound. The importance of this outcome measure has been recognized when interventions with immunotherapy are being investigated for viral control. Moreover, this endpoint is expected to be necessary for regulatory approval of any interventions aiming to induce viral suppression in the absence of ART. Several recently developed studies evaluating combination interventions with immunotherapy are using viral control as the outcome measure (NCT04340596; NCT04357821; NCT03588715). Because of the immunologic nature of these study treatments, where a period of viremia may be needed to generate host mechanisms of immunologic control, the primary efficacy outcome assesses viral control at a prespecified time point after discontinuation of ART instead of time to viral rebound.

Viral control during ATI is a binary outcome. Participants remaining off ART but with HIV RNA above the threshold at evaluation are defined as failures. Participants meeting viral rebound criteria and resuming ART before evaluation week are also failures. Analytic approaches for this outcome are straightforward. Normally, the probability of viral control is estimated using the proportion of participants achieving viral control and confidence intervals. Comparison between arms use methods testing two independent samples, either with asymptotic approximation for trials with moderate-to-large sample sizes or exact methods for smaller trials. Direct comparisons between studies are possible. Investigators need to consider how to deal with participants who restart ART for other reasons, e.g., participant choice or pregnancy, before the evaluation time point [34]. Counting these participants as failures might lead to underestimating the true effect of the intervention as these ART restarts might have been viral control successes if participants remained off ART. However, excluding these ART restarts will decrease the number included in the analysis, causing loss of precision and potential for bias, which may have a significant impact on early exploratory trials with relatively small sample sizes. One way to handle this missing data situation is to use methods for time-to-event data, estimating the cumulative probability of not achieving viral control at evaluation week, and assuming the other reasons for restarting ART are non-informative for the viral control at evaluation time point later. Participants who restart ART for reasons other than virologic criteria can be censored when restarting ART.

Viral Set Point

Viral set point is an individual’s HIV viral load stabilized after a period of initial viremia. Participants reach the set point when their immune systems develop HIV-specific cytotoxic T cells and begin to fight the virus. Viral set point has been identified as an important predictor of HIV disease progression before ART initiation. Higher set point viral loads resulted in faster progression to AIDS; lower set point viral loads meant individuals remained in clinical latency longer. Viral set point was a primary outcome measure in early therapeutic vaccine trials with ATI, with the main objective of inducing host immune responses to lower viral set point [22, 35,36,37]. Viral set point as a primary outcome measure may expose participants to a longer period of uncontrolled viremia, typically 12–16 weeks or longer. Studies have shown that individuals might reach viral steady state as early as 4–6 weeks after acute infection [38] and participants achieved new viral set points 8–12 weeks after ATI [21, 22]. Generally, however, trial participants will still need to tolerate viral peaks as high as 100,000 copies/mL for several weeks before the endpoint is observed if viral set point is the primary endpoint.

Viral set point is a continuous outcome, with comparisons between treatment groups using, for example, rank-based two-sample tests. To avoid missing evaluations, the primary outcome measure is usually the average of two measurements or a single value when the other was missing. Participants with missing measurements are assigned the worst rank.

In general, clinical trial outcome measures and analysis methods are kept simple as discussed above but more complicated statistical modeling of viral rebound kinetics can also be applied [39, 40]. However, these methods were developed using older ATI studies; substantive modeling complications might arise if applied to ATI trials with relatively short durations off ART such as studies with ART restart directly after viral rebound. Furthermore, these methods have potential for informative censoring, where ART restart censors the observations of subsequent off-ART viral loads.

ART resumption criteria

As with ATI trial design, ART resumption criteria have evolved as knowledge of viral rebound kinetics accumulated. Early ATI trials that evaluated the therapeutic vaccine effect commonly used viral set point as a primary outcome measure and many designs specified when to resume ART. All participants would restart ART at the same predetermined study visit to allow the observation of viral set point. Participants could restart ART sooner if their CD4 counts declined substantially, if they had very high viral loads, or if they experienced clinical events. Following the SMART study results, ART resumption criteria evolved to minimize time off ART while viral load rebounded. Participants would restart ART when detectable viral load was confirmed.

A 2013 study found that elite or post-treatment controllers achieved viral control after a period of high viral load [41]. Several successful trials in non-human primates also suggest a period of rebound viremia after ATI might be necessary to achieve viral control [18, 42]. Expanding immune-mediated mechanisms, such as antibody-dependent cellular cytotoxicity or enhancing cytotoxic T-cell responses, can lead to sustained viral control, but requires viral replication and viral antigen expression. Thus, resuming ART promptly after confirming viral rebound may miss post-rebound viral controllers, especially if interventions depend on host-virus interaction after stopping ART [43].

The Ragon Institute ATI forum reached a consensus that when considering ART resumption criteria, viremia duration might be more important than the level of viremia [14••]. AIDS Clinical Trials Group (ACTG) Reservoirs Remission and Cure Transformative Science Group and its bNAbs Working Group developed criteria for ACTG studies with ATI components. The key virologic criterion is plasma HIV RNA ≥1000 copies/mL for ≥4 consecutive weeks and has not dropped 0.2 log10 from the previous week. Other criteria include confirmed CD4 decline (< 350 cells/mm3 or CD4% < 15%); acute retroviral syndrome; clinical disease progression; and not necessarily HIV disease-related reasons such as participant choice, pregnancy, a sexually transmitted infection, SARS-CoV-2, and unprotected sex.

These ART resumption criteria serve as a starting point for clinicians to consider when designing trials. The mechanism of action of the interventions under study and other design factors may require and justify modifying these criteria. For example, 12–16 weeks of uncontrolled viremia could be supported when viral set point is the primary endpoint [14••]. As studies using these criteria proceed, they will shed new light on the validity and utility of ART resumption criteria, which will likely evolve further.

Single Arm versus Placebo Controlled

Randomized placebo-controlled design is the gold standard for evaluating treatment efficacy in clinical research. However, single-arm trials are useful in early phase proof of concept studies, as they can assess the efficacy, safety, and tolerability of novel interventions before larger-scale randomized clinical trials [44]. The availability of investigational products and enrollment feasibility often determine sample sizes for early phase clinical studies evaluating HIV remission strategies. A growing number of studies aim to enroll individuals treated during acute infection and require them to maintain viral suppression on ART for extended periods before entering the study. This rarer population has less diverse viral reservoirs than those who started ART during chronic infection, and may have a lower chance of viral breakthrough. Thus, single-arm design might be more feasible, with all participants receiving the investigational products. The focus of single-arm studies is to provide an accurate point estimate of the efficacy of the study treatment. Generally, the efficacy is compared with historical data. Investigators may set an efficacy threshold that serves as a go/no-go criterion for moving to the next stage of investigation. With historical data providing information comparable to a control arm, specifically for individuals who started ART during chronic infection, trials can devote more resources to the novel investigational treatment and minimize risk to participants. Early ATI trials provided ample historical data and characterized the viral rebound kinetics in chronically treated individuals [13••, 45••], but much less data on acutely treated populations on modern, potent ART regimens. Recent studies suggest that ATI kinetics are similar after stopping older versus modern ART, and in the vast majority of cases, viral replication is rapidly re-suppressed upon resumption of ART [46, 47].

The key assumption for single-arm design is that historical controls are sufficiently similar to a concurrent control arm. Potential drawbacks of single-arm studies include limited generalizability to populations not included in the study or limited comparability to other studies. Viral control rates can be due to factors other than the investigational agents. For example, a high viral control rate in a single-arm study may be due to enrollment of a population with less resistant or diverse virus and not the anti-reservoir efficacy of the investigational agent. Sneller et al. observed a spontaneous suppression rate higher than previously reported in the placebo participants following ATI [30]. Other potential confounding and bias factors include different study implementation or assay testing procedures in the historical control study, and different participant characteristics, such as CD4 count at ART start, which may impair the interpretability of study results. Different ART resumption criteria, study designs, and visit schedules could also introduce confounding.

In early phase proof of concept studies, the value of placebo-controlled trials is less for well-powered randomized comparison than for descriptive understanding of the treatment effect. Definitive answers to address efficacy objectives will still need larger scale, randomized phase III clinical trials. Placebo-controlled trials also have the advantage of providing investigators with an unbiased assessment of safety by clinicians being blinded to treatment assignment when evaluating the association between adverse events and study treatment. Treatment effects between blinded study groups are compared directly and results are less prone to confounding factors, including the “white coat” effect of enrolling in the clinical trial setting.

The perspective of people living with well-controlled HIV also merits consideration. For some whose main motivation for participating in HIV remission trials is receiving an experimental agent that may reduce reservoir size or induce immune control, they might be reluctant to participate if they could be placed in a placebo group and also undergo ATI. Investigators should therefore engage community representatives to assess openness to placebo-controlled trials. An uneven allocation ratio, e.g., 2:1 to active:placebo, may be considered in these situations.

Despite their limitations, single-arm studies have a unique role in clinical trials when a randomized placebo-controlled clinical trial is not feasible or desirable. They can provide critical pilot efficacy and safety data on novel treatments aiming to reduce viral reservoirs or induce post-rebound viral control. However, if interpreting study findings properly requires a placebo group—as when there are no historical controls—then a study should use placebo-controlled design, or potentially an open-label or partially blinded randomized control arm. Ultimately, the science and study constraints should be the main factors that drive design choices.

Ethical Considerations

Ethical issues surrounding the use of ATIs in HIV clinical trials assessing strategies for treatment-free HIV remission have been extensively discussed [14••, 48, 49]. Recent studies have suggested that ATI is safe, and participants were able to re-suppress virus after ART re-initiation. Evidence also indicates that short-term ATI does not increase viral diversity or reservoir size [50,51,52]. However, ATI is not without risks, which can include potential transmission to sex partners and acute retroviral syndrome [14••]. Prolonged viral exposure during ATI may also alter immune status in seronegative participants who initiated ART during acute infection. In a small trial evaluating viremic control after ATI in individuals treated during very early stage of HIV infection, four out of seven Fiebig I participants seroconverted after ATI [53].

Trial investigators must make sure participants understand the potential physical, social, financial, and psychological risks of ATI studies by carefully addressing risks in informed consent documents. Most ATI studies are early stage trials, and current interventions under investigation are not expected to lead to a conclusion of potential cure or remission. Prospective participants should be given realistic expectations of trial outcomes and experiences [54, 55].

It is thus crucial that investigators determine if they can justify including ATI in an early stage trial, and they should explore alternatives or novel study designs. One option is to first show on-ART activity of the investigational agents or combinations, such as boosting anti-HIV immunity. A study design may include predefined go/no-go criteria for incorporating ATI based on evidence for treatment effects for on-ART virologic or immunologic outcomes. A staged design is another option where enrollment of additional participants is decided based on the efficacy signal of early stage in the same trial to minimize the number of participants who undergo ATI unnecessarily [53].

Conclusion

ATI is essential in evaluating novel strategies aiming to achieve HIV treatment-free remission or long-term viral suppression. Design of the ATI component in HIV clinical trials is driven by the scientific question and the mechanism of action of the intervention being investigated, e.g., choosing outcome measures and ART resumption criteria. Single-arm design may be a viable option for proof of concept early phase studies when appropriate historical control data are available. However, investigators need to understand how a single-arm design may affect interpretation of the trial results.

At the time of authorship, the global SARS-CoV-2 pandemic has had major impacts on daily life, medical care, and clinical research [56]. Conducting HIV clinical trials with ATI during the pandemic faces new challenges and requires re-evaluating the risks and benefits together with pragmatic mitigation strategies [57, 58].

This paper focused on designing ATI studies in adult populations. The review by Lau et al. [13••] found only one ATI study out of 59 that investigated interventions beyond ART in a pediatric population. Investigators should consider whether similar design choices, with potentially more restraints, apply to trials in children and adolescents. Design challenges include limited prior pharmacologic and safety data on investigational agents, reduced sample volume for measuring HIV reservoirs, potentially more restrictive ATI eligibility criteria and low viral load thresholds for ART restart criteria [59, 60].

The search for treatments for HIV remission without ART continues and more trials involving ATI are proceeding. It may not be feasible to frame standardized approaches for conducting ATI trials, given the wide variety of interventions being studied. ATI trial design will thus continue to evolve to reflect the ever-changing clinical and scientific landscape for HIV remission and cure.