Scolaris Content Display Scolaris Content Display

Laparoscopic surgical box model training for surgical trainees with no prior laparoscopic experience

Collapse all Expand all

Abstract

Background

Surgical training has traditionally been one of apprenticeship, where the surgical trainee learns to perform surgery under the supervision of a trained surgeon. This is time consuming, costly, and of variable effectiveness. Training using a box model physical simulator ‐ either a video box or a mirrored box ‐ is an option to supplement standard training. However, the impact of this modality on trainees with no prior laparoscopic experience is unknown.

Objectives

To compare the benefits and harms of box model training versus no training, another box model, animal model, or cadaveric model training for surgical trainees with no prior laparoscopic experience.

Search methods

We searched the Cochrane Central Register of Controlled Trials (CENTRAL), MEDLINE, EMBASE, and Science Citation Index Expanded to May 2013.

Selection criteria

We included all randomised clinical trials comparing box model trainers versus no training in surgical trainees with no prior laparoscopic experience. We also included trials comparing different methods of box model training.

Data collection and analysis

Two authors independently identified trials and collected data. We analysed the data with both the fixed‐effect and the random‐effects models using Review Manager for analysis. For each outcome, we calculated the standardised mean difference (SMD) with 95% confidence intervals (CI) based on intention‐to‐treat analysis whenever possible.

Main results

Twenty‐five trials contributed data to the quantitative synthesis in this review. All but one trial were at high risk of bias. Overall, 16 trials (464 participants) provided data for meta‐analysis of box training (248 participants) versus no supplementary training (216 participants). All the 16 trials in this comparison used video trainers. Overall, 14 trials (382 participants) provided data for quantitative comparison of different methods of box training. There were no trials comparing box model training versus animal model or cadaveric model training.

Box model training versus no training: The meta‐analysis showed that the time taken for task completion was significantly shorter in the box trainer group than the control group (8 trials; 249 participants; SMD ‐0.48 seconds; 95% CI ‐0.74 to ‐0.22). Compared with the control group, the box trainer group also had lower error score (3 trials; 69 participants; SMD ‐0.69; 95% CI ‐1.21 to ‐0.17), better accuracy score (3 trials; 73 participants; SMD 0.67; 95% CI 0.18 to 1.17), and better composite performance scores (SMD 0.65; 95% CI 0.42 to 0.88). Three trials reported movement distance but could not be meta‐analysed as they were not in a format for meta‐analysis. There was significantly lower movement distance in the box model training compared with no training in one trial, and there were no significant differences in the movement distance between the two groups in the other two trials. None of the remaining secondary outcomes such as mortality and morbidity were reported in the trials when animal models were used for assessment of training, error in movements, and trainee satisfaction.

Different methods of box training: One trial (36 participants) found significantly shorter time taken to complete the task when box training was performed using a simple cardboard box trainer compared with the standard pelvic trainer (SMD ‐3.79 seconds; 95% CI ‐4.92 to ‐2.65). There was no significant difference in the time taken to complete the task in the remaining three comparisons (reverse alignment versus forward alignment box training; box trainer suturing versus box trainer drills; and single incision versus multiport box model training). There were no significant differences in the error score between the two groups in any of the comparisons (box trainer suturing versus box trainer drills; single incision versus multiport box model training; Z‐maze box training versus U‐maze box training). The only trial that reported accuracy score found significantly higher accuracy score with Z‐maze box training than U‐maze box training (1 trial; 16 participants; SMD 1.55; 95% CI 0.39 to 2.71). One trial (36 participants) found significantly higher composite score with simple cardboard box trainer compared with conventional pelvic trainer (SMD 0.87; 95% CI 0.19 to 1.56). Another trial (22 participants) found significantly higher composite score with reverse alignment compared with forward alignment box training (SMD 1.82; 95% CI 0.79 to 2.84). There were no significant differences in the composite score between the intervention and control groups in any of the remaining comparisons. None of the secondary outcomes were adequately reported in the trials.

Authors' conclusions

The results of this review are threatened by both risks of systematic errors (bias) and risks of random errors (play of chance). Laparoscopic box model training appears to improve technical skills compared with no training in trainees with no previous laparoscopic experience. The impacts of this decreased time on patients and healthcare funders in terms of improved outcomes or decreased costs are unknown. There appears to be no significant differences in the improvement of technical skills between different methods of box model training. Further well‐designed trials of low risk of bias and random errors are necessary. Such trials should assess the impacts of box model training on surgical skills in both the short and long term, as well as clinical outcomes when the trainee becomes competent to operate on patients.

Plain language summary

Laparoscopic surgical box model training for surgical trainees with no prior laparoscopic experience

Background

Surgical training has traditionally been one of apprenticeship, where the surgical trainee learns to perform the surgery under the supervision of a trained surgeon. This is costly, time consuming, and is of variable effectiveness. Laparoscopic surgery involves the use of instruments using a key‐hole incision and is generally considered more difficult than open surgery. Training using box models (physical simulation) is an option to supplement standard laparoscopic surgical training. The impact of box model training in surgical trainees with no prior laparoscopic experience is unknown. We sought to determine whether the box model training is useful in such trainees in terms of improving technical outcomes by performing a thorough search of the medical literature for randomised clinical trials. Randomised clinical trials are commonly called randomised controlled trials and are the best study design to answer such questions. If conducted well, they provide the most accurate answer. Two review authors searched the medical literature available to May 2013 and obtained the information from the identified trials. The use of two review authors to identify studies and obtain information decreases the errors in obtaining the information. We identified and included 25 trials in the review.

Study characteristics

The trials compared box model training versus no training (16 trials; 464 participants) or versus different types of box model training (14 trials; 382 participants) (some trials and participants were included in both comparisons as the trials compared different methods of box training versus no training).

Key results

The primary outcomes investigated in this review were time taken to perform task, error score, accuracy score, and a composite (total summed) performance score. Box model training appears to decrease the time required to perform a laparoscopic task, improve the accuracy, decrease the errors, and improves the overall performance. This suggests that the box model training improves technical skills of surgical trainees with no previous experience in laparoscopic surgery. There does not appear to be any significant differences in different methods of box model training. The impact of the improved surgical skills on patients or healthcare funders in terms of improved health or decreased costs is unknown.

Quality of evidence

All but one of the trials were of high risk of bias (defects in study design that can lead to arriving at incorrect conclusions with overestimation of benefits and underestimation of harms). Furthermore, our results are prone to risks of random errors. Overall, the quality of evidence was very low.

Future research

Further well‐designed trials with less risk of bias because of poor study design or because of chance are necessary.

Authors' conclusions

Implications for practice

Laparoscopic box model training appears to improve technical skills compared to no training in trainees with no previous laparoscopic experience. The impact of this decreased time on patients and healthcare funders in terms of improved outcomes or decreased costs is unknown. There appears to be no significant differences in the improvement of technical skills between different methods of box model training.

Implications for research

Further well‐designed trials of low risk of bias and random errors are necessary. Such trials should assess the impact of box model training on clinical outcomes.

The conduct and reporting of trials using the SPIRIT statement (www.spirit‐statement.org/) and the CONSORT statement (www.consort‐statement.org/) are likely to result in better information from the trials.

Summary of findings

Open in table viewer
Summary of findings for the main comparison. Box model training compared with no training or standard surgical training for trainees with no prior laparoscopic experience

Box model training compared with no training or standard surgical training for trainees with no prior laparoscopic experience

Patient or population: trainees with no prior laparoscopic experience.

Settings: secondary care.

Intervention: box model training.

Comparison: no training or standard surgical training.

Outcomes

Video‐box model training

No of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Time taken for task completion (seconds)

The mean time taken for task completion in the intervention group was 0.54 standard deviations lower (0.27 to 0.81 lower).

229

(7)

⊕⊝⊝⊝
very low1,2

Error score

The mean error score in the intervention group was 0.69 standard deviations lower (0.17 to 1.21 lower).

69

(3)

⊕⊝⊝⊝
very low1,2

Results became non‐significant in sensitivity analysis (after exclusion of trials with an imputed standard deviation).

Accuracy score

The mean accuracy score in the intervention group was 0.67 standard deviations higher (0.18 to 1.17 lower).

73

(3)

⊕⊝⊝⊝
very low1,2

Composite score

The mean composite performance score in the intervention group was 0.49 standard deviations higher (0.25 to 0.73 higher).

321

(9)

⊕⊝⊝⊝
very low1,2,3

Significant heterogeneity in magnitude of effect was present.

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1The trial(s) was (were) of high risk of bias.
2There were too few trials to assess publication bias.
3There was significant heterogeneity.

Background

Description of the condition

Surgical training has traditionally been one of apprenticeship, where the surgical trainee learns to perform surgery under the supervision of a trained surgeon. Different procedures have different learning curves (Herrell 2005; Tekkis 2005a; Tekkis 2005b). Surgeons experienced in one procedure may not be experienced in another, and results improve with experience in an individual procedure (Herrell 2005; Tekkis 2005a; Tekkis 2005b).

An increasing number of surgical procedures are being done laparoscopically (abdominal key‐hole surgery). This includes laparoscopic cholecystectomy (removal of gallbladder), laparoscopic anti‐reflux procedures (surgery for heartburn), laparoscopic hysterectomy (removal of uterus), and laparoscopic nephrectomy (removal of kidney) (Ghezzi 2006; Keus 2006; Salminen 2007; Venkatesh 2007). The different methods of laparoscopic surgical training include live animal model training, human and animal cadaver training, training using a box trainer (also called video trainer), and virtual reality training (training using computer simulation) (Munz 2004).

With the decreasing time to train surgeons because of the European Working Time Directive (Chikwe 2004), and modernising medical careers (MMC) initiative by the UK Department of Health (Payne 2005), training structured to improve the surgical skills in the least time with maximum efficiency is necessary. This is applicable to surgical trainees with no prior experience in laparoscopic surgery and in those who have started their laparoscopic career but have not achieved proficiency. Because of the shortened working hours, the trainees may be exposed to fewer surgical procedures and hence may lack experience.

The price of the simulators can vary. Traditional training is not without costs. The operating time increases significantly for junior surgeons compared to senior surgeons (Farnworth 2001; Babineau 2004; Wilkiemeyer 2005; Kauvar 2006; Harrington 2007). Bridges and Diamond reported the mean costs of this increased operating time to be about USD12,000 per year per resident during the period 1993 to 1997 (Bridges 1999). The complication rate is also higher for junior surgeons compared to senior surgeons (Wilkiemeyer 2005; Kauvar 2006). Bridges and Diamond did not include the cost of the complications in their cost analysis. Thus, the cost of the simulators has to be balanced against the effectiveness in training, cost of increased operating time and complication rates during traditional surgical training, and the costs of traditional training.

Description of the intervention

Training using a box model involves performance of tasks that are encountered in laparoscopic surgery using animal tissues, plastic models, foam, cloth, or other materials. The images can be obtained using a laparoscope (camera) and viewed on monitors. This is called a video‐box trainer. Another type of box trainer is the mirrored‐box trainer, in which mirrors are used to show the working field and direct vision of the working field is prevented (Keyser 2000).

How the intervention might work

Laparoscopic surgery is different from open surgery because of the increased need for hand‐eye co‐ordination to perform tasks when looking at a screen and to compensate for not being able to operate under direct vision; increased need for manual dexterity to compensate for the use of long instruments (the fine motor skills required for performing laparoscopic surgery are greater than in open surgery since small movements are more amplified in laparoscopic surgery than open surgery because of the longer instruments used in laparoscopic surgery), which can amplify any error in movement; the fulcrum effect of the body wall, that is, when the surgeon moves his hand to the patient's right the operating end of the instrument moves to the patient's left on the monitor (Gallagher 1999); the lack of sensation of touch using hands; and the lack of three‐dimensional images. Training by box‐trainer may work by repeated practice and improve hand‐eye co‐ordination and manual dexterity.

Why it is important to do this review

In one previous systematic review, we have shown that virtual reality training can supplement standard laparoscopic training (Gurusamy 2008; Gurusamy 2009a). Sutherland et al concluded in one systematic review that there was no evidence that a box trainer was effective in laparoscopic training (Sutherland 2006). There have been no other systematic reviews and there are no other Cochrane reviews on this topic. This review provides evidence as to whether laparoscopic surgical box model training is beneficial for surgical trainees with no prior laparoscopic experience.

Objectives

To compare the benefits and harms of box model training versus no training, another box model, animal model, or cadaveric model training for surgical trainees with no prior laparoscopic experience.

Methods

Criteria for considering studies for this review

Types of studies

We included randomised clinical trials irrespective of blinding, language, publication status, or sample size. We excluded quasi‐randomised studies (eg, allocation by date of birth, day of the week, etc.) and observational studies for reported benefit, but we planned to include them for the report on intervention‐related harms.

Types of participants

We included surgical trainees with no prior laparoscopic experience in the review. We have considered the effectiveness of box model training for surgical trainees with limited prior laparoscopic experience in another review (Gurusamy 2014).

Types of interventions

We planned to include the following comparisons.

  • Box model training alone or supplementing standard surgical training versus standard surgical training.

  • Box model training versus animal model training.

  • Box model training versus cadaveric model training.

  • Video‐box trainer versus mirrored‐box trainer.

  • One type of video‐box trainer versus another type of video‐box trainer.

  • One type of mirrored‐box trainer versus another type of mirrored‐box trainer.

As mentioned in the Background section, the box model trainer has been compared with a virtual model trainer in another Cochrane review (Gurusamy 2008; Gurusamy 2009a).

We allowed co‐interventions if used equally in all intervention groups and the control group of the trial.

Types of outcome measures

Primary outcomes

  1. Time taken to complete the task.

  2. Error score (however defined by authors).

  3. Accuracy (however defined by authors).

  4. Composite score of the above.

Secondary outcomes

  1. Mortality and morbidity (when animal model was used to assess the trainees).

  2. Movements:

    1. distance;

    2. error.

  3. Trainee satisfaction (however defined by authors).

We have presented all the above outcomes that were reported in the summary of findings Table for the main comparison created using GRADEpro 3.6 (ims.cochrane.org/revman/other‐resources/gradepro).

Search methods for identification of studies

Electronic searches

We searched the Cochrane Central Register of Controlled Trials (CENTRAL) (Issue 4, 2013), MEDLINE, EMBASE, and Science Citation Index Expanded (Royle 2003) to May 2013. We have given the search strategies with the time spans of the searches in Appendix 1.

Searching other resources

We searched the references of the identified trials to identify further relevant trials. We also searched the metaRegister of Controlled Trials (mRCT) (www.controlled‐trials.com/mrct/) and the World Health Organization (WHO) Clinical Trials Platform. The meta‐register includes the ISRCTN Register and the NIH ClinicalTrials.gov Register, among others.

Data collection and analysis

We performed the systematic review following the instructions given in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011) and the Cochrane Hepato‐Biliary Group Module (Gluud 2013).

Selection of studies

Two review authors (MN and KG) independently identified the trials for inclusion. We have listed the excluded studies with the reasons for the exclusion (Characteristics of excluded studies). We resolved any differences through discussion.

Data extraction and management

Two review authors (MN and CT) independently extracted the following data.

  1. Year and language of publication.

  2. Country.

  3. Year of conduct of the trial.

  4. Inclusion and exclusion criteria.

  5. Sample size.

  6. Details of the previous experience of surgical trainees.

  7. Details of the box trainer used.

  8. Details of the training regimen used.

  9. Outcomes (described in Primary outcomes and Secondary outcomes sections).

  10. Risk of bias (described Risk of bias in included studies section).

We sought any unclear or missing information by contacting the authors of the individual trials. If there was any doubt whether the trials share the same patients, completely or partially (by identifying common authors and centres), we contacted the authors of the trials to clarify whether the trial report had been duplicated. We resolved any differences in opinion through discussion.

Assessment of risk of bias in included studies

We followed the instructions given in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011) and the Cochrane Hepato‐Biliary Group Module (Gluud 2013). According to empirical evidence (Schulz 1995; Moher 1998; Kjaergard 2001; Wood 2008; Lundh 2012; Savovic 2012; Savovic 2012a), the risk of bias of the trials were assessed based on the following bias risk domains.

Allocation sequence generation

  • Low risk of bias: sequence generation was achieved using computer random number generation or a random number table. Drawing lots, tossing a coin, shuffling cards, and throwing dice were adequate if performed by an independent person not otherwise involved in the trial.

  • Uncertain risk of bias: the method of sequence generation was not specified.

  • High risk of bias: the sequence generation method was not random.

Allocation concealment

  • Low risk of bias: the participant allocations could not have been foreseen in advance of, or during, enrolment. Allocation was controlled by a central and independent randomisation unit. The allocation sequence was unknown to the investigators (eg, if the allocation sequence was hidden in sequentially numbered, opaque, and sealed envelopes).

  • Uncertain risk of bias: the method used to conceal the allocation was not described so that intervention allocations may have been foreseen in advance of, or during, enrolment.

  • High risk of bias: the allocation sequence was likely to be known to the investigators who assigned the participants.

Blinding of participants and personnel*

  • Low risk of bias: blinding was performed adequately, or the assessment of outcomes was not likely to be influenced by lack of blinding.

  • Uncertain risk of bias: there was insufficient information to assess whether blinding was likely to introduce bias on the results.

  • High risk of bias: no blinding or incomplete blinding, and the assessment of outcomes were likely to be influenced by lack of blinding. 

*It is impossible to blind the surgical trainees and any assisting personnel. Provided that the outcome assessors were blinded, we considered that there was low risk of bias due to lack of blinding of participants and any assisting personnel for all outcomes except for surgical trainee satisfaction.

Blinding of outcome assessors

  • Low risk of bias: blinding was performed adequately, or the assessment of outcomes was not likely to be influenced by lack of blinding.

  • Uncertain risk of bias: there was insufficient information to assess whether blinding was likely to induce bias on the results.

  • High risk of bias: no blinding or incomplete blinding, and the assessment of outcomes were likely to be influenced by lack of blinding. 

Incomplete outcome data

  • Low risk of bias: missing data were unlikely to make treatment effects depart from plausible values. Sufficient methods, such as multiple imputation, had been employed to handle missing data.

  • Uncertain risk of bias: there was insufficient information to assess whether missing data in combination with the method used to handle missing data were likely to induce bias on the results.

  • High risk of bias: the results were likely to be biased due to missing data.

Selective outcome reporting

  • Low risk of bias: all outcomes were pre‐defined and reported, or all clinically relevant and reasonably expected outcomes were reported.

  • Uncertain risk of bias: it is unclear whether all pre‐defined and clinically relevant and reasonably expected outcomes were reported.

  • High risk of bias: one or more clinically relevant and reasonably expected outcomes were not reported, and data on these outcomes were likely to have been recorded.

For this purpose, the trial should have been registered either on the www.clinicaltrials.gov website or a similar register, or there should be a protocol (eg, published in a paper journal). In the case when the trial was run and published in the years when trial registration was not required, we carefully scrutinised all publications reporting on the trial to identify the trial objectives and outcomes and determine whether usable data are provided in the publications results section on all outcomes specified in the trial objectives.

For‐profit bias

  • Low risk of bias: the trial appeared to be free of industry sponsorship or other type of for‐profit support that may manipulate the trial design, conductance, or results of the trial.

  • Uncertain risk of bias: the trial may or may not be free of for‐profit bias as no information on clinical trial support or sponsorship was provided.

  • High risk of bias: the trial was sponsored by the industry or had received another type of for‐profit support.

We considered trials judged with low risk of bias in all the above domains as trials with low risk of bias and the remaining as trials with high risk of bias.

Measures of treatment effect

For dichotomous variables, we planned to calculate the risk ratio (RR) with 95% confidence interval (CI). RR calculations do not include trials in which no events occurred in either group, whereas risk difference calculations do. We planned to report the risk difference if the results using this association measure were different from RR. For continuous variables, we planned to calculate the mean difference (MD) with 95% CI for outcomes such as hospital stay, and calculated the standardised mean difference (SMD) with 95% CI for quality of life and other outcomes (where different scales might be used).

Unit of analysis issues

The unit of analysis was the surgical trainee who underwent training according to the randomised group.

Dealing with missing data

We performed an intention‐to‐treat analysis (Newell 1992) whenever possible. We planned to impute data for binary outcomes using various scenarios such as good outcome analysis, bad outcome analysis, best‐case scenario, and worst‐case scenario (Gurusamy 2009b; Gluud 2013).

For continuous outcomes, we used available‐case analysis. We imputed the standard deviation from P values according to the instructions given in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011), and we used the median for the meta‐analysis when the mean was not available. If it was not possible to calculate the standard deviation from the P value or the CI, we imputed the standard deviation as the highest standard deviation in the other trials included under that outcome, fully recognising that this form of imputation would decrease the weight of the study for calculation of MDs and bias the effect estimate to no effect in the case of SMDs (Higgins 2011).

Assessment of heterogeneity

We explored heterogeneity using the Chi2 test with significance set at a P value of 0.10, and measured the quantity of heterogeneity using the I2 statistic (Higgins 2002). We also used overlapping of CIs on the forest plot to determine heterogeneity.

Assessment of reporting biases

We planned to use visual asymmetry on a funnel plot to explore reporting bias (Egger 1997; Macaskill 2001). We planned to perform the linear regression approach described by Egger 1997 to determine the funnel plot asymmetry. Selective reporting was also considered as evidence for reporting bias.

Data synthesis

We performed the meta‐analyses using the software package Review Manager 5 (RevMan 2012), following the recommendations of The Cochrane Collaboration (Higgins 2011), and the Cochrane Hepato‐Biliary Group Module (Gluud 2013). We used both random‐effects model (DerSimonian 1986) and fixed‐effect model (DeMets 1987) meta‐analyses. In the case of discrepancy between the two models, we have reported both results; otherwise, we have reported the results of the fixed‐effect model. We planned to use the generic inverse method to combine the hazard ratios for time‐to‐event outcomes.

Trial sequential analysis

We planned to use the trial sequential analysis to control for random errors due to sparse data and repetitive testing of the accumulating data for the primary outcomes (CTU 2011; Thorlund 2011). We planned to add the trials according to the year of publication, and if more than one trial was published in a year, add the trials in alphabetical order according to the last name of the first author. We planned to construct the trial sequential monitoring boundaries on the basis of the required information size (Brok 2008; Wetterslev 2008; Brok 2009; Thorlund 2009; Wetterslev 2009; Thorlund 2010).

We planned to apply trial sequential analysis (CTU 2011; Thorlund 2011) using a required sample size calculated from an alpha error of 0.05, a beta error of 0.20, a control group proportion obtained from the results, and a relative risk reduction of 20% for binary outcomes with two or more trials to determine whether more trials are necessary on this topic (if the trial sequential monitoring boundary and the required information size is reached or the futility zone is crossed, then more trials are unnecessary) (Brok 2008; Wetterslev 2008; Brok 2009; Thorlund 2009; Wetterslev 2009; Thorlund 2010). For hospital stay, we planned to calculate the required sample size from an alpha error of 0.05, a beta error of 0.20, the variance estimated from the meta‐analysis results of low risk of bias trials, and a minimal clinically relevant difference of one day. Trial sequential analysis cannot be performed for SMD. Therefore, we did not plan to perform trial sequential analysis for the outcomes where SMDs were calculated.

Since SMD was the effect estimate used for the reported outcomes in this review, we could not conduct trial sequential analyses.

Subgroup analysis and investigation of heterogeneity

We planned to perform the following subgroup analyses.

  • Trials with low risk of bias compared to trials with high risk of bias.

  • Different types of box trainers (video trainer or mirror trainer).

  • Different levels of prior open surgical experience.

We planned to use the 'test for subgroup differences' available in Review Manager 5 (RevMan 2012) to identify the differences between subgroups.

Sensitivity analysis

We planned to perform a sensitivity analysis by imputing data for binary outcomes using various scenarios such as good outcome analysis, bad outcome analysis, best‐case scenario, and worst‐case scenario (Gurusamy 2009b; Gluud 2013). We performed a sensitivity analysis by excluding the trials in which the mean and the standard deviation were imputed.

Results

Description of studies

Results of the search

We identified 1240 references through electronic searches of CENTRAL (n = 169), MEDLINE (n = 376), EMBASE (n = 358), Science Citation Index Expanded (n = 317), and randomised controlled trials registers (n = 20). We excluded 470 duplicates and 702 clearly irrelevant references through screening titles and reading abstracts. We retrieve 68 references for further assessment. We identified no references through scanning reference lists of the identified randomised trials. We excluded 34 references for the reasons listed in the Characteristics of excluded studies table. In total, 34 references of 32 completed randomised clinical trials met the inclusion criteria. This is summarised in the study flow diagram Figure 1.


Study flow diagram.

Study flow diagram.

Included studies

Of the 32 trials, 23 trials were two‐arm trials (Torkington 2001; Munz 2004; Korndorffer 2005; Youngblood 2005; Chandrasekera 2006; Robinson 2006; Stefanidis 2006; Madan 2007; Stefanidis 2008; Tanoue 2008; Dunnican 2010; Prabhu 2010; Bennett 2011; Cox 2011; Gill 2011; Santos 2011; Fransen 2012; Holznecht 2012; Kannappan 2012; Stefanidis 2012; Supe 2012; Alaraimi 2013; Horeman 2013). Of these 23 two‐arm trials, 11 trials compared box training versus no training (Torkington 2001; Munz 2004; Korndorffer 2005; Youngblood 2005; Madan 2007; Stefanidis 2008; Tanoue 2008; Prabhu 2010; Bennett 2011; Stefanidis 2012; Supe 2012). The remaining 12 trials compared different types of box training (Chandrasekera 2006; Robinson 2006; Stefanidis 2006; Dunnican 2010; Cox 2011; Gill 2011; Santos 2011; Fransen 2012; Holznecht 2012; Kannappan 2012; Alaraimi 2013; Horeman 2013). Eight trials were three‐arm trials (Jordan 2001; Bruynzeel 2007; Stefanidis 2007; O'Connor 2008; Muresan 2010; Kolozsvari 2011; Mulla 2012; Van Bruwaene 2013). Seven of these trials compared two methods of box training versus no training (Jordan 2001; Bruynzeel 2007; Stefanidis 2007; O'Connor 2008; Muresan 2010; Kolozsvari 2011; Mulla 2012). One trial compared three different methods of box training (Van Bruwaene 2013). One trial was a four‐arm trial (Rivas 2010). In this trial, two methods of box training were compared with two corresponding methods of no training (Rivas 2010).

Of the 32 trials, seven trials did not contribute to quantitative synthesis (Bruynzeel 2007; O'Connor 2008; Cox 2011; Kannappan 2012; Mulla 2012; Alaraimi 2013; Horeman 2013). Of these seven trials, two trials comparing different methods of box training (Cox 2011; Horeman 2013), and one trial comparing two different methods of box training and no training (O'Connor 2008), did not report any of the outcomes of interest. One trial that compared two methods of box training and control reported outcomes selectively and so no information was obtained from this trial (Mulla 2012). Two trials that compared different methods of box training (Kannappan 2012; Alaraimi 2013), and one trial that compared two different methods of box training and no training (Bruynzeel 2007), did not state the number of participants in each group and so could not be included for quantitative synthesis. The results from these three trials are presented in the form of narrative synthesis under the relevant outcomes under the relevant comparisons.

Overall, 16 trials (464 participants) provided data for meta‐analysis of box training (248 participants) versus no training (216 participants) (Jordan 2001; Torkington 2001; Munz 2004; Korndorffer 2005; Youngblood 2005; Madan 2007; Stefanidis 2007; Stefanidis 2008; Tanoue 2008; Muresan 2010; Prabhu 2010; Rivas 2010; Bennett 2011; Kolozsvari 2011; Stefanidis 2012; Supe 2012). All the 16 trials in this comparison used video trainers. Overall, 14 trials (382 participants) provided data for quantitative comparison of different methods of box training (Jordan 2001; Chandrasekera 2006; Robinson 2006; Stefanidis 2006; Stefanidis 2007; Dunnican 2010; Muresan 2010; Rivas 2010; Gill 2011; Kolozsvari 2011; Santos 2011; Fransen 2012; Holznecht 2012; Van Bruwaene 2013). Where reported, the mean age of trainees in the studies ranged from 21 to 26 years and the proportion of females ranged from 32% to 69%. The details of the trials, such as inclusion and exclusion criteria, details of the intervention and control (including the training regimen), and the outcomes measured, are shown in the Characteristics of included studies table.

There were no trials comparing box model training versus either animal model or cadaveric model training. There were also no trials comparing one type of mirrored‐box trainer against another type of mirrored‐box trainer.

Excluded studies

We excluded 34 studies that did not meet the inclusion criteria. The reasons for exclusion are shown in the Characteristics of excluded studies table.

Risk of bias in included studies

We considered only one trial to be at low risk of bias in all domains (Chandrasekera 2006). The risk of bias in the included trials is summarised in the 'Risk of bias' graph (Figure 2) and 'Risk of bias' summary (Figure 3).


Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.


Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Effects of interventions

See: Summary of findings for the main comparison Box model training compared with no training or standard surgical training for trainees with no prior laparoscopic experience

The findings are summarised in summary of findings Table for the main comparison.

Box model training alone versus no training

Time taken for task completion

Eight trials (249 participants) were included for the meta‐analysis (Torkington 2001; Munz 2004; Korndorffer 2005; Youngblood 2005; Madan 2007; Tanoue 2008; Muresan 2010; Bennett 2011). There were two comparisons in the trial by Muresan 2010 (box trainer drills and box trainer suturing versus open suturing). The meta‐analysis showed that the time taken for task completion was significantly shorter in the box trainer group than the control group (SMD ‐0.48 seconds; 95% CI ‐0.74 to ‐0.22) (Analysis 1.1). There was moderate heterogeneity (I2 = 53%; Chi2 test for heterogeneity P value = 0.03). There was no change in the results by using fixed‐effect or random‐effects model meta‐analyses. The standard deviation was imputed in two trials (Munz 2004; Muresan 2010). Exclusion of these trials did not alter the results (SMD ‐0.42 seconds; 95% CI ‐0.70 to ‐0.13) (Analysis 1.5). One trial reported change in time before and after intervention (Torkington 2001). Exclusion of this trial did not alter the results (SMD ‐0.54 seconds; 95% CI ‐0.81 to ‐0.27). One trial could not be included in the meta‐analysis as the number of participants in each group was not reported (Bruynzeel 2007). In this trial, the time taken to complete the task was significantly shorter in the box training groups compared with the no training group (Bruynzeel 2007).

Error score

Three trials (69 participants) were included for the meta‐analysis (Jordan 2001; Munz 2004; Muresan 2010). There were two comparisons in the trial by Jordan 2001 (both Z‐ and U‐maze tracking trainers were compared with a control group) and by Muresan 2010 (box trainer drills and box trainer suturing versus open suturing). The meta‐analysis showed that the error score in the box trainer group was significantly lower than the control group (SMD ‐0.69; 95% CI ‐1.21 to ‐0.17) (Analysis 1.2). There was no significant heterogeneity (I2 = 0%; Chi2 test for heterogeneity P value = 0.85). There was no change in the results by using the fixed‐effect or the random‐effects model. The standard deviation was imputed in two trials (Munz 2004; Muresan 2010). Exclusion of this trial resulted in the results becoming not statistically significant (SMD ‐0.90; 95% CI ‐1.81 to 0.02) (Analysis 1.6). However, it should be noted the point effect estimate became even larger. The standard deviation in one trial could not be imputed from data provided in the study report (Madan 2007). This trial did not feature in the meta‐analysis (the standard deviation was not imputed from the next highest study as the errors were reported on different scales by different studies). There were no significant differences in the error score between the box trainer and no training groups.

Accuracy score

Three trials (73 participants) were included for the meta‐analysis (Jordan 2001; Korndorffer 2005; Youngblood 2005). There were two comparisons in the trial by Jordan 2001 (both Z‐ and U‐maze tracking tasks were compared with a control group). The meta‐analysis showed that the accuracy score in the box trainer group was significantly better than the control group (SMD 0.67; 95% CI 0.18 to 1.17) (Analysis 1.3). There was no significant heterogeneity (I2 = 35%; Chi2 test for heterogeneity P value = 0.20). There was no significant change in the results by using the fixed‐effect or the random‐effects model meta‐analyses.

Composite score

Ten trials (373 participants) could be included for the meta‐analysis (Youngblood 2005; Stefanidis 2007; Stefanidis 2008; Muresan 2010; Prabhu 2010; Rivas 2010; Bennett 2011; Kolozsvari 2011; Stefanidis 2012; Supe 2012). There were two comparisons in the trials by Kolozsvari 2011 (standard box model training and box model over‐training versus no training), Muresan 2010 (box trainer drills and box trainer suturing versus open suturing), Rivas 2010 (operating room box model training and classroom box model training versus no training), and Stefanidis 2007 (box model training without distraction and box model training with distraction versus no training). The meta‐analysis showed that the composite performance score in the box trainer group was significantly better than the control group (SMD 0.65; 95% CI 0.42 to 0.88) (Analysis 1.4). However, there was significant heterogeneity (I2 = 78%; Chi2 test for heterogeneity P value < 0.00001). This heterogeneity existed only in magnitude of effect (not in direction of effect). There was no overlapping of CIs. The overall effect of box training remained significant whether the fixed‐effect model or the random‐effects model was used. The standard deviation was imputed in four trials (Stefanidis 2007; Stefanidis 2008; Muresan 2010; Stefanidis 2012). Exclusion of these trials did not alter the significance of the results (SMD 0.56; 95% CI 0.30 to 0.83) (Analysis 1.7).

Mortality and morbidity

None of the trials reported the mortality or morbidity when animal models were used for assessment.

Movements

Three trials reported movement distance (Torkington 2001; Munz 2004; Tanoue 2008). However, the information was not presented in a format that could be used in the meta‐analysis. There was significantly less movement distance in the box model training group and the no training group in one trial (Munz 2004). There was no significant difference in the movement distance between the box model training group and the no training group in two trials (Torkington 2001; Tanoue 2008). None of the trials reported the error in movements.

Trainee satisfaction

None of the trials reported trainee satisfaction in both the intervention and control groups.

Subgroup analysis

No subgroup analysis was performed since there were no trials of low risk of bias; all the trainers used were video trainers; and the participants did not differ significantly in surgical experience from the information provided.

Reporting bias

Reporting bias was assessed only for composite score since this was the only outcome with more than 10 trials. Trials with high standard error (generally indicative of smaller sample size or heterogeneous participants) seemed to show a greater effect in favour of box model training (Figure 4). Egger's regression method of assessment of reporting bias was statistically significant (P value = 0.0216).


Funnel plot of comparison: 1 Box model training alone or supplementing standard surgical training versus standard surgical training, outcome: 1.4 Composite score.

Funnel plot of comparison: 1 Box model training alone or supplementing standard surgical training versus standard surgical training, outcome: 1.4 Composite score.

Different methods of box model training

Time taken for task completion

Six trials (180 participants) were included under four comparisons (Chandrasekera 2006; Dunnican 2010; Muresan 2010; Gill 2011; Holznecht 2012; Fransen 2012). One trial (36 participants) found significantly shorter time taken to complete the task when box training was performed using a simple cardboard box trainer compared with the standard pelvic trainer (Chandrasekera 2006) (SMD ‐3.79 seconds; 95% CI ‐4.92 to ‐2.65) (Analysis 2.1). One hundred and four participants were randomised in three trials to reverse alignment and forward alignment (Dunnican 2010; Gill 2011; Holznecht 2012). The time taken to complete the task was significantly shorter in the reverse alignment group than the forward alignment group (SMD ‐1.28 seconds; 95% CI ‐2.20 to ‐0.35) using the fixed‐effect model (Analysis 2.1). There was significant heterogeneity (I2 = 99%; Chi2 test for heterogeneity P value < 0.00001) in magnitude and direction of effect. There was no significant difference between the groups using the random‐effects model (SMD 3.79 seconds; 95% CI ‐7.13 to 14.70). There was no significant difference in the time taken to complete the tasks in the remaining two comparisons (box trainer suturing versus box trainer drills and single incision versus multiport box model training) (Analysis 2.1). Two trials could not be included in the quantitative synthesis as the numbers of participants randomised to each group were not stated (Bruynzeel 2007; Kannappan 2012). There was no significant difference in the time taken to complete the task using a mirror box trainer compared with a video box trainer (Bruynzeel 2007) or using box training incorporating positive feedback compared with box training incorporating negative feedback (Kannappan 2012).

Error score

Fifty‐six participants randomised to six different methods of box training in three trials (Jordan 2001; Muresan 2010; Fransen 2012) were included for quantitative synthesis of error score. There was no significant difference in the error score between the two groups in any of the comparisons (box trainer suturing versus box trainer drills; single incision versus multiport box model training; Z‐maze box training versus U‐maze box training) (Analysis 2.2). Two trials could not be included in the quantitative synthesis as the numbers of participants randomised to each group were not stated (Kannappan 2012; Alaraimi 2013). There was no significant difference in the time taken to complete the task using a three‐dimensional box trainer compared with a two‐dimensional box trainer (Alaraimi 2013) or using box training incorporating positive feedback compared with box training incorporating negative feedback (Kannappan 2012).

Accuracy score

One trial (16 participants) randomised to Z‐maze box training (eight participants) and U‐maze box training (eight participants) were included for quantitative synthesis of accuracy score. The accuracy score was significantly higher with Z‐maze box training than U‐maze box training (SMD 1.55; 95% CI 0.39 to 2.71) (Analysis 2.3).

Composite score

Two hundred and sixty‐one participants in 11 trials were included in 10 comparisons for quantitative synthesis of composite score. One trial (36 participants) found a significantly higher composite score with simple cardboard box trainer than conventional pelvic trainer (Chandrasekera 2006) (SMD 0.87; 95% CI 0.19 to 1.56) (Analysis 2.4). Another trial (22 participants) found a significantly higher composite score with reverse alignment compared with forward alignment box training (Holznecht 2012) (SMD 1.82; 95% CI 0.79 to 2.84) (Analysis 2.4). There were no significant differences in the composite score between the intervention and control groups in any of the remaining comparisons (Analysis 2.4).

Other outcomes

None of the secondary outcomes were adequately reported in the trials.

Discussion

Summary of main results

In this systematic review, we reviewed the effect of laparoscopic surgical box model training on four different technical outcomes for trainees with no prior laparoscopic experience. When compared against no training, box‐trainers appear to significantly reduce both the time taken to complete a laparoscopic task and the error score while increasing the accuracy and composite performance scores. This suggests that box model training may improve the technical skills of the surgical trainee. However, the results of these meta‐analyses must be interpreted with caution in the light of several issues. All the included trials were at high risk of bias. In addition, there was heterogeneity in magnitude of effect in the composite performance score. While we know that the composite scores are better after box model training than no supplementary training, we do not know by how much it is better. However, the quality of evidence was very low (summary of findings Table for the main comparison) and the impact of this improvement in technical skills in improvement of laparoscopic operative procedures in patients cannot be determined since none of the trials reported mortality and morbidity even in the animal model. However, it should be noted that obtaining mortality and morbidity data in the animal model may be difficult. For example, it is illegal to train on live animals even if they are anaesthetised in UK.

One possibility to assess the impact of box model training on patients is to follow these trainees for a longer period until they become competent to operate on humans and assess the impact of box model training on patient outcomes. However, this can be difficult since it may be several years before the impact on patients can be assessed and the trainees may have received other forms of training. This is acceptable if the trainees receive only standard surgical training in an apprenticeship model as box model training should not be considered as a replacement for standard surgical training but only considered as a supplementary training. Another problem with assessing the long‐term impact of box model training is whether the skills are retained after a long period. The issue is whether the training by these box trainers lasts only for a short time and if so whether there is any benefit to the trainee or the patient. Surgical training by apprenticeship is an ongoing learning process and it may be necessary to allow the trainees allocated to box model training to continue training on the box model during the time that that become eligible to operate on humans.

A surgical career can easily last 30 to 35 years. Several skills including technical skills, decision‐making skills, and ability to think clearly in stressful situations are necessary for a good surgeon. Training by box model, virtual reality model, a hybrid model (augmented reality model), animal model, cadaveric model, or a combination of one or more of these are all options for improving the technical skills of surgeons. Given the importance of the issue at stake, large multicentre trials in which the trainees are followed up at least one year after they complete training should be considered by the National Governments and Surgical Training Colleges. Patient‐oriented outcomes, such as patient mortality and morbidity, patient quality of life, and hospital stay and cost‐effectiveness of the different training methods, will be suitable outcomes for such trials.

Although there were some significant differences in some outcomes between different methods of box training, most of the comparisons involved only one trial. There was significant inconsistency in some of the comparisons that involved more than one trial. There was also inconsistency across the outcomes, that is, the composite score would be expected to improve if the accuracy score improved and errors decreased. However, this was not observed in the various comparisons between different methods of box model training. One has to be sceptical about the validity of the findings of a single trial that has not been replicated. Thus, there is no evidence that any specific method of box training is better than other methods of box training.

The potential advantages of box model training over virtual reality training include: cheaper cost of the model, which enables the training of multiple trainees simultaneously in short training courses, and better realism (use of real tissue and presence of haptic feedback) compared to currently evaluated virtual reality models (Madan 2005).

It may be that a mixed modality training regimen that involves box model training and virtual reality training is preferable at the current time although we did not specifically look at this comparison in this review. The hybrid simulators with camera trackers to follow the instruments (Botden 2007) combine some of the advantages of the virtual reality training and box model training and further research is needed to determine whether such hybrid simulators are better than sole use of either box trainers or virtual reality trainers.

Overall completeness and applicability of evidence

The results of this review are applicable only in surgical trainees with no previous laparoscopic experience. In practice, almost all of the trials in this review made use of medical students. The evidence only shows that the technical parameters assessed in this review (time taken, errors, accuracy, composite performance) were improved in box model training compared with standard surgical training (control). Given that participants were students, they were not permitted to operate on humans and there is thus no evidence available in this review of the effect of box model training on patient outcomes.

Quality of the evidence

All the trials except for one (Chandrasekera 2006) were at high risk of bias. Furthermore, there are risks of random errors (play of chance) in all comparisons. We could, due to the data structure, not apply trial sequential analysis to control for random errors. Overall, the quality of evidence is very low as indicated in summary of findings Table for the main comparison. Nevertheless, this is the best evidence that is currently available.

Potential biases in the review process

Although study selection and data collection was performed in a non‐blinded manner, the potential for bias and errors was largely reduced by the use of two independent data extractors. We were unable to formally assess the publication bias by funnel plot for most comparisons as there were fewer than 10 trials. However, there was significant evidence of selective outcome reporting where the authors tended to report only the positive aspects of box model training. The inclusion of data from such trials may result in changes in conclusions. We imputed the standard deviation when it was not available from the studies. A sensitivity analysis demonstrated that the impact of such imputation did not result in alterations in conclusions. The alternative to imputation of standard deviation was to exclude the information from the trial, which would have made the interpretation of data even more difficult given the paucity of evidence.

Agreements and disagreements with other studies or reviews

Sutherland et al concluded that there was no evidence that a box trainer was effective in laparoscopic training based on one systematic review of literature (Sutherland 2006). We have concluded that box model training improves the technical skills of surgical trainees with no prior laparoscopic experience based on the knowledge that all the assessments of training modalities such as virtual reality in surgical trainees without prior laparoscopic experience is likely to be about short‐term impact on technical skills rather than long‐term impact on patient outcomes.

Study flow diagram.
Figures and Tables -
Figure 1

Study flow diagram.

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.
Figures and Tables -
Figure 2

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.
Figures and Tables -
Figure 3

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Funnel plot of comparison: 1 Box model training alone or supplementing standard surgical training versus standard surgical training, outcome: 1.4 Composite score.
Figures and Tables -
Figure 4

Funnel plot of comparison: 1 Box model training alone or supplementing standard surgical training versus standard surgical training, outcome: 1.4 Composite score.

Comparison 1 Box model training alone or supplementing standard surgical training versus standard surgical training, Outcome 1 Time taken for task completion (seconds).
Figures and Tables -
Analysis 1.1

Comparison 1 Box model training alone or supplementing standard surgical training versus standard surgical training, Outcome 1 Time taken for task completion (seconds).

Comparison 1 Box model training alone or supplementing standard surgical training versus standard surgical training, Outcome 2 Error score.
Figures and Tables -
Analysis 1.2

Comparison 1 Box model training alone or supplementing standard surgical training versus standard surgical training, Outcome 2 Error score.

Comparison 1 Box model training alone or supplementing standard surgical training versus standard surgical training, Outcome 3 Accuracy score.
Figures and Tables -
Analysis 1.3

Comparison 1 Box model training alone or supplementing standard surgical training versus standard surgical training, Outcome 3 Accuracy score.

Comparison 1 Box model training alone or supplementing standard surgical training versus standard surgical training, Outcome 4 Composite score.
Figures and Tables -
Analysis 1.4

Comparison 1 Box model training alone or supplementing standard surgical training versus standard surgical training, Outcome 4 Composite score.

Comparison 1 Box model training alone or supplementing standard surgical training versus standard surgical training, Outcome 5 Time taken for task completion (sensitivity analysis).
Figures and Tables -
Analysis 1.5

Comparison 1 Box model training alone or supplementing standard surgical training versus standard surgical training, Outcome 5 Time taken for task completion (sensitivity analysis).

Comparison 1 Box model training alone or supplementing standard surgical training versus standard surgical training, Outcome 6 Error score (sensitivity analysis).
Figures and Tables -
Analysis 1.6

Comparison 1 Box model training alone or supplementing standard surgical training versus standard surgical training, Outcome 6 Error score (sensitivity analysis).

Comparison 1 Box model training alone or supplementing standard surgical training versus standard surgical training, Outcome 7 Composite score (sensitivity analysis).
Figures and Tables -
Analysis 1.7

Comparison 1 Box model training alone or supplementing standard surgical training versus standard surgical training, Outcome 7 Composite score (sensitivity analysis).

Comparison 2 Different methods of box model training, Outcome 1 Time taken for task completion (seconds).
Figures and Tables -
Analysis 2.1

Comparison 2 Different methods of box model training, Outcome 1 Time taken for task completion (seconds).

Comparison 2 Different methods of box model training, Outcome 2 Error score.
Figures and Tables -
Analysis 2.2

Comparison 2 Different methods of box model training, Outcome 2 Error score.

Comparison 2 Different methods of box model training, Outcome 3 Accuracy score.
Figures and Tables -
Analysis 2.3

Comparison 2 Different methods of box model training, Outcome 3 Accuracy score.

Comparison 2 Different methods of box model training, Outcome 4 Composite score.
Figures and Tables -
Analysis 2.4

Comparison 2 Different methods of box model training, Outcome 4 Composite score.

Summary of findings for the main comparison. Box model training compared with no training or standard surgical training for trainees with no prior laparoscopic experience

Box model training compared with no training or standard surgical training for trainees with no prior laparoscopic experience

Patient or population: trainees with no prior laparoscopic experience.

Settings: secondary care.

Intervention: box model training.

Comparison: no training or standard surgical training.

Outcomes

Video‐box model training

No of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Time taken for task completion (seconds)

The mean time taken for task completion in the intervention group was 0.54 standard deviations lower (0.27 to 0.81 lower).

229

(7)

⊕⊝⊝⊝
very low1,2

Error score

The mean error score in the intervention group was 0.69 standard deviations lower (0.17 to 1.21 lower).

69

(3)

⊕⊝⊝⊝
very low1,2

Results became non‐significant in sensitivity analysis (after exclusion of trials with an imputed standard deviation).

Accuracy score

The mean accuracy score in the intervention group was 0.67 standard deviations higher (0.18 to 1.17 lower).

73

(3)

⊕⊝⊝⊝
very low1,2

Composite score

The mean composite performance score in the intervention group was 0.49 standard deviations higher (0.25 to 0.73 higher).

321

(9)

⊕⊝⊝⊝
very low1,2,3

Significant heterogeneity in magnitude of effect was present.

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1The trial(s) was (were) of high risk of bias.
2There were too few trials to assess publication bias.
3There was significant heterogeneity.

Figures and Tables -
Summary of findings for the main comparison. Box model training compared with no training or standard surgical training for trainees with no prior laparoscopic experience
Comparison 1. Box model training alone or supplementing standard surgical training versus standard surgical training

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Time taken for task completion (seconds) Show forest plot

8

249

Std. Mean Difference (IV, Fixed, 95% CI)

‐0.48 [‐0.74, ‐0.22]

2 Error score Show forest plot

3

69

Std. Mean Difference (IV, Fixed, 95% CI)

‐0.69 [‐1.21, ‐0.17]

3 Accuracy score Show forest plot

3

73

Std. Mean Difference (IV, Fixed, 95% CI)

0.67 [0.18, 1.17]

4 Composite score Show forest plot

10

373

Std. Mean Difference (IV, Fixed, 95% CI)

0.65 [0.42, 0.88]

5 Time taken for task completion (sensitivity analysis) Show forest plot

6

204

Std. Mean Difference (IV, Fixed, 95% CI)

‐0.42 [‐0.70, ‐0.13]

6 Error score (sensitivity analysis) Show forest plot

1

24

Std. Mean Difference (IV, Fixed, 95% CI)

‐0.90 [‐1.81, 0.02]

7 Composite score (sensitivity analysis) Show forest plot

6

268

Std. Mean Difference (IV, Fixed, 95% CI)

0.56 [0.30, 0.83]

Figures and Tables -
Comparison 1. Box model training alone or supplementing standard surgical training versus standard surgical training
Comparison 2. Different methods of box model training

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Time taken for task completion (seconds) Show forest plot

6

Std. Mean Difference (IV, Fixed, 95% CI)

Subtotals only

1.1 Box suturing versus box drills

1

20

Std. Mean Difference (IV, Fixed, 95% CI)

‐0.78 [‐1.70, 0.14]

1.2 Simple cardboard box versus standard box

1

36

Std. Mean Difference (IV, Fixed, 95% CI)

‐3.79 [‐4.92, ‐2.65]

1.3 Single incision versus multiport

1

20

Std. Mean Difference (IV, Fixed, 95% CI)

0.11 [‐0.76, 0.99]

1.4 Reverse alignment versus forward alignment

3

104

Std. Mean Difference (IV, Fixed, 95% CI)

‐1.28 [‐2.20, ‐0.35]

2 Error score Show forest plot

3

Std. Mean Difference (IV, Fixed, 95% CI)

Totals not selected

2.1 Box suturing versus box drills

1

Std. Mean Difference (IV, Fixed, 95% CI)

0.0 [0.0, 0.0]

2.2 Single incision versus multiport

1

Std. Mean Difference (IV, Fixed, 95% CI)

0.0 [0.0, 0.0]

2.3 Z‐maze box versus U‐maze box

1

Std. Mean Difference (IV, Fixed, 95% CI)

0.0 [0.0, 0.0]

3 Accuracy score Show forest plot

1

Std. Mean Difference (IV, Fixed, 95% CI)

Totals not selected

3.1 Z‐maze box versus U‐maze box

1

Std. Mean Difference (IV, Fixed, 95% CI)

0.0 [0.0, 0.0]

4 Composite score Show forest plot

10

Std. Mean Difference (IV, Fixed, 95% CI)

Subtotals only

4.1 Mirror trainer versus video trainer

1

26

Std. Mean Difference (IV, Fixed, 95% CI)

‐0.69 [‐1.49, 0.10]

4.2 Simple cardboard box versus standard box

1

36

Std. Mean Difference (IV, Fixed, 95% CI)

0.87 [0.19, 1.56]

4.3 Single incision versus multiport

1

20

Std. Mean Difference (IV, Fixed, 95% CI)

0.34 [‐0.55, 1.23]

4.4 Reverse alignment versus forward alignment

1

22

Std. Mean Difference (IV, Fixed, 95% CI)

1.82 [0.79, 2.84]

4.5 Basic task overtrained versus basic task trained

1

49

Std. Mean Difference (IV, Fixed, 95% CI)

0.53 [‐0.04, 1.10]

4.6 Box suturing versus box drills

1

20

Std. Mean Difference (IV, Fixed, 95% CI)

0.59 [‐0.31, 1.50]

4.7 Operating room trained versus classroom trained

1

16

Std. Mean Difference (IV, Fixed, 95% CI)

0.21 [‐0.77, 1.20]

4.8 Trained with distraction versus trained without distraction

1

25

Std. Mean Difference (IV, Fixed, 95% CI)

0.06 [‐0.73, 0.84]

4.9 Ongoing training versus no initial training only

2

47

Std. Mean Difference (IV, Fixed, 95% CI)

‐0.41 [‐1.02, 0.20]

4.10 Distributed versus massed maintenance practice

1

20

Std. Mean Difference (IV, Fixed, 95% CI)

‐0.02 [‐0.89, 0.86]

Figures and Tables -
Comparison 2. Different methods of box model training