1 Introduction

Across a number of high-performance environments, there is a proliferation of technology designed to augment existing performance solutions. One such technology is virtual reality (VR). Virtual reality has attracted much interest as a training solution because it not only allows safe, repeatable training tasks, but it affords complete control over the training environment. A simulation can be used to augment the training task by varying task constraints (e.g., Lammfromm and Gopher 2011), controlling feedback (e.g., Sigrist et al. 2015), or providing adaptive difficulty (e.g., Gray 2017). At the pragmatic level, VR training enables a trainee to engage in learning, practice, or rehearsal that may have otherwise been impractical, for example rehearsing particularly dangerous skills without real risk, or practicing skills while physically distinct from the real performance environment (e.g., learning to fly in a simulator without access to an aeroplane). Virtual reality has intuitive appeal as a training tool and has been implemented in a number of environments including sport (Gray 2019; Neumann et al. 2018), surgery (Gurusamy et al. 2008), aviation (Vine et al. 2015), and rehabilitation (Adamovich et al. 2009). However, the use of VR in these environments raises questions about whether the current systems are sufficiently realistic to provide an effective training stimulus. With the exception of aviation, where there is a large evidence base, all of these areas have adopted VR simulations for training without the necessary evidence of the efficacy of the VR tool and the training effectiveness.

In developing an appropriate evidence base for the use of VR training, one of the biggest challenges is demonstrating real-world transfer (Michalski et al. 2019). While evidence for positive transfer in sporting tasks is limited, there is a body of evidence from surgical skills training which suggests simulators can facilitate task expertise in the real world (Gurusamy et al. 2008; Haque and Srinivasan 2006; Lerner et al. 2010). Simulation in surgical education is more advanced than many other areas—probably due the amount of investment and research in this area—but nonetheless this demonstrates the potential benefits of VR training. In order to achieve good real-world transfer, it is imperative that simulations demonstrate appropriate levels of validity and fidelity. Validity is the degree to which the simulation provides an accurate representation of core features of the task (Gray 2019). Fidelity refers to the extent to which a simulation reproduces the state or behavior of the real-world system (Burdea and Coiffet 2003). Gray (2019) subdivided fidelity into physical fidelity (i.e., how similar is the appearance of the task to the real world), affective fidelity (i.e., how similar are the emotions experienced to those experienced in the real world), biomechanical fidelity (i.e., how similar are task-specific movements elicited compared to the real world), and psychological fidelity (i.e., how similar are the perceptual-cognitive skills needed compared to the real world). The greater the congruence between these aspects of the simulation and the real world, the more likely that simulation training will provide benefits in real-world performance contexts (Gray 2019). An analysis of the validity and fidelity of VR sport simulators is therefore critical, before practitioners begin to use them in training contexts.

In the context of the validation of VR simulators, construct validity describes how the simulator is able to distinguish participants of differing levels of expertise (Roberts et al. 2017). Although there is no recognized standards for the evaluation of construct validity in VR simulators, the examination of expertise differences has been extensively used for this purpose, particularly in clinical contexts. For example, previous work in surgery (see Vine et al. 2014; Bright et al. 2012) has validated VR simulators by comparing the performances of expert performers of the task with novice performers of the task. In these contexts, if the performance markers that the VR simulation provides are valid and reliable, then a consistent and predictable distinction between expert and novice users of the device should be evident (as Vine et al. 2014; Bright et al. 2012). In sport, therefore, if the training drills in the VR environment are a true representation of the skills needed in the real world, then those that excel at the sport in the real world should also excel in the virtual one (Gray 2019). As such, the examination of expertise differences in VR simulators has proved to be an important first step in the evaluation of their effectiveness.

Despite the increase in the use of VR within sport science (for reviews see Neumann et al. 2018; Michalski et al. 2019), little work has been carried out that has examined expertise differences in simulated environments. In a recent study, Harris et al. (2019a) examined the construct validity of a VR golf-putting simulator using elite and novice golfers. Elite-level golfers out-performed novices in both the real and virtual golf putting task, and there was a moderate positive correlation between real and virtual putting performance. Another study by Dessing and Craig (2010) examined how soccer goalkeepers caught curved free kicks in VR. They showed that an elite goalkeeper performed better by waiting significantly longer before initiating movement, thus gaining more information about the ball trajectory, much as they do in the real world. Finally, the virtual reality batting simulator used in a training study by Gray (2017) was previously shown to predict the playing level of baseball batters based on metrics of spatial and temporal hitting accuracy (Gray 2002). Although these studies report encouraging results, the tasks used are limited to basic interception and aiming tasks. No study has yet examined the use of VR simulators that replicate more dynamic sport contexts dependent on higher-order perceptual-cognitive processes (e.g., decision-making, anticipation, and visual search), which are arguably more important and more difficult to replicate in the virtual world. For example, in soccer contexts, perceptual-cognitive expertise has been related to superior advance visual cue utilization, pattern recall and recognition, visual search behavior and the knowledge of situational probabilities (see Casanova et al. 2009 for a review). As such, it is important to examine the validity of simulations that aim to train these skills.

The aim of this experiment, therefore, was to examine the construct validity of one of the most advanced soccer-specific VR simulators in the market. If the VR simulator provides a representative environment in relation to the perceptual-cognitive processes needed for expertise in soccer (Pinder et al. 2011), then it should be able to differentiate experts (professional soccer players), intermediate (academy soccer players), and novice soccer players based on performance in the simulation. While the manufacturers claim that this virtual environment could be used to help to train soccer players, the validity of such claims has yet to be tested. This study therefore represents a critical first step before establishing its potential for training perceptual-cognitive skills in soccer.

2 Methods

2.1 Participants

Seventeen professional soccer players (13 male, 4 female; mean age = 28.41 years, SD 6.11), seventeen academy players (14 male, 3 female; mean age = 14.47 years, SD 2.00), and seventeen novice players (9 male, 8 female; mean age = 21.53 years, SD 3.54) took part in the experiment. Professional players were recruited from the English Premier League (n = 6), English Championship (n = 4), English League One (n = 1), American Major League Soccer (n = 2), American National Women’s Soccer League (n = 2), and the English F.A Women’s Super League (n = 2). Six professionals were full international players. Academy players were youth players from the academy of professional clubs in the English Premier League (n = 4), American Major League Soccer (n = 6), Russian Premier League (n = 4), American National Women's Soccer League (n = 3). Novice players were recruited from the student cohort at the lead author’s institution. These players had minimal competitive playing experience at the recreational level (mean = 3.32 years, SD 1.24).

Sample sizes were based on a priori examination of effects sizes from similar studies (Harris et al. 2019a) that showed that a total sample size of ten in each group was necessary to achieve power of 0.80. All players were free from injury and had normal or corrected to normal vision. Informed consent and parental consent were given by all players, and institutional ethics was sought prior to the commencement of this study. All procedures performed in this study were in accordance with the ethical standards of the institutional committee and with the 1964 Helsinki Declaration and its later amendments.

2.2 Virtual reality equipment

The MiHiepa Sports Rezzil VR platform (https://mihiepa.com/) consists of a HTC Vive Pro head-mounted display (HTC Inc., Taoyuan City, Taiwan) with a resolution of 2880 × 1600 pixels that was updated 90 times per second and has a horizontal and vertical field of view of 110°. Participants wore bespoke training shoes (matched to their shoe size) and shin guards that had four detachable HTC Tracker 2.0 sensors (HTC Inc., Taoyuan City, Taiwan) connected to each shoe and shin guard (Fig. 1). These sensors were tracked using two HTC Lighthouse 2.0 trackers (HTC Inc., Taoyuan City, Taiwan) positioned 2 m high and 3 m diagonally to the left and right of the participants.

Fig. 1
figure 1

Showing the sensors placed on the shin guards and feet of the players (left), and the accuracy (center) and speed (right) calibration drills

This system has a fully automated calibration procedure that is supported by written and audio instructions in the virtual world. This calibration process consisted of a targeting task where players were required to shoot and successfully hit five shots to highlighted thirds of a full-sized virtual goal and a passing power self-calibrating drill where players had to aim to get the ball to stop on the goal line when kicks from the penalty mark (Fig. 1). Once calibrated, participants completed four basic drills in the following order.

2.3 Virtual reality soccer drills

2.3.1 Rondo scan

In this drill, participants were surrounded by 10 virtual mini goals aligned with ball feeder machines. A ball was fired randomly at the participant from one of these ball feeders, and they then had to pass the ball into a randomly highlighted goal (Fig. 2a). The performance score for the rondo scan was derived from the number of balls passed into the correct goal in a three-minute time limit.

Fig. 2
figure 2

A visual representation of the rondo scan (a), color combo (b), shoulder sums (c), and the pressure pass (d) VR drills taken from the VR environment

2.3.2 Color combo

In this drill, each half of the players’ virtual boots were colored a different color (see Fig. 2b). Colored balls were then fired out of four virtual ball feeder machines, and players had to intercept each ball with the matched colored part of their virtual boot (e.g., red balls needed to be intercepted with the outside of the right foot). In addition, balls that were silver in color could be intercepted by any side of any foot and gave the player three points. Balls that were gray were to be avoided or a life would be lost. The drill progressed through five levels that gradually increase in speed (from 25 to 51 km/h) and number of balls presented. The performance score for the color combo was derived from the number of correct balls intercepted with the correct side of the foot, the number of silver balls intercepted, and the number of gray balls avoided. This drill carried on indefinitely until participants lost three ‘lives’ by touching the gray balls.

2.3.3 Shoulder sums

In this drill, players were faced with four full sized virtual goals with virtual ball feeders between each goal. As a ball was passed to the participant, a number of players (colored red and yellow) appeared over their left and right shoulder. On-screen instructions asked the participant to count the total number of players appearing over both shoulders and then pass the ball to the segment of the goal that matches the number of players (Fig. 2c). As the drill progresses to Level 2, players were required to only count the number of red or yellow players that matched the color of the ball coming toward them (e.g., only yellow players were counted if the ball was yellow). The performance score for the shoulder sums was derived from the number of correct sums and the accuracy of the pass (i.e., how close it was to the center of the goal) into the related goal.

2.3.4 Pressure pass

This was a dynamic passing drill where each player was surrounded by three teammates (in yellow) who were marked by three opposing red players. The opposing players moved toward and away from the participant creating dynamic passing angles and passing opportunities. Players were required to pass to all three teammates in yellow, in any order, without hitting the opposing players. If an opposing player is hit, then the number of teammates already hit was reset (Fig. 2d). The performance score for the pressure passing drill was derived from the longest passing streak achieved and the accuracy of these passes (i.e., how close the ball hit to the center of the player).

The VR platform gives four performance scores for each of the four VR soccer drills. It also provides four separate ‘process’ scores relating to how efficient and effective the players performance was across all the drills performed. These relate to (1) passing accuracy (number of correct passes and the accuracy of these passes), (2) reaction time (how long players dwelled on the ball before making a passing decision), (3) composure (maintaining performance level despite increases in task difficulty), and (4) adaptability (the number of touches with both feet). From the interaction of these performance and process scores, the algorithm within the system then provides an overall diagnostic score concerning the ability of the player. This is termed the ‘Rezzil Index’ score and is purported to reflect the overall ability of the player being tested, based on an algorithm that measures the use of both feet, passing and receiving the ball under pressure, spatial awareness, speed of decision-making, and the accuracy of these decisions.

2.4 Procedure

Novice players attended the laboratory individually and, after providing written informed consent, were set up on the equipment, calibrated using the automatic calibration protocol and completed the four VR drills. Testing lasted approximately 30 min per participant and was conducted in a laboratory or sports hall environment. For the professional and academy players, anonymized player data were taken from the cloud service of MiHiepa under the terms and conditions of their licensing agreement with the professional clubs. Authors were given a random sample of 250 players (125 academy and 125 professional) and used a random number generator to select a sample of 17 in order to match the novice sample. Data attained represented the first time that all players had used the VR platform. All players were taken through the same VR drills, in the same order, and these were administered by the same person. Novice players were tested at the lead author's institution, and academy and professional players were tested at their associated soccer clubs.

2.5 Data analysis

MANOVA was conducted to assess the effect of each group for each VR drill (rondo scan, color combo, shoulder sums, and pressure pass), process score (passing accuracy, composure, reaction time, and adaptability), and overall Rezzil score. A series of one-way ANOVAs were then conducted, and post hoc Bonferroni corrected pairwise comparisons were carried out to explore significant effects. Partial eta squared is reported for main effects and Cohen’s d for pairwise comparisons effect sizes (0.2 'small,' 0.5 'medium,' and 0.8 'large' effect size).

3 Results

The results of the MANOVA showed a significant difference between groups, Pillai's Trace = 0.91, F(18,82) = 3.82, p < 0.001, ηp2 0.46, on scores generated by the VR platform. Consequently, a series of one-way ANOVAs were conducted to explore differences between skill levels (novice vs academy vs professional) for each VR performance metric (see Table 1).

Table 1 The statistics for performance and process scores across each VR drill and the overall ‘Rezzil Index' score

3.1 Performance scores

Professional players significantly out-performed novice and academy players on all but one of the VR soccer drills. There were no significant differences in performance scores between novice and academy players on any of the drills. No significant differences between players were revealed for the color combo drill. These data are presented in Fig. 3.

Fig. 3
figure 3

Individual data points for each player and the mean (95% CI) for each VR drill (rondo scan, color combo, shoulder sums, and pressure pass), process score (passing accuracy, composure, reaction time and adaptability), and the overall Rezzil Index

3.2 Process scores

Professional players out-performed novice and academy players on passing accuracy, composure, and adaptability process measures (see Table 1). Academy players significantly out-performed novice players on reaction time and adaptability, but there were no significant differences between these groups on passing accuracy and composure. These data are presented in Fig. 3.

3.3 Overall ‘Rezzil Index’ score

Critically, the overall Rezzil index score significantly differentiated between all groups (see Table 1). Based on the interaction between performance and process scores, the VR system showed that professional players scored significantly higher than both academy and novice players and that academy players scored significantly higher than novice players (see Fig. 3).

4 Discussion

Given the growing interest in VR simulation for sports training, more rigorous assessments of the validity of simulations are required if VR training is to be successful. Consequently, the aim of this experiment was to provide an examination of the construct validity of a soccer-specific VR simulator. Results suggested that in terms of performance, the VR simulator differentiated professional players compared to both academy and novice players on every VR soccer drill except for the color combo drill. The skills required to perform well in the color combo drill are arguably less representative of real-life soccer than all of the other drills, so the fact that it did not differentiate between groups is perhaps expected. Instead, this drill mainly focuses on the ability of the player to maintain goal-directed attentional control in the face of increased task difficulty and largely relies on the executive functions of the player. Although there is some evidence that such abilities are associated with success in youth soccer players (Vestberg et al. 2017) and top professionals (Vestberg et al. 2012), there is little evidence to suggest that differences in executive functioning can predict soccer performance (e.g., Furley and Memmert 2015).

Academy players did not out-perform novices in any of the drills assessed. It would probably be expected that academy players who have received extensive soccer training and who have been selected on their soccer ability, would out-perform novices with little soccer experience. This suggests that the VR simulator was not sensitive enough to differentiate expertise differences at this lower level where performance variability is generally higher. Alternatively, this lack of disparity between these lower level players could be influenced by age-related differences between the groups influencing developmental differences in constructs like working memory (Furley and Wood 2016). However, it must be remembered that performance on these drills is only related to outcome and is no reflection on the quality of performance. For example, it is possible that academy players performed the task quicker and with greater technical ability, but performed similarly in terms of performance outcome (e.g., how many targets were hit). To uncover if this is correct, an examination of the process measures is needed.

In fact, the process measures did successfully differentiate between expertise levels. The professional players demonstrated superior passing accuracy, composure, reaction time, and adaptability compared to all groups. The academy players showed superior reaction time and adaptability compared to the novice players. As these measures are designed to measure the ‘quality’ of the performance in the VR simulator, it was expected that more expertise-based differences would be evident in these metrics. However, the most important finding was related to the overall diagnostic ability (i.e., Rezzil Index) score produced by the VR platform. Based on an algorithm that calculates the interaction between performance and process, the VR platform significantly differentiated across all expertise groups. Furthermore, interpretation of effect sizes showed that the VR system differentiated between novice and academy players with a probability of 76%, between academy and professional players with a probability of 85%, and between novice and professional players with a probability of 97%. From this, it can be concluded that, although expertise differences were not evident across all VR metrics, construct validity of the overall diagnostic score was shown. In short, the system could successfully differentiate differences across expertise levels.

While these findings are encouraging, not only for the validity of this VR system but also for the feasibility of using VR in sport more broadly, there are a number of issues that need a great deal of consideration when assessing the implications of this work. For example, just because experts performed better on the simulator, it does not automatically mean that training on the simulator would have positive transfer to real-world soccer skills. Rather, we are interpreting these results as evidence that there is at least partial overlap between the perceptual-cognitive and motor skills needed to perform well in both the VR task and the real-world equivalent. The extent of this overlap (i.e., the extent to which the perceptual-cognitive and motor demands of soccer tasks are replicated by the VR simulation) will determine its efficacy as a training tool. As such, further work is needed to explore the validity and fidelity of the simulation before its adoption as a training device can be fully endorsed.

A logical next step in this pursuit would be to examine the psychological fidelity of the simulator to establish the extent to which perceptual-cognitive processes used in the virtual world are similar to those that are critical for real-world performance. Examples of this type of work could include examination of visual search, cue utilization, quiet eye durations (e.g., Vine et al. 2014), and head movements during visual exploration (McGuckian et al. 2018; 2019) across real and virtual environments. If performers do not utilize perceptual information in the simulation in the same manner as they do in the real world, it would suggest that the simulation provides limited psychological fidelity, which would likely impair transfer (Harris et al. 2019a, b, c). For example, when evaluating the effectiveness of a VR golf putting simulator, Harris et al. (2019b) also examined changes in gaze behavior (i.e., quiet-eye durations) of golfers across virtual and real putting environments. Despite improvements in real-world performance as a result of virtual training, there was no corresponding change in gaze behavior. In fact, prior use of VR actually caused temporary disruptions to gaze control. One potential reason for this may be the perceptual irregularities of stereoscopic VR, which creates conflict between cues to depth and a subsequent reduction in visual acuity due to presenting objects at varying depths on a fixed depth screen (Hoffman et al. 2008; Kramida 2016). These disruptions have previously been shown to disrupt performance in soccer tasks (Dicks et al. 2010). As perceptual-cognitive skills have been shown to be a significant predictor of sporting expertise (Mann et al. 2007), findings from these types of studies might be useful for establishing the predictive validity of the simulator that could also have implications for talent identification in soccer.

As an extension to this, future work could also test the emotional (affective) fidelity of these types of simulators to examine if they prepare performers for pressurized competitive environments (Argelaguet Sanz et al. 2015). While VR has been used extensively in exposure therapy for anxiety disorders in clinical contexts (Carl et al. 2019), only a few studies have examined the utility of VR environments for replicating pressurized situations in sport. In one example, Stinson and Bowman (2014) created a virtual soccer penalty kick task involving goalkeepers attempting to save 15 penalty kicks. Dependent measures of anxiety included heart rate, galvanic skin response, self-reported state anxiety, and the number of saves made. The results demonstrated that a VR system can induce increased anxiety (based on physiological and subjective measures) compared to a baseline condition. Previous experimental work has documented the disruptive effect that anxiety can have on penalty takers, manifesting as an attentional bias toward the goalkeeper (Wilson et al. 2009), particularly when the goalkeeper exhibits distracting behaviors (Wood and Wilson 2010), which impacts negatively upon their performance. If the VR simulator can replicate anxiety and distraction to a similar extent as that induced in this laboratory-based experimental research, then a similar disruption in attentional control and performance could be expected. The examination of these types of questions would validate the feasibility of using VR for sporting scenarios where anxiety and distraction are prevalent and athletes need to maintain attentional control in order to avoid performance disruptions. Training in such situations is likely to desensitize players to threatening stimuli and provide a greater sense of perceived control (Wood et al. 2015).

In conclusion, these results demonstrate the construct validity of this soccer-specific VR simulator. From an applied perspective, this VR platform may have potential for use during the rehabilitation of soccer players who are injured and need to maintain a level of perceptual-cognitive skill while avoiding the physical load experienced in real environments. In such circumstances, the use of VR has been shown to increase enjoyment, adherence to rehabilitation exercises, and confidence (Sveistrup 2004), something also reported by professional clubs currently using this VR platform. What is clear is that while the progress of VR sport platforms is developing at an astonishing pace, the evaluation of these simulators is lagging behind. Before sport-related VR training can be implemented as a realistic and effective supplement to real-world sport training, further scientific examination of the claims made regarding their efficacy is required. To achieve this, a greater collaboration is needed between the designers of such technologies and those with the expertise to test and validate them independently and rigorously.