The purpose of this study was to test the reliability and validity of the First-Episode Psychosis Services Fidelity Scale (FEPS-FS) and compare it with similar scales.

Methods:

A fidelity scale was developed from previously identified essential components of first-episode psychosis services. The scale was tested in six programs in two countries and compared with three existing scales.

Results:

Program data collection from multiple sources indicated the feasibility and reliability of the FEPS-FS (intraclass correlation coefficient for interrater reliability=.842; 95% confidence interval=.795–.882). Satisfactory programs scored an average of 86% of the maximum total score; the single unsatisfactory program scored 70%. Compared with the other scales, the FEPS-FS has fewer items, but it has the highest proportion of items common to all scales.

Conclusions:

The FEPS-FS is a feasible, compact, reliable, and valid measure of adherence to evidence-based practices for first-episode psychosis services that can be applied to any first-episode psychosis service.

The purpose of this study was to develop and test the feasibility, reliability, and validity of the First-Episode Psychosis Services Fidelity Scale (FEPS-FS). Fidelity refers to the degree of implementation of an evidence-based practice (1). Fidelity scales provide a list of objective criteria by which a program or intervention is judged to adhere to an intervention that is the reference standard. Such scales have multiple applications in research, quality management, and accreditation (2–4).

The application of fidelity scales in first-episode psychosis services (FEPS) has been limited. In the United Kingdom, the EDEN study (Evaluating the Development and Impact of Early Intervention Services) developed a fidelity scale by using an expert clinician consensus process, and the scale was refined by researchers (5). In the United States, EASA (Oregon Early Assessment and Support Alliance) developed a fidelity scale by using a process of expert committees; the scale has been used in support of program implementation and quality control (6). In the United States, the RAISE Connections program (Recovery After an Initial Schizophrenia Episode) reported on fidelity by using routinely collected program data from two program sites (7). None of these three scales were developed with a three-step knowledge synthesis process comprising systematic reviews, evidence ratings, and international expert consensus.

Fidelity scales can be developed on the basis of a successful program with proven effectiveness or from systematic reviews of the literature, provided there is sufficient available research (8). We determined that there was sufficient evidence to develop a scale that is based on the evidence for the effectiveness of individual components of FEPS and on evidence from robust large-scale randomized controlled trials of the effectiveness of FEPS (9–12). In contrast to a scale developed from a single effective program, a scale that is derived from a comprehensive review of the literature can be more easily applied to a broad range of programs. We designed this study to develop, refine, and test the reliability and face validity of a fidelity scale for FEPS that was based on systematic review, ratings of evidence, and international expert consensus.

Methods

We first identified 32 essential components of FEPS (13) and then identified characteristics of effective treatment teams from a systematic review of the mental health team literature. To transform these essential components into a useful fidelity scale, a group consisting of a fidelity scale expert, a health services researcher, experts in first-episode psychosis, and an epidemiologist transformed the components into operational definitions linked to specified anchor points on a 5-point scale. This was achieved through an iterative process of reviews of evidence on team functioning and component integration and, where possible, reviews of evidence that supported the “dosing” of interventions. This process resulted in a 32-component scale comprising 22 individual treatment components and ten team-based components. Each item is rated on a 5-point scale from 1 to 5. The total fidelity score is obtained by summing the individual items.

We also developed a version of the scale to be used for an individual that includes only the intervention components. In this version, the component descriptors remain the same, but the component ratings change from percentages of patients who received the service to the number of services received by an individual patient. The individual scale is designed to make comparisons between team-based and non–team-based services. Finally, we developed a manual to guide the process of fidelity assessment.

The next step in development was a study of the feasibility, reliability, and face validity of the draft FEPS-FS. The project received ethics institutional review board approval from the University of Calgary.

We first assessed one program with the first draft of the scale and the draft manual. Information on the program had been obtained by video-recording interviews with key informants, collecting administrative data, and reviewing ten randomly selected health records. The results were presented to and discussed with the other investigators at a two-day meeting. This resulted in significant modifications of both the scale and the rating manual—for example, adding that clinical nurse practitioners as well as psychiatrists can prescribe medications and that cognitive-behavioral therapy (CBT) should be delivered by a therapist with formal training in CBT or by a therapist trained to follow a formal manual based on CBT principles, such as Individual Resiliency Training (a component of the RAISE protocol [14]).

The scale was further tested during fidelity site visits in the first six months of 2015 by using combinations of two or three raters, all of whom had participated in the assessment of the first program. Site visits have been established as a best practice for fidelity assessment (15). The four U.S. sites included two small rural programs embedded in community mental health centers and two large downtown urban programs. Each U.S. site assessment was conducted by the same three raters. In the U.S. sites, the assessment included the administration of the FEPS-FS and an existing first-episode fidelity scale, the EASA scale (6), which is used routinely as part of a statewide quality improvement process conducted by a state-funded technical assistance center. Two Canadian sites were located in urban areas, and each fidelity visit was conducted in one day.

All fidelity visits included a review of program documents, including Web sites, policies and practices, presentations, and handouts for community partner education and for client and family education. Fidelity raters examined administrative data, discovering that although the content varied across programs, the data always included current staffing, annual admissions and discharges, and time from referral to face-to-face meetings. During the fidelity visit, the raters observed a team meeting; reviewed ten health records; and met with the program manager and senior administrator, clinicians, psychiatrists or other prescribers, family members, and clients. Raters completed their ratings independently during fidelity visits. Within a week of the site visit, raters held a teleconference, reported their individual ratings, and ultimately reached a consensus rating. The interrater reliability data were based on the independent ratings completed before consensus ratings were determined. The process of arriving at a consensus often generated suggestions for modifying item component descriptors or rating scales. Interrater reliability was assessed by calculating the intraclass correlation between each rater’s independent ratings on individual items.

Interrater reliability of the FEPS-FS was determined by intraclass correlation coefficient (ICC) by using a two-way random-effects model. We first calculated the ICC over the 31 items for each site. Because the raters for the four U.S. sites were the same, we also calculated the ICC of the overall fidelity scores of the four sites. The ICC was calculated by using Stata, version 14.0.

The comparison between the final version of the FEPS-FS and the other fidelity scales—the EDEN fidelity scale (5), the EASA fidelity scale (6), and the RAISE-C monitoring tool (7)—was a descriptive comparison of the development processes used, the number of components included, the items or domains assessed that were common to all scales, and the number of items or domains shared by the FEPS-FS and each of the other scales. The component descriptions across the scales were not identical; therefore, the term “domain” is used here because the same concept or domain is being addressed.

Results

The basic structure of the scale did not change during the study; however, two items were dropped and one added during the study. In addition, the wording of component descriptors was clarified, and some changes to the rating descriptors were made. [The full final version of the 31-item scale is available in an online supplement to this report.] A manual was developed to support the raters in making reliable ratings of the descriptors and is available on request from the first author.

Collecting data from multiple sources in order to score the FEPS-FS proved feasible, and raters integrated all sources of data to reach their best estimate for each component. The interrater reliability of ratings calculated at an item level was high; the ICC across items and based on the independent ratings before the discussion of consensus was .842 (95% confidence interval [CI]=.795–.882) and ranged from .741 (CI=.590–.854) to .972 (CI=.951–.986). These results were achieved for the four programs rated by the same three raters. The ICC across the overall scores of the four U.S. sites was .928 (CI=.581–.995). [Means and standard deviations for each item are presented in the online supplement.]

Programs that were considered to have adequate fidelity had a mean score of 86% (range 81% to 89%) on the basis of the consensus rating of the total potential score. The single program considered not to meet standards according to an established fidelity scale (EASA) scored 70%. We took this result to indicate that FEPS-FS was capable of differentiating between programs of different quality. We also considered that this finding provided early support for a potential cutoff of 80% as a satisfactory rating or an average rating of 4 on each 5-point scale.

Content validity was supported by comparison with other fidelity scales (Table 1). A total of 17 components are assessed by all four scales. Because of its shorter length, the FEPS-FS has the highest proportion of components common to all scales (53%). In addition, compared with the other three measures, the FEPS-FS has the highest proportion of overlap with each of the other scales (average of 75%). The FEPS-FS has more components in common with each of the other scales than each has in common with the others. Each fidelity scale was developed by using a different process and used different methods for undertaking the fidelity assessment [see online supplement].

TABLE 1. Item content of four scales for assessing fidelity of first-episode psychosis services^a

Characteristic	FEPS-FS	EASA	RAISE-C	EDEN
N of items	31	97	41	64
N of items shared by all scales	17	17	17	17
Percentage of items shared^b
With all other scales	55	17	41	27
With FEPS-FS	100	25	54	39
With EASA	77	100	50	43
With RAISE-C	74	22	100	22
With EDEN	80	43	22	100

^aFEPS-FS, First-Episode Psychosis Services Fidelity Scale; EASA, scale developed for Oregon Early Assessment and Support Alliance; RAISE-C, scale developed for Recovery After an Initial Schizophrenia Episode Connections program; EDEN, scale developed for Evaluating the Development and Impact of Early Intervention Services

^bFEPS-FS has highest proportion of items common to all measures and the highest proportion of items shared with other measures.

TABLE 1. Item content of four scales for assessing fidelity of first-episode psychosis services^a

Enlarge table

Conclusions

We have described the development and testing of the 31-item FEPS-FS, which can be used to assess any program because it was developed with a formal knowledge synthesis process rather than modeled on a single program model. We have demonstrated that a program can be reliably rated in a one-day visit by using the FEPS-FS and that the scale can be used to rate a variety of programs, from small rural programs to large urban and academic programs. Compared with the three other published fidelity scales, the FEPS-FS proved to be more compact and to have a higher density of components.

The process of fidelity scale development is an iterative process in which testing the scale in a variety of real-world settings is an essential step. We have described the initial steps in the development and assessment of the FEPS-FS. The results suggest that the FEPS-FS is ready for further testing in prospective studies to test predictive validity. Predictive studies could also be used to test the 80% score to differentiate satisfactory from unsatisfactory services. The FEPS-FS could also be used to assess the quality of implementation of new programs, as well as in randomized controlled studies to quantify the quality of the services delivered in various treatment arms.

Dr. Addington, Ms. McKenzie, and Dr. Wang are with the Department of Psychiatry, University of Calgary, Calgary, Alberta, Canada (e-mail: [email protected]). Dr. Norman is with the Department of Health Outcomes and Health Services Research, London Health Sciences Centre, London, Ontario, Canada. Dr. Bond is with the Dartmouth Psychiatric Research Center, Lebanon, New Hampshire. Ms. Sale and Dr. Melton are with the EASA Center for Excellence, Regional Research Institute, Graduate School of Social Work, Portland State University, Portland, Oregon.

The project was funded with the aid of a Pilot Research Fund Program grant from the Hotchkiss Brain Institute/Mathison Centre at the University of Calgary.

The authors report no financial relationships with commercial interests.

References

1 Bond GR, Evans L, Salyers MP, et al.: Measurement of fidelity in psychiatric rehabilitation. Mental Health Services Research 2:75–87, 2000Crossref, Medline, Google Scholar

2 Corbière M, Bond GR, Goldner EM, et al.: Brief reports: the fidelity of supported employment implementation in Canada and the United States. Psychiatric Services 56:1444–1447, 2005Link, Google Scholar

3 Orwin RG: Assessing program fidelity in substance abuse health services research. Addiction 95(suppl 3):S309–S327, 2000Crossref, Medline, Google Scholar

4 Teague GB, Bond GR, Drake RE: Program fidelity in assertive community treatment: development and use of a measure. American Journal of Orthopsychiatry 68:216–232, 1998Crossref, Medline, Google Scholar

5 Lester H, Birchwood M, Jones-Morris N, et al: EDEN: Evaluating the Development and Impact of Early Intervention Services (EISs) in the West Midlands. Manchester, United Kingdom, National Primary Care Research and Development Centre, 2006Google Scholar

6 Practice Guidelines for Oregon Early Assessment and Support Alliance. Salem, Oregon Health Authority, 2012. Available at www.oregon.gov/oha/amh/ReportingReqs/Practice%20guidelines%20for%20Oregon%20EASA.pdfGoogle Scholar

7 Essock SM, Nossel IR, McNamara K, et al.: Practical monitoring of treatment fidelity: examples from a team-based intervention for people with early psychosis. Psychiatric Services 66:674–676, 2015Link, Google Scholar

8 Mowbray CT, Holter MC, Teague GB, et al.: Fidelity criteria: development measurement and validation. American Journal of Evaluation 24:315–340, 2003Crossref, Google Scholar

9 Kane JM, Robinson DG, Schooler NR, et al.: Comprehensive versus usual community care for first-episode psychosis: 2-year outcomes from the NIMH RAISE Early Treatment Program. American Journal of Psychiatry (Epub ahead of print, Oct 20, 2015)Google Scholar

10 Petersen L, Jeppesen P, Thorup A, et al.: A randomised multicentre trial of integrated versus standard treatment for patients with a first episode of psychotic illness. British Medical Journal 331:602, 2005Crossref, Medline, Google Scholar

11 Ruggeri M, Bonetto C, Lasalvia A, et al.: Feasibility and effectiveness of a multi-element psychosocial intervention for first-episode psychosis: results from the cluster-randomized controlled GET UP PIANO trial in a catchment area of 10 million inhabitants. Schizophrenia Bulletin 41:1192–1203, 2015Crossref, Medline, Google Scholar

12 Srihari VH, Tek C, Kucukgoncu S, et al.: First-episode services for psychotic disorders in the US public sector: a pragmatic randomized controlled trial. Psychiatric Services 66:705–712, 2015Link, Google Scholar

13 Addington DE, Mckenzie E, Norman R, et al.: Essential evidence-based components of first-episode psychosis services. Psychiatric Services 64:452–457, 2013Link, Google Scholar

14 Mueser KT, Penn DL, Addington J, et al.: The NAVIGATE program for first-episode psychosis: rationale, overview, and description of psychosocial components. Psychiatric Services 66:680–690, 2015Link, Google Scholar

15 McHugo GJ, Drake RE, Whitley R, et al.: Fidelity outcomes in the National Implementing Evidence-Based Practices Project. Psychiatric Services 58:1279–1284, 2007Link, Google Scholar

Volume 67
Issue 9

September 01, 2016
Pages 1023-1025

Metrics

PDF download

History

Received 14 September 2015

Revised 2 December 2015

Accepted 30 December 2015

Published online 1 April 2016

Published in print 1 September 2016

Sign In

Change Password

Your password must have 6 characters or more:

Password Changed Successfully

Create your account

Forget yout Password?

Forgot your Username?

Development and Testing of the First-Episode Psychosis Services Fidelity Scale

Abstract

Objective: