Reliability and validity of the Static-99R in sex offenders with intellectual disabilities

Claudia Pouls (Claudia Pouls and Inge Jeandarme are both based at the Knowledge Centre Forensic Psychiatric Care, OPZC Rekem, Rekem, Belgium)
Inge Jeandarme (Claudia Pouls and Inge Jeandarme are both based at the Knowledge Centre Forensic Psychiatric Care, OPZC Rekem, Rekem, Belgium)

Journal of Intellectual Disabilities and Offending Behaviour

ISSN: 2050-8824

Article publication date: 7 December 2021

Issue publication date: 20 January 2022

106

Abstract

Purpose

Risk assessment studies involving recidivism in sex offenders with intellectual disabilities (SOIDs) continue to be scarce, limited and producing mixed results. This study aims (to test the ability ...) to test the ability of one such instrument (the Static-99R) to predict intramural sexual and violent incidents involving members of this group.

Design/methodology/approach

The Static-99R was prospectively scored for 38 SOIDs. Occurrences of any violent or sexual incident and/or illegal sexual behaviour were recorded during a minimum period of six months. Predictive accuracy was analysed using several performance indicators.

Findings

The Static-99R significantly predicted sexual incidents (area under the curve = 0.70) but failed to predict violent and illegal sexual incidents. Regarding illegal sexual incidents, the instrument was better at detecting low-risk individuals than high-risk offenders.

Originality/value

Risk assessment studies, both in offenders with and without an intellectual disability (ID), rarely use multiple accuracy estimates. The current study used both discrimination and calibration indicators to evaluate the ability of the Static-99R to detect low- and high-risk offenders.

Keywords

Citation

Pouls, C. and Jeandarme, I. (2022), "Reliability and validity of the Static-99R in sex offenders with intellectual disabilities", Journal of Intellectual Disabilities and Offending Behaviour, Vol. 13 No. 1, pp. 20-31. https://doi.org/10.1108/JIDOB-08-2021-0013

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited


Up till now, three systematic reviews have been conducted related to risk assessment in offenders with an intellectual disability (OIDs) (Camilleri and Quinsey, 2011; Pouls and Jeandarme, 2015; Hounsome et al., 2018). In sum, the research indicated that several risk assessment instruments predicting violent reoffending have been tested successfully in OIDs (e.g. violence risk appraisal guide [Quinsey et al., 2006]), suggesting that the same risk factors predict violent behaviour in OIDs as well as in non-ID offenders. This was confirmed in a meta-analysis of 14 studies, including 1,390 OIDs (Lofthouse et al., 2017). The reviews further revealed that some instruments included adaptations and clarification guidelines for this specific population (e.g. Psychopathy checklist-revised [Hare, 1991, 2003], Psychopathy checklist: screening version [Hart et al., 1995], Historical clinical risk management-20 Version 2 [Webster et al., 1997], Sexual violence risk-20 (SVR-20) [Boer et al., 1997] and Structured assessment of PROtective Factors for violence risk [de Vogel et al., 2007]). However, these additional guidelines did not improve the reliability nor the validity of the instruments (Verbrugge et al., 2011).

A limitation of the existing risk instruments is the fact that being developed for mainstream offender populations, specific issues relevant to an ID population are not being considered. Given the relatively greater dependence of people with ID on external structures and support, environmental variables seem of crucial importance (Boer et al., 2007). Staff attitudes towards OIDs, communication among staff and supervision consistency are some examples of contextual factors identified by Boer et al. (2004) as being relevant in OIDs. As Boer et al. (2007) have argued, these contextual factors, in combination with dynamic client factors, may lead to a more accurate prediction of dynamic risk. Following this reasoning, for example, Boer et al. (2013) developed an instrument designed specifically for use with individuals with a borderline or mild intellectual disability who have offended sexually or have displayed sexually offensive behaviour, namely, the Assessment of Risk and Manageability of Individuals with Developmental and Intellectual Limitations who Offend – Sexually (ARMIDILO-S [Boer et al., 2013]). In addition, some authors argue that a few items may not discriminate well between (sex) offenders with ID (Wilcox et al., 2009). For example, few people with ID have the opportunity for a live-in or marital relationship, possibly resulting in an elevated risk score for this risk factor.

Notwithstanding some evidence for the use of the risk assessment instruments examined, the reviews pointed out that risk assessment in OIDs is clearly an under-researched area, with very few studies found in this population (7 studies in Camilleri and Quinsey, 2011, 14 studies in Hounsome et al., 2018 and 24 studies in Pouls and Jeandarme, 2015) when compared with the extensive risk assessment literature available in general offender populations. This was even more so the case in sex offenders with ID (SOIDs).

Sexual violence risk assessment

Since Pouls and Jeandarme (2015) and Hounsome et al. (2018) reported predictive validity studies of four risk assessment instruments (Static-99/Static-99R/Static-2002/Static-2002R [Harris et al., 2003; Phenix et al., 2008; Phenix et al., 2016a; Phenix et al., 2016b], Rapid Risk Assessment for Sex Offense Recidivism [RRASOR; Hanson, 1997], Risk Matrix 2000 [Thornton et al., 2003] and SVR-20 [Boer et al., 1997]), only one recent article (related to Static-99R and Stable-2007) has been published (Delforterie et al., 2019). Results of studies in SOIDs were at best mixed and mostly involved the Static-99 and related versions (eight studies) and the RRASOR (four studies). Details of the Static-99(R) predictive validity studies are shown in Table 1 of this paper (cf. also Endnote 1). Outcome measures included both adjudicated sexual recidivism and broader assessments of non-adjudicated sexual incidents.

In Tough’s (2001) study, Static-99 scores failed to discriminate recidivists from non-recidivists (r = 0.08; cf. Endnote 2), nor did they correlate positively with recidivism. However, an extension of the follow-up period resulted in a change in d-value from 0.22 (small effect in Tough, 2001) to 0.57 (medium effect; Harris and Tough, 2004, cited in Hanson et al., 2013, p. 10; cf. Endnote 2). By contrast, significant levels of predictive validity were found in the studies of Lindsay et al. (2008) and Lofthouse et al. (2013), with an area under the curve (AUC) of 0.71 and 0.75, respectively. This again contrasts with the study of Wilcox et al. (2009), which failed to significantly predict sexual recidivism.

Stephens et al. (2017) took these studies further by examining the predictive accuracy of the Static-99R between five intelligence groups. For the group with intelligence quotient (IQ) < 80, Harrell’s C (Harrell et al., 1996; cf. Endnote 3) was 0.60 for sexual recidivism and 0.62 for violent recidivism. The instrument slightly under-predicted risk (expected–observed [E/O] index = 0.84), although the difference between the expected and observed number of recidivists was not statistically significant. Finally, Delforterie et al. (2019) have demonstrated the applicability of the Static-99R with ID samples by showing that people with sexual offence histories, both with and without ID or borderline intellectual functioning, did not differ on Static-99R total and item scores.

Individual studies on predictive validity thus show mixed findings, but in the meta-analysis of Hanson et al. (2013), moderate to large effects were found for both instruments. There was a trend towards the superiority of the Static-99(R) over the RRASOR, but significant differences could not be tested.

The inter-rater reliability was only tested in three studies (Table 1), but it was consistently proven to be nearly perfect according to Fleiss’ criteria (range = 0.96–0.97), although it should be noted that measures used to calculate inter-rater reliability differed.

Based on this literature, more research seems needed into the utility and predictive validity of static (and dynamic) risk assessment instruments for people with ID who sexually offend. The current study, therefore, aimed to validate two risk assessment instruments in SOIDs, the Static-99R for static risk factors and the ARMIDILO-S for dynamic risk factors, in a Flemish sample of SOID, hereby adding to the scarce amount of research. Secondly, most of the abovementioned studies in SOIDs only evaluated global accuracy (like the AUC). The current study also aimed to evaluate different dimensions of predictive validity by using multiple performance indicators. In this article, we present data on the Static-99R, whereas data on the ARMIDILO-S will be discussed in another article.

Method

Participants

The study was conducted in six specialised forensic psychiatric and prison units for (sex) offenders with ID or borderline intellectual functioning. At the time of the study, these were the only OID-specific projects in Flanders. All sex offenders within these units were asked to participate in the study, resulting in a sample of 50 male SOID or borderline intellectual functioning. In the case of seven patients, Static-99R scores could not be obtained, either due to a lack of a category A offence (two patients) or due to missing information (five patients). Because of the limited sample size, only five further patients with a follow-up of less than six months were excluded, leaving a total final sample of thirty-eight. Exploratory analyses involving the full sample (n = 43) and only those individuals with a follow-up of one year (n = 33) showed similar results (cf. Endnote 4).

The mean sample IQ score was 63 (standard deviation (SD) = 11.13, range = 45–85). A total of 3 patients had an IQ between 35 and 50, 21 between 50 and 70 and 12 between 70 and 85. Based on clinical Diagnostic and Statistical Manual of Mental Disorders diagnoses, 18 were classified as having a mild intellectual disability, six with a moderate intellectual disability and two with borderline intellectual functioning. A total of 12 patients did not receive an official diagnosis of intellectual disability but were nevertheless admitted to a unit for SOIDs and had IQs at or below 85 (range = 53–85). A paraphilic disorder was present in 27 members (more than two-thirds of the sample), a developmental disorder in eight members, a personality disorder in seven members and a substance abuse disorder in seven members. A total of 18 participants (almost half of the total) had more than one psychiatric diagnosis. All had committed a sexual offence (35 as an index offence and 18 as a prior offence). In 35 cases, these involved hands-on offences (in 29 cases against children, in two cases against adults and in four cases against both an adult and a child). Hands-off offences (e.g. possession of child pornography, indecent exposure) had occurred in 16 cases. The mean age of participants at assessment was 44 years, ranging from 24 years to 74 years, with an SD of 12.14 years. The group’s mean length of treatment and/or imprisonment was close to three years, with the SD approaching three years (two years and nine months).

Measures

The Static-99 (Hanson and Thornton, 1999) is an actuarial instrument designed to assess the likelihood of sexual and violent recidivism in sex offenders. More recently, revisions were carried out, resulting in the Static-99R with revised age weights (Helmus et al., 2012b; Phenix et al., 2016a). The Static-99 family is one of the best-known and widely-used groups of instruments used in structured risk assessment (Archer et al., 2006; Neal and Grisso, 2014). It has considerable research support, with meta-analyses showing moderate predictive accuracy with AUCs around 0.70 (Hanson and Morton-Bourgon, 2009; Singh et al., 2011; Helmus et al., 2012a; Helmus et al., 2021). Although offenders with low intellectual functioning/ID are sometimes included in the total sample, sample sizes are usually too small to infer strong statements (Harris et al., 2003; Hanson et al., 2015; Baudin et al., 2021). The Static-99R consists of 10 static items relating to demographic, offence and victim information. The total score is the sum of the item scores and varies from –3 to 12, further divided into five nominal risk categories: Level I – very low risk (scores of –3 to –2), Level II – below average risk (scores of –1 to 0), Level III – average risk (scores of 1–3), Level IVa – above average risk (scores of 4–5) and Level IVb – well above average risk (scores of 6 or above).

Procedure

The study was prospective in its design. The Dutch version of the Static-99R (Smid et al., 2014) was scored by a trained and experienced assessor (i.e. the first author). The scoring was based on judicial and clinical files. These files typically included criminal responsibility evaluations, criminal records, information concerning the index offence, prison reports, probation reports and treatment plans, although all of these files were not available in every single case. Second ratings were gathered for 20 participants by a bachelor student in applied psychology and a master student in criminology, both under the supervision of the first author. To enhance rater consistency, several files were scored before the effective data gathering.

Violent and sexual incidents within one year after the scoring were registered by the research team, based on observational notes by the nursing staff. Due to the prospective nature of the study, the start of the registration differed between the institutions, ranging from December 2016 to May 2020.

Ethical approval was obtained from the Ethics Committee of Antwerp University Hospital. Permission to conduct the study was also sought from the Belgian Federal Government of Justice. Furthermore, informed consent was obtained from each respondent and his guardian or treating physician.

Outcome measures

Predictive accuracy was assessed using three outcome measures: illegal sexual incidents, sexual incidents and violent incidents. A violent incident was defined as physical non-sexual violence against another person: e.g. uttering threats, grabbing by the throat, kicking, hitting, biting or throwing objects at a person with the purpose of inflicting pain. A sexual incident was defined as a broad category of sexually inappropriate behaviour, either legal or illegal, with a sexual motive: e.g. sexual assault, sexual touching, gross indecency, indecent exposure, grooming, expression of fetish, making sexual comments, making obscene gestures and loitering. The operationalization of illegal sexual incidents was stricter, including only illegal sexual behaviour.

Statistical analyses

Analyses were conducted using SPSS 22© (IBM Corp., 2013) and MedCalc (Garber, 1998). Inter-rater reliability (IRR) was evaluated using a two-way random intraclass correlation coefficient (ICC2,1) where 1 = absolute agreement and by Fleiss’s (1986) critical values for single measures. These are: ≥0.75 = excellent, ≥0.60 = good, ≥0.40 = moderate and < 0.40 = poor. Predictive validity was calculated using multiple accuracy estimates, both discriminative and calibratory. Discrimination refers to how well an instrument is able to separate those who went on to be violent from those who did not. Calibration refers to how well the predicted risk (expected recidivism) accords with the actual observed risk (Singh, 2013). The global effect size was calculated using the receiver operating characteristic analysis, a method of assessing the predictive accuracy of a model by plotting its sensitivity versus specificity over a complete range of diagnostic results (Fawcett, 2006). Corresponding AUC values indicate the probability of a randomly selected recidivist having a higher risk classification than a randomly selected non-recidivist. These were evaluated according to the classification by Rice and Harris (2005), where ≥0.56 = little effect, ≥0.64 = moderate effect and ≥0.71 = large effect.

Other parameters examined included: sensitivity (proportion of recidivists considered to be at high risk of reoffending), specificity (proportion of non-recidivists judged to be at low risk of reoffending), positive predictive value (PPV) (proportion of high-risk participants who reoffended); negative predictive value (NPV) (proportion of low-risk individuals who did not re-offend), number needed to detain (NND) (the number needing to be detained to prevent one incident or offence) and number safely discharged (NSD) (the number of participants judged to be at low risk which could be discharged prior to a single incident or offence). These performance indicators were analysed using the statistical program MedCalc, which inter alia calculates how accurately the Static-99R identified high-risk (“rule in” PPV and NND) and low-risk (“rule out” NPV and NSD) individuals. With its help, participants classified as being above or well above average risk of reoffending (Levels IVa and IVb) were compared with participants classified as very low, average or below average risk of reoffending (Levels I–III). The E/O index could not be completed because the follow-up period did not match that of the normative sample.

Results

The base rates were as follows: 19 participants were involved in a sexual incident (50%), 4 in an illegal sexual incident (10.5%) and 16 in a non-sexual violent incident (42.1%).

Mean total Static-99R score was 3.86 (SD = 2.68; median = 4; range = –2 to 9). Only one participant was classified at Level I (very low risk), 3 participants at Level II (below-average risk), 13 participants at Level III (average risk), 8 participants at Level IVa (above-average risk) and 13 participants at Level IVb (well above average risk). Table 2 shows in more detail the presence and nature of the risk factors involved.

Overall IRR (ICC2,1; cf. Fleiss, 1986) was 0.78 (CI = 0.53–0.91). IRR per item is presented in Table 3 and shows excellent agreement among raters for all items except item 5 (“Prior sex offences” – moderate IRR).

The Static-99R was significantly able to predict sexual incidents (AUC = 0.70), but not illegal sexual incidents or violent incidents. Of those individuals who were involved in sexual incidents (n = 19), 12 patients (63%) had been classified as being at high risk; that is, above or well above average risk of future sexual offending (sensitivity). Of those individuals who were not involved in sexual incidents (n = 19), 10 patients (53%) had been judged to be at low risk (specificity). Of those judged to be at above or well above average risk (n = 21), 12 patients (57%) had been involved in a sexual incident (PPV), equivalent to a median of two high-risk patients who need to be detained to prevent one sexual incident. Of those judged to be low risk (n = 17), 10 patients (59%) did not pose any problematic sexual behaviour (NPV), equivalent to a median of one low-risk patient who could be discharged prior to a single sexual incident occurring. The results were similar for the outcome measure regarding violent incidents. For illegal sexual incidents, PPV (14%) and NND (7 patients) were clearly worse, whereas NPV (94%) and NSD (16 patients) were also high. An overview of all performance indicators is shown in Table 4.

Discussion

The current study provides a comprehensive evaluation of the predictive accuracy of the Static-99R in sex offenders with ID or borderline intellectual functioning. Because offending behaviour is often seen as challenging behaviour, and therefore, not always reported or criminally sanctioned in OIDs (Hounsome et al., 2018), certainly not in a forensic institution, unofficial incidents could be a better outcome measure than official recidivism. Therefore, we used several outcome measures in our study. The Static-99R predicted intramural sexual incidents, but not illegal sexual incidents or violent incidents. According to the classifications of both Rice and Harris (2005) and Sjöstedt and Grann (2002), the Static showed a moderate effect in predicting sexual incidents. It is important to keep in mind that although an instrument has general discriminative accuracy, this does not necessarily mean that it is able to predict high-risk offenders. Calibration indicators can help understanding the practical impact of release decisions. However, there are currently no guidelines for the interpretation of PPV/NPV and NND/NSD, making this a rather moral judgement.

So far, only one study of (sex) offenders with ID calculated a calibration indicator, namely the E/O index where the (E)xpected number of recidivists is divided by the (O)bserved number of recidivists to create a ratio statistic (a score of 1 reflects perfect calibration; Stephens et al., 2017). Interestingly, in non-disabled offender groups, calibration indices are also rarely included. Limited findings show that instruments are generally able to detect low-risk offenders but not high-risk offenders (Singh et al., 2011; Fazel et al., 2012; Declue and Campbell, 2013). Although there are no cut-offs for determining whether PPV/NPV or NND/NSD is sufficient, the results for sexual or violent incidents in our study show that the Static-99R did not prospectively predict who is going to re-offend. If an offender is judged to be at high risk for reoffending, there is still a 50% chance that he will eventually not re-offend. The results were even worse for illegal sexual incidents. These results are in accordance with Singh et al. (2011), who also reported a marked difference between PPV (0.18–0.33) and NPV (0.82–0.95) for the Static-99. When the base rate was low (i.e. illegal sexual incidents), the instrument was good at detecting low-risk individuals.

Overall, the clinical applicability in the current sample seems limited as the Static-99R does not seem to prospectively predict who was going to re-offend, with high numbers of both false positives and false negatives. Overestimation of the risk can have significantly damaging effects on the individual in terms of loss of personal liberty and restricted community access, whereas underestimation poses a risk to society. Overestimation could be due to the fact that sex offenders in the current sample were living in a secure setting for a long period averaging three years, and so were strictly monitored with frequent use of risk assessments. They, therefore, had few opportunities to display illegal sexual behaviour.

Another hypothesis for the poor results in our study relates to the relatively short follow-up period. The Static-99 and related instruments were developed to predict long-term recidivism during the time at risk, whereas the current study analysed intramural incidents during a minimum period of six months. However, as shown in the studies of Lindsay et al. (2008) and Lofthouse et al. (2013), the usage of incident data during a short follow-up did result in significant predictions in SOIDs.

Following the critical values of Fleiss (1986), the inter-rater reliability (for the total score and almost all items) was excellent (ICC = 0.84–0.95), although smaller compared with previous studies in non-ID sex offender populations (Phenix et al., 2016a) and ID offender populations (Stephens et al., 2017). This could be due to the fact that students acted as second rater. They had no formal training in scoring the Static-99R (or any other risk assessment instrument) and had no experience besides the cases scored prior to the beginning of the study. At the item level the IRR of item 5 (prior sex offences) was low and in accordance with another Belgian study where information concerning previous charges and convictions was inconsistently recorded in the hospital records (Ducro and Pham, 2006). Although the static seems to be a clear-cut, easy-to-score instrument, these results underscore the importance of adequate file information and formal risk assessment training.

The mean score of 3.9 in our study was somewhat higher than the mean score of 2.3 in the normative sample (Phenix et al., 2016b). This could be in line with the assumption that some characteristics (i.e. risk factors) are more prevalent in SOIDs, resulting in an elevated risk score and a reduction in predictive validity (Wilcox et al., 2009). However, descriptive analyses indicated this is most likely not the case. For example, almost half of the sample experienced a live-in relationship with an intimate partner. Furthermore, the amount of offences against male victims is in line with that of non-ID sex offenders in the study of Rice et al. (2008), altough there was no comparison group in the current study. There thus seems no overrepresentation of offences against male victims as suggested by some research (Murrey et al., 1992; Rice et al., 2008). Furthermore, the Static-99R showed predictive accuracy in predicting sexual incidents, which is evidence against this assumption. The lack of a control sample of non-ID sex offenders, however, prevents strong conclusions. Bolt et al. (2018) conducted a comparative study between SOIDs and sex offenders without ID and found no differences in the Static-99R risk factors or total score.

Limitations

The small sample size of the present study is a significant limiting factor, although all people with ID or borderline intellectual functioning and sexual offence histories in OID-specific projects in Flanders were included. The problem of low sample sizes is also addressed in previous studies with risk assessment instruments in SOIDs. Of the abovementioned research, only the studies of Lindsay et al. (2008) and Stephens et al. (2017) had a large sample (212 and 454, respectively). Other studies report sample sizes ranging from 27 to 76 (Tough, 2001; Wilcox et al., 2009; Lofthouse et al., 2013; Delforterie et al., 2019).

Further, because calibration indicators are base rate-dependent and vary depending on the population, time at risk and outcome of interest, these results cannot be extrapolated to populations different from the current sample or methodologically different studies.

Thirdly, although our study was prospective in nature, we did not use a standardized registration form such as the Staff Observation Aggression Scale – Revised (Nijman et al., 1999) for two reasons. Firstly, the possibility of registering sexual incidents using these standardized forms is often limited or even non-existent. Secondly, the reliability of incident registration is at risk when different people from six institutions are involved in the scoring process. However, when using a non-official outcome measure in contrast to, for example, reconviction data, rater subjectivity in determining what is appropriate sexual behaviour cannot be ruled out. Moreover, we were unaware of possible formal allegations or convictions. Because one researcher recorded all incidents – based on the observational data – consistency in scoring is however guaranteed. Underreporting might exist, namely, regarding potential incidents off-campus. Because unsupervised leave was limited for most patients, the risk of underreporting is possibly negligible.

Future research

Following the risk principle of the risk-need-responsivity model (Andrews and Bonta, 2010), risk assessment based on static risk factors provides a baseline risk from which the intensity of treatment and supervision can be determined. The need principle implies assessing and addressing dynamic risk factors (criminogenic needs) through treatment and/or other interventions. Future research should, therefore, also focus on the assessment of dynamic risk factors. As part of the current study, the psychometric properties of the ARMIDILO-S will be evaluated. This is a structured professional judgment instrument designed to assess the dynamic risk factors in SOIDs and is advised to be used in conjunction with the Static-99(R).

Secondly, this study included mainly offenders in the low to borderline intelligence range, as is the case in most of the work in this field (Pouls and Jeandarme, 2015). Future research should look at predictive accuracy depending on the severity of the intellectual disability. The sample of the current study was too small to make these comparisons.

Thirdly, still more research is needed about risk assessment in OIDs, preferably with larger samples than the ones used in most studies so far and by using multiple performance indicators.

Conclusion

The current study showed mixed results for the prediction of sexual or violent incidents, possibly because of the limited sample size. Overall, the Static-99R seems more appropriate in determining low-risk individuals. The empirical evidence for risk assessment in (S)OIDs is still slowly progressing, in contrast to research in mainstream offender populations. Further research in a larger sample with distinct ID groups is therefore recommended.

Endnotes

[1] Unpublished raw data from Harris and Tough (2004, as cited in Hanson et al., 2013) and McGrath et al. (2012) as cited in Hanson et al. (2013) were not accessible for the researchers.

[2] Pearson’s correlation coefficient r and Cohen’s d are both measures of effect size, measuring the magnitude of an observed effect in a standardized way. Pearson’s correlation coefficient r measures the strength of the relationship between two variables. Cohen’s d makes a comparison between two means. A Cohen’s d of 1 indicates the two groups differ by 1 standard deviation, a Cohen’s d of 2 indicates they differ by 2 standard deviations and so on. A Cohen’s d = 0.2 is commonly considered a “small” effect size, 0.5 a “medium” effect size and 0.8 a “large” effect size (Cohen, 1988; https://www.socscistatistics.com/effectsize/default3.aspx).

[3] This corresponds to an AUC of respectively, 0.56 and 0.66 (Rice and Harris, 2005).

[4] Harrell’s C is a measure of predictive discrimination that represents the number of individuals for whom the prediction and result are concordant. When an outcome is dichotomous (e.g. recidivism coded as “yes” or “no”), its interpretation is like the AUC statistic (Harrell et al., 1996).

[5] Data available on request.

Predictive accuracy (AUC) of the Static-99(R) in offenders with intellectual disabilities

Study IQ range N Follow-up
(years)
IRR AUC sexual incidents/sexual recidivism
Static-99
Tough (2001) 76 0.2–19.2 0.54a
Harris and Tough (2004), as cited in Hanson et al. (2013)° 81 7.7 0.66 b,e
Lindsay et al. (2008) 43–89 212 1 0.97 0.71*
Wilcox et al. (2009)° <80 27 6.3 0.64
Lofthouse et al. (2013)° 54–75 64 6 0.96c 0.75*
Hanson et al. (2013)° 52 7.8 0.75*
Static-99R
Hanson et al. (2013)° 52 7.8 0.79*
Stephens et al. (2017) <80 78 10.5 0.96 0.60 d,e
McGrath et al. (2012), as cited in Hanson et al. (2013)° 14 5 0.64f
Notes:

aEquivalent of r = 0.08 using the conversion table of Rice and Harris (2005).

bEquivalent of d = 0.57 using the conversion table of Rice and Harris (2005).

cCalculated using r.

dHarrell’s C2.

ep value not reported.

fEquivalent of d = 0.53 using the conversion table of Rice and Harris (2005).

°Included in the meta-analysis of Hanson et al. (2013).

p < 0.05

Presence of the different risk factors

Item static-99R Frequency Percentage
1 Age at release (<35 years) 11 28.9
2 Never lived with an intimate partner 20 52.6
3 Index non-sexual violence 10 26.3
4 Prior non-sexual violence 9 23.7
5 Prior sex offences (+6 charges or +4 convictions) 9 23.7
6 Prior sentencing dates 8 21.1
7 Any convictions for non-contact sex offences 19 50.0
8 Any unrelated victims 29 76.3
9 Any stranger victims 17 44.7
10 Any male victims 11 28.9

Inter-rater reliability (ICC) on item level

Item static-99R ICC
1 Age at release 0.99
2 Ever lived with an intimate partner 0.80
3 Index non-sexual violence 1.00
4 Prior non-sexual violence 0.80
5 Prior sex offences 0.44
6 Prior sentencing dates 0.78
7 Any convictions for non-contact sex offences 0.81
8 Any unrelated victims 1.00
9 Any stranger victims 0.80
10 Any male victims 0.86

Performance indicators of the Static-99R for the prediction of sexual and (non-sexual) violent incidents

Performance indicator Sexual incidents Illegal sexual incidents Violent incidents
AUC (95% CI) 0.70* (0.52–0.87) 0.70 (0.46–0.93) 0.62 (0.44–0.80)
Sensitivity (95% CI) 63.16% (38.36–83.71) 75.00% (19.41–99.37) 62.50% (35.43–84.80)
Specificity (95% CI) 52.63% (28.86–75.55) 47.06% (29.78–64.87) 50.00% (28.22–71.78)
PPV (95% CI) 57.14% (42.61–70.54) 14.25% (7.99–24.12) 47.61% (34.07–61.51)
NPV (95% CI) 58.82% (40.85–74.72) 94.13% (73.90–98.91) 64.71% (46.21–79.65)
NND 2 7 2
NSD 1 16 2
Note:

* p < 0.05

References

Andrews, D.A. and Bonta, J. (2010), “Rehabilitating criminal justice policy and practice”, Psychology, Public Policy and Law, Vol. 16 No. 1, pp. 39-55.

Archer, R.P., Buffington-Vollum, J.K., Stredny, R.V. and Handel, R.W. (2006), “A survey of psychological test use patterns among forensic psychologists”, Journal of Personality Assessment, Vol. 87 No. 1, pp. 84-94.

Baudin, C., Nilsson, T., Sturup, J., Wallinius, M. and Andiné, P. (2021), “A static-99R validation study on individuals with mental disorders: 5 to 20 years of fixed follow-up after sexual offenses”, Frontiers in Psychology, Vol. 12, p. 140, doi: 10.3389/fpsyg.2021.625996.

Boer, D.P., Tough, S. and Haaven, J. (2004), “Assessment of risk manageability of intellectually disabled sex offenders”, Journal of Applied Research in Intellectual Disabilities, Vol. 17 No. 4, pp. 275-283.

Boer, D.P., McVilly, K.R. and Lambrick, F. (2007), “Contextualizing risk in the assessment of intellectually disabled individuals”, Sexual Offender Treatment, Vol. 2 No. 2, pp. 1-4.

Boer, D.P., Hart, S.D., Kropp, P.R. and Webster, C.D. (1997), Manual for the Sexual Violence Risk-20: Professional Guidelines for Assessing Risk of Sexual Violence, The Mental Health, Law, & Policy Institute, Vancouver.

Boer, D.P., Haaven, J., Lambrick, F., Lindsay, W.R., McVilly, K.R. and Sakdalan Frize, M.C.J. (2013), “ARMIDILO-S: the assessment of risk and manageability of individuals with developmental and intellectual limitations who offend sexually”, available at: www.armadilo.net/files (accessed 4 October 2021).

Bolt, B., Berg, J.W., Delforterie, M., Hazel, T. and Didden, R. (2018), “Verschil moet er zijn? Een vergelijkend onderzoek naar risicofactoren voor recidive bij seksueel delinquenten met een licht verstandelijke beperking [Different or the same? Comparing scores on STATIC-99R and STABLE-2007 between sexual offenders with and without an intellectual disability]”, Tijdschrift Voor Seksuologie [Journal of Sexology], Vol. 42 No. 3, available at: file:///C:/Users/Pouls%20Claudia/Downloads/Bolt_e.a._Verschil_moet_er_zijn._Een_vergelijkend_onderzoek_naar_risicofactoren_voor_recidive_bij_seksueel_delinquenten_met_een_licht_verstandelijke_beperking%20(1).pdf (accessed 17 August 2021).

Camilleri, J.A. and Quinsey, V.L. (2011), “Appraising the risk of sexual and violent recidivism among intellectually disabled offenders”, Psychology, Crime & Law, Vol. 17 No. 1, pp. 59-74.

Cohen, J. (1988), Statistical Power Analysis for the Behavioral Sciences, 2nd ed., L. Erlbaum Associates, Hillsdale, NJ.

de Vogel, V., de Ruiter, C., Bouman, Y. and de Vries Robbé, M. (2007), SAPROF. Richtlijnen Voor Het Beoordelen Van Beschermende Factoren Voor Gewelddadig Gedrag [SAPROF. Guidelines for the Assessment of Protective Factors for Violence Risk], Forum Educatief, Utrecht.

Declue, G. and Campbell, T. (2013), “Calibration performance indicators of the static-99R: 2013 update”, Open Acccess Journal of Forensic Psychology, Vol. 5, pp. 82-88.

Delforterie, M., van den Berg, J.W., Bolt, B., van den Hazel, T., Craig, L. and Didden, R. (2019), “Comparing STATIC-99R and STABLE-2007 between persons with and without intellectual disabilities”, Journal of Intellectual Disabilities and Offending Behaviour, Vol. 10 No. 3, pp. 58-68.

Ducro, C. and Pham, T. (2006), “Evaluation of the SORAG and the static-99 on Belgian sex offenders committed to a forensic facility”, Sexual Abuse, Vol. 18 No. 1, pp. 15-26.

Fawcett, T. (2006), “An introduction to ROC analysis”, Pattern Recognition Letters, Vol. 27 No. 8, pp. 861-874.

Fazel, S., Singh, J.P., Doll, H. and Grann, M. (2012), “Use of risk assessment instruments to predict violence and antisocial behaviour in 73 samples involving 24 827 people: systematic review and Meta-analysis”, BMJ, Vol. 345, doi: 10.1136/bmj.e4692, available at: www.bmj.com/content/bmj/345/bmj.e4692.full.pdf (accessed 6 June 2021).

Fleiss, J.L. (1986), The Design and Analysis of Clinical Experiments, John Wiley, New York, NY.

Garber, C. (1998), “MedCalc software for statistics in medicine”, Clinical Chemistry, Vol. 44 No. 6, pp. 1370-1370.

Hanson, R.K. (1997), The Development of a Brief Actuarial Risk Scale for Sexual Offense Recidivism, Department of the Solicitor General of Canada, Ottawa, ON.

Hanson, R.K. and Thornton, D. (1999), Static-99: Improving Actuarial Assessments for Sex Offenders, Department of the Solicitor General of Canada, Ottowa, ON.

Hanson, R.K. and Morton-Bourgon, K.E. (2009), “The accuracy of recidivism risk assessments for sexual offenders: a meta-analysis of 118 prediction studies”, Psychological Assessment, Vol. 21 No. 1, pp. 1-21.

Hanson, R.K., Sheahan, C.L. and VanZuylen, H. (2013), “STATIC-99 and RRASOR predict recidivism among developmentally delayed sexual offenders: a cumulative meta-analysis”, Sexual Offender Treatment, Vol. 8 No. 1, pp. 1-14.

Hanson, R.K., Helmus, L.-M. and Harris, A.J.R. (2015), “Assessing the risk and needs of supervised sexual offenders: a prospective study using STABLE-2007”, Criminal Justice and Behavior, Vol. 42 No. 12, pp. 1205-1224.

Hare, R.D. (1991), The Hare Psychopathy Checklist-Revised (PCL-R), Multi-Health Systems, Toronto, ON.

Hare, R.D. (2003), Manual for the Revised Psychopathy Checklist, 2nd ed., Multi-Health Systems, Toronto, ON.

Harrell, F.E. Jr., Lee, K.L. and Mark, D.B. (1996), “Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy and measuring and reducing errors”, Statistics in Medicine, Vol. 15 No. 4, pp. 361-387.

Harris, A.J.R. and Tough, S.E. (2004), “Recidivism follow-up of 81 sexual offenders with developmental delays from york Central hospital”, Unpublished raw data.

Harris, A., Phenix, A., Hanson, R.K. and Thornton, D. (2003), Static-99 Coding Rules Revised - 2003, Department of the Solicitor General of Canada, Ottawa.

Hart, S.D., Cox, D.N. and Hare, R.D. (1995), The Hare Psychopathy Checklist: Screening Version, Multi-Health Systems, Toronto, ON.

Helmus, L.M., Hanson, R.K., Murrie, D.C. and Zabarauckas, C.L. (2021), “Field validity of static-99R and STABLE-2007 with 4,433 men serving sentences for sexual offences in British Columbia: new findings and Meta-analysis”, Psychological Assessment, Vol. 33 No. 7, pp. 581-595.

Helmus, L., Hanson, R.K., Thornton, D., Babchishin, K.M. and Harris, A.J.R. (2012a), “Absolute recidivism rates predicted by static-99R and static-2002R sex offender risk assessment tools vary across samples: a Meta-analysis”, Criminal Justice and Behavior, Vol. 39 No. 9, pp. 1148-1171.

Helmus, L., Thornton, D., Hanson, R.K. and Babchishin, K.M. (2012b), “Improving the predictive accuracy of static-99 and static-2002 with older sex offenders: revised age weights”, Sexual Abuse, Vol. 24 No. 1, pp. 64-101.

Hounsome, J., Whittington, R., Brown, A., Greenhill, B. and McGuire, J. (2018), “The structured assessment of violence risk in adults with intellectual disability: a systematic review”, Journal of Applied Research in Intellectual Disabilities, Vol. 31 No. 1, pp. e1-e17, doi: 10.1111/jar.12295.

IBM Corp (2013), IBM SPSS Statistics for Windows, Version 22.0, IBM Corp., Armonk, NY.

Lindsay, W.R., Hogue, T.E., Taylor, J.L., Steptoe, L., Mooney, P., O'Brien, G., Johnston, S. and Smith, A.H. (2008), “Risk assessment in offenders with intellectual disability: a comparison across three levels of security”, International Journal of Offender Therapy and Comparative Criminology, Vol. 52 No. 1, pp. 90-111.

Lofthouse, R.E., Lindsay, W.R., Totsika, V., Hastings, R.P., Boer, D.P. and Haaven, J.L. (2013), “Prospective dynamic assessment of risk of sexual reoffending in individuals with an intellectual disability and a history of sexual offending behaviour”, Journal of Applied Research in Intellectual Disabilities, Vol. 26 No. 5, pp. 394-403.

Lofthouse, R., Golding, L., Totsika, V., Hastings, R. and Lindsay, W. (2017), “How effective are risk assessments/measures for predicting future aggressive behaviour in adults with intellectual disabilities (ID): a systematic review and Meta-analysis”, Clinical Psychology Review, Vol. 58, pp. 76-85.

McGrath, R.J., Lasher, M.P. and Cumming, G.F. (2012), “Static-99R scores of 14 developmentally delayed sexual offenders from Vermont who were either recidivists or non-recidivists”, Unpublished raw data.

Murrey, G.J., Briggs, D. and Davis, C. (1992), “Psychopathic disordered, mentally ill and mentally handicapped sex offenders: a comparative study”, Medicine, Science and the Law, Vol. 32 No. 4, pp. 331-336.

Neal, T.M.S. and Grisso, T. (2014), “Assessment practices and expert judgment methods in forensic psychology and psychiatry: an international snapshot”, Criminal Justice and Behavior, Vol. 41 No. 12, pp. 1406-1421.

Nijman, H.L.I., Muris, P., Merckelbach, H.L.G.J., Palmstierna, T., Wistedt, B., Vos, A.M., van Rixtel, A. and Allertz, W. (1999), “The staff observation aggression scale–revised (SOAS-R)”, Aggressive Behavior, Vol. 25 No. 3, pp. 197-209.

Phenix, A., Doren, D., Helmus, L., Hanson, R.K. and Thornton, D. (2008), “Coding rules for static-2002”, available at: www.static99.org/pdfdocs/static2002codingrules.pdf (accessed 19 October 2021).

Phenix, A., Fernandez, Y., Harris, A.J.R., Helmus, M., Hanson, R.K. and Thornton, D. (2016a), “Static-99R coding rules revised — 2016”, available at: www.static99.org/pdfdocs/Coding_manual_2016_v2.pdf (accessed 9 June 2021).

Phenix, A., Helmus, L.M. and Hanson, R.K. (2016b), “Static-99R & static-20002R evaluators' workbook”, available at: www.static99.org/pdfdocs/Evaluators_Workbook_2016-10-19.pdf (accessed 9 June 2021).

Pouls, C. and Jeandarme, I. (2015), “Risk assessment and risk management in offenders with intellectual disabilities: are we there yet?”, Journal of Mental Health Research in Intellectual Disabilities, Vol. 8 No. 3/4, pp. 213-236.

Quinsey, V.L., Harris, G.T., Rice, M.E. and Cormier, C.A. (2006), Violent Offenders: Appraising and Managing Risk, American Psychological Association, Washington, DC.

Rice, M.E. and Harris, G.T. (2005), “Comparing effect sizes in follow-up studies: ROC area, Cohen's d and r”, Law and Human Behavior, Vol. 29 No. 5, pp. 615-620.

Rice, M.E., Harris, G.T., Lang, C. and Chaplin, T.C. (2008), “Sexual preferences and recidivism of sex offenders with mental retardation”, Sexual Abuse, Vol. 20 No. 4, pp. 409-425.

Singh, J.P. (2013), “Predictive validity performance indicators in violence risk assessment: a methodological primer”, Behavioral Sciences & the Law, Vol. 31 No. 1, pp. 8-22.

Singh, J.P., Grann, M. and Fazel, S. (2011), “A comparative study of violence risk assessment tools: a systematic review and meta regression analysis of 68 studies involving 25,980 participants”, Clinical Psychology Review, Vol. 31 No. 3, pp. 499-513.

Sjöstedt, G. and Grann, M. (2002), “Risk assessment: what is being predicted by actuarial prediction instruments?”, The International Journal of Forensic Mental Health, Vol. 1 No. 2, pp. 179-183.

Smid, W., Koch, M. and van den Berg, J.W. (2014), “STATIC-99R scorehandleiding [static-99R scoring manual]”, De Forensische Zorgspecialisten [The Forensic Care Specialists], Utrecht.

Stephens, S., Newman, J., Cantor, J. and Seto, M. (2017), “The static-99R predicts sexual and violent recidivism for individuals with low intellectual functioning”, Journal of Sexual Aggression, Vol. 24 No. 1, pp. 1-11.

Thornton, D., Mann, R., Webster, S., Blud, L., Travers, R., Friendship, C. and Erikson, M. (2003), “Distinguishing and combining risks for sexual and violent recidivism”, Annals of the New York Academy of Sciences, Vol. 989 No. 1, pp. 225-235.

Tough, S. (2001), “Validation of two standard assessments (RRASOR, 1997; STATIC-99, 1999) on a sample of adult males who are intellectually disabled with significant cognitive deficits”, Master's Thesis, University of Toronto, Toronto, ON.

Verbrugge, H.M., Goodman-Delahunty, J. and Frize, M.C.J. (2011), “Risk assessment in intellectually disabled offenders: validation of the suggested ID supplement to the HCR-20”, International Journal of Forensic Mental Health, Vol. 10 No. 2, pp. 83-91.

Webster, C.D., Douglas, K.S., Eaves, D. and Hart, S.D. (1997), HCR-20: Assessing Risk for Violence (Version 2), Simon Fraser University and Forensic Psychiatric Services Commission of British Columbia, Burnaby, BC.

Wilcox, D.T., Beech, A., Markall, H.F. and Blacker, J. (2009), “Actuarial risk assessment and recidivism in a sample of UK intellectually disabled sexual offenders”, Journal of Sexual Aggression, Vol. 15 No. 1, pp. 97-106.

Acknowledgements

We would like to thank the participating clients and institutions: A.B.A.G.G. (’t Zwart Goor), Amanis (‘t Zwart Goor), Itinera (Sint-Idesbald), Limes (Sint-Ferdinand), Ontgrendeld (OBRA), KFP (APZ Sint-Lucia), and Forensische Zorg 4 (OPZC Rekem). Furthermore, we would like to thank the Federal Government of Justice.

Funding: This project was funded by the Public Psychiatric Care Centre Rekem (OPZC Rekem).

Corresponding author

Claudia Pouls can be contacted at: claudia.pouls@opzcrekem.be

Related articles