Modeling behavioral patterns of family violence aggressors

Background setting The presumption that family violence will repeat and escalate is embedded in practices including risk assessment and case management. However, there is limited evidence that further episodes are inevitable, or that subsequent episodes will increase in severity. Therefore, we need to better understand temporal patterns in aggressor behavior to inform how risk is conceptualized in practice. Methods For a sample of 2115 family violence aggressors who came to police attention in Integrated Safety Response catchment areas in Aotearoa New Zealand, we collected information New Zealand Police routinely recorded about reported harm between 2018 and 2020. We used a hidden Markov model to estimate the latent (i.e., unmeasurable) states behind the information reported to police, and modeled aggressors’ movement between those states over time. Results We identified three latent states. The first contained low or no reported harm, the second contained low probabilities of reported harm, and the third involved a high probability of reported verbal abuse and a moderate probability of reported physical violence. We identified four pathways through the latent states over the two-year follow-up period, which we called No reported harm , High reported harm , Low reported harm , and De-escalation . Conclusions The findings add to the body of research indicating that family violence aggressors do not inevitably repeat or escalate their harmful behavior, and that a small subset of cases account for a large proportion of reported harm. This study demonstrates how information that police routinely collect can be used to estimate aggressors’ latent behavioral states and model pathways communicating the probability that they will continue to come to police attention for family violence, contributing to improved risk assessment and practice.


Introduction
Legislation in Aotearoa New Zealand1 defines family violence (FV) as physical, sexual, or psychological violence inflicted against a person 'whom [the aggressor] is, or has been, in a family relationship with' (Family Violence Act, 2018, § 9), broadly interpreted with reference to the Māori2 concept of whānau. 3In contrast, research on FV overwhelmingly focuses on intimate partner violence (IPV), or on one type of FV in isolation (e.g., child maltreatment).Yet, when one type of FV occurs, it is often accompanied by others (Chan et al., 2021;Dixon et al., 2007) and when police respond to FV calls for service, they are often faced with multiple, complex relationships between family members (McEwan et al., 2018;Saxton et al., 2022).Police officers arriving at FV episodes cannot simply examine types of FV in isolation; they are required to quickly respond to harm occurring in the context of complex overlapping relationships between family members (McEwan et al., 2018;Saxton et al., 2022).Hence, there is a need for practice-relevant research encompassing all types of FV to inform judgements made on the front line (Dixon & Browne, 2003;McEwan et al., 2018).
The broad conceptualization of FV in New Zealand all forms of harm in any kind of family-like or whānau relationship (Family Violence Act, 2018, § 9) gives a large remit for what is reported to police.Indeed, police recorded 177,452 family harm investigations between June 2022-June 2023 (with this figure being 49% larger than in 2017; New Zealand Police, 2023).Beyond the initial police response, the multi-agency Integrated Safety Response (ISR) brings together representatives from governmental organizations such as Police, Ministry of Health, Oranga Tamariki (Ministry for Children), and Ara Poutama Aotearoa (Department of Corrections) and non-governmental mainstream and kaupapa Māori organizations (e.g., Tuu Oho Mai Services), to respond to families with episodes reported to police in the Waikato and Canterbury areas.All families that come to police attention in the ISR's catchment areas undergo multi-agency triaging and risk assessment, and the case management response that each family receives differs according to the risk category and needs that triage team members identify ([authors]).Low-risk cases usually receive a follow-up phone call from a single organization, whereas higher risk cases are offered a multi-agency or 'wrap-around' response that could include the provision of interventions for victims, aggressors, or the entire whānau (Mossman et al., 2017).Some of these interventions are provided by specialist kaupapa Māori practitioners; the ISR is committed to improving cultural responsiveness to help address the overrepresentation of Māori experiencing FV in New Zealand (Ministry of Justice, 2021, 2022;Mossman et al., 2017;New Zealand Government, 2022).
People have long presumed that without intervention from external parties (e.g., law enforcement) FV aggressors will repeat and escalate their use of harmful acts (Bland & Ariel, 2015).The concept of escalation can refer to an increase in either severity or frequency (Barnham et al., 2017;Bland & Ariel, 2015), but is often understood as increases in severity of harm (e.g., a progression from verbal abuse to physical violence).Retrospective analysis of the histories of people involved in FV-related homicides adds support for the idea that FV may escalate, and sometimes to lethality (NZFVDRC, 2021a(NZFVDRC, , 2021b)).Sometimes, the perpetrators or victims of these homicides had previous contact with agencies, leading to questions about whether those agencies could have detected the likelihood of escalation and intervened (NZFVDRC, 2021a).Consequently, there has been a widespread push for law enforcement and other agencies to identify risk of continued harm or escalation and mitigate that risk to prevent further homicides (Henning et al., 2021).Indeed, the idea that agencies can detect and intervene to prevent escalation of FV is part of public discourse.For example, the ISR is credited by news media with saving lives through its risk assessment and case management practices (e.g., Ensor & Cooke, 2019).
However, there is limited evidence that continuation and escalation of FV are inevitable.Longitudinal research examining patterns of FV has highlighted that the severity and frequency of harmful acts vary both within and across cases over time (Barnham et al., 2017;Bland & Ariel, 2015).Some cases may be characterized by a single highly harmful event, while others show diverse patterns in both the frequency and degree of harm associated with the case.For example, Piquero and colleagues (2006) examined rearrest data from the United States Spouse Assault Replication Program and found IPV aggressors showed heterogeneous behavioral patterns, escalating or de-escalating their harmful acts, or both, within short follow-up periods.Jones and colleagues (2010) examined information victims reported about the harmful behaviors of IPV aggressors referred to non-violence programs in four United States cities, and used a hidden Markov model to estimate the latent (i.e., unmeasurable) states that gave rise to aggressor behavior (Jones et al., 2010, p. 6).Then, Jones and colleagues (2010) examined how aggressors moved between the latent states over time, finding the most common trajectory involved aggressors remaining in the least severe state over time, while others escalated and de-escalated between states, and only 2% persistently remained in the most severe state.
Taken together, longitudinal research predominantly based on calls for police service suggests the most common FV patterns involve repeated low-level harm or no further reported episodes (Barnham et al., 2017;Bland & Ariel, 2015;Heckert & Gondolf, 2005;Jones et al., 2010;Piquero et al., 2006;Swartout et al., 2012).But, at the same time, in a small proportion of cases, the frequency or severity of harm escalates (Bland & Ariel, 2015): accounting for a disproportionate amount of harm overall (e.g., 3% accounting for 90% of harm; Barnham et al., 2017; 1% accounting for 62% of harm; Mossman et al., 2017).Acknowledging this fact, researchers have developed targeted instruments for identifying the highest risk aggressors (e.g., Robinson & Clancy, 2020).To support these efforts, we need to understand temporal patterns in aggressor behavior; such research may inform how FV risk is conceptualized in practice.
The hidden Markov modeling approach employed by Jones and colleagues (2010) has several features that make it an appropriate method for describing temporal processes with noisy data collected over time (Glennie et al., 2022;Zhang et al., 2010;Zucchini et al., 2016), such as the behavior of FV aggressors.First, it models latent states representing the unobserved behaviors (i.e., FV) associated with observed outcomes (i.e., reported harm).These latent states can be thought of as analogous to the different levels of harm or risk that practitioners consider when responding to FV (Hanson et al., 2017;Jones et al., 2010).Second, each latent state has a probability distribution for those observed outcomes, delineating the connection between behavior and information at hand (Zucchini et al., 2016).Third, the Markov property allows aggressors to transition from their present state to any other state, regardless their past states.This makes hidden Markov models more suited than alternative methods like linear regression, time series, or growth curve models for capturing behavioral patterns that could include periods of escalation, de-escalation, or both (Jones et al., 2010;Zucchini et al., 2016).And finally, hidden Markov models are specifically designed for sequential data and are relatively robust to missing information (Glennie et al., 2022;Zucchini et al., 2016), characteristic features of archival data derived from police reports.

This study
In this study, we sought to ameliorate some of the limitations present in previous research about temporal patterns in the behavior of FV aggressors and produce findings with relevance to FV practitioners in New Zealand.First, previous research predominantly focuses on IPV (e.g., Jones et al., 2010;Piquero et al., 2006), or examines discrete types of FV in isolation.There is a need for practice-relevant research that includes all the overlapping types of FV to inform judgements made on the front line (Dixon & Browne, 2003;McEwan et al., 2018).Second, research tends to focus on physical violence or criminal offending (Heckert & Gondolf, 2005).Yet, only a third of FV investigations New Zealand Police conduct involve behaviors that can be classified as criminal offenses (e.g., physical harm or property damage; Crimes Act, 1961) 4 and even fewer result in an arrest (New Zealand Police, 2022).More than half of episodes reported to police in other jurisdictions also involve forms of antisocial behavior that are not criminalized (e.g., verbal abuse), but which nevertheless cause long term cumulative consequences (Ansara & Hindin, 2011;Dichter et al., 2018;Wiener, 2017).
Therefore, based on the work of Jones and colleagues (2010), in this study we generate a hidden Markov model from the harm-related information that New Zealand Police routinely collect.The sample includes 2115 FV aggressors with FV episodes reported to police in ISR catchment areas; hence, the findings may be specific to the ISR's multi-agency approach.First, we estimate latent (i.e., unmeasurable) states describing the probability of different types of harm being reported while aggressors are in that state and examine the demographic and risk characteristics of aggressors in each initial state.We then model the aggressors' movements between latent states at three-month intervals during a two-year period.Finally, we identify common patterns in aggressor behavior and examine the demographic and risk characteristics of aggressors with similar pathways to explore whether the findings can inform how FV risk is conceptualized in practice.

Research questions
1. Can we estimate latent states from information reported to police about FV aggressors' harmful behaviors?2. To what extent are there common patterns in FV aggressors' movements between latent states over two years?
4 Although verbal and psychological harm are included in the definition of family violence in the Family Violence Act (2018), at present there are no corresponding charges in the Crimes Act (1961) that can be used to arrest aggressors for those behaviors.

Data source
The data for this study came from a larger project on FV in New Zealand [authors].The index sample (i.e., the starting point for modelling) contained all FV episodes reported to New Zealand Police in the Integrated Safety Response (ISR) catchment areas of Waikato and parts of Canterbury between 1 November and 9 December 2018 (N = 2115).The data were collected from the ISR's Family Safety System (FSS) database.

Archival information Police reports
Police reports included the demographic characteristics of people in the report and descriptive variables about the episode itself (see Table 1).Five dichotomous indicators for the presence of harm were available in FSS: verbal abuse (present in 92.2% of index episodes), physical harm (30.9%), sexual harm (0.3%), threats of harm (13.8%), and property damage (14.2%).We used these harm indicators to create the hidden Markov model.FSS did not contain indicators for other forms of harm (e.g., psychological harm or coercive control).

Static assessment of family violence recidivism
Police reports in the FSS also included risk categories from New Zealand Police's two bespoke risk assessment instruments (Bissielo & Knight, 2016;New Zealand Police, n.d.).The Static Assessment of Family Violence Recidivism (SAFVR) is an actuarial instrument designed to predict the likelihood an aggressor will be convicted of a FV-related offense within the next two years (Bissielo & Knight, 2016).The SAFVR contains eight static variables about the aggressor: number of prior FV episodes, age at first FV offense, gender, presence of a prior prison sentence of 30 days or more, presence of a prior conviction, number of prior offenses of any type, number of prior breaches of criminal or family court conditions, and the presence of a FV episode in the past year (Bissielo & Knight, 2016;New Zealand Police, n.d.).These variables come from information held by New Zealand Police and the Ministry of Justice and are updated daily by a computerized algorithm.The SAFVR has good predictive ability with an AUC of 0.77 for predicting FV offenses within two years (Bissielo & Knight, 2016) and 0.64 for predicting FV-related calls for police service within six months ([authors]) but tends to over-predict recurrence.Raw SAFVR scores were unavailable; instead, risk was automatically categorized into either 'low' , 'moderate' , or 'high' .Some SAFVR scores were missing (i.e., 'no score'), usually due to a person having no criminal history in New Zealand.

Dynamic risk assessment
The Dynamic Risk Assessment (DYRA; New Zealand Police, n.d.) is an actuarial instrument designed to estimate the likelihood of an aggressor committing further FV-related harm and inform the 3-day safety plan police compose at each call out.Police complete the DYRA by entering victim responses to a series of questions into an iPhone app that calculates the total score and risk category ('low' , 'moderate' , or 'high').The DYRA consists of ten questions asked in all cases, two questions asked if the episode contains IPV, and a further four questions asked if children normally reside at the scene address (New Zealand Police, 2018).The questions are not publicly available but concern the aggressor's behavior, including their mental health and substance use, and current stressors affecting the family (New Zealand Police, 2018).A recent study found the predictive ability of the DYRA is statistically inferior to the SAFVR for FV-related calls for police service within six months (AUC of 0.54; [authors]), and the DYRA also tends to over-predict recurrence.We could access DYRA scores but chose to use the risk categories ('low' , 'moderate' , or 'high') to remain consistent with the SAFVR and ISR risk categories.

ISR risk categories
Triage teams, comprised of representatives from the ISR's partner organizations, meet daily to undertake collaborative risk assessments for cases recently reported to police in their catchment area (Integrated Safety Response, 2019).For each case, team members read the police report for the most recent FV episode, share relevant information from their organization's databases, identify risk factors, and collaboratively determine a categorical risk level: 'low' , 'medium' , or 'high' (Integrated Safety Response, 2019).This risk category is then used to determine the level of concern for future episodes and the harm expected within them, guiding the intensity of interventions offered to the family.A recent study found the ISR triage teams' risk categories have an AUC of 0.60 for predicting FV-related calls for police service within six months; statistically inferior to the SAFVR, but superior to the DYRA [authors].Like the DYRA and SAFVR instruments, the ISR risk categories also tend to overpredict recurrence.

Sample characteristics
We identified the predominant aggressor in each index episode based on the role police assigned to people in their report (e.g., perpetrator, suspect; New Zealand Police, 2018).In cases where two or more individuals were labelled as 'mutual participants' , we identified the aggressor as the 'person posing risk' in the DYRA completed by attending officers ([authors]).Table 1 shows that around three quarters of aggressors were male, and most were identified as Māori or New Zealand European.Half of index episodes were between current intimate partners, and a further fifth were between former intimate partners, meaning that over two-thirds of aggressors in the sample had IPV index cases.A further one-fifth of index episodes were between parents and children, with the remainder being between siblings or people in other familial relationships.More than half of aggressors were categorized as high-risk on the SAFVR, with a fifth being categorized as low-risk, 13% categorized as moderate-risk; 14.9% of aggressors had no SAFVR score.A third of cases were categorized as highrisk on the DYRA for the index episode, around a quarter each were categorized as low-and moderate-risk, and the questions that make up the DYRA were not answered in 14% of cases.In contrast, the most common ISR risk category was medium, applying to more than half of cases.
The ISR triaged two-fifths of cases as low-risk and rated fewer than one in ten as high-risk.

Procedure
We collected all FV-related episodes reported to police during the two years following the index episode for the index aggressor (which could be up to December 2020), where they were again in the role of perpetrator, suspect, or mutual participant.We divided the two-year follow-up period into eight three-month intervals and recorded the presence of each of the five indicators of harm-verbal abuse, physical, sexual, threats of harm, property damage-during each interval.If an aggressor had no episodes reported, they scored a 0 for all five types of harm.Each type of harm could only be scored once, even if an aggressor had multiple episodes reported during the same interval (e.g., a person with two episodes in one interval, one containing verbal abuse, and one containing verbal abuse and physical harm, would score 1 for verbal abuse and 1 for physical harm for that three-month interval).

Analysis
Our first research question was: can we estimate latent states from information reported to police about FV aggressors' harmful behaviors?To answer this question, we used the five harm indicators from police reports for FV episodes (verbal abuse, physical harm, sexual harm, threats of harm, and property damage) to create discretetime hidden Markov models with two to eight latent states, using the LMest package in R (Bartolucci et al., 2017).No other information was used to generate the model.
From the models generated we then selected the bestusing the Bayesian Information Criterion (BIC; Jones et al., 2010;Zhang et al., 2010;Zucchini et al., 2016).Then, for each state, we described the probability of each of the five types of harm being reported while aggressors were in that state (Heckert & Gondolf, 2005;Jones et al., 2010).Next, we used the Viterbi algorithm from the LMest package to estimate each aggressor's most likely latent state at each of the eight three-month intervals (Bartolucci et al., 2017).Focusing on the initial latent state (i.e., during the first three-month interval after the index episode), we described aggressors' demographic and risk characteristics by initial latent state with means, percentages, and chi-square tests of independence.Then, we described how likely a given aggressor was to move from one state to another, or remain in the same state, over time, given their most recent state (Zucchini et al., 2016).Because hidden Markov models allow for non-linear movement between latent states over time (Jones et al., 2010;Zhang et al., 2010) the probability of moving from one state to another varied at each transition between intervals (see supplemental materials).We reported the average of these transition probabilities and the accompanying standard errors to describe the overall pattern in aggressors' movement between latent states.Then, we described the distribution of aggressors between the three latent states across the two-year follow-up period.
The second research question was: to what extent are there common patterns in FV aggressors' movements between latent states over two years?To answer this question, we performed k-means clustering in IBM SPSS Statistics version 27 to find the most common pathways between the latent states during the two-year follow-up period.Finally, we described aggressors' demographic and risk characteristics by pathway with means, percentages, and chi-square tests of independence.

Results
We compared the BIC for hidden Markov models with two to eight latent states (see supplemental materials) and chose to proceed with the three-state model because it had the smallest BIC statistic (Jones et al., 2010).Figure 1 shows the probability of each of the five harm variables being reported or not reported to police in the three latent states.The first state had a very small probability of verbal abuse (0.02) and virtually no probability of all other types of harm (< 0.01).The second state in the model appeared to be "in between" in terms of severity.It had a small probability of verbal abuse (0.18), very small probability of physical harm (0.02) and threats (0.01), and virtually no probability of property damage or sexual harm (< 0.01).The third state included an almost certain probability of verbal abuse (0.96), moderate probability of physical harm (0.43), small probabilities of threats (0.23) and property damage (0.22); and a very small probability of sexual harm (0.01).Thus, this state was viewed as the highest in terms of reported harm severity.
Table 2 describes the demographic and risk characteristics of aggressors according to their initial latent state (i.e., their state in the first three months following the index episode).There was an association between initial latent state and the SAFVR, DYRA and ISR risk categories, whereby state 1 (the least severe) had a greater proportion of aggressors in the lower risk tiers than state 2 and state 3 (the most severe).Initial latent state was also associated with aggressor gender; a greater proportion of aggressors were men in states 2 and 3 compared with state 1.There were patterns in aggressors' relationship to the index victim, whereby the largest proportion of aggressors who were the current intimate partner of the index victim was in state 2, the largest proportion of aggressors who were former intimate partners was in state 3, and the largest proportion of aggressors with parent/child index episodes was in state 1.Finally, initial state was associated with aggressor ethnicity; a greater proportion of Māori were in state 2 and 3 than in state 1 and a greater proportion of New Zealand European aggressors were in states 1 or 3 than in state 2.
Table 3 shows how likely a given aggressor was-on average, across all seven transitions between the eight intervals in the two-year follow-up period-to move from one state to another, given their most recent state.The states in the rows represent the most recent latent state, and the states in the column represent the latent state at the next interval.The table shows that if a case was in state 1 at a given interval, the most likely outcome was that they remained in that state, with a very small probability of escalating directly to state 3.If a case was in state 2, they were most likely to remain in that state at the next interval, with a moderate probability of escalating to state 3, and a small probability of de-escalating to state 1.If a case was in state 3, they were most likely to remain in state 3, followed by lower probabilities of deescalating to state 1, or state 2.
Figure 2 shows the proportion of aggressors in the sample who were classified as being in each latent state at each interval during the two-year follow-up period (see supplemental materials for corresponding table).These proportions differ from the average transition probabilities described in the previous paragraph because they refer to changes in proportions at each transition, rather than overall.The proportion of aggressors in state 3 decreased from 34% at time 1, to 16% at time 2 and 15% at time 3.The proportion of aggressors in state 2 increased from 5 to 19%, then steadily decreased, reaching 0% by time 8.As we would expect given the declining proportions of aggressors in states 2 and 3, there was a corresponding increase in the proportion of aggressors in state 1, which reached 85% by time 8.The raw figures support this pattern; the proportion of aggressors with any further episodes reported to police declined in each  three-month interval from 68% at time 1, to 61% at time 2, 56% at time 3, 51% at time 4, 46% at time 5, 39% at time 6, 29% at time 7, and 19% at time 8 (see supplemental materials).
Recall the second research question: to what extent are there common patterns in FV aggressors' movements between latent states over two years?We performed cluster analysis with two to eight clusters and chose to proceed with the four-cluster solution because the patterns in the cluster centers were sufficiently different to one another to interpret.Figure 3 shows the four pathways identified in this cluster analysis.In the first pathway, which we called De-escalation of reported harm (n = 371, 17.5%) aggressors were in state 3 in the first interval, then de-escalated to state 2 for five intervals, then de-escalated further to state 1 for the remaining two intervals.In the second pathway, High reported harm (n = 326, 15.4%), aggressors moved between states 3 and 2 continuously during the followup period.In the third, least common, pathway Low reported harm (n = 114, 5.4%) aggressors were in state 1 for the first three intervals, then escalated to state 2 for the remaining five intervals.We chose to call this pathway Low reported harm rather than Escalation because state 2 was still relatively low in severity; characterized by a small probability of reported verbal abuse and very small or no probability of other types of harm being reported to police.Lastly, in the fourth-but most common-pathway No reported harm (n = 1304, 61.7%), aggressors remained in state 1 for the entire follow-up period, which had a very small probability of reported verbal abuse.
Finally, Table 4 describes the demographic and risk characteristics of aggressors by pathway.There were significant associations between New Zealand Police's existing SAFVR and DYRA risk assessment instruments and the ISR triage teams' risk categories and the pathways, but the associations were not straightforward.For the SAFVR (i.e., New Zealand Police's static risk instrument), the greatest proportion of high-risk aggressors was in the High reported harm pathway, followed by the Low reported harm pathway and De-escalation pathway.However, almost half of aggressors in the No reported harm pathway were also rated as high-risk.The High reported harm and De-escalation pathways had a higher proportion of aggressors assigned medium-and high-risk categories by the ISR after the index episode, compared with the No reported harm and Low reported harm pathways.The De-escalation pathway had the greatest proportion of cases assigned the high-risk category of the DYRA (i.e., New Zealand Police's dynamic risk instrument) at the index episode.With that said, the distribution of cases in the DYRA risk categories did not appear to differ greatly between the pathways, and the chisquare statistic for this comparison may only have been statistically significant due to the large sample size.
There were also associations between the pathways and aggressor gender and ethnicity; a greater proportion of men and Māori were in the Low reported harm, High reported harm, and De-escalation pathways, compared with the No reported harm pathway.Finally, the High reported harm pathway had the largest proportion of aggressors who were the current intimate partner of the index victim, whereas the No reported harm pathway had the largest proportion of aggressors who were former partners of the index victim.

Discussion
In this study we constructed latent states based on information New Zealand Police collected about reported harm, modeled aggressors' movement between those states during a two-year period, and identified common pathways communicating the probability that people would continue to come to police attention for FV.We found that aggressors who were in the least severe state at a given interval were also the least likely to change states, and the most common of the four pathways between the latent states involved virtually No reported harm over the two-year follow-up period.
Taken together, the findings support other research predominantly based on calls for police service indicating that FV aggressors do not inevitably repeat or escalate their harmful behavior, and that a small subset of cases account for a disproportionately large proportion of reported harm (Bland & Ariel, 2015;Piquero et al., 2006;Swartout et al., 2012;Walker et al., 2013).Our results also align with Gulliver and Fanslow's (2015) research describing women's experiences of IPV victimization in New Zealand.In that study, the first and most common class involved no or low harm, and the second class involved less severe and frequent harm than the third class, which involved high rates of severe IPV (Gulliver & Fanslow, 2015).
On the other hand, because an estimated two-thirds of FV episodes are not reported to police (Ministry of Justice, 2022), conclusions about the pathways generated in this study should be understood as based on reported harm, rather than actual harm; inevitably, police records only represent part of the reality of harm experienced (Ministry of Justice, 2022).Consistent with this view, the latent states we found contained lower probabilities of harm than those found by Jones et al. (2010) among men referred to non-violence programs, where data were obtained through phone calls with victims rather than from archived police reports.
On the other hand, the lower overall rates of reported harm in this study may also be a result of the practice during this time of supporting the reporting to police of low and no harm episodes more generally.

Implications
The findings produced in this study may be specific to the multi-agency approach in ISR catchment areas, wherein all cases received triaging, and cases assessed as being at medium or high risk were allocated some form of multiagency case management that could have included interventions for victims, aggressors, or the entire whānau.If these services were a) delivered and b) effective, they may have altered aggressors' pathways from what would have naturally occurred and could have caused the De-escalation and Low reported harm pathways.Hence, the findings may not generalize to other contexts; for example, research exploring aggressor behavior in the context of another multi-agency case-management response would likely identify more severe harm on average if performed in a jurisdiction where that response only receives referrals for cases already classified as high risk (resulting in a truncated distribution, e.g., Robinson & Tregidga, 2007).
Furthermore, the results should be understood within New Zealand's social context.Due to colonization, Māori are over-represented in FV statistics in New Zealand (Ministry of Justice, 2021, 2022) and disproportionately experience risk factors for FV (Dobbs & Eruera, 2014).Resultantly, the ISR has a commitment to improving the cultural responsiveness of its services (Ara Poutama Aotearoa, 2019;Mossman et al., 2017;New Zealand Government, 2022).In this study we found Māori were disproportionately represented in the states and pathways with higher probabilities of reported harm.One explanation for this result may be differences in reporting to police; but the New Zealand Crime and Victims Survey has found no ethnicity-related differences in reporting rates (Ministry of Justice, 2022).
This research generates suggestions for changes to how practitioners conceptualize and assess risk for FV.The results align with evidence challenging the longstanding presumption of the inevitability of escalation (Barnham et al., 2017;Bland & Ariel, 2015).Rather than assuming escalation, the task then becomes to identify the small group who do escalate or who are consistently experiencing harm (Robinson & Clancy, 2020).The risk assessment instruments New Zealand Police currently use are designed to triage out low-risk cases and otherwise over-predict the likelihood of recurrence [authors].This tendency to overestimate risk can have negative unintended consequences, subjecting people to unwarranted sanctions (e.g., arrest and imprisonment) and contributing to the overrepresentation of Indigenous people in the criminal justice system (e.g., Thorburn & Weatherburn, 2018).Given the over-representation of Māori in FV episodes reported to police (Ministry of Justice, 2021, 2022) this group bears a disproportionate burden of these consequences.But it is somewhat comforting to observe that within the context of the ISR, over-prediction leads to people receiving intervention and support services rather than punitive measures.
We examined the ISR, SAFVR, and DYRA risk categories assigned to aggressors in different pathways and found the ISR overestimated high risk to a lesser degree than the other risk categories.The SAFVR had the strongest relationship with the pathways, but even then, two-fifths of aggressors in the No reported harm pathway had high SAFVR risk categories (i.e., were likely overclassified).This result suggests the SAFVR is not adequately sensitive in predicting the patterns of harm examined in this study, which makes sense because it was primarily designed to assess the likelihood of an aggressor being convicted of a family violence-related offense within the next two years (Bissielo & Knight, 2016).Both the ISR and SAFVR risk categories outperformed the DYRA, which likely only had a significant relationship with the pathways due to the large sample size.We suggest it is necessary to refine existing instruments so they can be better used to identify cases at the highest risk of ongoing harm; the incorporation of multi-agency information into these instruments may provide a promising opportunity for improvement (Robinson & Clancy, 2020).

Strengths and limitations
This study employed a novel approach to describe patterns in the behavior of FV aggressors over two years and has limitations that should be addressed in further research.Because most risk assessment for FV occurs on the front line, the ability for practitioners to complete those assessments with routinely collected information is an important aspect of ecological validity.We used police-recorded information in this study; however, that approach excludes unreported information-which is the bulk of FV (Ministry of Justice, 2022)-which could contribute to different results.This limitation is why we opted for a hidden Markov model in the design of this study, because it uses observed outcomes to model states representing unobserved behaviors (Jones et al., 2010;Zhang et al., 2010;Zucchini et al., 2016).Nevertheless, it is important to acknowledge that temporal misalignment can occur between observed outcomes and unobserved behaviors (Glennie et al., 2022).In other words, the reported harm indicators at a particular point in time may not accurately reflect aggressors' true behavioral states at that time, due, for example, to inconsistent underreporting or delays in reporting.Further research could mitigate this issue by interviewing victims and using natural language processing and the hidden Markov approach to more accurately infer latent states representing aggressor behavior and victims' experiences of harm.
Another limitation of this research is that the police harm indicators did not include psychological harm, coercive control, or stalking (Robinson et al., 2018).These are important components of FV that detrimentally impact victims but are often not directly measured despite legislative attempts to criminalize them in some jurisdictions (Ansara & Hindin, 2011;Dichter et al., 2018;Wiener, 2017).Although the hidden Markov model employed in this study could not identify coercive control; the presence of controlling behaviors is conceivable in the High reported harm pathway (accompanying reported harm) and in the No or Low reported harm pathways (preventing reporting).Research that uses victim accounts to capture coercive control behavior may improve our understanding of families' experiences; for example, there may be a distinct state involving coercive controlling behaviors, or those behaviors may precede, accompany, or follow physical violence.
There were several factors that we could not account for in the design and data sources used in his study.For example, by focusing on the harm that individual aggressors committed, we could not account for bi-directional harm or dyads that switched between the roles of aggressor and victim within and between episodes (Straus, 2015;Straus & Gozjolko, 2016).Because this study included a broad and relatively low-risk sample compared with other studies of FV (e.g., fewer than a third of initial episodes involved physical harm), bi-directional harm and role switching may have been more common than is observed in high-risk samples (Straus & Gozjolko, 2016), and this is a point for further investigation.We also could not tell whether episodes reported to police during the follow-up period involved the same victim as the index event, nor who reported episodes.Moreover, we could not account for other contextual factors associated with the results, such as the end of relationships, relocation, and the removal of children; or the impact of community-based sentences, protection orders, and imprisonment.Due to the disproportionate number of Māori experiencing FV (Dobbs & Eruera, 2014) and the disproportionate treatment indigenous people receive in criminal justice systems (Thorburn & Weatherburn, 2018) this limitation would disproportionately impact findings for Māori people in the sample.But it is again important to note that because this study used a broad and relatively low-risk sample, the removal of children and imprisonment would likely have occurred in very few cases, if any.Integrating a wider range of information about peoples' roles, contextual factors, and other family members' characteristics and behavior would further improve the holistic and descriptive nature of the model presented herein.
Finally, we did not know whether aggressors and their families were offered interventions during the follow-up period.This is an important limitation because in theory, some cases could have engaged with interventions that changed the pathway they may have otherwise followed.However, the effect of this limitation is likely small because even in high-risk cases, which receive the most attention, the rate of initial engagement with interventions is around 63% for victims and 38.2% for aggressors ([authors]), with research lacking about ongoing engagement and intervention efficacy.Furthermore, we also could not determine whether the quality of the response affected how likely families were to report further harm.Further research should account for those interventions to show how they align with patterns of harm, exploring whether engagement with a given intervention (or combination of interventions) is associated with a corresponding de-escalation in an aggressor's behavioral state.

Future directions
The hidden Markov modeling approach could be developed and incorporated in efforts to better predict and communicate risk for FV.In practice, these efforts would involve identifying latent states from a larger sample of information about aggressor behavior and investigating whether existing risk assessment instruments can predict the likelihood of those states.Conceptualizing risk in this way would align the outcomes that instruments are designed to predict with the outcomes that practitioners responding to FV have in mind (i.e., states; Heckert & Gondolf, 2005).For instance, one state may describe a situation where a family experiences ongoing verbal abuse, whereas another may describe a more volatile situation with a greater probability of multiple forms of harm.Alternatively, instruments could be created to predict a given aggressor's most likely pathway (Jones et al., 2010;Zhang et al., 2010).These changes would preserve the simple design of risk assessment instruments and allow agencies to predict the likelihood of the state or pathway they are most interested in, based on their focus (e.g., ensuring immediate physical safety versus longer-term case management and preventing cumulative harm).
On the other hand, conceptualizing outcomes as latent states could contribute to a different way of thinking about risk.The risk categories presently used to communicate scores from risk assessments are abstract labels (e.g., low, medium, or high) interpreted differently by different people, and applied inconsistently across types of offending (Hanson et al., 2017).Instead of predicting one outcome, the transition matrix from a hidden Markov model communicates the probability of multiple outcomes (e.g., the situation staying the same, de-escalating to a less severe latent state, or escalating to a more severe latent state), given the aggressor's most recent state (Jones et al., 2010;Zhang et al., 2010;Zucchini et al., 2016).Hence, risk categories could be replaced by a description of a given aggressor's most likely state and the types of harm most probable within that state, with a quantification of the uncertainty around that estimate.Such a description would be both informative and uniformly interpreted; thus, could provide a more complete picture of the risk posed in each case, compared with the risk categories in current use (Hanson et al., 2017;Heckert & Gondolf, 2005).

Conclusion
In this study, we modeled the behavioral patterns of 2115 FV aggressors using police-recorded information about reported harm.We estimated three latent states and identified four pathways between those states during a two-year period.Taken together, the findings support other research showing that FV aggressors do not inevitably repeat or escalate their harmful behavior, with a small subset of cases accounting for a large proportion of reported harm.This study demonstrates how information police routinely collect about FV cases can be used to model latent states describing the probability that people will continue to come to police attention for FV, and generates ideas for further work that could improve how risk for FV is conceptualized, assessed, and communicated in practice.

Fig. 1
Fig. 1 Probabilities of the Presence of Five Different Types of Harm in Each of the Latent States.The probability of each type of harm being reported or not reported, in each state, added to 1.The probabilities of the different types of harm being present did not add to one within each latent state or across latent states because they were separate dichotomous variables

Fig. 2
Fig. 2 Distribution of Aggressors Across the Three Latent States During the Two-Year Follow-up Period

Table 1
Descriptive statistics for sample (N = 2115) DYRA = Dynamic Risk Assessment; SAFVR = Static Assessment for Family Violence Recidivism; ISR = Integrated Safety Response a People received no SAFVR score when they lacked the necessary criminal history for a score to be calculated (D.Scott, personal communication with first author, January 28, 2020)

Table 3
The Probability of Aggressors' Next Latent State, Given their Current Latent State, with Standard Errors in Brackets Bayesian Information Criterion = 29,312.19.The table shows average transition probabilities and standard errors, calculated by averaging the probabilities and standard errors for each transition between three-month intervals (see supplemental materials)