Skip to main content

Need to go further: using INLA to discover limits and chances of burglaries’ spatiotemporal prediction in heterogeneous environments

Abstract

Near-repeat victimization patterns have made predictive models for burglaries possible. While the models have been implemented in different countries, the results obtained have not always been in line with initial expectations; to the point where their real effectiveness has been called into question. The ability to predict crime to improve preventive policing strategies is still under study. This study aims to discover the limitations to and the success of the models that attempt to predict burglaries based on spatiotemporal patterns of the risk of break-ins spreading in geographic proximity to the initial break-ins. A spatiotemporal log-Gaussian Cox process is contemplated to model the generic near-repeat victimization scenario and adjusted using the Integrated Nested Laplace Approximation (INLA) methodology. This approach is highly suitable for studying and describing the near-repeat phenomenon. However, predictions obtained with INLA are quite monotonous, of low variability and do not reproduce well the local and short-term dynamics of burglaries for predictive purposes. The conclusion is that predictive models cannot be restricted exclusively to distance decay risk, but they must be designed to detect other types of spatiotemporal patterns which, among other possibilities, open up the possibility of correlating distant events and clusters. Although other studies have already highlighted this problem, the proposal here is to go one step further and clearly extend the near-repeat spatial patterns to achieve better prediction results.

Introduction

It has been a decade since a group of researchers from Los Angeles surprised criminologists and police by proposing a crime prediction model (Mohler et al., 2011). The analogy between cluster generation systems for earthquake aftershocks and for crimes led to a mathematical formulation capable of predicting the dynamics of crime clusters in space and time. The key to the analogy and to the predictive model itself is the criminological theory of repeat victimization (Farrell & Pease, 1993) which, in the case of break-ins, establishes that each burglary has the effect of increasing the risk of new burglaries in the same area over the subsequent days or weeks. This theory is known as near-repeat victimization (Townsley, 2003). Through the summation of these increments in risk nearby, which are normally radial and adjusted with functions that are inversely proportional to distance, these mathematical models update the risk map for the following days or weeks. By applying the Knox test (Briz-Redón et al., 2020; Knox & Bartlett, 1964), this theory has been empirically proven in many countries for residential burglaries (Johnson et al., 2007; Kikuchi et al., 2010; Vijaya Kumar, 2011; Wang & Liu, 2017), despite the apparent geographical, social, economic, and criminal differences. This behaviour common to burglars around the world, seems to increases their profits and chances of success, and can be explained within the general framework of the crime opportunity theory (Gottfredson & Hirschi, 1990).

The starting point of any prediction for residential burglaries is a spatiotemporal pattern that generates dynamic crime clusters, i.e., the near-repeat pattern. When observing a daily or weekly sequence, these concentrations of crime have a lifetime of two or three weeks, (from the first burglary to the last), in what is often called a wave of burglaries. Between waves, several weeks can pass with very few or no break-ins. The goal of predictive programs is to detect the occurrence of waves from the very beginning to then deter them.

While different police, business and academic initiatives have pooled ideas to develop crime prediction programs, all have different focuses and methodological peculiarities, which are not always explained or transparent. Furthermore, business interests and confidentiality agreements make it challenging to know the predictive machinery in detail. Concepts such as Big Data, Machine Learning or Artificial Intelligence are often incorporated as part of the technologies used, which are often seen as black boxes operating on protected and personal data. This has sparked criticism over aspects such as the ethics of the predictions and/or data security. Criticism has also been made concerning prediction biases due to the use of historical police data. The lack of empirical evidence for such predictive programs, or the opacity of such evidence, have caused problems, along with the models and their subsequent predictions being accepted in the day-to-day life of the police, who are used to being guided by their experience and intuition rather than by the results of a "machine" (Bennett Moses & Chan, 2018; Egbert & Krasmann, 2020; EUCPN, 2016; Gerstner, 2018; Meijer & Wessels, 2019; Ratcliffe et al., 2020; Seidensticker et al., 2018; Townsley, 2018; Weathington, 2020; Yang, 2020).

This paper will not enter these debates but rather will focus on the methodology and theory of predicting home burglaries. Predictive policing continues to advance as a tool for the future that will be able to help police improve their prevention strategies and reduce crime. While new programs are emerging to implement predictive policing, none have yet managed to gain general acceptance. The entire predictive process, from its founding ideas and approaches to its methods and results, must continue to be constructed and reviewed.

Following an earlier study in Catalonia (Spain) which found the presence of the near-repeat pattern on a large scale (Boqué, Serra, and Saez 2020), this study aims to discover to what extent a prediction can be made in Catalonia with the near-repeat pattern operating only in geographical proximity, before addressing any possible extension to or generalization of the same.

Two main objectives are considered. The first is to contrast the possibility of predicting residential burglaries in heterogeneous environments when considering the classic hypotheses of near-repeat victimization, i.e., distance decay risk in geographic proximity and temporal autocorrelation. Large-scale patterns of waves of burglaries observed in Catalonia (Boqué et al., 2020), despite being compatible with the theory of near-repeat victimization, lead to questioning different aspects in terms of the burglars' decision model when choosing their targets. The hypothesis is that, in the general framework of optimal foraging theory (Bernasco, 2009; Johnson, 2014; Krebs & Davies, 1993) environments with small, heterogeneous and scattered residential areas would force the thieves to move and rob several areas within the same criminal initiative. The resulting impact on crime patterns would then be a lower concentration of near-repeats in space. To test this hypothesis, a model based on risk decay in geographical proximity is proposed and analysed using real data.

The second objective is to discuss the role that the two main spatiotemporal risk factors (static and dynamic) play in these predictive models, which often overlap and cause confusion. Static risk is related to the crime opportunity of the area and the flag hypothesis (Tseloni & Pease, 2003), while dynamic risk is related to burglars’ rational decision-making model and the boost hypothesis (Bowers & Johnson, 2004). More recently, a degree of interaction between the two hypotheses called the Flag and Boost Interaction (FBI) has been proposed (Farrell & Pease, 2017), in addition to a theory that extends the boost hypothesis to a network approach (Lantz & Ruback, 2017).

Static risk is easier to model thanks to the long-term stability of the data, thus enabling us to represent and estimate it well using the calculation of means. However, the question is to what point can models based on this component be adjusted and to what extent do they model the dynamic factor of burglaries. To this effect, patterns of risk spread around recent burglaries have been verified (Johnson et al., 2007). Meanwhile, the idea of whether we should continue underpinning only the hypothesis of geographical proximity to relate repeat burglaries is being questioned (Wang & Zhang, 2020). In fact, even the idea of the initial burglary triggering the whole process of repetitions may be considered questionable.

Proposals will be made concerning the questions outlined above, thus allowing us to deepen our understanding of burglaries spatiotemporal patterns. We will also attempt to identify the most suitable model to improve predictions in environments that are not strictly urban but are heterogeneous in nature and have relatively low or irregular intensity. The results obtained will allow us to discuss some of the problems observed in previous models and to propose explanatory and methodological alternatives.

Data and methodology

Data

The territory: catalonia

Catalonia is an autonomous state within Spain. It lies in Mediterranean Europe along the east coast of the Iberian Peninsula, occupying 5.5% of this land area. It is bordered by the Pyrenees and France to the north, the Mediterranean Sea to the east, and the rest of Spain to the west and south. The Catalan population (7.7 million inhabitants) is concentrated in 30% of the territory, mainly on the coastal plains. Two thirds of the population live in the urban area of its capital city, Barcelona. Apart from the capital and its metropolitan area, other cities and residential areas are, in general, small or medium-sized and with a diverse and heterogeneous urban planning (Fig. 1).

Fig. 1
figure 1

Source: Cartographic and Geological Institute of Catalonia – ICGC

Typical heterogeneous urbanism of Catalan populations.

The territory is divided into nine police regions (Fig. 2). Each region includes a set of Basic Police Areas (ABP), which are the territory’s primary units defined by geographical and policing criteria and usually encompassing several municipalities. These regions are geographically diverse. For instance, the territory of the Barcelona Metropolitan Police Region is almost entirely urban, while the Pirineu Occidental Police Region is mountainous with a few, generally small, villages.

Fig. 2
figure 2

Source: Author’s own elaboration

Location of Police Regions (PR) in Catalonia.

Crime data

The data used in this study are forced-entry burglaries reported to the Catalan police from 2014 to 2019. Location in UTM coordinates and a window of time (generally less than a day) when the burglary is assumed to have occurred are recorded for each burglary. The type of residence broken into– a flat, house, or farmhouse—and whether it is a first or second home are also specified.

The series of burglaries were segmented in space and time in different sized square cells (5 km, 1 km, 500 m, 250 m, 100 m) and in weekly intervals. Profile characteristics used were the type of building (house or flat), type of residence (first or second), time window of the burglary (morning, afternoon, or night), the day of the week (mid-week or weekend), and the monthly distribution.

These data were provided by the Government of Catalonia Police (Mossos d’Esquadra (PG-ME)), but they were not extracted using the same criteria as applied to the official data made public annually by the same police force (Mossos d’Esquadra—Portal Dades Obertes, n.d.). In addition, some data that were not correctly geolocated had to be discarded. Therefore, while the data are approximately the same, the total number of burglaries stated in this paper does not exactly match the official data, although the results of the analysis are not affected by this.

Methods

Methods are divided into two types. First is a set of simple techniques to verify criminal data stability over time and space, and second is a methodology for modelling the spatiotemporal dynamics of the data.

To test stability, annual discrepancies are measured using standard deviation (SD) and the coefficient of variation (CV). We divide the territory into 5 km sized square cells and take a variable of interest (for example the number of burglaries in flats, or the number of burglaries in second residences) and calculate the percentage of incidence for each of the six years analyzed. For each cell, we obtain the mean, standard deviation, and coefficient of variation of these percentages. SD values less than 0.1 (± 10%) or CV values less than 0.3 can be considered indicators of data stability. The results of this analysis are summarized as a percentage of cells that meet these criteria. For this analysis we discard cells with very few burglaries.

Using the same criteria, the stability of the spatial distribution of the burglaries inside the 5 km cells is determined by analyzing the distribution in percentage in each of the 25 1 km sub cells.

Runs tests are also applied in different cells and spatial configurations to check the validity and extent of the stability of the non-random wave pattern. The runs test is based on transforming the temporal series of weekly burglaries into a series of zeros and ones through different activation levels: if in a period of time the level has not been reached, it will be 0, otherwise 1. The runs test results in non-randomization grouping for a certain significative p-value when the number of runs in the sample is close to the minimum possible range according to the total of zeros and ones available. We call a clustered run of burglaries a wave.

The second and central part of the methodology consists of applying a log-Gaussian Cox process (Diggle et al., 2013; Serra et al., 2014) spatiotemporal model to predict weekly burglaries. This model is adjusted using the Integrated Nested Laplace Approximation (INLA) methodology (Blangiardo et al., 2013; Lindgren & Rue, 2015), accessible at https://www.r-inla.org/home (R-INLA Project, n.d.). The validity of this continuous approach for this type of discrete model in which the study variable is a count, has been demonstrated (Simpson et al., 2015).

The model starts from a discrete–continuous approximation of the phenomenon of home burglaries. This approximation is constructed on the assumption that there is a latent risk of a burglary in space (and time) or, in other words, that places close together in space (and time) have a similar risk (following Tobler’s first law of geography (Tobler, 1970)). This way, any point in space (and time) can be assigned a risk by using the next point’s interpolation.

Mathematically, this is formulated as a latent space–time process:

$$Z\left({s}_{ij};{t}_{j}\right)= Y\left({s}_{ij};{t}_{j}\right)+ \epsilon \left({s}_{ij};{t}_{j}\right)$$

where \(t \in \left\{{t}_{1}, \dots ,{t}_{T} \right\}\in {D}_{t}\) are the times when the observations are made in the time domain \({D}_{t}\), \(s \in \left\{{s}_{1}, \dots ,{s}_{M} \right\} \in {D}_{s}\) are the set of points in the space domain\({D}_{s}\), \(Z\left({s}_{ij};{t}_{j}\right)\) represent the number of burglaries, \(Y\left({s}_{ij};{t}_{j}\right)\) is the latent process, and \(\epsilon \left({s}_{ij};{t}_{j}\right)\) is an error iid with a mean of 0 and a variance of\({\sigma }_{\epsilon }^{2}\).

The objective is to predict the value of the so-called latent field \(Y\left({s}_{0};{t}_{0}\right)\) in the space–time localization \(\left({s}_{0},{t}_{0}\right)\), depending on the known vector observations \(Z\).

This latent field can be broken down into two parts: static and dynamic, \(Y\left(s;t\right)=\mu \left(s;t\right)+\eta \left(s;t\right), \forall \left(s;t\right)\in {D}_{s}\times {D}_{t}\), where \(\mu \left(s;t\right)\) represents the average of the process, interpretable as the static and non-random component, and \(\eta \left(s;t\right)\) represents the random or dynamic part of the process, with an average of 0, and with spatial and temporal dependency. The latent field in point \(\left({s}_{0};{t}_{0}\right)\),\(\widehat{Y}\left({s}_{0};{t}_{0}\right)\), is estimated by searching for a prediction function that minimizes the average quadratic error,\(E{(Y\left({s}_{0};{t}_{0}\right)- \widehat{Y}\left({s}_{0};{t}_{0}\right))}^{2}\).

An adaptation of this approach for cases where the variable of interest is a count, as is the case with burglaries, is a hierarchic formulation with conditional probabilities and the introduction of a link function:

$$Z\left( {s;t} \right){\text{|}}Y\left( {s;t} \right),\gamma \sim \pi \left( {Y\left( {s;t} \right),\gamma } \right),s \in D_{s} ,t \in D_{t}$$
$$g\left(Y\left(s;t\right)\right)=x{\left(s;t\right)}^{^{\prime}}\beta +\eta \left(s;t\right) , s\in {D}_{s} , t\in {D}_{t}$$

where \(\pi\) is a probability distribution, with a parameter of scale \(\gamma\) and a mean \(Y\left(s;t\right)\). The function g(.) is the link function that converts the mean of the response variable into a term that contains the fixed effects \(x{\left(s;t\right)}^{^{\prime}}\beta\) (which can include covariables), and a term that contains the random effects \(\eta \left(s;t\right)\) that can be modelled using a space–time covariance matrix.

This covariance matrix is the one that will model the dependency of the points in space and time. In practice, it is often considered inseparable, such that both the covariance in space (spatial dependency) and the covariance in time (temporal dependency) is modelled.

The model for the case of forced entry burglaries will be formulated considering that the counts follow the Poisson distribution, and that the link function is the logarithm (log-Gaussian Cox Process):

$$Z\left( {s;t} \right){\text{|}}Y\left( {s;t} \right),\gamma \sim Poiss\left( {Y\left( {s;t} \right),\gamma } \right)$$
$$log\left(Y\left(s;t\right)\right)=x{\left(s;t\right)}{^{\prime}}\beta +\eta \left(s;t\right)$$
$$s\in {D}_{s} , t\in {D}_{t}$$

The function \(\eta \left( {s;t} \right) = {\varvec{\Phi}} \left( {{\text{s}};{\text{t}}} \right){\varvec{\alpha}} \left( {{\text{s}};{\text{t}}} \right)\), where \({\varvec{\Phi}}\left(\mathrm{s};\mathrm{t}\right)\) is a matrix \({m}_{t }\times {n}_{\alpha }\) of basic spatial functions, and \({\varvec{\upalpha}}\left(\mathrm{s};\mathrm{t}\right) \sim Gau(0,{C}_{\alpha })\) are the associated random coefficients in which the covariance matrix is assumed to be separable, the product of the spatial and temporal covariance matrices.

The Matérn covariance function is taken as the covariance of the space. This establishes that if the distance between two points is d = \(\Vert {s}_{i}- {s}_{j}\Vert\), then its covariance is \(C_{v} \left( d \right) = \frac{{2^{{1 - v}} }}{{\Gamma (v)}}(kd)^{v} K_{v} (kd)\), where Г is the gamma function, \({K}_{v}\) is the modified Bessel function, \(k>0\) is a parameter of scale, and \(v>0\) is a smoothing parameter. This covariance is static and isotropic or, in other words, it only varies depending on the distance between the points.

Once the parameters are adjusted, the degree of spatial dependency between two points is determined by estimating the distance reached by the radial irradiation of the risk caused by the burglaries in a specific cell. The INLA method gives the estimate of the distribution of the parameter \(l,\) called the practical range, where \(l= \sqrt{8v}/k\), which is the distance from where the spatial correlation will be less than 0.1.

Regarding temporal covariance, an AR(1) process is taken (Geurts et al., 1977), such that a single parameter, \(\rho\), will indicate the degree of temporal correlation between successive intervals. The fact that it has been proven that the data are usually grouped into waves of burglaries means that we expect this parameter to be near to 1 if time intervals of a week are considered.

INLA specifics when applied

The INLA method requires a mesh to approximate the latent field (Krainski et al., 2019). This mesh is not irrelevant, as it can condition the parameters estimated by the model, e.g., temporal or spatial autocorrelations. Mesh should be thin enough to modulate spatiotemporal effects and thick enough to avoid redundant calculations that unnecessarily increase computational time. It will also depend on the accuracy in space and time of the input data and the sense of the continuous approach of the phenomenon studied. In the case of home burglaries in Catalonia 250 m sized square cells have been chosen when the environment is predominantly urban, like in Barcelona, or 500 m cells for other types of environments, usually more extensive areas like the other police regions. This input grid-cell definition is the starting point of the mesh. Meshes thinner than the input grid-cell or much coarser do not make sense.

Another specific aspect of INLA comes from its Bayesian orientation. It is necessary to specify a priori information, specifically the spatial and temporal autocorrelation, in order to be able to make the continuous approximation of the model from the solution of the stochastic partial differential equations (SPDE) (Fuglstad et al., 2019; Simpson et al., 2017). These priors, which will condition results, will depend on the previous knowledge of the phenomenon studied, and must be in line with the mesh. A micro-scale mesh will have lower standard deviations and the possibility to observe autocorrelation at a shorter distance, something which will not be possible if the mesh configuration is larger.

INLA output offers, by default, an estimate of a posteriori distributions of the main approximate model parameters and its mean. These parameters include temporal autocorrelation (ρ), spatial autocorrelation (l), and the standard deviation of the logarithm of the number of burglaries at the vertices of the mesh (σ).

To compare the models obtained, R-INLA provides some statistics like Marginal Likelihood, Conditional Predictive Ordinates (CPO), Predictive Integral Transform (PTI), Deviance Information Criterion (DIC) or Watanabe-Akaike Information Criterion (WAIC) (Krainski et al., 2019). However, in this case, we will evaluate it using standard indicators like correlation, the coefficient of determination or R squared (R2), along with the total number of burglaries predicted. Data obtained with the INLA approximated model are projected onto grid-cells of different sizes. The initial projection is made onto the smallest grid-cell (100, 250 or 500 m) and the number of burglaries in these cells is integrated to obtain the projection in the larger grid cells (from 250 m to 5 km).

Results

General stability of the data

The monthly distribution of the number of burglaries in Catalonia between 2014 and 2019 shows a seasonal monthly pattern (Fig. 3, Table 1), with an increased number of burglaries in the summer months and between the end of autumn and the beginning of winter.

Fig. 3
figure 3

Source: Author’s own elaboration from data provided by the police

Graph of the annual and monthly distribution of burglaries in Catalonia.

Table 1 Annual and monthly distribution of burglaries in Catalonia

There is little variation in the different years, with a monthly CV of less than 0.1 in all of them. Regarding interannual variation, out of the six years considered, 2014 was the year when the least burglaries were recorded, with 25,846, with the mean for the last 5 years stabilizing at around 28,000 annual burglaries. The CVs of the annual total and for the last five years are 0.05 and 0.04, respectively.

This global profile is common to the three metropolitan police regions, but in this case with some slightly higher monthly CVs (Additional file 1: Fig. S1A). The rest of the police regions have monthly distributions with seasonal differences, and with the summer peak disappearing. The graphics for the police regions with the least burglaries appear to be almost random around a relatively stable monthly average. These patterns can also be seen in the weekly distributions of the burglaries, although in this case the variability is higher than the monthly one (Additional file 1: Fig. S2A).

For smaller territories, weekly distributions of burglaries in the basic police areas are also represented (Additional file 1: Fig. S3A). In general, it can be observed how, with a few exceptions, seasonality is diluted or less obvious, while at the same time random appearance increases, with weekly CVs close to or higher than 0.5. Despite this greater variability, data continue to oscillate with respect to a relatively stable weekly average across weeks and years. These characteristics are more accentuated when weekly burglaries in the 5 km, 1 km, and 250 m sized square cells are analyzed (Fig. 4).

Fig. 4
figure 4

Annual variability of the mean (2014 to 2019) in different size cells (top graph 5 km, middle graph 1 km and bottom graph 250 m)

The general pattern, which has been observed previously (Boqué et al., 2020), establishes that burglaries with forced entry in Catalonia follow Poisson distributions with relatively stable means over the years, despite occasional variations or trends.

As for the stability of the profile of the burglaries, the summarized results of the annual discrepancies observed are shown. To avoid statistical distortion of the number of cells with one or very few burglaries, the study has been carried out with a sample of 100 5 km cells with an annual mean intensity higher than 35 burglaries (weekly mean higher than 0.67) (Table 2).

Table 2 Annual profile stability of burglaries in each studied cell

Furthermore, the stability of the distribution of burglaries inside the 5 km cells and the annual percentage in the 25 sub cells of 1 km inside it are compared. In this case, 425 1 km sub cells with an average annual intensity of more than 15 burglaries were chosen. The annual percentage, the mean of this percentage, the standard deviation, and the CV were calculated for each of them. The result is that discrepancy in SD of these is lower than 0.1 in 94.4% of the cells and the CV is lower than 0.3 in 70% of the cells. The size of these discrepancies graphically depicted in Fig. 5.

Fig. 5
figure 5

Annual percentage variability of burglaries in the 1 km cells with respect to the total burglaries in the corresponding 5 km cell

To check the stability of the non-random waves pattern, the runs test was applied. The result of the weekly series of burglaries for different size cells can be seen in Table 3.

Table 3 Percentage of non-random wave patterns in different size cells

The randomness studies for each of the three cell sizes considered were carried out independently, starting with all the cells each time and not subordinating the analysis of the smaller sizes to the fact that they are included in the larger size non-random cells.

Predictive models with R-INLA

Predictive models have been built for five police regions in which the majority of home burglaries in Catalonia are concentrated i.e., the three metropolitan regions along with Girona and Camp de Tarragona. The results obtained in all of them are similar and lead to the same conclusions.

We note that analysed data in these predictive schemes correspond to periods in which the Catalan police did not apply any preventive strategy based on predictions, so that the results are not distorted for this reason.

Table 4 shows the minimum and maximum value of these means in the different tests that have been carried out in non-overlapping weekly periods during 2019.

Table 4 Statistical summary of posteriori parameters: minimum and maximum values of the means estimated by INLA in various tests in non-overlapping periods for each of the above-mentioned police regions during 2019

To illustrate this in more detail, a summary of some examples is shown which correspond to the North Metropolitan, Girona and Barcelona Metropolitan police regions.

In the case of the North Metropolitan Police Region (Figs. 6, 7 and 8), a 7-week predictive scheme was used, where the previous six weeks were used to adjust the model parameters, and the prediction was made for the seventh week. Data input had an accuracy of 500 m. Figure 7 shows the prediction obtained, with an average of the high temporal autocorrelation parameter (ρ = 0.76) and a spatial dependence mean with a radius of 1.6 km, distributed between 1 and 2.5 km. In another weekly period, a spatial dependence mean of 3.7 km was obtained (Fig. 9).

Fig. 6
figure 6

Mesh for North Metropolitan PR, including burglary locations (red points)

Fig. 7
figure 7

Example 1.1 (NMPR): INLA predicted burglaries (z) for next week in 500 m cells with spatiotemporal dependence parameters and variance information (mean and probability distribution). Predictive scheme of the example: Six previous weeks to adjust parameters for the following week’s prediction

Fig. 8
figure 8

Example 1.2 (NMPR): INLA output: 5 km cells individual adjustment per week (1 to 6) and predicted values for 7th week. (cc: correlation coefficient, R.2: coefficient of determination)

Fig. 9
figure 9

Mesh for Girona PR, including burglary locations (red points)

In the case of the Girona Police Region (Fig. 10), this value reaches a mean of 4.1 km, with an oscillation between 2 and 6 km.

Fig. 10
figure 10

Example 2.1 (GPR): INLA predicted burglaries (z) for next week in 500 m cells with spatiotemporal dependence parameters and variance information (mean and probability distribution). Predictive scheme of the example: five previous weeks to adjust parameters for the following week’s prediction

Figures 8 and 11 show how the fit of the data with this large spatial configuration is good, with high values of the coefficient of determination R2. In contrast, in both cases the predictions made for the last week series show less accuracy (Fig. 12).

Fig. 11
figure 11

Example 2.2 (GPR): INLA output: 5 km cells individual adjustment per week (1 to 5) and predicted values for 6th week (cc: correlation coefficient, R.2: coefficient of determination)

Fig. 12
figure 12

Mesh for Barcelona MPR, including burglary locations (red points)

In the case of the Barcelona Metropolitan Police Region, input cells are of 250 m, and a predictive scheme of five previous weeks has been followed. A smaller spatial autocorrelation is obtained (Fig. 13), with an average of 800 m, varying between 400 and 1200 m. In Fig. 14, it is observed that the 500 m cells are too small to pick up the spatial dependence effect, with a low R2 for both the previous weeks’ adjustment and the predictive week. In Fig. 15, considering a cell size of 1 km, closer to the practical range of the burglaries in Barcelona, the fit and prediction are better.

Fig. 13
figure 13

Example 3.1 (BMPR): INLA predicted burglaries (z) for next week in 250 m cells with spatiotemporal dependence parameters and variance information (mean and probability distribution). Predictive scheme of the example: four previous weeks to adjust parameters for the following week’s prediction

Fig. 14
figure 14

Example 3.2 (BMPR): INLA output: 500 m cells individual adjustment per week (1 to 4) and predicted values for 5th week (cc: correlation coefficient, R2: coefficient of determination)

Fig. 15
figure 15

Example 3.2 (BMPR): INLA output: 1 km cells individual adjustment per week (1 to 4) and predicted values for 5th week (cc: correlation coefficient, R2: coefficient of determination)

INLA summary results for North Metropolitan Police Region (NMPR)

INLA summary results for Girona Police Region (GPR)

INLA summary results for Barcelona Metropolitan Police Region (BMPR)

Discussion

It has been shown that burglaries in Catalonia are stably distributed in space and time. There are seasonal and trend variations in the average intensity, over the long term, but a stable pattern based on the Poisson distribution is observed. This stability goes hand in hand with a dynamic of weekly waves of burglaries, not random, detected in large spaces and less visible in smaller ones, such as in the 250 m cells.

These macro waves observed in 5 km sized square cells, with a heterogeneous environment inside, can have two different explanations. First, they are generated by the sum of independent micro-waves located in different areas inside the cell, which can even be several kilometers away from one another. If this were the case, an approach based on the micro-scale near-repeat pattern should correctly model the micro-level burglaries’ distribution in space and time and, therefore, the macro-level. Second, one could consider the overall cell as a near-repeat "unit", so that any subsequent burglary in the cell after those that have originated the macro-wave can be considered a replica, without worrying about micro-level patterns inside.

The first explanation, which could be framed within the classic near-repeat victimization theory, has some obvious weaknesses, because it seems contradictory for local micro-waves to be independent and synchronized at the same time, transpiring in generating stable and non-random macro waves.

To test this possible limitation of applying a classical near-repeat scenario in Catalonia to predict burglaries, a mathematical model, based on the concept of the latent field and capable of modelling different situations in which the data presents spatial and/or temporal dependencies, was constructed. Approaches from the perspective of point processes, areal data, grid data, and continuous data are suitably modelled in it (Rue et al., 2009; Simpson et al., 2015). Along these lines, the spatiotemporal log-Gaussian Cox model used in this study (Diggle et al., 2013) can be seen as a continuous and general approach to model the near-repeat pattern, and which meets the hypothesis of risk spread in the geographical and temporal proximity. The model is ruled by very few, general restrictions and consistent with near-repeat theory, i.e., the Matérn covariance function is considered to represent the spatial correlation between two points, where the main restriction is being inversely proportional to the distance, and the AR(1) autoregressive process is considered to model temporal dependency; a simple option that has been shown to be sufficient and suitable in this case.

Another advantage of the model is the continuous latent field, which allows us to largely avoid the debate about the optimum data aggregation configuration (Chainey, 2013; Hipp & Kim, 2017; Mallesonid et al., 2019). This debate refers to what the optimal cell size or bandwidth to better capture repeat patterns would be, whether they should be square cells or circles, what effect the boundary of these spaces can have, etc. The debate is needed especially when the segmentations of the space are fixed for the analysis of the patterns, which is not the case in this model. Cells of 100, 250 or 500 m size used in the input have provided sufficient detail to model burglaries through different residential environments. Based on these, the continuous estimate of the latent field obtained using INLA can be projected to any desired spatial aggregation, to be represented, and analyzed, as we have done in 250 m to 5 km sized cells.

In the weekly predictions, risk contagion parameters of the log-Gaussian Cox model are updated according to the burglaries in the last few weeks. Both static and dynamic risk will be deduced from the most recent events, and the radius of risk contagion will not be a constant parameter throughout the territory but a probability function of this parameter that will be graded differently in each area according to these more recent burglaries. This means that in urban environments the radius of risk contagion will be smaller, and in rural or more widespread areas the radius will tend to be larger. This modulation will be reflected in the distribution of the spatial dependency parameter.

However, despite the model’s flexibility and adaptability, INLA approximation shows an excessively monotonous result, where predicted events in each cell are basically the same in all the time intervals, with weekly values increasing and decreasing proportionally to the increase and decrease of the total burglaries in the territory.

Hotcell maps of the adjusted values always have the same or a very similar appearance, except for global multiplication by a scale parameter related to the variation in the weekly trend of the total burglaries in the police region. Meanwhile, the hotcell maps of the real values show more variability and a different dynamic. Data history stability makes prediction acceptable, both overall and according to adjustment parameters like coefficient of correlation, the R2 and the total number of burglaries predicted, although it clearly does not represent well the local variability or short-term patterns of burglaries distribution in space and time, to the effect that it does not allow us to anticipate where the highest crests of the waves will be.

This limitation in no way contradicts the fact that there is a concentric risk diffusion for each burglary, but it does not exactly match the near-repeat hypothesis. The INLA method detected spatial dependency at distances between 1 to 6 km. The interpretation of this fact is that any burglary has the effect of increasing the risk of new break-ins both in the proximity of the original incident and some distance away from it. In other words, the more burglaries there are a week in the police region as a whole, the greater the probability that there will be more burglaries the following week, but these subsequent burglaries can occur anywhere in the region.

Temporal dependence

With respect to the temporal factor, in all scenarios studied this has been positive and near to 1 (rho parameter \(\rho \cong 1\)). According to this result obtained with INLA, the best prediction for the following week will be the projection of what has happened in the current week (in a large cell configuration with side sizes of about 2 practical ranges).

Locally, this can be interpreted as a confirmation of the pattern of non-random waves because periods of activation and non-activation, for different thresholds, last for various weeks or, in other words, they correlate among themselves. We could also interpret that the dynamics of activation and deactivation are smooth, always enabling this similarity with the previous period, whether the wave is increasing or decreasing in magnitude.

However, in real data the pattern of the waves is not smooth. Sudden increments in the oscillations can be observed, even though they are normally within an already active wave (otherwise, the non-random pattern of the spates would be broken). Seeking to minimize the global error, these models tend towards the mean because they do not seem to capture the local, short-term dynamic well.

The possibility that too large a spatial resolution in the input data can mask the real spatiotemporal effect on a smaller scale could be raised. In this sense, tests have been carried out in Barcelona applying 100 m grid-cells to model burglaries in the Eixample (Additional file 1: Figs. S1B, S2B and S3B) and Ciutat Vella districts. The result has been that temporal autocorrelation is lost (ρ  0), with mean values close to 0 and which also can be negative (Additional file 1: Fig. S2B, bottom), indicating that burglars would avoid returning to the 100 m cells struck the previous week. However, in these micro-scale configurations and in densely populated environments, such as the Eixample district of Barcelona, spatial dependence is detected at distances of less than 250 m. Therefore, there is spatial but not temporal clustering, and micro-scale weekly prediction seems not to be possible.

Consequences of low detection of the near-repeat pattern in geographical proximity

The consequences of these limitations of the predictive models based on risk decay in geographic proximity are rather a double-edged sword for both predictive policing programs (Perry et al., 2018) and the police that enforce them. They tend to recommend going to static hotcells, which is usually where there is, on average, a greater likelihood of burglaries, and they especially recommend this when there have been burglaries during the previous week. While this is not a bad strategy, since it focuses police attention on the critical spots, it can almost certainly be improved. In the usual hotcells, there is expected to be special prevention, with a long-term planning horizon, to reduce the crime opportunity of the place (Center for Problem-Oriented Policing | ASU Center for Problem-Oriented Policing, n.d.); Risk Terrain Modeling | Official Site—Home, n.d.). This is related to the flag hypothesis and can be detected without the use of predictive programs due to the stability of the data.

Therefore, what appears to be most recommended for designing dynamic preventive strategies is improving the modelling of the dynamic component in a unique manner and without depending on the static component.

To do so, abandoning these micro-scale configurations is recommended. This is also justified by analyzing what happens in the 250 m cells because in these, even though the series are non-random and the burglaries appear grouped, they will mostly take place at a rate of 1 per week. According to the Poisson distribution, for cells with averages of 0.2 it is very unusual to observe weeks with a high number of burglaries (above 2). Burglaries considered predictable in a larger configuration will end up being isolated in 250 m cells which, individually, are not considered to be optimum for making predictions. This restricts predictive possibilities in some high-density urban areas where only a small number of the total burglaries in Catalonia take place and where police prevention strategies can be very complicated.

To improve prediction and prevention, it is necessary to zoom out towards large scales and study spatiotemporal relations that do not rule out distant correlations. It is not sufficient to look for correlations between burglaries located in the same large cell (Boqué et al., 2020); it is about linking distant events and clusters in general, and looking for correlations between different cells, adjacent or not. The criminological justification for this change of focus in heterogeneous environments can be found in the classical theories that explain crime patterns (Bernasco, 2009; Clarke & Felson, 2008; Gottfredson & Hirschi, 1990), where what is uppermost is maximizing the benefits and minimizing the cost or the risk. Therefore, if this behaviour is observed as being established as a pattern, it means that it benefits the burglars who have adapted the optimal way to act in these environments. As such, spatiotemporal patterns for making predictions should be expanded accordingly.

This study is not the first to detect limitations of prediction models based on the near-repeat phenomenon. Methods to measure the micro-level predictive capacity of the near-repeat pattern, before deciding whether to adopt it as the reference for making predictions, have been proposed (Groff & Taniguchi, 2019). Differences in the intensity of this pattern depending on the environment, have been detected (Chainey & Figueiredo, 2016; Chainey et al., 2018).

In fact, the need to broaden the concept of near-repeat victimization has long been mooted (Farrell et al., 2012), and in the last few years, a number of researchers have proposed new approaches. One of these is the idea of a near-repeat chain (Glasner et al., 2018; Haberman & Ratcliffe, 2012), while another is the spatial Markov chain and spillover effects approach (Rey et al., 2012; Zhang & Song, 2014). More recently, the shift pattern has been proposed (Wang & Zhang, 2020; Wang et al., 2019), which is considered an extension of the near-repeat victimization that links geographically distant events close in time, and in similar environments, due to displacement effects. This last study (Wang et al., 2019) concludes that adding shift pattern to near-repeats increases predictive capacity, and finds that 5 km is the average displacement distance (in this study, burglaries are located in a large Chinese city inside a circumference of approximately 10 km radius). Displacement is explained (Wang & Zhang, 2020) as a response to crime prevention initiatives that motivates offenders to look for alternative places for suitable opportunities. Offenders modify their awareness space (Bernasco & Nieuwbeerta, 2005; Brantingham & Brantingham, 1995), adding new areas to their “mental map”, in which they can strike consecutively due to similar geographic characteristics.

The new pattern that seems to have emerged is a combination of the near-repeat and the shift patterns, which could be seen as contradictory to one another. The way to unite them into a single pattern would be to suppose that burglars first choose a set of zones (Z1, Z2, …, Zn) to strike at the same time in a single wave. Then, for two or three weeks, a low-intensity pattern of near-repeat victimization would be observed in all of them, along with the shift pattern.

In this scenario, the need to rethink some theoretical aspects of the near-repeat victimization pattern stand out. For example, detecting these sets of victimized areas at the same time and testing whether the burglars’ elections are stable over time, so that on another occasion they choose the same. This would mean that the risk in one area could be predicted by remote burglaries in another area. Other questions arise such as whether these areas struck at the same time are similar (environment, type of houses, etc.), following the broad sense of near-repeat victimization theory. Likewise, the idea of prior burglaries triggering the wave should be questioned. Burglaries in Zone 1 could be used to predict the risk in Zone 2, if there is a known pattern that indicates this. Furthermore, which burglary triggered Zone 2: the first ones that were committed there or the previous ones in Zone 1?

This approach is consistent with the results obtained in this study, but there is a need to explore this further and more in-depth, especially when trying to model crime beyond urban environments and in a diverse and heterogeneous territory like Catalonia.

It is also consistent with studies on “journey to crime” of criminal groups that commit burglaries (Bernasco & Nieuwbeerta, 2005; Townsley & Sidebottom, 2010; Van Deviver et al., 2015), especially in Europe (Wollinger et al., 2018). There is a consensus that specialized groups tend to make longer journeys to access zones that they consider favorable for burglaries. They can strike dispersed areas in one hit on the same day. The decision-making process includes reconnoitering the area, usually some days previously, choosing the areas depending on the environment, and applying their knowledge or previous experiences of breaking and entering.

If this is confirmed, to model this phenomenon cluster and classification techniques capable of unraveling the action of criminal groups in large territories will need to be applied, before employing other predictive techniques.

Limitations

This study has some limitations. Crime data comes from events reported to the police, and so there is an unknown number of unreported crimes, although in the case of burglaries this is relatively low compared to other types of crimes. These missing data may be attempted or less serious burglaries and may be assumed that their spatiotemporal distributions run parallel to reported ones and so do not interfere with main conclusions. With reported burglaries, there is another limitation in relation to the time the burglary occurred. Usually, the day and the time window (morning, afternoon or night) is known, but in some cases, especially when the burglary occurs in an unoccupied second residence, the time window can be as much as several weeks. That said, the percentage of these cases is low and, in this study, we decided to consider intervals of a week which is a good time window for predicting purposes.

Data in this study are exclusively from Catalonia, and this may limit the generalization of the results and conclusions. However, the geography, town planning, type of society, lifestyle, etc., in Catalonia may have similarities with other areas and countries, especially in Europe and not only located in the Mediterranean area. In addition, organized and specialized crime in Europe has great mobility, and it is known that groups operating in Catalonia have a similar modus operandi to organised crime groups in other European countries. Likewise, in the case of home burglaries, the validity of the theoretical approach has been verified around the world, including Catalonia, and the local adjustments of prediction models that have been observed in this study case could be valid in other places, albeit they should be checked specifically.

A final note is to justify the absence of covariates in prediction models, which can be considered a limitation. Although it is not always easy to obtain territory layers of information that may be appropriate for the spatiotemporal modelling of burglaries, the option of not including covariates was intentional. The aim of the study was to test the risk decay hypothesis in spatiotemporal proximity, not to construct a prediction model that could consider the type of environment, house, or burglary to adapt or grade that risk decay.

Conclusions

The phenomenon of residential burglaries in Catalonia follows stable spatiotemporal patterns related to the static factor and dynamic waves of burglaries that enable us to predict them in large-scale configurations. When modelling from the micro-scale, limitations appear and the dynamics of the burglaries are not well captured.

To measure to what extent the classical principles of near-repeat victimization theory can be used for predicting purposes, we have considered a log-Gaussian Cox process to estimate and predict the number of weekly burglaries by modelling this phenomenon from the micro to macro-scale and according to risk decay in time and space.

With this approach, the INLA methodology has proven to be a very suitable tool for analyzing this kind of spatiotemporal dependencies, offering an output of a posteriori parameter distribution that greatly facilitates the interpretation and description of the phenomenon. Although in Catalonia results obtained fail in making good weekly predictions, it is highly recommended that any police force interested in studying near-repeat victimization use INLA as a first step, to be able to observe the wave’s limits and chances of repeat patterns in the same zone or zones nearby.

Analyzing the results in detail and relying on other research, the need for a new framework to explain space and time dependencies of crime in heterogeneous environments is clear. This new framework would be important for understanding serial and specialized burglar behavior in these environments, quite common in Europe, and would generate new challenges for crime prediction models and police prevention strategies.

In this regard, a new spatiotemporal pattern that seems valid to explain the results of this study has been proposed. It consists of assuming that burglars strike several areas at once in a single wave.

The pattern of near-repeat victimization, observed and widely proven around the world (Johnson et al., 2007; Kikuchi et al., 2010; Wang & Liu, 2017), meets the requirements of being interpretable from the policing perspective and useful for prevention. Even though this paper has demonstrated its formulation is insufficient to model and predict the general behavior of burglars in heterogeneous environments, the solidness of the criminological studies that underpin this concept means that it continues to be a basic spatiotemporal pattern for predicting home burglaries and that maybe, when broadened, as proposed, it will eventually explain the generation of distant, correlated events and clusters.

References

Download references

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, PB, MS and LS; Data curation, PB; Formal analysis, PB and LS; Methodology, PB, MS and LS; Resources, PB; Software, PB, MS and LS; Validation, PB and LS; Writing – original draft, PB; Writing review & editing, PB, MS and LS. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Laura Serra.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Figures, graphics and tables. INLA summary results for Barcelona Metropolitan Police Region (BMPR).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boqué, P., Saez, M. & Serra, L. Need to go further: using INLA to discover limits and chances of burglaries’ spatiotemporal prediction in heterogeneous environments. Crime Sci 11, 7 (2022). https://doi.org/10.1186/s40163-022-00169-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40163-022-00169-w

Keywords