 Research
 Open access
 Published:
An information theory approach to hypothesis testing in criminological research
Crime Science volume 7, Article number: 2 (2018)
Abstract
Background
This research demonstrates how the Akaike information criterion (AIC) can be an alternative to null hypothesis significance testing in selecting best fitting models. It presents an example to illustrate how AIC can be used in this way.
Methods
Using data from Milwaukee, Wisconsin, we test models of placebased predictor variables on street robbery and commercial robbery. We build models to balance explanatory power and parsimony. Measures include the presence of different kinds of businesses, together with selected age groups and social disadvantage.
Results
Models including placebased measures of land use emerged as the best models among the set of tested models. These were superior to models that included measures of age and socioeconomic status. The best models for commercial and street robbery include three measures of ordinary businesses, liquor stores, and spatial lag.
Conclusions
Models based on information theory offer a useful alternative to significance testing when a strong theoretical framework guides the selection of model sets. Theoretically relevant ‘ordinary businesses’ have a greater influence on robbery than socioeconomic variables and most measures of discretionary businesses.
“A welldesigned model is, after all, a judiciously chosen set of lies, or perhaps more accurately put, partial truths about reality, which have been chosen so as to permit us to reason more effectively about some issue than we otherwise could.”
(Baumol 1993, p. 55).
Background
Empirical criminological research relies heavily on testing null hypotheses of no difference. Rooted in statistical theory, decisions to reject a null hypothesis are keyed to finding statistically significant differences in relationships, or between outcome variables. Adopting conventions from previous research (Bushway et al. 2006; Sullivan and Mieczkowski 2008), we refer to this as null hypothesis significance testing (NHST). Despite its widespread use, researchers have identified a number of problems associated with the NHST approach as it is used in criminological research, and in other social sciences.
First is the reification of statistical significance as the most important outcome of quantitative research (Maltz 1994). Replicating analysis of papers published in the American Economic Review (McCloskey and Ziliak 1996; Ziliak and McCloskey 2004), Bushway and colleagues show that criminologists similarly more prominently report statistical significance than effect size. Second, scholars accept findings of no significance as evidence of no relationship (Weisburd et al. 2003), not always recognizing possible problems related to sample sizes, measurement error, or other features of research design. A related problem stems from modeling strategies when a large number of predictors are present. Third is the use of language such as “highly significant,” “borderline significant,” or “most significant,” that mistakenly equates significance and effect size. Fourth, researchers with very large numbers of data points may find that all independent variables meet virtually any significance level in their relationship with dependent variables (Maltz 2006).
Setting aside these problems, NHST mandates a simplified approach to empirical research that assumes binary increments to knowledge and often produces results of limited theoretical substance. Notably, the NHST requires that a researcher produce only one interesting research hypothesis and state the null. The research hypothesis, in essence, is never tested. Burnham and Anderson ask, “if there was little or no a priori belief in the null, what has been learned by its rejection?” (Burnham et al. 2011, p. 29).
This paper describes how an IT approach can guide selection of bestfitting statistical models. A key strength of this approach is its emphasis on testing a set of theorybased models against each other to identify the best among available models. What results is a more purposive comparison strategy in place of the somewhat arbitrary criterion of statistical significance, which plays virtually no role in AIC models.
We begin with a brief background discussion of an information theoretic approach that has become widely used in biology and psychology, but rarely guides criminological research. We then demonstrate the IT approach, using crime data from Milwaukee, Wisconsin, to examine how placebased measures of land use, together with measures of social disadvantage and age, are related to street robbery and commercial robbery. We use the Akaike information criterion (AIC) (Akaike 1973) to evaluate different models and aid in selecting the best models for two types of crime.
Akaike information criterion: a theoretical background
When building a theoretical model, information theorists posit that no model is a true model (Box 1976). This is largely because some percent of variance remains unexplained by all models. As such, any model built only approximates reality, or the unknown/unconstrained model.
However, Burnham and Anderson (2002) argue that it is possible to find the ‘best approximation’ to reality, or the distance between the unknown model and the model built to explain it, with a minimum loss of information. Kullback and Leibler (1951) developed a measure that became known as the Kullback–Leibler divergence, to represent this information loss associated with fitting a constraining model to the data.
Kullback and Liebler’s (1951) paper quantified the meaning of “information”, a concept related to Fisher’s thinking about “sufficient statistics” (Burnham and Anderson 2004). Three decades later, Hirotugu Akaike’s paper “Information Theory and an Extension of the Maximum Likelihood Principle” (Akaike 1973), proposed the Akaike information criterion (AIC), a method where Kullback–Leibler (K–L) divergence can be used to determine model suitability and selection.
The AIC approach computes goodnessoffit (accuracy) and model variability (precision) to quantitatively rank different models in order to select the most parsimonious model (Saffron et al. 2006). Put somewhat differently, the AIC seeks to find “optimal complexity” (Garamszegi 2011, p. 2) by incorporating parsimony in modelselection. Among other things, this means that AIC model statistics are not defined for “full” models containing all possible variables.
Rooted in work by William of Occam (ca.1320), the parsimony principle states that the simplest competing description is the best (Anderson 2008; Saffron et al. 2006). Parsimony is used to determine how many parameters can be estimated and included to reach optimum model accuracy (Anderson 2008). Models with too few parameters are underfitted and subject to bias due to the lack of information in the model. This is the familiar omittedvariables bias. Models with too many parameters are overfitted and lack precision (McQuarrie and Tsai 1998; Burnham and Anderson 2002). Model selection, therefore, involves a tradeoff between bias and variance, reflecting the statistical principle of parsimony (Burnham and Anderson 2004).
Models are often ranked based on conventional measures of goodnessoffit, such as their R^{2} values. Models that have increasing numbers of parameters end up with greater R^{2} values, but at the expense of greater variability in how the model represents the data (Saffron et al. 2006). This is because every additional parameter captures a ‘stochastic signal’, and this decreased amount of information available for each calculation will lead to increased variation in parameter estimates (Rannala 2002; Lemmon and Moriarty 2004).
It may be argued that using adjusted R^{2} value to report the fit of the model will achieve the same goal as AIC; the adjusted R^{2} also has the penalty for each additional parameter when added to the model. However, Burnham and Anderson (2002) suggest that while adjusted R^{2} values are useful as a measure of the proportion of explained variation in a model, these values should not be used for model selection and can be misleading (Burnham and Anderson 2002). Using an example of nine a priori models of avian speciesaccumulation curves from the Breeding Bird Survey (Flather 1996), they show that models with identical R^{2} values of 0.99 had large differences in AIC values that yielded more precise statements about the “best” model (Burnham and Anderson 2002, p. 95). These comments also apply to measures such as pseuoR^{2}, and others that center on proportion of variance explained.
The AIC includes a penalty for overfitting the model, not allowing for an increase in the statistical bias when more parameters are fitted (Wilson et al. 2013). Another advantage of the AIC in model selection is that AIC is independent of the order in which models are computed (Anderson et al. 2001).
The Akaike information criterion, is calculated as
where n is the number of data points (i.e. sample size), RSS is the residual sum of squares, and k is the total number of estimated model parameters, which include both the model parameters and the constant.
Computationally the AIC is the sum of two so called “penalty terms” (Burnham and Anderson 2002), one for bias and one for uncertainty. This means that the smallest AIC values achieved among candidate models is deemed the preferred model. The addition of parameters will always increase the likelihood score, and this “penalty term” ensures that the overparameterized model is not selected (Ripplinger and Sullivan 2008). In other words, models that have more fitted parameters will have higher AIC values, all other things being equal, and models that will be favored will be those with fewer parameters (Symonds and Moussalli 2011).
One of the strengths of building AIC models is the variety of methods that can be used to deal with model selection uncertainty (Garamszegi 2011). To compare models and determine relative support for each candidate model, several statistics can be calculated, which include the delta AIC (Δi), Akaike weights (wi) and evidence ratios.
Delta AIC (Δi) measures relative differences between a particular candidate model (AICi) and the Akaike ‘bestranked’ model, the model with the smallest AIC value (minAIC). Delta AIC is used to evaluate relative support for other candidate models and is calculated as in Eq. 2.
Burnham and Anderson (2002) suggest that models with Δi ≤ 2 provide “substantial evidence” for the model, meaning these models are essentially as good as the best model. Models that have 4 ≤ Δi ≤ 7 indicate “considerably less support” for the model, and Δi > 10 show that the model is “very unlikely” and should be rejected.^{Footnote 1}
It is important to note that AIC is a relative measure of how good a given model is among a candidate set of models, given the data. As such, even if essentially meaningless parameters or those that are poorly linked to the outcome variable are included, the AIC analysis will still produce a ‘best’ model among the candidate models examined.
Burnham et al. (2011) point out that such pitfalls can be avoided by theorybased selection of parameters. Parsimony is a criterion for evaluating models with strong theoretical support, and is consistent with the goal of finding the best model among a set of possible models.
Akaike weights (Wi) are an essential next step after the AIC values for each proposed model have been calculated. These weights represent the ratio of delta AIC (Δi) value for each model relative to the whole set of candidate models (Burnham and Anderson 2002). The calculations of Akaike weights allow for an immediate ranking of all candidate models. Weights for the ith model in a set of R candidate models are calculated as shown in Eq. 3,
where the denominator is simply the sum of the relative likelihoods for all candidate models. Wi is interpreted as the probability that the model is the Akaike ‘bestranked’ among the set of candidate models (Burnham and Anderson 2002). For example, an Akaike weight of 0.80 for a given model indicates that this model has an 80% chance of being the Akaike ‘bestranked’ model among the set of candidate models.
Lastly, Akaike weights can be used to determine the extent to which the ‘best’ model is better than other candidate models, expressed as evidence ratios:
Equation 4 compares model Wj against model Wi, and any calculated value is interpreted such that model j is X times more likely than model i to be the ‘best’ in the set (Burnham and Anderson 2002). For example, an evidence ratio of 4 indicates that model j is four times better that model i. Evidence ratios allow researchers to express how much better the ‘best’ approximating model (or any given model in the set) is compared to the next best model or other models in the set (Symonds and Moussalli 2011). Evidence ratios can also be calculated relative to models other than the ‘best’ model, providing more evidence for the relative strength of all candidate models (Anderson 2008).
The calculation of Akaike weights across all models allows the researcher to evaluate the relative importance of many potential predictor variables within these models (Burnham and Anderson 2002). In fact, Lukacz et al. (2007) argue that model weights and their ability to account explicitly for model uncertainty are major reasons why IT approaches should be highly favored over NHST (Richards et al. 2011).
Other model selection approaches have been developed that aim at achieving the same goal as the Akaike information criterion: to identify the most parsimonious and theoretically relevant models. These approaches rely on different model selection strategies and use different criteria to evaluate model fit relative to its complexity. This diverse list of models includes Mallow’s Cp method (Mallows 1973), Bayes information criterion (Schwartz 1978), Takeuchi’s information criterion (Takeuchi 1976), generalized information criterion (Rao and Wu 1989), among others. The Akaike information criterion, however, has been receiving considerable attention in recent years (Garamszegi 2011). Many fields in behavioral, as well as life sciences, such as astronomy, cosmology, nuclear and particle science, medical physics, ecology, statistics and psychology, engineering and computer science, have turned to Akaike information theory to model relationships.
Using AIC in criminal justice research
Scholars in other disciplines have been quicker to recognize the limits and common misinterpretation of p values in significance testing. A statement by the American Statistical Association (Wasserstein and Lazar 2016) lists and summarizes many of these objections.
Analyzing very large numbers of cases with the NHST approach produces a type of parsimony problem that is common in criminological research. When very small effect sizes are reported as statistically significant, models can include coefficients that contribute little to the substantive understanding of research questions. For example, in their analysis of state sentences applied to convicted offenders in Florida, Feldmeyer et al. (2015) analyze 501,027 cases accumulated over a 7year period. Each of 19 independent variables predicting a prison sentence is significant. Not surprisingly, this produces odds ratios that, in many cases, are not much different from 1.0.
Examples of sensitivity to the limits of NHST are emerging in criminological research. In their analysis of about 470,000 Pennsylvania defendants over seven years, Steffensmeier et al. (2016, p. 10) acknowledge that statistical significance is virtually assured: “As such, we place more emphasis on the direction and magnitude of the coefficients than on statistical significance….”. Similarly, Bernasco et al. (2017) avoid discussing statistical significance in their analysis of the combined effects of time and types of places on robberies in 24,594 census blocks in Chicago. Instead, they examine how odds ratios bracketed by standard errors depart from 1.0 for different 2h intervals within types of places. Using AICbased models offers a tool for systematically assessing the relative importance of models irrespective of sample size.
A related phenomenon is that with many cases, more variables can be added, something that is sometimes done with minimal justification. Controlling for measures of social wellbeing, socioeconomic status, social disadvantage, known risk factors, and the like is the norm. This is partly because previous research includes such concepts, often with minimal theoretical justification. In any event, producing multiple models with staged introduction of predictor and control variables implicitly treats all as equally important or unimportant until proven otherwise.
Sullivan and Mieczkowski (2008) summarize how a Bayesian approach can be an alternative to NHST in applied criminal justice research. They describe an example that sequences research sites in a series of intensive probation experiments. Three sites are timeordered, so that data collected from later sites draw on results for data from earlier sites in a cumulative analysis that “learns” from prior evidence. This contrasts with a NHST approach that would pool data from all three sites.
The most directly relevant example in criminology is Petrossian’s (2015) analysis of illegal, unreported, and unregulated fishing in the waters of 53 countries. Her analysis of AIC values for models combining situational variables concluded that the best model included all predictor variables, rather than selected subsets. It’s noteworthy that this analysis was published in Biological Conservation, a journal in which ITbased model selection is routine.
These examples notwithstanding, we are not aware of criminological research that uses an AIC approach to evaluate alternative theorybased models among a set of candidate models.^{Footnote 2} To illustrate how the AIC can be used, we examine how features of places are related to the distribution of two types of crime in Milwaukee, Wisconsin.
Crime and place
Criminological research has increasingly examined links between crime and place. The framework is theoretically rich, drawing on opportunity, crime pattern, and routine activity theories. That crime is concentrated at places, usually a small number of places, has been consistently demonstrated in a number of different cities. Weisburd (2015) offers a recent and comprehensive analysis showing this, to support his call for a new criminology of place. As noted by Weisburd (2015), and by Haberman and Ratcliffe (2015), empirical research has widely supported theoretical expectations about crime and place. Lee et al. (2017) present a systematic review showing the consistent links between crime and place.
An important example is research on how the presence of different kinds of businesses and facilities is related to crime patterns. Block and Block (1995) examined the presence of taverns and liquor stores near crime hotspots in Chicago. Bars and liquor stores are examples of crime attractors (Brantingham and Brantingham 1995), and have been the focus of much research on links between land use and crime (Groff 2014; Pridemore and Grubesic 2013; Gruenewald et al. 2006). Other types of undesirable but legal places, such as pawnshops, check cashing facilities and nightclubs, have also been examined in several cities. Such places are often referred to as “criminogenic,” (Bernasco and Block 2011; Groff and Lockwood 2014; Haberman and Ratcliffe 2015) unpopular, or troublesome (Wilcox and Eck 2011).
Less common is research on how the presence of ordinary businesses and facilities is related to crime at places. An example is the analysis of robbery in Chicago by Bernasco and Block (2011). They describe how concentrations of businesses based mostly on small cash transactions (fastfood restaurants, grocery stores, barber and beauty shops) are associated with crime hot spots, in addition to such places as vice markets, bars, and pawnshops. Analyzing about 24,600 census blocks in Chicago, all facility types were significantly related to robbery. Haberman and Ratcliffe (2015) focus mostly on criminogenic places, but recognize how the kinds of facilities regularly used by large numbers of people can increase crime risks by serving as crime generators. Such places include corner stores, fastfood restaurants, ATMs, and mass transit stations.
Building on Haberman and Ratcliffe (2015), Bernasco et al. (2017) combine measures of place types with time of day and day of week to assess whether robbery increases for specific combinations of places and times in Chicago. They find little temporal variation except for the presence of high schools, and that robbery is higher in census blocks with a variety of smallscale retail places not normally viewed as criminogenic, such as restaurants, grocery stores, gas stations, and laundromats. Yu and Maxfield (2014, p. 314) similarly find that businesses, such as grocery stores, beauty parlors, and business services, are associated with higher rates of commercial and residential burglary. Their analysis concludes with discussion of different mechanisms at work in associations between the presence of ordinary businesses and burglary risk.
Our analysis builds on this research, and what Yu and Maxfield term “ordinary businesses.” Unlike bars, liquor stores, pawnshops and the like, ordinary businesses are places that most people visit on a regular basis. Through such routines, “…innocuous or ordinary places play a role in exposing targets to an offender population.” (Yu and Maxfield 2014, p. 314). Like Bernasco et al. (2017), and Haberman and Ratcliffe (2015), we examine robbery. Unlike previous research, we distinguish robbery of commercial places and street robbery, expecting that the presence of different kinds of facilities and businesses will be differently related to each type of robbery. The distinction is important, because commercial robberies target fixed places, while the victims of street robberies can be more mobile. It is possible that certain types of commercial places are more attractive targets of robbery. Similarly, street robbery victims may be targets because they visit certain types of establishments, or because they are on the street, visiting ordinary businesses.
Crime and place serves as a useful example to demonstrate the AIC approach to inference for two reasons. First is the strong theoretical and empirical framework that has been built up around crime and place. Bernasco et al. (2017) cite rational choice, routine activity, crime pattern theories and the geography of crime as complementary theoretical frameworks in understanding links between place and crime. Second, the role of ordinary businesses is inherently placebased, and the effects of ordinary businesses can be systematically compared to the effects of businesses described as criminogenic. Such specific theoretical expectations are best tested by an IT approach that evaluates different combinations of variables within a set of place types.
Because theories of place are comprehensive and have accumulated empirical support, the theoretical mechanisms at work are especially wellsuited for comparing alternative models of robbery. Our analysis focuses on selecting the best among sets of models for commercial and street robbery. We then compare the AICranked best models to models that include all variables under study.
Methods
Study site
Milwaukee is the 31st largest city in the United States, with a 2010 population of about 594,000. About 61% of the Milwaukee population is white, followed by 27% African American, and 3% Asian, with the remaining 9% comprising other races (American Community Survey 2013). As of 2013, Milwaukee ranks the 7th most dangerous city in America, with a violent crime rate of 587.1 per 100,000 people (FBI 2013).
Units of analysis
Considering the units of analysis that accurately capture the social process under investigation is an important first step in spatial analysis (Johnson et al. 2009). After examining the distribution and number of businesses in Milwaukee, as well as the overall distribution of the crimes under investigation, we found the census tract level (N = 224) to be most appropriate.
We initially considered census blocks, but analyses revealed that about 90% of the census blocks remained unpopulated by the types of businesses examined here. Because drug stores, grocery stores, service stations and the like are common, we suspect their absence in the vast majority of census blocks reflects patterns of settlement in smaller Midwestern US cities like Milwaukee. Most research using census blocks has been conducted in larger, denser places like Chicago (Bernasco and Block 2011) or Philadelphia (Groff and Lockwood 2014; Haberman and Ratcliffe 2015). Moreover, past research has used census tracts as units of analysis to examine densities of businesses and violent crimes (e.g., Gruenewald et al. 2006; Livingston 2008; Zhu et al. 2004).
Data sources
Outcome variables
We obtained 2009 data on all crimes reported to police from the Milwaukee Police Department. Each record included the National IncidentBased Reporting System (NIBRS) code, address, time and date of the offense, type of location, and type of weapon(s) used. We selected commercial robberies and street robberies for further analyses. The Police Department provided the data in ArcGIS shapefile format, therefore, no further manipulations were necessary (such as geocoding addresses) to display the crime locations in ArcGIS.
Predictor variables
We used two sources to extract data for the predictor variables in this study. Data on demographic predictors aggregated at the census tract level, specifically, percent below poverty, percent renter occupied, percent age 18–21, and percent age 22–29, were obtained from the US Census Bureau (US Census 2000).
In this study, we distinguish between what we call discretionary places and ordinary places. Discretionary places are those that most people can choose whether to visit or not in the course of their normal activity. These include drinking places, liquor stores, and places of amusement/recreation. In contrast, ordinary places are businesses that most people patronize on a regular basis: drug stores, grocery stores, and service (petrol) stations.
Milwaukee data for the year 2009 were obtained from Infogroup, a company that provides data on businesses in the United States disaggregated by National Industry Classification codes. Infogroup’s database contains information about all registered businesses in the United States, and includes such details as business address, size, sales volume, number of employees, type of industry under which the business is registered and the business’s exact XY coordinate based on its registered address. The company contacts over 100,000 businesses daily (nationally) to verify the quality of the data in their database, as well as to ensure that the data are as current as possible (Infogroup 2015).
Data preparation
Demographic data in the form of ArcGIS shapefiles were directly downloaded from the US Census Bureau. The shapefiles were projected to match the projected coordinate system of the shapefiles containing data on crimes in Milwaukee. Crimes were then aggregated to 224 census tracts by spatially joining them to these tracts based on their location.
We used the XY coordinate information available in the Infogroup database to geocode the addresses of Milwaukee businesses used in the current study. Geocoding yielded a 100% match. We used the ‘clip’ tool in ArcGIS to select only the businesses that fell within the city boundary. We then aggregated these businesses to the 224 census tracts by spatially joining them to the census tracts. Table 1 presents descriptive statistics on businesses, crimes per census tract, age group, and social disadvantage.
Controlling for spatial autocorrelation
Spatial autocorrelation violates one of the important assumptions of traditional statisticsindependence of observations. We found that spatial autocorrelation was present for each crime type.^{Footnote 3} As a result, we created spatial lags to represent the average values for neighboring areas (Anselin 2003), which can be either determined as those bordering the target census tract or those calculated based on a fixed distance from the centroid of the target census tract. In this research, we computed spatial lag based on the knearest neighbor method as the distance weight.
Multiple working hypotheses
This research considered two groups of theories: those based on traditional explanations of crime: the agecrime curve and social disadvantage; and those based on environmental criminology. The proposed hypotheses representing each model used in the analyses are listed in Table 2.
We use AIC models to test the empirical evidence for each of the hypotheses listed in Table 2 relative to the others in the set. In other words, each of these theoretically built models, which are considered a priori, are tested against the other competing models to evaluate their strength relative to their competitors.
Analyses and results
Steps to evaluating the models
Different modifications of AIC include AICc (or AIC corrected), QAIC (or quasiAIC) and QAICc (see Symonds and Moussali 2011, for more information). To evaluate the fit of our models, we first determined which of these modifications of the AIC was most appropriate. We concluded that AICc is most appropriate given the small sample size (Anderson 2008). We proceeded to the following steps to estimate the models for each crime type using GLM (identity link function) and their associated AICc scores. These steps are shown in respective columns in Table 3.

A.
Calculated the small sample corrected AIC (AICc) by (column 1)
$$AICc = AIC + \frac{2k (k + 1)}{n  k  1}$$where k is the total number of predictors in the model (including the constant and error), and n is the sample size.

B.
Ranked the models from lowest to highest based on the AICc values. (Column 1)

C.
Calculated the difference between the model with the lowest AICc and others in the set (i.e. Δi) by (column 2)
$$\Delta i = AICci  AICcmin$$ 
D.
Calculated relative likelihood to evaluate the plausibility of each model by (column 3)
$${\mathcal{L}}\left( {{\text{g}}i y} \right) \propto \exp \left( {  \frac{1}{  2}\Delta i} \right)$$ 
E.
Calculated the Akaike weights for each model to normalize the relative likelihood values by (column 4)
$$wi = \frac{{\exp \left( {  \frac{1}{2}\Delta i} \right)}}{{\mathop \sum \nolimits_{r = 1}^{R} { \exp }\left( {  \frac{1}{2}\Delta r} \right)}}$$
Results for commercial robbery (all variables)
Table 3 shows the results for commercial robberies. It lists all the models that test the theories in separate sets together with models that include different theoretical combinations (e.g. the model that combines the discretionary and ordinary variables). Models that include place types, age groups, and social disadvantage are also shown. Additionally, all theoretically built models are compared against the interceptonly model to determine if the predictor variables have merit when compared against the latter.
The columns in Table 3 correspond to the steps discussed above. Column 1 ranks each model using AICc. Here, based on the AICc value, the first model containing both the discretionary and ordinary variables, together with spatial lag, has been identified as the model most justified by data, also referred to as the AIC ‘bestranked’ model. Akaike weights (column 4) show the weight of evidence that any given model is a plausible approximation given the data and the set of candidate models.
As indicated by the Delta AICc (column 2) and the relative likelihoods, the model that includes both discretionary and ordinary variables (plus spatial lag) was identified as having a 78% likelihood (column 4) of being the Akaike ‘bestranked’ among the set. No other models were identified as strong competing candidates. The ‘best’ model is four times better than the secondranked and 30 times better than the thirdranked model.
Results for commercial robbery (unpacked models)
To further examine whether we can discard models with uninformative parameters, we created the socalled unpacked models. Similar to Fondell et al. (2008), we retained only the AIC ‘bestranked’ model from the previous step. We then considered a new set of models to determine if we could eliminate the least important parameters. Unpacked models consider individual business types within the grouped discretionary and ordinary categories. In this way, the set of all models considered includes different mixes of business types, based on the AIC ‘bestranked’ model shown in Table 3. Results for the unpacked models are shown in Table 4.
The model that includes all ordinary business types, plus liquor stores and spatial lag, was identified as the AIC ‘bestranked’ model. The Akaike weights indicate that this new model has an 83% likelihood of being the Akaike ‘bestranked’ among the set, with no other models showing as possible strong candidates. The AIC ‘bestranked’ mode is six times better than the second best model. Apart from these models, the remaining models are highly unlikely.
Results for street robbery (all variables)
Table 5 shows the results for street robberies. Similar to commercial robbery, we consider the theoretically constructed models separately, as well as in combination. The interceptonly model is included in this set as well.
Like results for commercial robbery, the model that includes both discretionary and ordinary variables (plus spatial lag) has a 47% likelihood of being the Akaike ‘bestranked’ among the set. Two other models are candidates because Delta AICc are < 2. However, the ‘best’ model is almost two times better than the other competing models.
Because age was included in the secondbest model, we added age to unpacked models in a separate analysis (not shown). Results indicated that the unpacked models that included age were not better than those with landuse variables only. In the interest of parsimony, we do not report the results of these unpacked model sets. Apart from these two competing models, the remaining models are highly unlikely.
Results for street robbery (unpacked models)
Similar to commercial robbery, we built unpacked models for street robbery. The results for the unpacked models are shown in Table 6.
As shown in Table 6, the model that includes all ordinary variables, plus liquor stores and spatial lag, has a 67% likelihood of being the Akaike ‘bestranked’ model among the set, with no other models showing as possible strong candidates. The AIC ‘bestranked’ mode is four and seven times better than the second and third best models, respectively. The remaining models have little support, producing results identical to those for commercial robbery.
Negative binomial regression results
Anderson (2008, p. 68) suggests that after the ‘bestranked’ model has been identified, it is useful to assess the Akaike ‘bestranked’ model using a goodnessoffit test, such as residual analysis, R^{2} or similar approaches. However, he cautions that these tests should be treated as descriptive statistics and run as post hoc tests only after the ‘bestranked’ models have been identified.
Table 7 presents the negative binomial regression coefficients for variables in the models identified as the Akaike ‘bestranked’ models. The final unpacked models for both robbery types identified as ‘best’, included all ordinary variables + liquor stores + spatial lag. As shown in the bottom panel of Table 7, adding all variables evaluated in the AIC analysis increments the pseudoR^{2} by only about 0.01 over that for the ‘best’ models (top panel).
Discussion and conclusions
Using the AIC to guide theorybased model selection, we find that the best models include mostly ordinary businesses, and one type of what we have termed “discretionary business.”
Summary and discussion
If we had followed the traditional NHST approach, our analysis would look more like what is presented in the second panel of Table 7. That tacitly assumes placebased and socioeconomic variables are equally important. A traditional NHST analysis would cite theories of social disorganization or disadvantage and placebased theories as possible explanations of mechanisms related to the risk of robbery. Then measures, such as those shown in Table 7, would be included in successive models that are evaluated by assumptions about whether coefficients are statistically different from zero (Berk et al. 2010).
The information theoretic approach shown in Tables 3, 4, 5 and 6 and summarized in the top panel of Table 7 offers two insights. First, the best models for each type of robbery include ordinary and discretionary businesses and spatial lag (Tables 3, 5). Adding measures for two younger age groups and two measures of social disadvantage increases explanatory power, but not by enough to justify complicating the models when parsimony is considered. This claim is supported by the basic AIC modeling approach, in which easily computed changes in AIC from adding successive terms to a model balance added explanatory power against the number of terms in the model. In this sense, the AIC and related statistics express “criminological significance” rather than statistical significance.
Second, after unpacking models that included all types of ordinary and discretionary businesses, ordinary businesses plus liquor stores and spatial lag are the best models among those examined in Tables 4 and 6. Apart from liquor stores, the presence of discretionary businesses has no impact on commercial or personal robbery. Setting aside the models containing all “significant” variables allows us to focus more attention to the implied mechanisms at work in more parsimonious models.
Our expectations about possible differences in the effects of places by type of robbery were not supported. Both commercial and personal robberies are found in areas with a variety of businesses, most of them what we have called “ordinary”. Drug stores, grocery stores, service stations, and liquor stores could be the targets of commercial robbery. For street robbery, it is likely that people visit these common places on a regular basis, thus exposing themselves to risk.
A substantive interpretation of the consistent impact of spatial lag is that robberies happen near other areas with robberies, a type of risk heterogeneity. This is consistent with recent work by Bernasco et al. (2017), suggesting that robbers work in fairly stable places where targets are to be found. These researchers also point to the role of cash economies produced by businesses and facilities in attracting targets. Recalling placebased mechanisms, ordinary businesses both become and attract targets for robbery, and robberies tend to cluster near other places with robberies.
Concluding remarks
This paper has added to research on crime and place using an approach to modeling that we argue is preferable to traditional approaches in certain applications. Theories of place offer guidance in how landuse may be related to the number of robberies. Following prior research on how robbery varies with the presence of different types of businesses, we successively modeled bundles of ordinary and discretionary businesses. Theory offered a clear guide to producing a set of models, and our analysis identified the best models among that set, considering both explanatory power and parsimony.
The complementary concepts of crime generators and crime attractors help explain the importance of ordinary business. Though they mention potential victims, much of the discussion of generators by Brantingham and Brantingham (1995) refers to offenders: “Crime generators are particular areas to which large numbers of people are attracted for reasons unrelated to any particular level of criminal motivation they might have or to any particular crime they might end up committing” (1995, p. 7). Crime attractors create opportunities that are widely recognized by potential offenders (Brantingham and Brantingham 1995, p. 8). Cited examples are illegal markets, bars, and large shopping areas. While generators and attractors influence the behavior of potential offenders, they also affect the larger number of potential victims. As Yu and Maxfield (2014) note, not everyone chooses to visit a bar, pawn shop, or nightclub. But virtually all ambulatory people routinely visit and patronize certain retail establishments. Ordinary retail businesses are scattered around mostly residential areas, not entertainment districts. Everyone goes to grocery stores, and, in the Midwestern United States, most people end up near service stations. Service stations often include or are near small grocery stores or convenience stores. These are centers of behavioral routines for virtually everyone, not locations specializing in vice or drinking that appeal to more limited numbers of people.
Apart from these substantive findings, our approach departed from traditional NHST approaches in its consideration of sets of theorybased and socioeconomic variables. Theories of social disorganization and disadvantage permeate criminological research. One result is that researchers routinely include socioeconomic variables in multivariate analysis, regardless of the theoretical relevance or social processes under investigation. Socioeconomic variables, often inaccurately labeled “demographics,” may be treated as controls, covariates, or predictor variables of interest. Analytic strategies often successively test models with and without different clusters of variables to see which combinations hold together.
While some theoretical rationale supports such strategies, what results are unduly complicated models that are often difficult to interpret, and do not address substantive significance. The consequences of this are most evident in analysis of large numbers of cases. Notably, the potential benefits of applying information theory are greater when analyzing large numbers of cases. Examining many cases can produce a kind of antiparsimony by producing models where everything is statistically significant, yet little is said about substantive significance.
We recognize that our AIC approach is a substantial departure from methods long used in empirical criminology. Our approach also comes with certain limits and disadvantages. First, the AIC can be difficult to interpret, partly because it is not wellknown. AIC does not consider that any of the tested models include the true model. These are all approximations to the true model and do not include the true model in the set. This is, however, founded on the assumption that all models are mere approximations, and no model can be treated as the ‘true’ model. A corollary of this is that AIC values are only indirectly related to effect size estimates for individual measures.
Second, although AIC will still produce the ‘bestranked’ model among the set, much thought must be devoted to models a priori, primarily relying on theory. In other words, the results of the analyses are as good as the candidate set of models specified before the analyses are conducted (Mazerolle 2006). If all candidate models are poor fits, AIC will still produce the ‘bestranked’ model. Similarly, the AIC analyses don’t show if a better model exists other than the ones specified, unless that model is included in the set. Third, comparing AIC results across different studies can be difficult.
Finally, NHST can be more appropriate when it is difficult to specify a set of theorybased candidate models (Steidl 2006). In such case, NHST guides a statistical hypothesis rather than a substantive criminological hypothesis (Sleep et al. 2007). NHST is also preferable to AIC in the case of randomized experiments (Mazerolle 2006), where the null hypothesis of no difference is a straightforward baseline statement for framing analysis.
Future criminological research can use AIC in two ways. First, this approach can be used to build new models that not only aim at identifying the best among sets of models, but also to objectively assess competing models. Over 75 topranked journals in many fields that include astronomy, cosmology, nuclear and particle science, medical physics, ecology, statistics and psychology have published papers that used the AIC approach to model relationships. Criminologists have recently begun a more limited use of AIC and other informationtheory criteria, but rarely to evaluate different models (Petrossian 2015; Groff 2014, are exceptions). The calculations of AIC are relatively easy. Many statistical software packages already produce AIC values within the goodnessoffit tables. The subsequent calculations of delta AIC values (Δi) to assess the relative importance of all candidate models, as well as the calculations of Akaike weights (Wi) to evaluate the strength of evidence for these models, can be easily made in Microsoft Excel.
Second, this approach can be used to reevaluate the models produced in previously published articles in order to weigh the importance of variables found to be statistically significant in these models. Criminological research offers examples where complex models built with tens or hundreds of thousands of cases are used to test the significance of large numbers of variables. Results may show virtually every variable to be statistically significant. But what is the substantive importance of these variables? As Ziliak and McCloskey (2004) use the phrase “economic significance,” and Sleep et al. (2007) propose the use of “biological hypothesis testing” to replace “statistical hypothesis testing”, we might ask about “criminological significance” of lowperforming predictor variables. AIC analysis of published research can reevaluate such models with the goal of producing parsimonious explanations that are more theoretically sound.
Returning to the quote that opens this paper, “A welldesigned model is, after all, a judiciously chosen set of lies, or… partial truths….” That is certainly true of the models we summarize in the top panel of Table 7. But the partial truths are consistent with theoretical expectations about people, places, and crime, and the models are parsimonious. Recalling a similar quote from Box (1976), “All models are wrong, but some are useful,” we argue that empirically considering parsimony and relative theoretical support is more likely to produce useful models, than is empirically establishing statistical significance. Similarly, it’s easier to evaluate a judiciously chosen, parsimonious set of lies than to sort through what untruths might underlie NHSTbased models built with large numbers of cases and variables.
Notes
In some instances, several models may compete for the ‘best’ model rank, as their Δj or evidence ratios are < 2. In this case, modelaverage estimates can be calculated, as well as the precision of these estimates. For more information, see Burnham and Anderson (2002).
Karlis and Meligkotsidou (2007) include AIC and BIC in their comparison of different distributions of crime counts, but do not link their analysis to criminological theory.
For street robberies—Moran’s I = 0.38, z = 18.40, p < 0.001; commercial robberies—Moran’s I = 0.17, z = 8.29, p < 0.001.
References
Akaike, H. (1973). Information theory as an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki (Eds.), Second international symposium on information theory (pp. 267–281). Budapest: Akademiai Kiado.
American Community Survey. (2013). State and county QuickFacts: Milwaukee County, City of Milwaukee. Washington, D.C.: US Census Bureau.
Anderson, D. R. (2008). Model based inference in the life sciences: A primer on evidence. New York: Springer.
Anderson, D. R., Burnham, K. P. & White, G. C. (2001). KullbackLeibler information in resolving natural resource conflicts when definitive data exist. Wildlife Society Bulletin, 29, 1260–1270.
Anselin, L. (2003). GeoDa 0.9 User’s Guide. Urbana Champaign, IL: Spatial Analysis Laboratory, Department of Geography, University of Illinois, Center for Spatially Integrated Social Science.
Baumol, W. (1993). On my attitudes: Sociopolitical and methodological. In M. Szenberg (Ed.), Eminent economists: Their life philosophies. Cambridge: Cambridge University Press.
Berk, R., Brown, L., & Zhao, L. (2010). Statistical inference after model selection. Journal of Quantitative Criminology, 26, 217–236.
Bernasco, W., & Block, R. (2011). Robberies in Chicago: A blocklevel analysis of the influence of crime generators, crime attractors, and offender anchor points. Journal of Research in Crime and Delinquency, 48(1), 33–57.
Bernasco, W., Ruiter, S., & Block, R. (2017). Do street robbery locations vary over time of day or day of week? A test in Chicago. Journal of Research in Crime and Delinquency, 54(1), 244–275.
Block, R. L., & Block, C. R. (1995). Space, place, and crime: Hot spot areas and hot spot places of liquorrelated crime. In J. E. Eck & D. David Weisburd (Eds.), Crime and place. Crime prevention studies 4 (pp. 145–183). Monsey: Criminal Justice Press.
Box, G. E. P. (1976). Science and statistics. Journal of American Statistical Association, 71, 791–799.
Brantingham, P. L., & Brantingham, P. L. (1995). Crime generators and crime attractors. European Journal of Cr: mmal Pohcy and Research, 3(3), 5–26.
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A Practical informationtheoretic approach (2nd ed., p. 2002). NY: Springer.
Burnham, K., & Anderson, D. R. (2004). Multimodel inference: Understanding AIC and BIC in model selection. Sociological Methods & Research, 33(2), 261–304.
Burnham, K. P., Anderson, D. R., & Huyvaert, K. P. (2011). AIC model selection and multimodel inference in behavioral ecology: Some background, observations, and comparisons. Behavioral Ecology and Sociobiology, 65, 23–35.
Bushway, S. D., Sweeten, G., & Wilson, D. B. (2006). Size matters: Standard errors in the application of null hypothesis significance testing in criminology and criminal justice. Journal of Experimental Criminology, 2, 1–22.
Feldmeyer, B., Warren, P. Y., Siennick, S. E., & Neptune, M. (2015). Racial, ethnic, and immigrant threat: Is there a new criminal threat in state sentencing?”. Journal of Research in Crime and Delinquency, 52(1), 62–92.
Flather, C. (1996). Fitting speciesaccumulation functions and assessing regional land use impacts on avian diversity. Journal of Biogeography, 23(2), 155–168.
Fondell, T. F., Miller, D. A., Grand, J. B., & Anthony, R. M. (2008). Survival of dusky Canada goose goslings in relation to weather and annual nest success. Journal of Wildlife Management, 72(7), 1614–1621.
Garamszegi, L. Z. (2011). Informationtheoretic approaches to statistical analysis in behavioral ecology: An introduction. Behavioral Ecology and Sociobiology, 65, 1–11.
Groff, E. (2014). Quantifying the exposure of street segments to drinking places nearby. Journal of Quantitative Criminology, 30, 527–548.
Groff, E., & Lockwood, B. (2014). Criminogenic facilities and crime across street segments in Philadelphia: Uncovering evidence about the spatial extent of facility influence. ournal of Research in Crime and Delinquency, 51, 277–314.
Gruenewald, P. J., et al. (2006). Ecological models of alcohol outlets and violent assaults: Crime potentials and geospatial analysis. Addiction, 101, 666–677.
Haberman, C. P., & Ratcliffe, J. H. (2015). Testing for temporally differentiated relationships among potentially criminogenic places and census block street robbery counts. Criminology, 53(3), 457–483.
Infogroup (2015). Our Company. Retrieved Apr 2, 2015, from http://www.infogroup.com/aboutinfogroup.
Johnson, S., Bowers, K., et al. (2009). Predictive mapping of crime by ProMap: Accuracy, units of analysis, and the environmental backcloth. In D. Weisburd, W. Bernasco, & G. Bruinsma (Eds.), Putting crime in its place: Units of analysis in geographic criminology (pp. 171–198). London: Springer.
Karlis, D., & Meligkotsidou, L. (2007). Finite mixtures of multivariate Poisson distributions with application. Journal of Statistical Planning and Inference, 137, 1942–1960.
Kullback, S. & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79–86
Lee, Y., Eck, J. E., Soohyun, O., & Martinez, N. N. (2017). How concentrated is crime at places? A systematic review from 1970 to 2015. Crime Science, 6, 6.
Lemmon, A. R. & Moriarty, E. C. (2004). The importance of proper model assumption in Bayesian phylogenetics. Systematic Biology, 53, 265–277.
Livingston, M. (2008). Alcohol outlet density and assault: a spatial analysis. Addiction, 103, 619–628.
Lukacz, P. M., Thomson, W. L., Kendall, W. L., Gould, W. R., Doherty, P. F., Burnham, & Anderson, D. R. (2007). Concerns regarding a call for pluralims of information theory and hypothesis testing. Journal of Applied Ecology, 44, 456–460.
Mallows, C. L. (1973). Some comments on Cp. Technometrics (Vol. 15, pp. 661–675). Estados Unidos: ASQ American Society for Quality.
Maltz, M. D. (1994). Deviating from the mean: the declining significance of significance. Journal of Research in Crime and Delinquency, 31(4), 434–463.
Maltz, M. D. (2006). Some Pbaked thoughts (P > 0.5) on experiments and statistical significance. Journal of Experimental Criminology, 2(2), 211–226.
Mazerolle, M. J. (2006). Improving data analysis in herpetology: Using Akaike’s information criterion (AIC) to assess the strength of biological hypotheses. AmphibiaReptilia, 27(2), 169–180.
McCloskey, D. N., & Ziliak, S. T. (1996). The standard error of regressions. Journal of Economic Literature, 34, 97–114.
McQuarrie, A. D. R. & Tsai, C. L. (1998). Regression and time series model selection. New Jersey: World Scientific.
Petrossian, G. A. (2015). Preventing illegal, unreported and unregulated (IUU) fishing: A situational approach. Biological Conservation, 189, 39–48.
Pridemore, W. A., & Grubesic, T. H. (2013). Alcohol outlets and community levels of interpersonal violence: Spatial density, outlet type, and seriousness of assault. Journal of Research in Crime and Delinquency, 50, 132–159.
Rannala, B. (2002). Identifiability of parameters in MCMC Bayesian inference of phylogeny. Systematic Biology, 51, 754–760.
Richards, S. A., Whittingham, M. J. & Stephens, P. A. (2011). Model selection and model averaging in behavioral ecology: The utility of the ITAIC framework. Behavioral Ecology and Sociobiology, 65, 77–89.
Rao, C. R., & Wu, Y. (1989). A strongly consistent procedure for model selection in a regression problem. Biometrika, 76, 369–374.
Ripplinger, J., & Sullivan, J. (2008). Does choice in model selection affect maximum likelihood analysis? Systematic Biology, 57, 76–85.
Saffron, C. M., Park, J., Dale, B. E. & Voice, T. C. (2006). Kinetics of contaminant desorption from soil: comparison of model formulations using the Akaike information criterion. Environmental Science & Technology, 40(24), 7662–7667.
Schwarz, G. (1978). Estimating the dimension of a model”. Annals of Statistics, 6, 461–464.
Sleep, D. J. H., Drever, M. C., & Nudds, T. D. (2007). Statistical versus biological testing: Response to Steidl. Journal of Wildlife Management, 71(1), 2120–2121.
Steffensmeier, D., PainterDavis, N., & Jeffrey Ulmer, J. (2016). Intersectionality of race, ethnicity, gender, and age on criminal punishment. Sociological Perspectives. https://doi.org/10.1177/0731121416679371.
Steidl, R. J. (2006). Model selection, hypothesis testing, and risks of condemning analytical tools. Journal of Wildlife Management, 70(6), 1497–1498.
Sullivan, C. J., & Mieczkowski, T. (2008). Bayesian analysis and the accumulation of evidence in crime and justice intervention studies. Journal of Experimental Criminology, 4, 381–402.
Symonds, M. R. E., & Moussalli, A. (2011). A brief guide to model selection, multimodel inference, and model averaging in behavioral ecology using Akaike’s information criterion. Behavioral Ecology and Sociobiology, 65, 13–21.
Takeuchi, K. (1976). Distribution of informational statistics and a criterion of model fitting. SuriKagaku (Mathematical Sciences), 153, 12–18. (in Japanese).
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on pvalues: Context, process, and purpose. American Statistics, 70, 129–133.
Weisburd, D. (2015). The law of crime concentration and the criminology of place. Criminology, 53(2), 133–157.
Weisburd, D., Lum, C. M., & Yang, S. M. (2003). When can we conclude that treatments or programs ‘don’t work?’. The Annals of the American Academy of Political and Social Science, 587, 31–48.
Wilcox, P., & Eck, J. E. (2011). Criminology of the unpopular: Implications for policy aimed at payday lending facilities. Criminology & Public Policy, 10(2), 473–482.
Wilson, D. K., Valente, D., Nykaza, E. T. & Pettit, C. L. (2013). Informationcriterion based selection of models for community noise annoyance. The Journal of the Acoustical Society of America, 133(3), EL195–EL201.
Yu, S. V., & Maxfield, M. G. (2014). Ordinary business: Impacts on commercial and residential burglary. British Journal of Criminology, 54, 298–320.
Zhu, L., Gorman, D. M., & Horel, S. (2004). Alcohol outlet density and violence: a geospatial analysis. Alcohol and Alcoholism, 39(4), 369–375.
Ziliak, S. T., & McCloskey, D. N. (2004). Size matters: The standard error of regressions in the American Economic Review. The Journal of SocioEconomics, 33, 527–546.
Authors’ contributions
GP drafted “Akaike information criterion: a theoretical background”, “Methods”, “Analyses and results” sections and conducted analyses. MM drafted “Background”, “Using AIC in criminal justice research”, “Crime and place”, “Discussion and conclusions” sections. Authors jointly revised the manuscript for publication. Both authors read and approved the final manuscript.
Acknowledgements
Authors would like to thank Drs. Kenneth Burnham and David Anderson for their invaluable feedback on the earlier draft of this paper. Their comments were both instructive and constructive.
Competing interests
The authors declare that they have no competing interests.
Data availability
Upon request to authors.
Ethics approval and consent to participate
Not applicable.
Funding
Authors used personal funds to purchase data from Infogroup. The John Jay College Office for the Advancement of Research reimbursed authors for publication fees.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Petrossian, G.A., Maxfield, M. An information theory approach to hypothesis testing in criminological research. Crime Sci 7, 2 (2018). https://doi.org/10.1186/s4016301800775
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4016301800775