Quantifying the Geographical (Un)reliability of Police Data1Acknowledgements: The author wishes to thank the participants at the Young Nordic Police Research Seminar in Oslo in May 2017 for interesting discussions and feedback.
- Side: 157-171
- DOI: 10.18261/issn.1894-8693-2018-02-05
- Publisert på Idunn: 2018-12-12
- Creative Commons (CC BY-NC 4.0)
Place-based policing has attracted a substantial amount of attention, not least in relation to hot spot policing. Such policing efforts depend on geographical analysis of where crime takes place. However, while it is well known that police crime data suffer from many limitations, less is known about the extent to which the geographical reliability of these data constitutes a problem. The present study attempts to quantify the extent of this problem by exploiting the fact that in Sweden there is an alternative, and more reliable, source of geographical data for incidents of arson. The study compares the locations for car arson incidents as recorded by the police and the rescue services, respectively. The resulting quantification of differences shows that the median error for the police data is 83 meters. This presents a potential pitfall for geographical analysis, both for researchers using police data and for the police themselves in their operational and strategic analysis of crime.Keywords: crime data, police, geography, arson, motor vehicle crime, geocoding
The topic of burning cars has attracted a substantial amount of interest in Sweden in recent years, and statistical data from the Swedish Civil Contingencies Agency show that the frequency of incidents of intentional car arson has increased steadily since the 1990s (Myndigheten för Samhällsskydd och Beredskap, MSB). While more research is needed on the topic of car arson in general, the present study is interested not so much in the car arson incidents themselves, as in using them to investigate potential problems with the geographical reliability of police data. This is made possible by the fact that car arson is both a crime, and thus recorded by the police, and a fire incident, and thus recorded by the Swedish Rescue Service (which includes the fire service). This means that two separate datasets relating to the same car arson incidents can be compared to assess the relative reliability of the police data.
It is well known that police data suffer from a number of reliability issues, most importantly since many crimes are not reported to the police, and in isolation police-report data may therefore be viewed as a poor measure of crime (Gibson & Kim 2008). A more specific problem associated with police data is their geographical (un)reliability (Mazeika & Summerton 2017). While some types of crime are less easy to specify in geographical terms, such as internet fraud or tax evasion (Ratcliffe 2004), most types of crime take place at a specific location, which is recorded by the police in the form of an address. These addresses can thus be analyzed in order to understand where crimes tend to occur, and further analysis can then also be used to understand why crimes occur at those specific locations.
In many cases, however, crimes are either not committed at the precise location of an address, or the specific address is unknown (Ratcliffe 2001). This is to some extent applicable to almost all crimes, with the exception of burglary, which by definition occurs at a specific and known address (Tompson et al. 2015). A typical example of the first type of case, where the crime does not occur at an address location, might involve an offence committed in the green space of a park, which typically has no specific address, with this resulting in the crime being recorded at either a central address for the park as a whole, or at a nearby address outside the park.2An example of this is mentioned in the NYPD Incident Level Data Footnotes #10: “Offences occurring in open areas such as parks or beaches may be geo-coded as occurring on streets or intersections bordering the area.” http://www1.nyc.gov/assets/nypd/downloads/pdf/analysis_and_planning/incident_level_data_footnotes.pdf
The second type of case, in which the specific address of the crime is unknown, has been discussed in relation to cases of theft in transit systems, where victims do not know exactly when and where the theft occurred (Newton et al. 2014). Another possible example could be where a crime such as robbery is committed on a street, but where the victim is vague or unsure of the exact address – “it happened near the end of street X.” In such cases, the crime will often be recorded by simply using the street name, or possibly a street segment, defined as the section of a street located between two intersections. In other cases, crimes may be attributed to a more general area rather than to a specific address. For example, in the context of an evaluation of CCTV in Oslo, Winge and Knutsson (2003) reported that crimes were only attributed to areas. While this may mean that the risk for errors is smaller, it cannot be ruled out that aggregated data of this kind are based on flawed locations to begin with and thus suffer from similar weaknesses (e.g. Ratcliffe 2004). As has been shown by Ratcliffe (2001), large errors in polygon data can occur if point data are incorrectly coded to a different polygon, and this can be the case even if the errors in the point data that lead to events being assigned to the incorrect area are small.
When police or researchers conduct geographical analyses of crime, the addresses need to be converted to a specific location on a map, a process which is called geocoding (Ratcliffe 2004; Mazeika & Summerton 2017). As a result of missing or incorrect data, and the two problems highlighted above, some crimes will then be geocoded to an incorrect location, which might make some places appear to have more or less crime than they actually have. An example of this can be seen in the work of Ivert and Kronkvist (2013), who conducted a detailed analysis of crime in a neighborhood of the city of Malmö. When they were calculating crime densities to identify where hot spots of crime might be located, they noted that one robbery hot spot appeared to be located on a pedestrian/bicycle path. Upon closer investigation, however, it was found that this was probably a result of all the crimes along that stretch of bicycle path having been registered to a specific location even though in reality the crimes were likely to have occurred at different places along the path. To deal with this misrepresentation in the crime statistics of the offences being concentrated to a single hot spot, when in fact the distribution was more likely to be that of a fairly extended ”warm spot” for crime, with a lower crime density, the researchers adjusted the statistics so that the robberies were distributed evenly along the path in order to convey a more reasonable representation of the geographical distribution of the offences in question.
While the problem of the geographical unreliability of police data is fairly well known, less is known about the extent of the problem (Mazeika & Summerton 2017). In the Nordic countries, no studies have attempted to quantify this problem, and it is therefore difficult to say how much of an impact it has on studies of the geographical distribution of crime. The present study will attempt to perform a quantification of this kind by exploiting the fact that data from the Swedish rescue services provide an alternative source of information for the location of motor vehicle arson offences (the act of intentionally setting fire to a motor vehicle). The rescue services employ GPS-devices to more accurately specify the precise location of an incident, and their data will therefore not suffer from the first of the two problems highlighted above, i.e. the errors that result from assigning an address to outdoor crimes which do not occur at the exact location of this address. The second problem highlighted above, i.e. where the exact location of a crime is unknown, is held constant in the present study, since the focus here is on car arson, and the exact location of burnt-out cars is easy to pinpoint. By comparing police data and rescue service data on incidents of car arson on the same day and during the same hours, it is thus possible to quantify how much of an impact the registration of crimes to a nearby address has on the crime locations recorded in police data. Car arson has an additional advantage in that it is a crime that is reported to the police relatively often, since it is highly visible, and thus attracts the attention of bystanders, and since a police report is required in order to make an insurance claim. Indeed, both vehicle crime and arson have been suggested as examples of crimes with among the highest levels of geocoding accuracy (Tompson et al. 2015), which means that the geocoding accuracy of car arson can arguably also be expected to be relatively good.
The topic at hand in the present paper is of particular importance for policing in the form of geographically targeted efforts at crime prevention or crime control. The next section of the paper will briefly discuss such methods, with a focus on findings from the Nordic countries, before moving on to discuss the specifics of the current study.
Geographically targeted policing
While there is a long history of analyzing the locations at which crimes occur in order to facilitate the distribution of police resources, this has become increasingly common with the advent of technologies that facilitate the analysis of crime. The development of geographical information systems (GIS) represents a particularly important advance, since these have allowed for the visualization and analysis of crime data on a large scale (Weisburd et al. 2009). Using GIS, crimes can be mapped, densities can be calculated, places with particularly high crime levels can be identified, and the correlations between such places and other environmental features, notably bars, restaurants or public transport nodes, can be calculated. An important discovery from this emerging field of place-based crime and policing analysis is that crime is highly clustered to specific places, often labeled hot spots (Brantingham & Brantingham 1999; Weisburd et al. 2012). A small proportion of such places, less than five percent, typically account for at least 50 percent of the crime in a city (Sherman 1989; Weisburd et al. 2004). This has led to the formulation of a law of crime concentration, which states that “for a defined measure of crime at a specific microgeographic unit, the concentration of crime will fall within a narrow bandwidth of percentages for a defined cumulative proportion of crime” (Weisburd 2015: 133). This proposition has since been the focus of several studies, and although a number of methodological concerns have been raised, the general proposition appears to hold true (Bernasco & Steenbeek 2017; Eck et al. 2017; Levin et al. 2017; Oliveira et al. 2017; Haberman et al. 2017; Gill et al. 2017; Hipp & Kim 2017; Hibdon et al. 2017).
The findings on hot spots of crime have had a major impact on policing, with police departments across the world adopting the principle that police resources should be focused on hot spots. Hot spot policing has been proven to be an effective means of reducing crime (Braga et al. 2012; Sherman & Weisburd 1995; Rosenfeld et al. 2014), and is in fact one of the policing methods that exhibits the strongest evidence for crime reduction through police work (Abt & Winship 2016). While several Swedish studies have shown that crime is highly concentrated (Marklund 2011; Uittenbogard & Ceccato 2017; Johansson et al. 2015; Gerell & Kronkvist 2016; Sturup et al. 2017), the evidence for hot spot policing in a Nordic context remains rather limited. The most ambitious Nordic study to date is a randomized control trial from Denmark, in which the Danish Ministry of Justice and the Danish police piloted hot spot policing in three police districts (Atterman 2017). They identified 36 hot spots which had at least 100 police-reported offences of certain selected types within a 270 meter radius and with a minimum of 200 meters between hot spots. Thirty-one of the hot spots were included in the study and were randomly assigned for intervention or to a control group. The evaluation found non-significant changes overall, but significant decreases in vandalism and motor-vehicle related crime (Atterman 2017).
A number of smaller studies have been conducted in Sweden. A study on hot spot policing conducted by private security guards in the Swedish city of Örebro found that increased patrolling by security guards was associated with non-significant decreases in crime (Frogner et al. 2013). A study of two hot spot policing interventions in the Swedish cities of Eskilstuna and Stockholm similarly noted non-significant decreases in assaults and robberies in the two cities (Marklund & Merenius 2014). Thus, while both those studies noted decreased crime at the hot spots, none of the studies found these decreases to be significant.
Although hot spot policing is typically considered from the viewpoint of an increased level of targeted police patrols (or, in the case of Frogner et al. 2013, patrols by private security guards), there are also studies on other aspects of policing that should be considered. Some such studies in the Nordic countries have considered the policing of hot spots, with two Swedish studies considering the effect of using actively monitored CCTV cameras to improve policing at hot spots in Malmö (Gerell 2016) and Stockholm (Marklund & Holmberg 2015) respectively. In both cases a hot spot policing effort was already in place, and the addition of CCTV was introduced in part to give the police better tools to identify situations that could lead to violence or other crimes and thus prevent the crimes from happening. However, both studies noted that the introduction of CCTV was associated with non-significant changes in crime (Gerell 2016; Marklund & Holmberg 2015). A similar finding of mostly non-significant changes in crime was reported by an evaluation of CCTV in Oslo, although this study noted decreases in both robbery and bicycle theft (Vinge & Knutsson 2003).
In addition to hot spot policing, a number of other geographically targeted policing methods, such as broken windows policing, have been found to be effective (Braga et al. 2015). While such methods also usually depend on some form of geographical analysis, they tend to be targeted at larger geographical areas, typically neighborhoods, and are therefore less susceptible to problems associated with inaccurate geographical information. Another type of intervention that has been studied has involved collaborative efforts to reduce nightlife violence, in which the police and other actors work to reduce the over-serving of alcohol and to generally reduce the risk of violent crime at nightlife hot spots that are characterized by high levels of violence. One such intervention in Sweden noted a 29 percent reduction in violence in Stockholm (Wallin et al. 2003; Wallin et al. 2005). In a more recent effort to replicate these findings in Oslo, however, no significant effect on crime was found (Skardhamar et al. 2016).
A more recent innovation is the use of near-repeat patterns for the purposes of crime prevention, based on the principle that following the occurrence of a crime, there is an elevated risk for a similar crime to occur nearby within a short period of time. A recent study on this topic found small but significant crime preventive effects from police-initiated efforts targeted at persons victimized by burglary and their near neighbors (Johnsson et al. 2017). In relation to burglary, the geographical reliability of police data will tend to be high, but near-repeat patterns have also been established for other crimes in Sweden, such as gun violence (Sturup et al. 2017), and preventive efforts in such cases may well also need to consider the reliability of police data.
The essence of hot spot policing and similar methods, which include analysis to identify micro-places as hot spots, and then increased police presence at such hot spots, will be sensitive to the way in which the geographical analysis that identifies hot spots is conducted. This is in turn highly sensitive to the geographical reliability of the data employed. When police resources are focused on very small locations, even fairly modest errors in the geographical information used to identify such locations may have an impact. As was outlined in the introduction, there are several potential sources of error, and the existence/absence of an address, the victim’s knowledge/memory of the exact location of a crime and the geocoding of data may all have an impact. To date, however, the research has for the most part only dealt with the issue of geocoding.
Ratcliffe (2004) calculated the hit-rate required for geocoding to produce accurate data, and while noting that an 85 percent hit-rate constitutes the minimum required level, he also noted that there may be systematic biases in geocoding. For example, new housing developments may not be included in the geocoding database, and specific buildings such as cinemas, that may be used as addresses, may not have a specific location associated with them. Both these errors are similar to the type of errors associated with crimes committed in a park or other large open area, since when they are geocoded they will be specified to a proximal location which may thus increase the degree of bias. In addition, there are a number of errors that are attributable to the geocoding process itself, with these often relating to incomplete address data, duplicate names or similar issues that make automated coding less accurate. The proportion of crimes that can be geocoded typically reaches the 85 percent threshold specified by Ratcliffe (2004), but some studies report far lower rates for some of the study data. For example, Cohen (2006) noted that only 72 percent of Pittsburgh data for the years 2000–2001 could be geocoded, while the rate was 91 percent for the years 1990–1999. While this suggests that geocoding is an important issue, and one that can often be dealt with satisfactorily, it tells us much less about whether the input data for geocoding were accurate to begin with. The present paper analyzes both geocoding and the quality of input data in order to provide a broader picture of the geographical reliability of police data in Sweden.
The research design of the present study is simple and straightforward. The two datasets (see below) are matched on the basis of time and date to create data-points pertaining to the same incident in both datasets. The geographical discrepancy between the two datasets is then measured using ARCGis, and the results are analyzed further. The analysis largely mirrors that of Ratcliffe (2001), but only uses point data, and focuses on mean and median differences between the datasets, in addition to histograms over the geographical errors.
The police data are based on a straightforward coding of the Swedish offence code 1202 “Vandalism, through arson (including motor vehicle)” for incidents of motor vehicle arson from the year 2013. The data include the coordinates as specified by the police, and the geocoding was thus performed by the police department. The police data also include the reported times of the start and conclusion of the incidents, which provide a time interval during which the crime is deemed likely to have been committed. For the purposes of the analysis in this study, the reported start time of the crime was employed. Crime reports focused on incidents that occurred in years other than 2013 were excluded.3In Swedish statistics, crimes are registered in the year that they are reported, even if the actual crime may have occurred earlier.
The rescue service data employed in the study include outdoor incidents that have been deemed by a fire inspector to have been “started with unlawful intent” (“brand anlagd med uppsåt”) (See Gerell 2017 for a discussion). In order to ensure relative comparability with the police data, all motor vehicle-related arson objects were included, including mopeds and trailers, but the bulk of the incidents relate to cars. The locations of the incidents in the two datasets were then transformed into a new coordinate system (RT9025gonv) in ARCGis, and all incidents coded as having occurred outside the municipal boundaries of Malmö4Some incidents that had been coded as having occurred outside Malmö’s municipal boundaries had clearly been incorrectly geocoded, but the presence of most of these incidents was due to the fact that both the police and rescue services also operate in other municipalities. were excluded before comparisons were made. The resulting police dataset included a total of 358 reports.5There were 359 incidents, but one was registered as having occurred in 2012 and was thus excluded. The rescue service dataset was coded based on the object, and all motor vehicle-related incidents were initially included, producing a total of 136 incidents once those that had occurred outside the municipality of Malmö had been excluded.
The time of the call to the alarm center from the rescue service data was compared to the start time specified in the police data, identifying the best match for the rescue service incidents in the police data. The police data are less precise, and are often rounded to the nearest half hour. In some cases, the police data did not include a start time. All cases in which a single match could be obtained within a 30 minute time frame were included. In addition, a manual review revealed a number of incidents (n=12) where it was plausible to assume the two datasets were referring to the same incident even if the time discrepancy was greater than 30 minutes (n=7) or when no starting time had been registered in the police data (n=5). In these cases, the match was made based on the time for the conclusion of the incident having been registered later on the same day, typically late at night or in the early hours of the morning. Such incidents were included in the data, but an analysis was also conducted with these cases excluded in order to rule out the possibility that the coding of these cases had biased the study data.
In some cases (n=7) more than one incident in the police data matched the rescue service time data, and in these cases the police data incident that was geographically closest to the rescue data incident was chosen. In part this reflects the fact that a single incident for the rescue services can yield multiple police reports, and indeed some of rescue service cases (n=5) yielded two incidents with exactly the same time and location in the police data. For the other incidents, however, the location differed, and since the best geographical match from the police data was selected, the analysis may have somewhat overestimated the reliability of the police data.
Of the 136 incidents in the rescue service dataset, it was possible to match 114 with the police data; 102 of these incidents were matched on the basis of the police data start time being within 30 minutes of the rescue service alarm time. Of the 102 incidents that were matched in this way, the mean difference between the police start time and the rescue service alarm time was 6 minutes and the median difference was 3 minutes. The 12 manually coded incidents were also included in the final sample, producing a total final sample size of 114 incidents.
The difference in distances between rescue service and police data for the 114 matched car arson incidents is summarized in Table 1. The first column shows results for all 114 incidents, the second column excludes one outlier for which the difference was over 6 kilometers, the third column excludes the 12 incidents that were not matched exactly on time and the fourth column excludes both these 12 cases and the outlier. The data are summarized in terms of mean differences in X (east-west) and Y (north-south), absolute mean differences in X and Y (e.g. disregarding whether the error was in the east-west or north-south direction, and only taking the error value into account), absolute median differences in X and Y, the combined mean difference (based on the Pythagorean theorem on absolute differences) and the combined median difference. The mean difference is close to zero following the exclusion of the outlier, suggesting that there is little systematic bias in the data on the north-south or east-west axes. When the manually coded incidents are also excluded, however, the data yielded a slight bias towards the east, which indicates that the manually coded incidents tended to be coded a little more to the east of the locations suggested by the rescue service data.
|Full data||Excluding outlier||Excluding manually coded incidents||Excluding manually coded incidents and outlier|
|Mean X diff||–36||–2||–35||26|
|Mean Y diff||–8||1||–9||1|
|Mean X absolute diff||156||101||168||108|
|Mean Y absolute diff||110||101||114||105|
|Median X absolute diff||45||44||45||44|
|Median Y absolute diff||50||49||50||49|
|Mean combined diff||213||159||227||166|
|Median combined diff||85||83||86||83|
Absolute mean differences are a little over 100 meters on both the east-west and north-south axes, and the mean actual difference is 213–227 meters when the outlier is included, and 159–166 meters when the outlier is excluded. The fairly small differences noted between the comparisons that include the manually coded data and those that exclude these data suggest that most of these incidents appear to have been accurately matched. Since the data are highly skewed, the median will be employed as the preferred outcome, and this is very stable at 83–86 meters in all four columns of the table. The police data thus tend to be “wrong” by about 80 meters, if the rescue service data are taken at face value as representing the exact location of burning cars.
These results are visualized with examples in Figures 1 and 2. The figures show the neighborhood of Kroksbäck, which has traditionally experienced fairly high rates of car arson. In 2013 there were 7 police recorded incidents of car arson on two streets in this neighborhood, as shown in Figure 1. The yellow dots mark the locations of car arson incidents according to police data, and the red squares mark the corresponding locations from the rescue service data. Dots of differing colors tend to be located close to each other, which seems to suggest that the two data sources are fairly comparable. Closer inspection shows this conclusion to be incorrect, however.
Figure 2 shows the matched data, with black lines indicating which dots are related to the same incidents, and with the buildings and streets removed to facilitate interpretation. It can now be seen that most of the locations differ quite substantially, and that the police data include only four locations whereas there are seven locations in the rescue service data.
The location of the yellow dot in the northeastern section of the map has been used by the police in connection with the registration of three different incidents, two of which had occurred more than two hundred meters further south on the street. This is probably the result of this street having few houses, and thus few addresses, with has led to crimes being recorded in the middle of this section of the street. The other three yellow dots showing where the police have recorded car arson incidents are off by between 10 and a little over 100 meters, which is probably due to the difficulty of specifying an exact address for these incidents and to the incidents therefore having been assigned to nearby locations.
To further elaborate the analysis, the distribution of errors was considered. On both the Y and the X axes, the errors associated with about half the incidents were of less than 50 meters. The distribution of errors largely follows a Poisson distribution with small errors being more frequent than progressively larger errors (Figures 3a and 3b). A reasonable hypothesis would be that the errors in the 0 to 50 meter range for each axis largely represent correctly coded addresses, but with the actual location of the burning car being in a parking lot that lies at a certain distance from the building that contains the address. The larger errors are more likely to be associated with locations for which it has not been possible to determine an exact address, which often results in the location being coded to the center of a street segment or street. One example of this can be seen in the easternmost cases in the Kroksbäck example, which took place on a street with few buildings, and thus few addresses, which resulted in the car arson incidents being coded to the center of the street. The largest errors, of 300 meters or more, may in some cases be attributable to similar problems, but might in some cases also represent miscoded data. One potential example of where this might happen would be if the home address of the car owner is registered as the location of the incident even when the incident actually took place at a completely different location. Unfortunately, there are no available data that would allow for the testing of such a hypothesis.
The present study has shown that there are systematic differences between police data and rescue service data on the location of car arson incidents. Since the rescue service data are arguably more accurate, this points to systematic errors in the police data. While the present study shows that the median error is 83 meters, it should be noted that car arson may be expected to be among the crime types characterized by the lowest levels of error (see e.g. Tompson et al. 2015, who discuss similar types of crime). This figure of 83 meters can also be compared to the mean distance between two different measures of addresses examined by Ratcliffe (2001), which were found to be 47 meters apart on average, largely due to one of the definitions being based on the road network and the other on buildings – which probably mirrors the differences between the data sources compared in the current study. The rescue service data are based on the location of a burning car, and will thus tend to be focused on streets or parking lots, whereas the police data are based on addresses, and will thus tend to be focused on buildings. This is a difference that is likely to be found for many other types of crime committed in public environments, and a fairly substantial proportion of the error in police data will be due to this basic discrepancy between the locations at which crimes are committed and the addresses to which they are ascribed. However, it is also worth noting that the rescue service data may also be subject to a number of weaknesses. One possible source of error would be if the fire inspector in charge of logging the location with a GPS device did so from the comfort of his car rather than from the exact location of the incident. While there are no available data to suggest that this does in fact happen, it would appear reasonable to suggest that the rescue service data may also be characterized by reliability issues. The bottom line, however, is that it is plausible to assume that the error will be smaller in the rescue service data than in the police data, and thus that the quantification of differences between the two gives some indication as to the geographical unreliability of police data.
For many types of crime, police data are likely to be even more inaccurate, and it is important to consider this when conducting detailed geographical analyses of crime patterns. This is particularly true in relation to detailed geographical analyses that are used as a basis for operational decisions linked to police work, most commonly in relation to hot spot policing. Hot spot policing is often directed at violent crime in public environments, which may be expected to be subject to a higher degree of geographical inaccuracy than the arson offences examined in this study, and there is a risk that policing efforts may end up being directed at the wrong locations. While most policing efforts are directed at areas that are large enough for this factor to be of only minor importance, it is nevertheless a factor that should be considered. Even if policing efforts are focused on the correct area, efforts might be specifically targeted at micro-places which are not in fact affected by large numbers of crimes, while nearby micro-places that do have high concentrations of crime are missed. In cases where point data on offence locations are not employed, such as in the study conducted by Vinge and Knutsson (2003) in Oslo, Norway, where the only available crime data were aggregated to larger areas, it is possible that these problems may be reduced. At the same time, however, we cannot rule out the possibility that the problems will sometimes in fact be exacerbated. If unreliable geographical locations result in a focus not just on the wrong place within a given area, but on the erroneous identification of an entire area for interventions against crime, this will have an even greater impact on the effectiveness of police operations. Since areas such as neighborhoods are often defined by streets and other natural boundaries, it is quite plausible that geographical errors that are relatively small in absolute terms could nonetheless result in large area-level biases. This might be the case, for example, if large numbers of crimes are erroneously coded to one side of a street that marks the boundary between two areas, in which case they may end up being recorded in the wrong area. As has been noted by Ratcliffe (2001), this problem is likely to become smaller as the areas in focus become larger, but given the preference for the use of small areas for the purposes of effective place-based policing, this trade-off may become an important consideration if the quality of the underlying data cannot be ascertained.
In order to achieve better geographical reliability, a reasonable way forward would be to examine alternative data sources for crime locations. This study has employed rescue service data, but data of this kind are only available for a small subset of crime types. Future research would do well to explore other alternative sources for crime data. For violence it would be possible to use accident and emergency room data to obtain insights into the locations of crime (Forgan 2014), and sources of this kind are deserving of further attention. In Cardiff, Wales, the emergency room staff systematically collect geographical data from assault victims and share these locations with the police in an effort to help prevent violence (Boyle et al. 2013). Another option would be to use ambulance call-out data to map violence, either at small-scale locations (Sutherland et al. 2017), or aggregated to larger areas (Sutherland et al. 2013). With regard to the use of ambulance call-out data on violence, it has been argued that such data could make a substantial contribution to policing efforts focused on crime prevention (Sutherland et al. 2017).
Future research would also do well to attempt to replicate the present study in other countries. Very little is known about the extent of the geographical unreliability of police data, and given its potential importance for both policing and research it is an issue that deserves to be given attention.
Overall, while geocoding issues have received some attention (Ratcliffe 2010), the related but separate issue of how well addresses capture actual crime locations may be at least as important. Crimes are often recorded to an existing address, even if the actual crime location is 10 or 100 meters away. This issue is further compounded by the fact that exact addresses are often unavailable, for instance when the victim is unsure of exactly where a crime took place. The present paper has shown that even for a crime that should be associated with a fairly high degree of geographical reliability, issues of this kind contribute to a median error of 83 meters as compared with more reliable rescue service data. We can expect errors to be even larger for crimes such as violent offences in public environments, and this could result in misdirected policing efforts if analyses fails to take such errors in to account. Researchers, analysts and police departments more generally need to be aware of these pitfalls when analyzing crime and designing place-based interventions to reduce crime. This also raises the question of whether the police should also consider employing GPS-data to a greater extent in their efforts to become more data-driven, and as a means of obtaining more detailed insights into both crime and policing.
|1||Acknowledgements: The author wishes to thank the participants at the Young Nordic Police Research Seminar in Oslo in May 2017 for interesting discussions and feedback.|
|2||An example of this is mentioned in the NYPD Incident Level Data Footnotes #10: “Offences occurring in open areas such as parks or beaches may be geo-coded as occurring on streets or intersections bordering the area.” http://www1.nyc.gov/assets/nypd/downloads/pdf/analysis_and_planning/incident_level_data_footnotes.pdf|
|3||In Swedish statistics, crimes are registered in the year that they are reported, even if the actual crime may have occurred earlier.|
|4||Some incidents that had been coded as having occurred outside Malmö’s municipal boundaries had clearly been incorrectly geocoded, but the presence of most of these incidents was due to the fact that both the police and rescue services also operate in other municipalities.|
|5||There were 359 incidents, but one was registered as having occurred in 2012 and was thus excluded.|