I analyzed the Boston crime data from the following Analyze Boston Data (Crime Incident Reports (August 2015 – To Date) (Source: New System) – Datasets – Analyze Boston) from 2018 – 2022. I selected 2018 – 2022 in order to compare pre-covid years through covid and post-covid years. I excluded 2020 from some plots because there were so many confounding variables.
Tools: Python to create a Streamlit app for visualizing the crime dataset.
Purpose of Investigation:
The media has been pushing messaging about a crime wave across the country. I have been very curious about this since data is usually presented as a comparison against the prior year. However, comparisons against the most recent years are confounded by societal changes resulting from covid. Many of the differences year over year could be attributed to the fact that people had been in lock down during 2020. Even in 2021, people were still not as active outside the home. Many people now work remotely, and downtown areas of the city are not as active as pre-covid times with the move towards remote/ hybrid work environments. I was curious to see if all crimes were up or maybe some types of crimes up and others down. Is there crime in areas that previously had little crime and maybe less crime in areas that had previously been considered the high crime areas?
I was also hoping to glean evidence of racial bias in the data and investigate any changes in violence against women. Unfortunately, there were two hindrances to achieving my initial goals.
Exploring the Data
Missing Information
Race/ethnicity was not included in the data regarding the crime, obfuscating analysis racial bias in policing. Also, rape and indecent assaults were included in the list of offenses, but these crimes not included in the data sets. If you look online, or you watch the news, you know that there have been rapes. However, there is no way to confirm the numbers provided. Is there a slant or bias on what the news outlets include in their numbers? If we are aiming for true transparency, this data should be included. I can understand if exact latitude and longitude is not included for these more sensitive crimes. However, they could have left this part of the data out. They could have simply included the district. Also, human trafficking cases were included, which would be sensitive as well, although less frequent. Worthy of note is also that there was only human trafficking data for 2018 and not after that. I am wondering if this is really the case, or if they stopped reporting this.
Augmenting the Dataset
To better illuminate racial or ethnic bias, I added district demographic data, percent white, non-Hispanic and percent living in poverty, that was also found in the Analyze Boston data sets. The ratio of person investigations to common crimes, such as thefts, violent crimes and drug crimes, was calculated for each district. This calculation was plotted in two separate plots, sorting district data on percent white, non-Hispanic and also percent living in poverty. This was done to highlight trends that indicate disparate numbers of person investigations that don’t appear to correlate to differences in the number of crimes in a district, but instead might be more correlated to district differences in class or race.
Grouping Offenses
Some individual offenses looked much higher than other individual offenses. However, different trends were found if you looked at groups of crimes or activities. Some types or groups of crimes may have more related offense codes, but fewer instances of an individual offense. I looked at the most common offense codes, individually, and also grouped into related crimes or activities (property, drug, violent, stealing/ theft related).
I also found that some crimes were categorized under multiple different codes. The address, date and time were the same. I grouped like crime categories together. For example, I combined all types of larceny together. This allowed me to remove duplicates to prevent over-counting the number of incidents. The downside is that there is a little less precision in terms of degree of the crimes.
Data Cleaning
Consolidating Offenses and Activities
There were duplicate offenses in some cases due to misspelling and also extra spaces, or hyphens in one version of an offense and not in another. Misspellings were corrected and extra spaces trimmed, and erroneous hyphens removed when necessary to combine like offenses into the same offense category. In some cases, there appeared to be two versions of an offense, one being used back in earlier years and another in more recent years. As an example, there was Assault Simple and Assault Simple – Battery. In 2018 and 2019 all of the Assault Simple cases were categorized under the offense including battery. In 2021 and 2022 all cases were listed under Assault Simple without the battery. The same situation existed with aggravated assault cases. In order to aid comparison, I categorized all simple assault cases together and all aggravated assault cases together. I made an assumption that it was unlikely that there were suddenly no assault cases with battery in 2021 and 2022 given the relatively large numbers. More likely there was a change in attribution of the offense categories. Also, this allowed me to compare the data better between earlier years and later years.
I included the term activities because there were investigative activities and assistance with sick individuals included with the offenses.
Missing Values
There records missing with missing column values. Blank district fields were replaced with ‘Unknown’. I wanted to know how many crimes might be unaccounted for in the district data. When looking at plots, I wanted to visualize whether the number of unknown data points were enough to change trends seen between districts or if they were small enough that we could be confident in the results.
When latitude and longitude data was missing, I added the latitude and longitude given for Boston as a whole (42.35866, -71.05674). This data is only used in the Folium map plot. I wanted to make sure to include them to aid in visualizing the full extent of a particular criminal offense.
In some cases, offense descriptions were abbreviated for display on the plot x-axis.
Results:
Murders, Manslaughter Cases – Boston at large
See plot below. Of note is that crime in 2022 is down from 2018 and 2020, although it is up from 2019 and 2021. What I find interesting about this is the almost annual variation in the murder rate. The media responds to changes in the murder numbers from one year to another as if it is a trend, but in looking at this, there is clearly variability with it going up one year and down the next. Maybe we should consider the murder rate as random from a statistical point of view over many years. We could look for the normal distribution to determine whether the change in the number of murders in a particular year is actually statistically significant. I plan to expand the number of years included in the next phase and attempt to find a mean and the standard deviation for Boston and other cities of similar size.

Murder, Manslaughter cases by district
Districts Faring Well/ No Clear change:
Below you see the murder, manslaughter cases by district and year. What is very interesting to see is that You can see that in Roxbury and Mattapan 2022’s murders were less than 2018 and 2020. Mattapan’s 2022 murder count was also less than 2021. Basically, we see variability with all 5 years clearly visible. The rate seems to go up and down from year to year. While Dorchester had much higher number of murders in 2022, 9 murders, when compared to 2021, 2 murders, this appears to be natural variability. There appears to be a pattern of alternating direction on an annual basis. There were 9 murders in 2022 and 2020 and 7 in 2018, with only 2 in 2019 and 2021.
Basically, in the areas that previously had the highest number of murders, murder numbers were definitely not at a high in 2022, when looking back at the last 5 years. In fact, they are on the mid to lower end.
Districts Not Faring Well:
A wealthier and whiter neighborhood, West Roxbury had one murder/ manslaughter case in 2022 after 0 such cases in 2018 through 2021. The South End, less white and less poor than Roxbury and Mattapan had 5 murders in 2022, more than double the prior year, 2021, when there were only 2 murders. There were more murders in the South End in 2022 then any of the other years included from 2018 to 2021.
Thoughts:
An increase/or decrease was definitely not a consistent trend based on class and race. Is the media’s hand wringing over supposed upticks in violence more of a concern over where the violence increasing than over an actual change in violence? Maybe the concern is more due to where the murders are happening than due to an actual increase in murders across the city. Is the difference in these areas even a trend, or just a random event that will increase and decrease randomly around a specific mean with a certain variability? The media seems to treat every increase between years as a trend, when the change may just be normal variation in a random event.

Non-Sexual Violent Offenses
If we look below at the non-sexual violent crimes below, we can see there is quite a bit of variability from year to year. The good news is that overall violent crime, in the previously highest crime neighborhoods, is down substantially. See that Roxbury and Mattapan’s violent crime is down substantially in 2021 and 2022 when compared to 2018 – 2020. None of the neighborhoods had their highest number of violent crimes in 2022.

Common Crimes and Person Investigations

Looking at the plot above, which moves from districts with less poverty to districts with more poverty. The purple bars are show 2019 data for the number of person investigations to crime ratio. The blue line shows 2022 data for person investigations to crime ratio. What is very clear is that person investigations increased in 2022 relative to crime, but much more so in some districts than others. In particular, the highest ratios between person investigations in crime, were the wealthiest (least poverty) districts and then in Mattapan and Roxbury, which are the least white, non-Hispanic districts. See below.

It appears that in the wealthier neighborhoods, crimes people are investigated heavily, most likely due to the influence of the wealthier individuals. However, there is also intensive investigation of the more heavily minority neighborhoods. This could be seen as evidence of biased policing of black and brown individuals.

See above that the South End and Downtown both had more of the theft related, drug related and violent crime than Mattapan and Roxbury, but there were more investigations of individuals in Roxbury and Mattapan. While Roxbury and Mattapan did have greater number of murders, those were small in number, 8 and 10. More individuals are likely to be affected by the less severe crimes than be a victim of murder or manslaughter. Again, look at the plot above. See the left y-axis (common crime counts) and see that there were thousands of crimes included. The right y-axis is for the person investigations indicated with the blue line. There were over 1000 investigations in Roxbury and approximately 900 in Mattapan. Overall crime was down in 2022, but investigations went up. You can see the light blue line for 2019 investigations and the brighter blue line for 2022 person investigations.
Crime Group Frequencies

Of note above, property crimes and activities are down in 2021 and 2022 from 2019. The dot for 2022 is almost completely occluded by the 2021 dot. The tip of the dot is peeking out from above. Motor vehicle activities are also both down in 2021 and 2022 from 2019. In this case, you cannot see the 2022 dot, which is completely occluded by 2021 dot. There was little change between 2021 and 2022. You can also see that Investigations did increase in 2021 and then more in 2022. In 2021 and 2022, stealing was also down in the city as a whole from 2019 numbers. Again, the 2022 dot is masked by the 2021 dot. Most of this is not particularly interesting except that overall crime appears to be down and thus, it appears that the police have increase the number investigations, maybe now that they have more time.
Top Police Activities from Pre-Covid 2019 through Covid and Beyond

I combined all assault, larceny M/V, Robbery with like crimes. Also, I combined all activities helping sick individuals into one grouping. What I see is that the number of Larceny cases went down while investigations of people went up. Assaults have also gone down since 2019. They may be up slightly from 2021, but they are still down from 2018 and 2019. We see that pre-covid, Larceny was the most frequent crime grouping. Assistance to sick individuals has gone up, which may be related to covid and opioids. However, what I found interesting was the marked increase in Investigations of people. Person investigations went from under 6000 in 2019 to greater than 8000 in 2022.
Missing Persons

I looked at the distribution of the missing cases to see if individuals in poorer or racial minority communities are more vulnerable to going missing. The top three communities with the highest numbers of individuals going missing are the three brownest communities. However, then comes the wealthier communities, such as Hyde Park and West Roxbury.
Final Analysis:
Murders, Manslaughter Up?
Based on the results, my interpretation is that there is a certain amount of variability in crime. Although murder, manslaughter counts are up in 2022 from 2021, the data appears to show that this may be normal year over year variation in both the city at large as well as in the individual neighborhoods. I plan to perform additional investigation of Boston, going back 10 years, as opposed to five years. My plan is to perform a time series analysis and also a statistical analysis of the data to determine the mean annual count, standard deviation and to compare against cities of similar size. This will allow me to determine in 2023 and moving forward if supposed crime waves are, in fact, just normal variation. Multi-year changes in the same positive/ negative direction would also be more indicative of an actual trend. It would also be interesting to compare data against similar sized cities with similar economic composition, but with different gun laws.
Overall Crime Down?
In Boston at large, simple assaults were approximately 5000 in 2018 and 2019. In 2020 the number went to approximately 2700 and has gradually increased to approximately 1000. It does appear that the decrease in simple assault may have been a trend, most likely related to covid. It has been gradually increasing again. Time will tell if the trend remains down as we get further away from covid.
2020, 2021 and 2022 aggravated assault numbers are similarly
I truly appreciate your technique of writing a blog. I added it to my bookmark site list and will
Awesome