‘Justice For Who?’

This analysis supports a data-driven examination of inequalities throughout the justice system in Fresno, California. Data and code can be found here. Find the story here, part of a larger project called Unequal From Birth, which was produced by a team of reporters at UC Berkeley’s Graduate School of Journalism. Find more information on the project here.

Data

This story relies on data spanning the justice system, from city, county and state sources. We obtained data on every traffic stop that the Fresno Police Department made in 2016 through a request made under the California Public Records Act. We obtained jail booking and release data — that is, people who are locked up in and who are released from the jail – from Jan. 2017 to Feb. 2018 through another request made under the CPRA. Finally, we downloaded school suspension and expulsion data for the last five school years from the California Department of Education and local demographic data from the U.S. Census Bureau. We used the following R code in our analysis.

Setting up

Loading packages to read, manipulate and plot our data, and download data from the Census Bureau. For more information on Census API keys, see here.

# loading required packages
library(readr)
library(dplyr)
library(censusapi)
library(tigris)
library(rgdal)
library(leaflet)
library(mapview)
library(plotly)
library(DT)

# set census api key
Sys.setenv(CENSUS_KEY="YOUR API KEY HERE")
readRenviron("~/.Renviron")

Looking at traffic stops — who’s pulled over and where

Loading the data

The two sets of data we load in here both came from the Fresno Police Department. The locations data gives us longitude/latitude fields for each stop made by the department’s patrol unit. The traffic unit did not keep data in the same way, so we are not able to geocode those stops.

# loading stop data
tstops <- read_csv("trafficStops_fpd_16.csv")

# formatting time
tstops <- tstops %>%
  mutate(CONTACT_TIME = as.POSIXct(CONTACT_TIME, format="%m/%d/%y %H:%M"))

# IDing stops made by the traffic unit and stops made by the patrol unit
tstops_traffic <- tstops %>%
  filter(is.na(EVENT_NUM)) %>%
  mutate(unit="traffic")

tstops_patrol <- anti_join(tstops,tstops_traffic)  %>%
  mutate(unit="patrol") %>%
  unique()

# loading locations of stops made by the patrol unit and fixing dupes
locations <- read_csv("trafficStops_locations_16.csv") %>%
  unique()

# joining the locations data onto tstops
tstops_locations <- inner_join(tstops_patrol,locations, by=c("EVENT_NUM"="EVENT_NO"))

Total stops by race

Right off the bat, we look for disproportionality in each race’s share of stops. Later, we compare these percentages to the city’s driving-age population. Note:‘H’ = hispanic, ‘W’ = white, ‘B’ = black, ‘A’ = asian, ‘O’ = other.

# looking at stops by race
stops_race <- tstops %>%
  group_by(APPARENT_RACE) %>%
  summarise(total_stops=n()) %>% 
  arrange(desc(total_stops)) %>%
  mutate(pc_stops = round(total_stops/sum (total_stops)*100,2))

datatable(stops_race, rownames = FALSE, colnames = c("race", "total stops", "percent stops"))

Comparing share of traffic stops to share of population

How do those percentages match up to each race’s share of Fresno’s population? Here, we found disproportionality in FPD’s traffic stops.

# making a bar chart for tstops and population 

# joining driving age data to stop data
driving_age <- read_csv("drivingAge_fresno.csv")

# renaming variables, spelling out race names

stops_race[1, "APPARENT_RACE"] <- "hispanic"
stops_race[2, "APPARENT_RACE"] <- "white"
stops_race[3, "APPARENT_RACE"] <- "black"
stops_race[4, "APPARENT_RACE"] <- "other"
stops_race[5, "APPARENT_RACE"] <- "asian"

pop_stop <- left_join(driving_age, stops_race, by=c("race"="APPARENT_RACE"))

# setting x axis order 
xform2 <- list(categoryorder = "array",
               categoryarray = c("white", 
                                 "hispanic", 
                                 "black",
                                 "asian"))

pop_stop_chart <-  plot_ly(pop_stop, x = ~race, y = ~pc_drivers, type = 'bar', name = '% of driving-age pop.', 
                           marker = list(color = 'rgba(189,189,189,0.5)',
                                         line = list(color = 'rgba(189,189,189,.1)',
                                                     width = .7))) %>%
  add_trace(y = ~pc_stops, name = '% of drivers stopped',
            marker = list(color = 'rgba(37,37,37,0.5)',
                          line = list(color = 'rgba(37,37,37,1)',
                                      width = .7))) %>%
  layout(yaxis = list(title = 'Percent'), 
         xaxis = list(title = 'Race'), 
         barmode = 'group') %>%
  config(displayModeBar = FALSE) %>%
  layout(xaxis = xform2)


print(pop_stop_chart)

# saving as html page
htmlwidgets :: saveWidget(pop_stop_chart, "pop_stop_chart.html", selfcontained = TRUE, libdir = NULL, background = "white")

Loading and processing Census data, tract shapefiles, and police district shapefiles

In order to map where these stops are happening, and how stops compare to tthe city’s demographic makeup, we loaded census tract level demographic data and shapefiles. Then we clipped those to the latest FPD district boundaries so that we see only demographic data from tracts wihtin FPD’s jurisdiction.

# grabbing census tract level demographic data
ct <- getCensus(name = "acs5",
                vintage = 2015,
                vars = c("GEOID",
                         "B01003_001E",
                         "B03002_003E",
                         "B03002_004E",
                         "B03002_012E",
                         "B03002_006E"),
                region = "tract:*",
                regionin = "state:06+county:019")

names(ct) <- c("GEOID", "state", "county", "tract", "total_population","white_population", "black_population", "hispanic_population", "asian_population")

ct <- ct %>%
  mutate(pc_black_population = round(black_population/total_population*100,2),
         pc_white_population = round(white_population/total_population*100,2),
         pc_hispanic_population = round(hispanic_population/total_population*100,2),
         pc_asian_population = round(asian_population/total_population*100,2))


# fixing NaNs
is.nan.data.frame <- function(x)
  do.call(cbind, lapply(x, is.nan))
ct[is.nan(ct)] <- 0

# load TIGER/LINE shapefiles for census tracts in Fresno
tracts_map <- tracts("06", county = "019")


# joining demographic data to tracts shapefile
tracts_merged <- geo_join(tracts_map,ct, "TRACTCE", "tract")


# loading shape files for fresno police department districts 
fpdBound_shape <- readOGR("pdbound.shp")


# converting to long/lat projection 
fpdBound_ll <- spTransform(fpdBound_shape, CRS('+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0'))

# turning policing districts into lines
fpdBound_lines <- as(fpdBound_ll, "SpatialLinesDataFrame")

# clipping tracts map to police districts so that we're only looking at tracts within fpd jurisdiction
fpd_tracts <- raster::intersect(tracts_merged,fpdBound_ll)

Counting the number of stops, searches and arrests that happened in each census tract

We are interested in three types of incidents: traffic stops in general, stops that resulted in the search of a driver’s vehicle and stops that resulted in arrest. Here, we count the number of times each of those events occurred in each of the city’s census tracts. To do this, we counted points in polygons.

## calculating and mapping total number of stops, searches and arrests in each census tract

# grabbing only long/lat columns and turning each stop into a point
stop_coords <- tstops_locations %>%
  select(LONGT,LAT) %>%
  filter(!is.na(LAT))

# turning stops into spatial points
stop_sp <- SpatialPoints(stop_coords)

# applying same projection to the points as we're using with our map
proj4string(stop_sp) <- "+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"

# now counting total number of stops in each tract 
by_tract <- over(stop_sp, tracts_map)

by_tract <- by_tract %>%
  group_by(GEOID) %>%
  summarise(total_stops=n()) %>%
  mutate(pc_stops = round(total_stops/sum(total_stops)*100,2)) %>%
  mutate(GEOID = as.character(GEOID)) %>%
  filter(!is.na(GEOID))

# calculating the above for searches
search_locations <- tstops_locations %>%
  filter(VEHICLE_SEARCHED=="Y")

# grabbing only long/lat columns and turning each search into a point
search_coords <- search_locations %>%
  select(LONGT,LAT) %>%
  filter(!is.na(LAT))

# turning searches into spatial points
search_sp <- SpatialPoints(search_coords)

# applying same projection to the points as we're using with our map
proj4string(search_sp) <- "+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"

# now counting total number of stops in each tract 
by_tract_search <- over(search_sp, tracts_map)

by_tract_search <- by_tract_search %>%
  group_by(GEOID) %>%
  summarise(total_searches=n()) %>%
  mutate(pc_searches = round(total_searches/sum(total_searches)*100,2)) %>%
  mutate(GEOID = as.character(GEOID)) %>%
  filter(!is.na(GEOID))

# joining with stop data
by_tract <- left_join(by_tract, by_tract_search, by=c("GEOID"="GEOID"))


# calculating the above for arrests
arrest_locations <- tstops_locations %>%
  filter(ACTION_TAKEN=="A")

# grabbing only long/lat columns and turning each stop into a point
arrest_coords <- arrest_locations %>%
  select(LONGT,LAT) %>%
  filter(!is.na(LAT))

# turning searches into spatial points
arrest_sp <- SpatialPoints(arrest_coords)

# applying same projection to the points as we're using with our map
proj4string(arrest_sp) <- "+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"

# now counting total number of stops in each tract 
by_tract_arrest <- over(arrest_sp, tracts_map)

by_tract_arrest <- by_tract_arrest %>%
  group_by(GEOID) %>%
  summarise(total_arrests=n()) %>%
  mutate(pc_arrests = round(total_arrests/sum(total_arrests)*100,2)) %>%
  mutate(GEOID = as.character(GEOID)) %>%
  filter(!is.na(GEOID))

# joining with stop data
by_tract <- left_join(by_tract, by_tract_arrest, by=c("GEOID"="GEOID"))


# joining with census data
by_tract_join <- left_join(by_tract, tracts_merged@data, by=c("GEOID"="GEOID"))


# turning data into a spatial polygon df to map in leaflet
total_stops_map <- geo_join(fpd_tracts, by_tract,"GEOID","GEOID")

Counting the number of stops, searches and arrests in each tract by race

Doing the same thing we just did above, but now doing it for individual races instead of total stops, searches and arrests. We’re particularly interested in the stops, searches, and arrests of white, hispanic and black drivers.

## now, doing all the above — counting traffic stops in each tract and mapping — for stops of white, black and hispanic drivers, respectively

# We only need the columns with the latitude and longitude
race_stop_coords <- tstops_locations %>%
  select(APPARENT_RACE,LONGT,LAT) %>%
  filter(!is.na(LAT))

# converting to spatial points, making sp for each race
white_stop_sp <- race_stop_coords %>%
  filter(APPARENT_RACE == "W") %>%
  select(LONGT,LAT)

white_stop_sp <- SpatialPoints(white_stop_sp)

black_stop_sp <- race_stop_coords %>%
  filter(APPARENT_RACE == "B") %>%
  select(LONGT,LAT)

black_stop_sp <- SpatialPoints(black_stop_sp)

hispanic_stop_sp <- race_stop_coords %>%
  filter(APPARENT_RACE == "H") %>%
  select(LONGT,LAT)

hispanic_stop_sp <- SpatialPoints(hispanic_stop_sp)

asian_stop_sp <- race_stop_coords %>%
  filter(APPARENT_RACE == "A") %>%
  select(LONGT,LAT)

asian_stop_sp <- SpatialPoints(asian_stop_sp)

# applying same projection to the points as we're using with our map
proj4string(white_stop_sp) <- "+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(black_stop_sp) <- "+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(hispanic_stop_sp) <- "+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(asian_stop_sp) <- "+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"


# calculating number of total stops of white drivers in each tract
by_tract_white <- over(white_stop_sp, tracts_map)

by_tract_white <- by_tract_white %>%
  group_by(GEOID) %>%
  summarise(total_white_stops=n()) %>%
  mutate(GEOID = as.character(GEOID)) %>%
  filter(!is.na(GEOID))


# calculating percent of stops that were white in each tract
by_tract_join <- left_join(by_tract_join, by_tract_white, by=c("GEOID"="GEOID"))

by_tract_join <- by_tract_join %>%
  mutate(pc_white_stops = round(total_white_stops/total_stops*100,2))


# calculating points in a polygon: number of total stops of black drivers in each tract
by_tract_black <- over(black_stop_sp, tracts_map)

by_tract_black <- by_tract_black %>%
  group_by(GEOID) %>%
  summarise(total_black_stops=n()) %>%
  mutate(GEOID = as.character(GEOID)) %>%
  filter(!is.na(GEOID))


# calculating percent of stops that were black in each tract
by_tract_join <- left_join(by_tract_join, by_tract_black, by=c("GEOID"="GEOID"))

by_tract_join <- by_tract_join %>%
  mutate(pc_black_stops = round(total_black_stops/total_stops*100,2))


# calculating points in a polygon: number of total stops of hispanic drivers in each tract
by_tract_hispanic <- over(hispanic_stop_sp, tracts_map)

by_tract_hispanic <- by_tract_hispanic %>%
  group_by(GEOID) %>%
  summarise(total_hispanic_stops=n()) %>%
  mutate(GEOID = as.character(GEOID)) %>%
  filter(!is.na(GEOID))


# calculating percent of stops that were hispanic in each tract
by_tract_join <- left_join(by_tract_join, by_tract_hispanic, by=c("GEOID"="GEOID"))

by_tract_join <- by_tract_join %>%
  mutate(pc_hispanic_stops = round(total_hispanic_stops/total_stops*100,2))

Mapping traffic stops and demographics

Now, we map traffic stops and popoulation by census tract. In the maps we make below — for white Fresnans — the darker the census tract, the higher percentage of residents were white and the higher percentage of drivers stopped were white.

## now making maps for % of traffic stops by race and population by race by census tracts

# turning data into a spatial polygon df to map in Leaflet
race_stop_map <- geo_join(fpd_tracts, by_tract_join,"GEOID","GEOID")

### white

# creating color pal and pop up for white pop map
popup_white <- paste0("Census tract: ", race_stop_map$id, "<br>", "% of population that is white: ", race_stop_map$pc_white_population)
pal_white <- colorNumeric(
  palette = "Blues",
  domain = c(0,100), na.color = "transparent"
)

# mapping white population 
white_pop_map <- leaflet(options = leafletOptions(zoomControl = FALSE)) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = race_stop_map, 
              fillColor = ~pal_white(pc_white_population), 
              color = "#b2aeae", 
              fillOpacity = 0.7, 
              weight = 0, 
              smoothFactor = 0.2,
              popup = popup_white) %>%
  addPolylines(data = fpdBound_lines,
               color = "#3b3b3b", weight = 1, smoothFactor = 0.5) %>%
  leaflet::addLegend("bottomright", pal = pal_white, values = c(0,100),
                     labFormat = labelFormat(suffix = "%"),
                     opacity = 1
  )

print(white_pop_map)

# saving map as html
htmlwidgets :: saveWidget(white_pop_map, "white_pop_map.html", selfcontained = TRUE, libdir = NULL, background = "white")

# saving screen shot of map for web juxtapose feature
mapshot(white_pop_map, file = paste0(getwd(), "/white_pop_map.png"))

# creating color pal and pop up for stops of white drivers
popup_white_stops <- paste0("Census tract: ", race_stop_map$id, "<br>", "% of drivers stopped who were white: ", race_stop_map$pc_white_stops)
pal_white_stops <- colorNumeric(
  palette = "Blues",
  domain = c(0,100), na.color = "transparent"
)

# mapping stops of white drivers

white_stops_map <- leaflet(options = leafletOptions(zoomControl = FALSE)) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = race_stop_map, 
              fillColor = ~pal_white_stops(pc_white_stops), 
              color = "#b2aeae", 
              fillOpacity = 0.7, 
              weight = 0, 
              smoothFactor = 0.2,
              popup = popup_white_stops) %>%
  addPolylines(data = fpdBound_lines,
               color = "#3b3b3b", weight = 1, smoothFactor = 0.5) %>%
  leaflet::addLegend("bottomright", pal = pal_white_stops, values = c(0,100),
                     labFormat = labelFormat(suffix = "%"),
                     opacity = 1
  )

print(white_stops_map)

# saving map as html
htmlwidgets :: saveWidget(white_stops_map, "white_stops_map.html", selfcontained = TRUE, libdir = NULL, background = "white")

# saving screen shot of map for web juxtapose feature
mapshot(white_stops_map, file = paste0(getwd(), "/white_stops_map.png"))

Where white Fresnans live