THE DATA

CLICK THE LINK BELOW TO VIEW THE SAN FRANCISCO POLICE TRAFFIC STOP DATA USED FOR BRIDGING THE GAP

DATA CRITIQUE

Our data comes from the Stanford Open Policing Project and was funded by 3 Stanford and nonprofit organizations.

Our dataset (linked here) was collected for the Stanford Open Policing Project. The Stanford Open Policing Project began in 2015 when their team requested comprehensive traffic stop data from law enforcement agencies across the United States.

The dataset includes information from 21 state patrol agencies and 29 municipal police departments, covering nearly 100 million traffic stops nationwide. This expansive collection incorporates both state and local jurisdiction data. In addition to vehicular stops, the dataset includes information on pedestrian stops. The original, unprocessed data they collected contains even more information – while they do not provide the raw data, they do provide any documentation they’ve received. For our team’s project, we decided to focus on just the data for the San Francisco region.

The Stanford Policing Project lists three main partners who have helped fund and make this data set possible: the Stanford Policy Lab, the Stanford Journalism Lab, and the Knight Foundation. The Stanford Policy Lab is a part of Stanford Law School and aims to teach students policy-making skills and knowledge in the context of real policy issues. The Stanford Journalism Lab supports the evolution of computational approaches to public affairs journalism through research, teaching, and reporting. Lastly, the Knight Foundation is an American non-profit foundation that provides grants for journalism, communities, and the arts.

The data provides a comprehensive overview of the stop’s details, but it lacks certain demographic and stop information that could help with further analytics.

Our data can help us understand the length of each recorded police stop, the gender and race of the stopped person, the exact location and district where the stop occurred, and the reason for the stop. The dataset also describes if an arrest was made, whether any contraband was found, whether a search occurred, and the overall result of the stop.

However, the dataset does not include certain details that might be key to understanding police stops. Firstly, we can’t view any details on exactly what contraband was found, the case-by-case detailed reason the stop occurred, or any details about the officer. The dataset also cites vague descriptions of the alleged infraction instead of specific violations, which would offer a clearer picture of what violations are levied against different demographics. Further, the San Francisco dataset does not detail whether a frisk was performed. While not crucial for our narrative, the lack of that data omits potentially important insights into police behavior and motivations specific to San Francisco that could lend well to our narrative.

Additionally, the data set lacks other demographic variables that could paint different narratives. For instance, the dataset includes information about the drivers’ age, sex, and race, but lacks information on income level, citizenship status, and English proficiency. If these variables were to be observed, they could reveal different narratives on police prejudices towards these factors. Since the set does not include information on each county/region within the city, those without former knowledge may find it difficult to contextualize the socioeconomic trends of each neighborhood.

The data’s ontology creates relationships between the officer and the specific driver’s outcome in each situation.

The dataset depicts different assessments of searches and search results. This creates a correlation between the officer – and inherently, the officer’s unique perspectives, training, experiences, and underlying biases – and the result of the traffic stop. However, since the data is from police officer reports, there is also a chance that aspects of the stop were left out or fabricated. The dataset also does not describe key aspects of possible police brutality such as the use of force, whether a body cam was on, or what specific unit or officer made the stop.

Overall, the dataset presents parts of the whole picture. While it describes aspects like violation, search, and other demographic-based descriptions like race, the extent of the data set may limit the understanding of other aspects that can help paint a full picture.