While I was at Smart Design, I worked on a project investigating bicycle safety in New York City using data analysis. We aimed to answer the question
Have bicycle routes made cycling safer?
Vision zero is a road traffic safety project across the US that aims to reduce fatalities and serious injuries to zero. It became official policy in 2014.
‘In the last five years, DOT (Department of Transport NYC) has expanded and enhanced the on-street bike network by more than 330 miles, including more than 82 protected lane miles, with 20 miles installed in 2018. DOT installed over 66 lane miles of bike facilities, including 55 lane miles of dedicated cycling space in 2018. ‘ – Department of Transport NYC
Despite this, many people feel not enough has been done, with more than 1000 cyclists protesting in Washington Square Park in 2019.
Collisions and Routes
Both datasets were sourced from NYC Open Data via a free API.
Road painted with 'sharrows' - arrows and bicycle icons
Typical lanes with painted white lines designating space for bicycles
A lane physically separated from other vehicles by cars or bollards
To visualise the datasets, a dashboard was created using Dash by Plotly. The interface allows visualisation of different views of collisions and routes throughout time.
These visualisations helped to give a good overview of the data we had. To delve deeper into understanding some of the patterns, we used data analysis techniques in Python using Jupyter Notebooks.
Matching Collisions to Routes
These two separate datasets needed to be combined, to discover exactly which collisions took place on which routes. The most obvious way of doing this would be to compare every route and every collision, also known as a brute force method.
A much more efficient method of pairing collisions with routes was to use an R-Tree algorithm. This tree data structure subdivides the map space, reducing the size of the problem. By using bounding boxes to decide whether or not to search a subtree, the number of comparisons that need to be made are significantly reduced. To facilitate this, each route was surrounded by a bubble, showing which routes it intersected with.
Normalising the data
It was anticipated that as the number of cyclists in NYC has been increasing, this would have a direct influence on the number of collisions taking place. A contradictory argument to this would be the 'safety in numbers' theory, by which the more cyclists there are, the fewer accidents occur. For this investigation we decided to assume the first approach, with a linear relationship between increase in cyclists and collisions. This was done by multiplying each year of collisions with a factor of correction, calculated using the change in number of cyclists over time,
Route type makes a difference
On average, protected routes saw a decrease in collisions after installation, whereas signed/marked routes saw an increase in collisions
When looking at collision count per route, we calculated what percentage of each route type saw an increase or decrease before and after installation
Conventional routes are the most common and saw slightly more increases than decreases
38% of all signed/marked routes saw an increase
30% of protected routes saw a decrease, compared with 24% seeing an increase
Cluttered road markings may be a cause
We wanted to see why some of these patterns were occurring. Google Streetview allowed us to see how streets changed over time. By looking at the best and worst case examples, we found that many of the worst performing routes had extremely cluttered road markings, especially at large intersections.
Spring Street - Broadway, November 2017
Spring Street - Broadway, June 2019
Intersections see a lot of incidents
When looking at routes on the dashboard, it was clear that many had concentrations of incidents at intersections rather than mid-route. This pattern is seemingly apparent across the city, however, there is no simple way to quantify it right now. (Read on to see how this was quantified in other cities)
LOOKING WIDER: OTHER CITIES
By looking at the data available from other US cities, we could compare the different methods used to protect cyclists. We chose two metropolitan cities, with good data sources. These can also be seen on the dashboard. The same methods of preprocessing and data analysis were used.
Sharrow location matters
Both San Francisco and Boston showed a different pattern of results to NYC when it came to signed and marked routes. In NYC these routes saw an increase in the number of collisions after installation, whereas in Boston and San Francisco they showed to have positive impact.
Using Google streetview to take a closer look at the design of signed/marked routes (routes that use painted arrow/bicycle symbols), we found that the exact positioning of the painted markings could be the reason for these differences.
In NYC, sharrows are placed to the side, implying that they form part of a dedicated bike lane
In Boston, the sharrows are central in the lane, removing that possible implication, acting more as a reminder that bicycles are present. This could be why Boston sees much safer signed/marked routes.
San Francsico has centrally placed the sharrows. Again, this could be a contributing factor to the safety of San Francisco's signed/marked routes.
Crash locations change over time
Unlike the NYC dataset, both San Francisco and Boston record if each collision occurs on the street or at an intersection, allowing us to quantify the differences. In Boston, it was clear that the ratio of crashes tend towards occurring more mid-street than at intersections. However, in San Francisco the opposite pattern is shown, with more collisions taking place at intersections over time.
By using Google streetview, we could see that Boston streets have many more features for cyclists at intersections. These include conflict markings, bike boxes and two-stage turn boxes all defined by clear design guidelines. Comparatively, San Francisco has fewer of these and NYC has almost none. We think that this could possibly be the cause of the different location patterns.
By analyzing the New York Dataset, and comparing it with Boston and San Francisco, we came to 3 key findings:
The positioning of ‘sharrows’ on signed/marked routes is critical, explaining why New Yorks’ signed/marked routes are much more dangerous than San Francisco and Boston’s
Cluttered road markings may be major contributors to road safety, as New York’s intersections have poor performance
Having specialist features at intersections, has increased safety of junctions in Boston and could be applied elsewhere
Overall, these cities are only as smart as the data they collect. Perhaps NYC has poor intersection performance as they aren’t collecting this data, leaving them ignorant of the problem?