# Motivation

If you’ve been in New York for 10 days or 10 years, you’ve inevitably seen, if not ridden, one of our City’s cherished CitiBikes. Although not all New Yorkers love CitiBikes as much as our own Zoe Verzani, they are used as a primary mode of transportation (and exercise) for many NYC residents.

Our project aims to examine CitiBike data from January 2019 to December 2020 to understand the impact of COVID-19 on CitiBike ridership and local travel more generally.

In an effort to limit the scope of this work, our analysis focused on CitiBike rides that began and ended in Manhattan. To supplement the CitiBike ridership data, our group examined other public transportation data in New York (i.e., MTA data) to understand how changes in CitiBike ridership differed or mirrored changes in subway or bus ridership during the COVID-19 pandemic.

# Initial Questions

Initially, we wanted to understand how CitiBike ridership changed in response to the COVID pandemic and various events (i.e., beginning work from home, city-wide stay at home orders, etc.). We sought to examine changes in MTA ridership relative to CitiBike ridership, as well as various descriptive statistics of CitiBike riders. Common routes, meaning typical starting and ending destinations, were also of interest.

While we initially wanted to examine trends in ridership specifically related to hospital workers using CitiBike to get to and from work, we were not able to do this, as available data do not indicate industry in which a rider works.

Fortunately, the available data did allow us to explore our other questions and identify new ones. Ultimately, we landed on three categories of exploration:

1. Descriptive Statistics: Who uses CitiBike? What are the age and sex distributions of users? How did ridership compare between 2019 and 2020? How does trip duration vary by age group and sex? By month?

2. Travel Patterns: What days of the week are more or less popular for users to rent bikes before and during COVID-19? Where are CitiBike riders going? What are common starting points and destinations? Did travel patterns change between 2019 and 2020?

3. Commuting Services: How did CitiBike ridership change, if at all, due to COVID-19? How has public transportation and CitiBike ridership together changed due to COVID-19?

# Data

We used several data sources for this work:

### CitiBike Data

CitiBike ridership data are readily available through a public website. These data include CitiBike rides by month with descriptive information including: start and end stations, latitude and longitude of those stations, trip duration, and sex, birth year, and membership status of rider.

The first challenge we encountered was the sheer volume of data – some months had nearly 250,000 rides (observations). We downloaded the 24 month-based ridership files, pulled the names of those files into a list, and then wrote and ran a function to read the csv files into R and conduct the following data cleaning steps:

file_names <- list.files(path = "./data")

find_manhattan_rides <- function(data_name) {

set.seed(8105)

bike_rides_df <-
janitor::clean_names() %>%
filter(gender != 0,
year(starttime) - birth_year >= 18,
year(starttime) - birth_year <= 85,
!(start_station_id == end_station_id & tripduration < 180)
) %>%
mutate(
start_station_id = as.numeric(start_station_id),
end_station_id = as.numeric(end_station_id)
) %>%
filter(
start_station_id %in% pull(manhattan_stations, id),
end_station_id %in% pull(manhattan_stations, id)
) %>%
sample_frac(.01)

return(bike_rides_df)

}

manhattan_rides_df <- map_df(file_names, find_manhattan_rides)
• include only rides that start and end at stations in Manhattan (identified using a separate csv file, which listed all Manhattan stations manhattan_stations.csv)
• keep rides that were at least 3 minutes in duration and didn’t start and end at the same station
• delete rides with missing or placeholder values for any variable (i.e., gender = 0),
• keep only rides completed by individuals who were 18-85 years old in the year of their ride, and
• take a random sample of 1% of rides from each month to ensure reasonable computing time.

After obtaining the sample data above, we created the following variables, and the resulting datafile manhattan_rides_df:

manhattan_rides_df <-
manhattan_rides_df %>%
mutate(
gender = ifelse(gender == 1, "male", "female") %>%
factor(levels = c("male", "female")),
trip_min = tripduration/60,
day_of_week = wday(starttime, label = TRUE),
start_date = format(starttime, format = "%m-%d"),
stop_date = format(stoptime, format = "%m-%d"),
year = as.factor(year(starttime)),
age = ifelse(year == 2019, 2019 - birth_year, 2020 - birth_year),
age_group = cut(age,
breaks = c(-Inf, 25, 35, 45, 55, 65, 85),
labels = c("18-25","26-35", "36-45", "46-55", "56-65", "66-85"))
)
• trip_min: Trip duration in minutes, rather than seconds
• day_of_week: Day of the week when the trip started
• start_date: Date when the trip started, excluding year
• stop_date: Date when the trip ended, excluding year
• year: Year when trip took place
• age: Age of rider during the ride by subtracting birthyear from year of ride
• age_group: Categorical variable identifying age group or rider (18-25, 26-35, 36-45, 46-55, 56-65, or 66-85)

We then wrote this to a csv.

write_csv(manhattan_rides_df, "manhattan_rides.csv")

Additionally, we pulled the CitiBike Monthly Operating Reports and combined information on the number of each Membership type into a single dataset. This was used to show changes in CitiBike annual passes, single-day passes and single-trip passes from January 2019 through November 2021 (when we last pulled these reports).

membership_df =
mutate(date = factor(date, ordered = TRUE, levels = c("01/2019", "02/2019", "03/2019", "04/2019",
"05/2019", "06/2019", "07/2019", "08/2019",
"09/2019", "10/2019", "11/2019", "12/2019",
"01/2020", "02/2020", "03/2020", "04/2020",
"05/2020", "06/2020", "07/2020", "08/2020",
"09/2020", "10/2020", "11/2020", "12/2020",
"01/2021", "02/2021", "03/2021", "04/2021",
"05/2021", "06/2021", "07/2021", "08/2021",
"09/2021", "10/2021"))) %>%
rename(ct_annual_members = annual_members) %>%
rename(ct_single_trip_passes = single_trip_passes) %>%
rename(ct_single_day_passes = single_day_passes) %>%
rename(ct_three_day_passes = three_day_passes) %>%
rename(ct_total_annual_members = total_annual_members) %>%
mutate(date = as.yearmon(date, "%m/%Y")) %>%
mutate(date = as.jul(date)) %>%
mutate(date = as.Date(date)) %>%
pivot_longer(
cols = starts_with("ct"),
names_to = "measure",
values_to = "count"
)

measures_of_interest =
membership_df %>%
filter(measure %in% c("ct_annual_members","ct_single_day_passes","ct_single_trip_passes"))

### MTA Data

Additionally, we wanted to pair the CitiBike data with data related to ridership of other forms of local transportation, namely buses and subways in New York. These data are also publicly available through the MTA website.

We used two MTA datasets:

### Turnstile data

Subway turnstiles data from 2019 and 2020 were downloaded. These datasets included information about the number of entries and exits from a given turnstile across all subway stations. We filtered to only Manhattan stations and converted latitude and longitude of stations to numeric variables, creating the turnstiles_2019_data and turnstiles_2020_data datasets. We then merged those datasets, resulting in the datafile called shiny_data. We created a new variable describing the percent change in ridership (exit and entry) per station from 2019 to 2020. This file was used to build our Subway Ridership Shiny App. This new data file contains 12 variables, some of which are listed below:

## Turnstile Data
mutate(gtfs_latitude = as.numeric(gtfs_latitude),
gtfs_longitude = as.numeric(gtfs_longitude),
borough = recode(borough, Q = "Queens", Bk = "Brooklyn", Bx = "Bronx", M = "Manhattan", SI = "Staten Island"),
month = format(as.Date(date), "%m"),
day_month = format(date, "%m/%d")) %>%
drop_na()

### calculating each stop's monthly average rates in 2019 (pre pandemic)
turnstiles_data_2019_averages =
turnstiles_data_2019 %>%
group_by(month, complex_id) %>%
summarise(avg_exits = mean(exits),
avg_entries = mean(entries))

mutate(gtfs_latitude = as.numeric(gtfs_latitude),
gtfs_longitude = as.numeric(gtfs_longitude),
month = format(as.Date(date), "%m"),
borough = recode(borough, Q = "Queens", Bk = "Brooklyn", Bx = "Bronx", M = "Manhattan", SI = "Staten Island"),
day_month = format(date, "%m/%d")) %>%
drop_na()

### merging 2020 data with the monthly average turnstile usage for 2019 - pre pandemic
total = merge(turnstiles_data_2020, turnstiles_data_2019_averages, by = c("complex_id", "month")) %>%
mutate(entries_pc = (((entries-avg_entries)/avg_entries)*100),
exits_pc = (((exits-avg_exits)/avg_exits)*100)) %>%
mutate(entries_pc = ifelse(entries_pc>100, 100, entries_pc),
exits_pc = ifelse(exits_pc>100, 100, exits_pc)) %>%
select(-c(day_month, avg_entries, avg_exits, entries, exits, month))

### creating data for Shiny App
shiny_data = total %>%
pivot_longer(
cols = entries_pc:exits_pc,
names_to = "turnstile_type",
values_to = "turnstile_count"
) %>%
drop_na()
• stop_name: Station name
• daytime_routes: Subway lines
• turnstile_type: Describing if the rider is entering or exiting the station
• turnstile_count: Number of entries/exits at a given turnstile

### Percent Change data

Additionally, we downloaded COVID MTA ridership data, which details the percent change in MTA ridership during COVID by transportation type.

ridership_covid_changes = read_csv("covid-ridership.csv") %>%
janitor::clean_names() %>%
mutate(date = as.Date(date, "%m/%d/%Y")) %>%
rename(buses_ter = buses_total_estimated_ridership) %>%
rename(lirr_ter = lirr_total_estimated_ridership) %>%
rename(metro_north_ter = metro_north_total_estimated_ridership) %>%
rename(subways_ter = subways_total_estimated_ridership) %>%
rename(subways_pc = subways_percent_change_from_pre_pandemic_equivalent_day) %>%
rename(metro_north_pc = metro_north_percent_change_from_2019_monthly_weekday_saturday_sunday_average) %>%
rename(lirr_pc = lirr_percent_change_from_2019_monthly_weekday_saturday_sunday_average) %>%
rename(buses_pc = buses_percent_change_from_pre_pandemic_equivalent_day) %>%
rename(bridges_and_tunnels_pc = bridges_and_tunnels_percent_change_from_pre_pandemic_equivalent_day) %>%
rename(access_a_ride_ter = access_a_ride_total_scheduled_trips) %>%
rename(access_a_ride_pc = access_a_ride_percent_change_from_pre_pandemic_equivalent_day)

ridership_covid_changes_2020 = ridership_covid_changes %>%
filter(date <= as.Date('2020-12-31'))

ridership_covid_pc_tidy =
ridership_covid_changes_2020 %>%

select(date, access_a_ride_pc, bridges_and_tunnels_pc, buses_pc, lirr_pc, metro_north_pc, subways_pc) %>%

pivot_longer(
c(access_a_ride_pc:subways_pc),
names_to = "transit_system",
values_to = "percent_change"
) %>%

mutate(transit_system = gsub("_pc", "", transit_system),
percent_change = gsub("%", "", percent_change),
percent_change = as.numeric(percent_change))

After importing and cleaning, the resulting datafile, ridership_covid_pc_tidy, includes the following variables: * date: Date * transit_system: Mode of transportation (Access a Ride, Bridges and Tunnels, Buses, LIRR, Metro North, or Subways) * percent_change: Percent change in rides from the year prior

# Exploratory Analysis

As outlined above, we had several categories of questions to explore and an even greater number of questions.

### Descriptive Statistics:

CitiBike Rides Over the Year: We started the exploratory analysis by confirming a few of our hypotheses: Did ridership go down during the initial months of COVID-19 lockdown? Yes!

manhattan_rides_df %>%
group_by(start_date, year) %>%
summarize(obs = n()) %>%
ggplot(aes(x = start_date, y = obs, group = year, color = year)) +
geom_line() +
geom_smooth(se = FALSE) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1, color = rep(c("black", rep("transparent", each = 10)), 365)))

Average Ride Length per Month: We then looked at trip duration by month, assuming that longer trips typically occur in warmed months and shorter trips take place during the cooler months. Upon initial examination of trip duration data, we found many outliers, namely trips of over 1,000,000 seconds. We assumed that these trips included people who did not properly dock the bikes at the end station, some who stole bikes or took them home for convenient use in the future, or similar situations. We filtered out these outliers using the IQR method and then examined trip duration using boxplots. We actually found that trip duration was relatively stable in 2019, but increased in March 2020. This is likely because as people were less comfortable using the subway during COVID, they turned to longer CitiBike rides.

monthly_Q1 <- quantile(pull(manhattan_rides_df, trip_min), probs = 0.25)
monthly_Q3 <- quantile(pull(manhattan_rides_df, trip_min), probs = 0.75)
monthly_inter_quart <- IQR(pull(manhattan_rides_df, trip_min))

monthly_rides_df <-
manhattan_rides_df %>%
filter(
trip_min >= monthly_Q1 - 1.5*monthly_inter_quart,
trip_min <= monthly_Q3 + 1.5*monthly_inter_quart
)

monthly_rides_df %>%
group_by(year) %>%
mutate(
month = month(starttime, label = T)
) %>%
plot_ly(
x = ~month,
y = ~trip_min,
color = ~year,
type = "box",
colors = "viridis") %>%
layout(
boxmode = "group",
title = "Duration of Citibike Rides by Month",
xaxis = list(title = "Month"),
yaxis = list(title = "Trip Duration in Minutes")
)

Number of Rides per Month: We also found that the number of rides dropped meaningfully when the stay-at-home orders were in place in New York in March and April. Folks didn’t use CitiBikes as much when there was nowhere they could go!

manhattan_rides_df %>%
group_by(year) %>%
mutate(
month = month(starttime, label = T)
) %>%
group_by(year, month) %>%
summarise(obs = n()) %>%
plot_ly(
x = ~month,
y = ~obs,
color = ~year,
type = "scatter",
mode = "lines",
colors = "viridis") %>%
layout(
title = "Number of Citibike Rides per Month",
xaxis = list(title = "Month"),
yaxis = list(title = "Rides")
)

Trip duration by Age and Sex: Then, we examined demographic information such as age and sex by trip duration. Specifically, we generated boxplots of trip duration in minutes by gender and age group.

#this generates a plot of boxplots of trip duration (minutes) by gender and age_group
trip_dur_age_gender_df %>%
mutate(
gender = str_to_sentence(gender)
) %>%
plot_ly(x = ~age_group, y = ~trip_min, color = ~gender, type = "box", colors = "viridis") %>%
layout(
boxmode = "group",
xaxis = list(title = "Age Range"),
yaxis = list(title = "Trip Duration (min)"),
legend = list(title = list(text = "<b> Gender </b>"))
)

Average Age of CitiBike Riders by Sex: We also looked at the average age per day by gender. Generally, male riders tended to be older throughout the year.

# this dataframe groups rides by gender and month
# and provides the average age for each gender in that month
# along with the standard deviation, standard error, lower bound, and upper bound
# which are then used to create a plotly graph where we get average age per month for each gender
# with 95% confidence bands around each line
avg_age_per_month_df <-
mutate(
date = floor_date(as_date(starttime), "month")
) %>%
select(date, gender, age) %>%
group_by(date, gender) %>%
summarize(
total = n(),
avg_age = mean(age),
sd_age = sd(age)
) %>%
mutate(
sem = sd_age/sqrt(total - 1),
lower_bound = avg_age + qt(0.025, df = total - 1) * sem,
upper_bound = avg_age - qt(0.025, df = total - 1) * sem
) %>%
ungroup()

avg_age_plot <-
avg_age_per_month_df %>%
mutate(
gender = str_to_sentence(gender)
) %>%
ggplot(aes(x = date, y = avg_age, color = gender)) +
geom_line(size = 1, alpha = 0.8) +
geom_ribbon(aes(ymin = lower_bound, ymax = upper_bound), alpha = 0.2)

ggplotly(avg_age_plot) %>%
layout(
xaxis = list(title = "Date"),
yaxis = list(title = "Age"),
legend = list(title = list(text = "<b> Gender </b>"))
)

### Travel Patterns

Next, we looked at how travel patterns changed during the pandemic.

Rides by Day of Week: In 2019, we would assume that many rides ocurred during the week - going to and from the office. Our hypothesis was that the number of rides per day was higher in 2019 during the week than in 2020. It was true! Interestingly, there were slightly more rides on the weekends in 2020, likely because of a predisposition to use biking as a mode of transportation, rather than subways, during COVID-19.

Monthly Destinations in 2019 vs. 2020: We then examined ending stations across Manhatann in 2019 vs. 2020. We noticed that CitiBike opened more stations in Northern Manhattan in 2020, leading to an increase in rides uptown. Finally, we see that 2020 started off with more rides, but then we see that ridership plummets in March before recovering slightly.

nyc =
as_tibble(
map_data("state")
) %>%
filter(subregion == "manhattan")

bikes_month_2019 = manhattan_rides_df %>%
filter(year == "2019") %>%
mutate(
month = format(starttime, format = "%m"),
month = as.numeric(month)
) %>%
group_by(month, end_station_longitude, end_station_latitude) %>%
summarise(n_rides = n())

bikes_month_2020 = manhattan_rides_df %>%
filter(year == "2020") %>%
mutate(
month = format(starttime, format = "%m"),
month = as.numeric(month)
) %>%
group_by(month, end_station_longitude, end_station_latitude) %>%
summarise(n_rides = n())

map_month_2019 =
ggplot() +
geom_polygon(data = nyc, aes(x = long, y = lat, group = group), fill = "grey", alpha = 0.5) +
geom_point(data = bikes_month_2019, aes(x = end_station_longitude, y = end_station_latitude, size = 0.25, color = n_rides), alpha = 0.5) +
scale_size(range = c(1,8), name = "Rides") +
geom_point(data = hospitals_df, aes(x = Long, y = Lat), color = "red") +
geom_point(data = hubs_df, aes(x = Long, y = Lat), color = "green") +
transition_time(month) +
labs(
title = "2019 Month: {frame_time}",
color = "Rides",
x = "Longitude",
y = "Latitude") +
enter_grow() +
exit_shrink() +
ease_aes("sine-in-out") +
coord_cartesian(ylim = c(40.68, 40.85), xlim = c(-74.02, -73.925))

map_month_2020 =
ggplot() +
geom_polygon(data = nyc, aes(x = long, y = lat, group = group), fill = "grey", alpha = 0.5) +
geom_point(data = bikes_month_2020, aes(x = end_station_longitude, y = end_station_latitude, size = 0.25, color = n_rides), alpha = 0.5) +
scale_size(range = c(1,8), name = "Rides") +
geom_point(data = hospitals_df, aes(x = Long, y = Lat), color = "red") +
geom_point(data = hubs_df, aes(x = Long, y = Lat), color = "green") +
transition_time(month) +
labs(
title = "2020 Month: {frame_time}",
color = "Rides",
x = "Longitude",
y = "Latitude") +
theme(legend.position = "none") +
enter_grow() +
exit_shrink() +
ease_aes("sine-in-out") +
coord_cartesian(ylim = c(40.68, 40.85), xlim = c(-74.02, -73.925))

#combining both animations using magick package
a_month_gif = animate(map_month_2019, duration = 12, fps = 1, width = 400, height = 400)
b_month_gif = animate(map_month_2020, duration = 12, fps = 1, width = 400, height = 400)

new_month_gif = image_append(c(a_month_mgif[1], b_month_mgif[1]))
for (i in 2:12) {
combined_month = image_append(c(a_month_mgif[i], b_month_mgif[i]))
new_month_gif = c(new_month_gif, combined_month)
}

new_month_gif

Daily Destinations in 2019 and 2020 in Lower Manhattan: We then wanted to examine destinations specifically in Lower Manhattan near hospitals. An interesting finding was that even when ridership drops in 2020, we see that common destinations continue to be near hospitals and other healthcare hubs.

bikes_2019 = manhattan_rides_df %>%
filter(year == "2019") %>%
group_by(stop_date, end_station_longitude, end_station_latitude) %>%
summarise(n_rides = n()) %>%
mutate(
date = paste0("2019-", stop_date),
date = as.Date(date, "%Y-%m-%d"))

bikes_2020 = manhattan_rides_df %>%
filter(year == "2020") %>%
group_by(stop_date, end_station_longitude, end_station_latitude) %>%
summarise(n_rides = n()) %>%
mutate(
date = paste0("2020-", stop_date),
date = as.Date(date, "%Y-%m-%d"))

timeline_2019 =
bikes_2019 %>%
filter(date >= as.Date("2019-02-16"), date <= as.Date("2019-06-30"))

timeline_2020 =
bikes_2020 %>%
filter(date >= as.Date("2020-02-16"), date <= as.Date("2020-06-30"))

daily_map_2019 =
ggplot() +
geom_polygon(data = nyc, aes(x = long, y = lat, group = group), fill = "grey", alpha = 0.5) +
geom_point(data = timeline_2019, aes(x = end_station_longitude, y = end_station_latitude, size = 0.25, color = n_rides), alpha = 0.5) +
geom_point(data = hospitals_df, aes(x = Long, y = Lat), color = "red") +
geom_point(data = hubs_df, aes(x = Long, y = Lat), color = "green") +
transition_time(date) +
labs(
title = "Date: {frame_time}",
color = "Rides",
x = "Longitude",
y = "Latitude") +
scale_fill_continuous(breaks = c(2.5, 5.0, 7.5, 10.0, 12.5)) +
enter_grow() +
exit_shrink() +
ease_aes("sine-in-out") +
coord_cartesian(ylim = c(40.725, 40.775), xlim = c(-74.0125, -73.925))

daily_map_2020 =
ggplot() +
geom_polygon(data = nyc, aes(x = long, y = lat, group = group), fill = "grey", alpha = 0.5) +
geom_point(data = timeline_2020, aes(x = end_station_longitude, y = end_station_latitude, size = 0.25, color = n_rides), alpha = 0.5) +
scale_fill_continuous(limits = c(0,12.5), breaks = c(0, 2.5, 5.0, 7.5, 10.0, 12.5)) +
geom_point(data = hospitals_df, aes(x = Long, y = Lat), color = "red") +
geom_point(data = hubs_df, aes(x = Long, y = Lat), color = "green") +
transition_time(date) +
labs(
title = "Date: {frame_time}",
color = "Rides",
x = "Longitude",
y = "Latitude"
) +
theme(legend.position = "none") +
enter_grow() +
exit_shrink() +
ease_aes("sine-in-out") +
coord_cartesian(ylim = c(40.725, 40.775), xlim = c(-74.0125, -73.925))

#combining both animations using magick package
a_daily_gif = animate(daily_map_2019, duration = 30, fps = 2, width = 400, height = 400)
b_daily_gif = animate(daily_map_2020, duration = 30, fps = 2, width = 400, height = 400)

#combining both animations using magick package
a_daily_gif = animate(daily_map_2019, duration = 30, fps = 2, width = 400, height = 400)
b_daily_gif = animate(daily_map_2020, duration = 30, fps = 2, width = 400, height = 400)

new_daily_gif = image_append(c(a_daily_mgif[1], b_daily_mgif[1]))
for (i in 2:60) {
combined_daily = image_append(c(a_daily_mgif[i], b_daily_mgif[i]))
new_daily_gif = c(new_daily_gif, combined_daily)
}

new_daily_gif

Common Rides to Manhattan Locations: We were also interested in common travel patterns around Manhattan from 2019-2020. To examine this, we selected 16 locations from around Manhattan and found the nearest CitiBike station to that location. Then, we summarized the data to find the number of trips that start at one station and then end at the stations closest to the Manhattan locations. Next, we filtered the data to show only the most common trip that end at each of the 16 CitiBike stations for our 16 locations (in the event of a tie, we select the start station that’s furthest away from the end station of interest). Then, we also determined the average age of a rider ending their trips at those 16 destinations.

We created a shiny app to visualize the journeys to various Manhattan locations, which you can view here. Please look at our Travel Patterns page to learn how to interact with our shiny app.

### Commuting Services

In this part of our analysis, we wanted to examine CitiBike usage by itself and relative to other commuting services (i.e., MTA).

CitiBike Memberships Purchased by Month in NYC and New Jersey City: This animated graph demonstrates the growth of single day passes in the summer of 2020 and 2021 relative 2019. As expected, New Yorkers (and tourists!) are eager to use CitiBike during warmer months.

measures_of_interest %>%
ggplot(aes(x = date, y = count)) +
geom_line(aes(group = measure, color = measure)) +
geom_point(aes(group = seq_along(date), color = measure)) +
transition_reveal(date) +
scale_x_date(breaks = "4 months", labels = date_format("%b-%Y")) +
scale_colour_discrete(name = "Membership Type",
breaks = c("ct_annual_members", "ct_single_day_passes",
"ct_single_trip_passes"),
labels = c("Annual Passes Renewed or Purchased",
"Single-Day Passes Purchased",
"Single-Trip Passes Purchased")) +
labs(x = "Date", y = "Count") +
ggtitle("CitiBike Memberships Purchased by Month in NYC and New Jersey City") +
theme_minimal() +
theme(legend.position = "bottom") +
guides(color = guide_legend(nrow = 2, byrow = TRUE)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

Ridership Transit System Percent Change Following COVID: We then wanted to examine how ridership of all commuting services changed from 2019 compared to 2020. We found that every transit service experienced a drop in ridership in March 2020 compared to 2019 levels, coinciding with the start of lockdown. However, CitiBike usage recovered and in many cases exceeded its 2019 levels, while other forms of transit stayed below the usage levels seen in 2019.

renamed_ridership =
ridership_pc_change %>%
mutate(transit_system = factor(transit_system, levels = c("access_a_ride", "bridges_and_tunnels",
"buses", "citi_bike", "lirr", "metro_north",
"subways"),
labels = c("Access a Ride", "Bridges and Tunnels", "Buses",
"Citi Bike", "Long Island Railroad", "Metro North",
"Subways")))

# Using plotly
renamed_ridership %>%
ungroup() %>%
drop_na() %>%
filter(date >= as.Date('2020-03-01')) %>%
plot_ly(
x = ~date,
y = ~percent_change,
color = ~transit_system,
type = "scatter",
mode = "lines") %>%
layout(
title = "Ridership Transit System Percent Change Following COVID",
xaxis = list(title = "Date"),
yaxis = list(title = "Percent Change")
)

Shiny App: Subway Ridership Percent Change from 2019 to 2020: Finally, we wanted to examine how subway ridership changes across locations between 2019 and 2020. Specifically, we looked at entry and exit data from Subway turnstiles to understand where people are going to and coming from throughout the year.

We created a shiny app to visualize the percent change in subway ridership to various NYC locations across the 5 boroughs, which you can view here. Please look at our Commuting Services page to learn how to interact with our shiny app.

We did not undertake any formal statistical analyses, as our research questions were descriptive in nature.

# Discussion

The goal of this project was to examine how usage of CitiBikes and overall NYC transit changed as a result of the COVID-19 pandemic. Our work shows that there were more rides in 2019 compared to the early months of the pandemic in 2020. There was a dip in ridership in March of 2020 when cases first started arising in NYC. On average, trip durations were longer throughout 2020 than in 2019. The longest rides occurred between March 2020 and August 2020. This is not surprising given that during this time, many people were avoiding using the MTA as a mode of transportation to avoid close contact with others. There was a major spike in all forms of CitiBike memberships from April 2020 through August 2020, and this spike was most significant for single-trip purchases. Another even more significant spike occurred from May 2021 through August 2021.

We see these same patterns displayed when examining our animated maps looking at month to month ridership when comparing 2019 to 2020. Interestingly in August of 2020, CitiBike expanded the number of stops they have in Northern Manhattan. Clearly these new stations were very popular among riders in that area. We created a Shiny App to allow the user to explore common journeys through various locations in Manhattan. Destinations include Columbia University and the Met.

We also found interesting patterns of Citi Bike ridership by age and gender. Across all time points from 2019 to 2020, the average age of riders was higher in males than in females. Not surprisingly, younger riders have, on average, longer trip durations than older riders.

In addition, we examined how ridership changed starting at the beginning of the pandemic for CitiBike as well as other NYC transit systems, including the Subway, Metro North, and Buses. From this graph, we can see that every transit service experienced a drop in ridership in March 2020 when compared to 2019 levels, which coincides with the start of lockdown. However, eventually, CitiBike usage recovered and, in many cases, exceeded its 2019 levels, while other forms of transit stayed below the usage levels seen in 2019. This finding was not surprising. We also created a shiny app to allow the user to see daily percent change of Subway Ridership data comparing 2019 to 2020. We can see that in March ridership went down by nearly 100% and remained low for the rest of the year.

Overall, our project demonstrates that all forms of transportation across NYC fell in their usage when the Covid-19 pandemic first hit the city. It also shows that at times when public transportation usage was down, CitiBike usage went up. Despite the difficulties that the pandemic has brought to the city, CitiBike experienced an increase in ridership and an expansion of stations.