New York City Bike-Sharing During COVID-19 (Data Analysis)

🇷🇺 РУССКОЯЗЫЧНАЯ ВЕРСИЯ | 🇷🇺 RUSSIAN-LANGUAGE VERSION

Introduction
About The Company
Research Project Roadmap
Ask
- 4.1. Objective
Prepare
- 5.1. Data Organization
- 5.2. Data Validation And Accessibility
- 5.3. Data Privacy
- 5.4. The Code Used For The «Prepare» Step
Process
- 6.1. The Code Used For The «Process» Step
Analyze
- 7.1. The Code Used For The «Analyze» Step
Share
Act
References

1. Introduction

This professional research project is a part of my portfolio as a Data Analyst — I hope you will find it useful and interesting!

2. About The Company

«Citi Bike» is a privately owned public bicycle sharing system serving the New York City (NY) as well as Jersey City (NJ) and Hoboken (NJ).

In October 2017, the system reached a total of 50 million rides and in July 2020, the system reached 100 million rides. As of July 2019, there are 169,000 annual subscribers. Monthly average ridership numbers increased above 100,000 for the first time in June 2021.

3. Research Project Roadmap

The project follows the six-step data analysis process:

Ask
Prepare
Process
Analyze
Share
Act

4. Ask

The step will consist of these tasks:

Ask SMART and effective questions.
Summarize data.
Manage a team’s and stakeholders’ expectations.

4.1. Objective

The objective is to help to design a new marketing strategy to convert casual riders into annual members.

Three questions might guide a future possible marketing program:

How do annual members and casual riders use «Citi Bike» bikes differently?
Why would casual riders buy «Citi Bike» annual memberships?
How can «Citi Bike» influence casual riders to become members?
How can «Citi Bike» influence annual members to use its services even more?

5. Prepare

This step will consist of these key tasks:

Address issues of bias and credibility of data, ensure ethical data practices.
Access databases and import data.
Organize and protect data.

5.1. Data Organization

«Citi Bike» historical trip data from March 2021 to March 2022 was used to analyze and identify various trends. Data was stored in a separate directory and copies of every dataset were made.

5.2. Data Validation And Accessibility

The first-party data provided was taken from the company website. The dataset is public and available for everyone to use.
The data has been processed to remove trips taken by staff and any trips that were below 60 seconds in length.
The data was provided according to the «Citi Bike» Data Use Policy.

5.3. Data Privacy

Due to data privacy, I will not use riders’ personal identification information, and this will prevent me from determining if a single user/rider has taken several rides. All ride IDs (ride_id) are unique in this dataset.

5.4. The Code Used For The «Prepare» Step

At first, I loaded the R packages needed for the project into «RStudio» — an Integrated Development Environment (IDE) for «R»). All these packages previously missing in the system were installed by using this command: install.packages(“package name”)

# Load the project-related packages

install.packages()
library(tidyverse)
library(janitor)
library(ggmap)
library(geosphere)
library(lubridate)
library(gridExtra)

Then, I downloaded the related datasets from 03/2021 to 03/2022 from the «Citi Bike» website and imported them to «RStudio». The variables representing these datasets have easy for understanding names like tripdata202103 or tripdata202203.

# Import the datasets to RStudio

tripdata202103 <- read_csv("202103-citibike-tripdata.csv")
tripdata202104 <- read_csv("202104-citibike-tripdata.csv")
tripdata202105 <- read_csv("202105-citibike-tripdata.csv")
tripdata202106 <- read_csv("202106-citibike-tripdata.csv")
tripdata202107 <- read_csv("202107-citibike-tripdata.csv")
tripdata202108 <- read_csv("202108-citibike-tripdata.csv")
tripdata202109 <- read_csv("202109-citibike-tripdata.csv")
tripdata202110 <- read_csv("202110-citibike-tripdata.csv")
tripdata202111 <- read_csv("202111-citibike-tripdata.csv")
tripdata202112 <- read_csv("202112-citibike-tripdata.csv")
tripdata202201 <- read_csv("202201-citibike-tripdata.csv")
tripdata202202 <- read_csv("202202-citibike-tripdata.csv")
tripdata202203 <- read_csv("202203-citibike-tripdata.csv")

Since there was no need to work on the imported data sets one by one, I decided to merge all of them into the single dataset called tripdata and work on it from now on.

# Merge individual monthly datasets into a single large dataset

tripdata <- bind_rows(tripdata202103, tripdata202104, tripdata202105, tripdata202106, tripdata202107, tripdata202108, tripdata202109, tripdata202110, tripdata202111, tripdata202112, tripdata202201, tripdata202202, tripdata202203)

Unfortunately, it was impossible for me to create the tripdata variable due to the different types of data in the end_station_id column across the imported datasets. This error message I received provides more information:

One or more parsing issues, see `problems()` for details 
> tripdata <- bind_rows(tripdata202103, tripdata202104, tripdata202105, tripdata202106, tripdata202107, tripdata202108, tripdata202109, tripdata202110, tripdata202111, tripdata202112, tripdata202201, tripdata202202, tripdata202203)
Error in `bind_rows()`:
 ! Can't combine `end_station_id` <double> and `end_station_id` <character>.
Run `rlang::last_error()` to see where the error occurred.

Therefore, I needed to transform all the data to the numeric type in the end_station_id column across all the imported datasets. I reviewed the columns of each dataset and found out only tripdata202110 had end_station_id formatted as the character type.

# A review of column types of the tripdata202110 dataset

str(tripdata202110)

I changed the end_station_id column of this data set to the numeric type and tried to merge individual monthly data sets (tripdata202103, tripdata202104, tripdata202105, etc.) into the single large data set called tripdata again.

# Change end_station_id to the numeric type for the tripdata202110 dataset

tripdata202110 <- mutate(tripdata202110, end_station_id = as.numeric(end_station_id))

# Merge individual monthly datasets into a single large dataset, attempt no. 2

tripdata <- bind_rows(tripdata202103, tripdata202104, tripdata202105, tripdata202106, tripdata202107, tripdata202108, tripdata202109, tripdata202110, tripdata202111, tripdata202112, tripdata202201, tripdata202202, tripdata202203)

As the second try was successful and the tripdata data set was created, I was able to see a list of its column names, the first rows, statistical summary, and other things. I needed that to better understand the data I am working with.

# View of the merged dataset in a table form

View(tripdata)

# Check the merged dataset
## See a list of column names

print("A list of column names:")
colnames(tripdata)

## See the first 6 rows

print("The first 6 rows:")
head(tripdata)

## See a list of columns and data types (numeric, character, etc.)

print("A list of columns and data types (numeric, character, etc.):")
str(tripdata)

## Glimpse of data

print("Glimpse:")
glimpse(tripdata)

## Statistical summary of data

print("Summary:")
summary(tripdata)

6. Process

The step will consist of these key tasks:

Connect business objectives to data analysis.
Clean small and large datasets using the «R» programming language.
Document the data-cleaning process.

6.1. The Code Used For The «Process» Step

To get non-biased conclusions from the tripdata data set I am working on, I need to clean it beforehand. For the latter, I removed all empty (NA, null, etc.) values from it and put the «cleaned» result into the tripdata_clean variable.

# Clean the tripdata database to be able to properly work with it
## Drop all NA (null = empty values)

tripdata_clean <- drop_na(tripdata)

## View the merged clean database in a table form

View(tripdata_clean)

7. Analyze

The step will consist of these key tasks:

Sort, filter, convert, and format data using the «R» programming language.
Substantiate data analysis process.
Seek feedback and support from others during data analysis.

7.1. The Code Used For The «Analyze» Step

I changed the type of data in the date row to an appropriate one for calculations and separated the month, day, year, and day of the week values into their own rows. I needed that for convenience in analyzing the tripdata_clean data set and for future data manipulations.

# Create new columns
## Change the type of data in the date row to an appropriate one for calculations

tripdata_clean$date <- as.Date(tripdata_clean$started_at)

## Separate the dates into month

tripdata_clean$month <- format(as.Date(tripdata_clean$date), "%m")

## Separate the dates into day

tripdata_clean$day <- format(as.Date(tripdata_clean$date), 
"%d")

## Separate the dates into year

tripdata_clean$year <- format(as.Date(tripdata_clean$date), "%Y")

## Separate the dates into day of a week

tripdata_clean$day_of_week <- format(as.Date(tripdata_clean$date), "%A")

Additionally, I created the new columns: duration of the ride length (in seconds), ride distance traveled (in kilometers), and ride speed (in km/h).

# Create new columns
## Duration of the ride length in seconds

tripdata_clean$ride_length <- difftime(tripdata_clean$ended_at, tripdata_clean$started_at)
## Ride distance traveled in kilometers
tripdata_clean$ride_distance <- distGeo(matrix(c(tripdata_clean$start_lng, tripdata_clean$start_lat), ncol = 2), matrix(c(tripdata_clean$end_lng, tripdata_clean$end_lat), ncol = 2))
tripdata_clean$ride_distance <- tripdata_clean$ride_distance / 1000

## Ride speed in km/h

tripdata_clean$ride_speed = c(tripdata_clean$ride_distance) / as.numeric(c(tripdata_clean$ride_length), units="hours")

As the tripdata_clean data set could contain unnecessary values when bikes were taken out of docks and checked for quality by «Citi Bike» employees (i.e., not average users) or when ride_length was negative, I needed to make sure they will not affect my research and make the results biased.

# Double-check there will be no values when bikes were taken out of docks and checked for quality by Citi Bike employees or when ride_length was negative

tripdata_clean <- tripdata_clean[!(tripdata_clean$start_station_name == "HQ QR" | tripdata_clean$ride_length < 0),]

After preparing the tripdata_clean data set, I calculated the average distance and ride length for both the casual and member type users to know who uses «Citi Bike» bicycles more actively.

# Calculate the average distance for both the casual and member type users

member_casual_mean <- tripdata_clean %>%
  group_by(member_casual) %>%
  summarise(mean_time = mean(ride_length), mean_distance = mean(ride_distance))

## Build a table with the results

View(member_casual_mean)

Here is the member_casual_mean table I got:

As it can be seen from the table, the mean ride length for a casual user is 1551.085 seconds (25.851 minutes) with the mean ride distance of 2.027 kilometers.
For a member user, the mean ride length is 810.361 seconds (13.506 minutes) with the mean ride distance of 1.8207 kilometers.

It means a casual’s ride length is almost 2 times more on average in comparison with a member’s one! Additionally, a casual’s ride distance is greater as well.

For a better understanding of the results above, I built 2 different plots (mean travel time, mean ride distance) and combined them on the third plot, which are located below.

# Build a plot of mean travel time by user type

member_casual_mean_time <- ggplot(member_casual_mean) + 
  geom_col(mapping = aes(x = member_casual, y = mean_time, fill = member_casual), show.legend = TRUE) +
  labs(title = "Mean travel time by user type: Member / Casual", x = "User type", y = "Mean time in sec", caption = "Data by Citi Bike. Plot by Vlad Dorokhin")

## Build a graph with the results

print(member_casual_mean_time)

Here is the first plot I got:

# Build a plot of mean travel distance by user type

member_casual_mean_distance <- ggplot(userType_means) + 
  geom_col(mapping = aes(x = member_casual, y = mean_distance, fill = member_casual), show.legend = TRUE) +
  labs(title = "Mean travel distance by user type: Member / Casual", x = "User type", y = "Mean distance in km", caption = "Data by Citi Bike. Plot by Vlad Dorokhin")

## Build a graph with the results

print(member_casual_mean_distance)

Here is the second plot I got:

# Combine two recent plots (mean travel time and mean travel distance) together

grid.arrange(member_casual_mean_time, member_casual_mean_distance, ncol = 2)

Here is the combined plots (2 plots above) I got:

After discovering mean travel time and mean ride distance for casual users and members, I checked the number of rides by user type during a week to get a better understanding of the users’ activity.

# Check the number of rides by user type during a week
## Build a tibble of the data

member_casual_rides_week <- tripdata_clean %>% 
    mutate(weekday = wday(started_at, label = TRUE)) %>% 
    group_by(member_casual, weekday) %>% 
    summarise(number_of_rides = n(), average_duration = mean(ride_length), .groups = 'drop') %>% 
    arrange(member_casual, weekday)
print(member_casual_rides_week)

Here is the 14 x 4 tibble (data frame) I got:

## Build a plot of the data

tripdata_clean %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n(), average_duration = mean(ride_length), .groups = 'drop') %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge") +
  labs(title = "Number of rides by user type during the week", x = "Days of the week", y = "Number of rides", caption = "Data by Citi Bike. Plot by Vlad Dorokhin", fill = "User type") +
  theme(legend.position = "top")

Here is the plot I got:

As it can be seen from the plot, members make a lot more bike trips in comparison with casual users during the week. Additionally, these conclusions can be made:

Members make the most number of trips on Wednesdays (the middle of the workweek) and make the least amount on Sundays. In general, members make more trips during the workweek rather than on weekends.
Casual users have different results — they make the most number of trips on Saturdays and Sundays (i.e., weekends). During the workweek, casual users make trips much less, with Mondays being the least active day for them.

After checking the number of rides by user type during a week, I created a new data set containing only classic bikes and electric bikes in the rideable_type column. I called it tripdata_clean_classic_electric. Using this data set, I was able to check the bike type (classic bike / electric bike) usage by user type (member / casual user) and build a corresponding plot with the results.

# Create a new data set with only classic bikes and electric bikes in the rideable_type column

tripdata_clean_classic_electric <- tripdata_clean %>%
  filter(rideable_type == "classic_bike" | rideable_type == "electric_bike")

# Check the bike type usage by user type
## Build a tibble of the data

member_casual_classic_electric <- tripdata_clean_classic_electric %>%
  group_by(member_casual, rideable_type) %>%
  summarise(totals=n(), .groups="drop")
print(member_casual_classic_electric)

Here is the 4 x 3 tibble (data frame) I got:

## Build a plot of the data

tripdata_clean_classic_electric %>%
  group_by(member_casual, rideable_type) %>%
  summarise(totals=n(), .groups="drop")  %>%
  ggplot() +
  geom_col(aes(x = member_casual, y = totals, fill = rideable_type), position = "dodge") + 
  labs(title = "Bike type usage by user type: Classic Bike / Electric Bike", x = "User type", y = NULL, fill = "Bike type", caption = "Data by Citi Bike. Plot by Vlad Dorokhin") +
  scale_fill_manual(values = c("classic_bike" = "#ffa600", "electric_bike" = "#bc5090")) +
  theme_minimal() +
  theme(legend.position = "top")

Here is the plot I got:

As it can be seen from the plot:

Both casual users and members use classic bikes much more frequently than electric bikes!
Members use both electric bikes and classic bikes much more than casual users.

Thereafter, I checked the bike type (classic bike / electric bike) usage by both user types (member + casual user) during a week and built a corresponding plot with the results.

# Check the bike types usage by both user types during a week
## Build a tibble of the data

member_casual_classic_electric_week <- tripdata_clean_classic_electric %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, rideable_type, weekday) %>%
  summarise(totals=n(), .groups="drop")
print(member_casual_classic_electric_week)

Here is the 28 x 4 tibble (data frame) I got:

## Build a plot of the data

tripdata_clean_classic_electric %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, rideable_type, weekday) %>%
  summarise(totals=n(), .groups="drop") %>%
  ggplot(aes(x = weekday, y = totals, fill = rideable_type)) +
  geom_col(position = "dodge") + 
  facet_wrap(~member_casual) +
  labs(title = "Bike type usage by user type during a week", x = "User type", y = NULL, caption = "Data by Citi Bike. Plot by Vlad Dorokhin") +
  scale_fill_manual(values = c("classic_bike" = "#ffa600", "electric_bike" = "#bc5090")) +
  theme_minimal() +
  theme(legend.position="top")

Here is the plot I got:

As it can be seen from the plot:

Both casual users and members use classic bikes much more frequently than electric bikes every day of the week.
Members use both classic bikes and electric bikes more frequently in comparison with casual users.
A day casual users use classic bikes the most is Saturday, followed by Sunday. The usage of electric bikes is more or less stable for casual members during the whole week.
A day members use classic bikes the most is Wednesday (the middle of the workweek) followed by Thursday and Tuesday. These exact days (in the same order) are the most popular for electric bikes usage by members as well.

After checking the bike type (classic bike / electric bike) usage by both user types (member + casual user) during a week, I checked the coordinates’ data of the rides with the most popular routes — the ones used > 1000 times and > 500 times.

# Check the coordinates data of the rides
## Create a table only for the top 1 most popular routes (used > 1000 times)

tripdata_coordinates_1000 <- tripdata_clean %>% 
  filter(start_lng != end_lng & start_lat != end_lat) %>%
  group_by(start_lng, start_lat, end_lng, end_lat, member_casual, rideable_type) %>%
  summarise(total = n(), .groups="drop") %>%
  filter(total > 1000)
print(tripdata_coordinates_1000)

Here is the 226 x 7 tibble (data frame) I got for tripdata_coordinates_1000:

## Create 2 sub-tables for each user type for the top 1 most popular routes (used > 1000 times)

casual_1000 <- tripdata_coordinates_1000 %>% filter(member_casual == "casual")
member_1000 <- tripdata_coordinates_1000 %>% filter(member_casual == "member")
print(casual_1000)
print(member_1000)

Here is the 20 x 7 tibble (data frame) I got for casual_1000:

Here is the 206 x 7 tibble (data frame) I got for member_1000:

## Create a table only for the top 2 most popular routes (used > 500 times)

tripdata_coordinates_500 <- tripdata_clean %>% 
  filter(start_lng != end_lng & start_lat != end_lat) %>%
  group_by(start_lng, start_lat, end_lng, end_lat, member_casual, rideable_type) %>%
  summarise(total = n(), .groups="drop") %>%
  filter(total > 500)
print(tripdata_coordinates_500)

Here is the 1,596 x 7 tibble (data frame) I got for tripdata_coordinates_500:

## Create 2 sub-tables for each user type for the top 2 most popular routes (used > 500 times)

casual_500 <- tripdata_coordinates_500 %>% filter(member_casual == "casual")
member_500 <- tripdata_coordinates_500 %>% filter(member_casual == "member")
print(casual_500)
print(member_500)

Here is the 89 x 7 tibble (data frame) I got for casual_500:

Here is the 1,507 x 7 tibble (data frame) I got for member_500:

At the end, I built 4 plots of coordinates’ data of the rides with the most popular routes: 2 plots for the ones used > 1000 times and 2 for the ones used > 500 times.

# Store bounding box coordinates for ggmap

nyc_bounding_box <- c(
  left = -74.15,
  bottom = 40.5774,
  right = -73.7004,
  top = 40.9176
)

# Store the stamen map of NYC

nyc_stamen_map <- get_stamenmap(
  bbox = nyc_bounding_box,
  zoom = 12,
  maptype = "toner"
)

# Plot of the data by casual users on the map (for routes used > 1000 times)

ggmap(nyc_stamen_map, darken = c(0.8, "white")) +
  geom_curve(casual_1000, mapping = aes(x = start_lng, y = start_lat, xend = end_lng, yend = end_lat, alpha = total, color = rideable_type), size = 0.5, curvature = .2, arrow = arrow(length = unit(0.2, "cm"), ends = "first", type = "closed")) +
  coord_cartesian() +
  labs(title = "Most popular routes by casual users (for routes used > 1000 times)", x = NULL, y = NULL, color = "User type", caption = "Data by Citi Bike. Plot by Vlad Dorokhin") +
  theme(legend.position="right")

Here is the plot I got for the most popular routes by casual users (for routes used > 1000 times):

As it can be seen from the plot:

The most popular routes for docked bikes are located at Central Park — especially on the side of W 59th St.
The most popular routes for classic bikes are located at Governors Island — especially on the side of the Hugh L. Carey Tunnel.
The notable popular route for classic bikes is located on the West St on the side of Hudson River Park.
Outside of Manhattan, the most popular routes are also located at the Brooklyn Bridge & at the Franklin D. Roosevelt Four Freedoms Park for classic bikes and at Prospect Park (Brooklyn)) on the side of Prospect Park West for docked bikes.

# Plot of the data by casual users on the map (for routes used > 500 times)

ggmap(nyc_stamen_map, darken = c(0.8, "white")) +
  geom_curve(casual_500, mapping = aes(x = start_lng, y = start_lat, xend = end_lng, yend = end_lat, alpha = total, color = rideable_type), size = 0.5, curvature = .2, arrow = arrow(length = unit(0.2, "cm"), ends = "first", type = "closed")) +
  coord_cartesian() +
  labs(title = "Most popular routes by casual users (for routes used > 500 times)", x = NULL, y = NULL, color = "User type", caption = "Data by Citi Bike. Plot by Vlad Dorokhin") +
  theme(legend.position="right")

Here is the plot I got for the most popular routes by casual users (for routes used > 500 times):

Interesting fact:

As it can be seen from the plot, there are no popular routes (used > 1000 or even > 500 times) for electric bikes by casual users.

# Plot of the data by annual members on the map (for routes used > 1000 times)

ggmap(nyc_stamen_map, darken = c(0.8, "white")) +
  geom_curve(member_1000, mapping = aes(x = start_lng, y = start_lat, xend = end_lng, yend = end_lat, alpha = total, color = rideable_type), size = 0.5, curvature = .2, arrow = arrow(length = unit(0.2,"cm"), ends="first", type = "closed")) +  
  coord_cartesian() +
  labs(title = "Most popular routes by annual members (for routes used > 1000 times)", x = NULL,y = NULL, caption = "Data by Citi Bike. Plot by Vlad Dorokhin") +
  theme(legend.position="right")

Here is the plot I got for the most popular routes by annual members (for routes used > 1000 times):

As it can be seen from the plot:

The most popular routes for classic bikes are located at the Lower East Side nearby the Williamsburg Bridge, at the Kips Bay neighborhood, at the Chelsea neighborhood, at the Lincoln Tunnel, and at East Harlem.
The most popular routes for docked bikes are located at at the Chelsea neighborhood, at the Yorkville neighborhood. Several pretty popular routes for docked bikes are also located around a half of Central Park.
Outside of Manhattan, the most popular routes are also located at Queensbridge, at the Williamsburg neighborhood, at Governors Island (especially on the side of the Hugh L. Carey Tunnel), & at the Franklin D. Roosevelt Four Freedoms Park for classic bikes. For docked bikes, it is at Prospect Park (Brooklyn) on the side of Prospect Park West.

# Plot of the data by annual members on the map (for routes used > 500 times)

ggmap(nyc_stamen_map, darken = c(0.8, "white")) +
  geom_curve(member_500, mapping = aes(x = start_lng, y = start_lat, xend = end_lng, yend = end_lat, alpha = total, color = rideable_type), size = 0.5, curvature = .2, arrow = arrow(length = unit(0.2,"cm"), ends="first", type = "closed")) +  
  coord_cartesian() +
  labs(title = "Most popular routes by annual members (for routes used > 500 times)", x = NULL,y = NULL, caption = "Data by Citi Bike. Plot by Vlad Dorokhin") +
  theme(legend.position="right")

Here is the plot I got for the most popular routes by annual members (for routes used > 500 times):

Interesting facts:

As it can be seen on the plot of the data by annual members on the map (for the top 1 routes used > 1000 times), there are no routes used by electric bikes.
However, on the plot of the data by annual members on the map (for the top 2 routes used > 500 times), the most popular electric bike routes are located at the Pulaski Bridge, at the Garment District neighborhood, and at the Financial District neighborhood nearby One New York Plaza.

The step will consist of these tasks:

Create visualizations and dashboards (all of them are displayed above).
Telling a data-driven story.
Present to others and answer questions about data.

8.1. The Average Distance And Ride Length

As the result of calculating the average distance and ride length (for both the casual users and annual members), I found out that:

The mean ride length for a casual user is 1551.085 seconds (25.851 minutes) with the mean ride distance of 2.027 kilometers.
The mean ride length is 810.361 seconds (13.506 minutes) with the mean ride distance of 1.8207 kilometers.

It means a casual’s ride length is almost 2 times more on average in comparison with a member’s one! Additionally, a casual’s ride distance is greater as well.

Here are the corresponding table and plot:

8.2. Number Of Rides By User Type During A Week

After checking the number of rides by user type (casual users / annual members) during a week, I found out that:

Members make a lot more bike trips in comparison with casual users during the week.
Members make the most number of trips on Wednesdays (the middle of the workweek) and make the least amount on Sundays. In general, members make more trips during the workweek rather than on weekends.
Casual users have different results — they make the most number of trips on Saturdays and Sundays (i.e., weekends). During the workweek, casual users make trips much less, with Mondays being the least active day for them.

Here is the corresponding plot:

8.3. Bike Type Usage By User Type

After checking the bike type (classic bike / electric bike) usage by user type (member / casual user), I found out that:

Both casual users and members use classic bikes much more frequently than electric bikes!
Members use both electric bikes and classic bikes much more than casual users.

Here is the corresponding plot:

8.4. Bike Type Usage By Both User Types During A Week

After checking the bike type (classic bike / electric bike) usage by both user types (member + casual user) during a week, I found out that:

Both casual users and members use classic bikes much more frequently than electric bikes every day of the week.
Members use both classic bikes and electric bikes more frequently in comparison with casual users.
A day casual users use classic bikes the most is Saturday, followed by Sunday. The usage of electric bikes is more or less stable for casual members during the whole week.
A day members use classic bikes the most is Wednesday (the middle of the workweek) followed by Thursday and Tuesday. These exact days (in the same order) are the most popular for electric bikes usage by members as well.

Here is the corresponding plot:

8.5. The Most Popular Routes By Casual Users (Used > 1000 Times (1) And Used > 500 Times (2))

After building a plot for the most popular routes by casual users (for routes used > 1000 times), I found out that:

The most popular routes for docked bikes are located at Central Park — especially on the side of W 59th St.
The most popular routes for classic bikes are located at Governors Island — especially on the side of the Hugh L. Carey Tunnel.
The notable popular route for classic bikes is located on the West St on the side of Hudson River Park.
Outside of Manhattan, the most popular routes are also located at the Brooklyn Bridge & at the Franklin D. Roosevelt Four Freedoms Park for classic bikes and at Prospect Park (Brooklyn)) on the side of Prospect Park West for docked bikes.

(1) Here is the corresponding plot:

After adding the most popular routes by casual users (used > 500 times) to the plot above, I found out that:

There are no popular routes (used > 1000 or even > 500 times) for electric bikes by casual users.

(2) Here is the corresponding updated plot:

8.6. The Most Popular Routes By Annual Members (Used > 1000 Times (1) And Used > 500 Times (2))

After building a plot for the most popular routes by casual users (for routes used > 1000 times), I found out that:

The most popular routes for classic bikes are located at the Lower East Side nearby the Williamsburg Bridge, at the Kips Bay neighborhood, at the Chelsea neighborhood, at the Lincoln Tunnel, and at East Harlem.
The most popular routes for docked bikes are located at at the Chelsea neighborhood, at the Yorkville neighborhood. Several pretty popular routes for docked bikes are also located around a half of Central Park.
Outside of Manhattan, the most popular routes are also located at Queensbridge, at the Williamsburg neighborhood, at Governors Island (especially on the side of the Hugh L. Carey Tunnel), & at the Franklin D. Roosevelt Four Freedoms Park for classic bikes. For docked bikes, it is at Prospect Park (Brooklyn) on the side of Prospect Park West.
As it can be seen on the plot, there are no routes used by electric bikes.

(1) Here is the corresponding plot:

After adding the most popular routes by casual users (used > 500 times) to the plot above, I found out that:

The most popular electric bike routes are located at the Pulaski Bridge, at the Garment District neighborhood, and at the Financial District neighborhood nearby One New York Plaza.

(2) Here is the corresponding updated plot:

8.7. Suggestions To The Questions Guiding The Future Marketing Program For «Citi Bike»

Using the data presented above, I was able to give my suggestions to the 4 questions guiding the possible future marketing program for «Citi Bike»:

How do casual riders and annual members use «Citi Bike» bikes differently?
Why would casual riders buy «Citi Bike» annual memberships?
How can «Citi Bike» influence casual riders to become members?
How can «Citi Bike» influence annual members to use its services even more?

8.7.1. How do casual riders and annual members use «Citi Bike» bikes differently?

1. Casual riders:

Take rides with a greater distance and 2 times more length on average.
Make the most number of trips on Saturdays and Sundays (i.e., weekends).
Make much fewer trips during the workweek (Monday — Friday), with Mondays being the least active day for them.
Use classic bikes much more frequently than electric bikes, every day of the week.
Use classic bikes the most on Saturday, followed by Sunday. The usage of electric bikes is more or less stable for casual members during the whole week.
The most popular routes for docked bikes are located at Central Park — especially on the side of W 59th St.
The most popular routes for classic bikes are located at Governors Island — especially on the side of the Hugh L. Carey Tunnel.
The notable popular route for classic bikes is located on the West St on the side of Hudson River Park.
Outside of Manhattan, the most popular routes are also located at the Brooklyn Bridge & at the Franklin D. Roosevelt Four Freedoms Park for classic bikes and at Prospect Park (Brooklyn)) on the side of Prospect Park West for docked bikes.
There are no popular routes (used > 1000 or even > 500 times) for electric bikes.

2. Annual members:

Take rides with a smaller distance and 2 times less length on average.
Make a lot more bike trips during the week.
Make the most number of trips on Wednesdays (the middle of the workweek) and make the least amount on Sundays.
Make more trips during the workweek rather than on weekends in general.
Use classic bikes much more frequently than electric bikes, every day of the week.
Use both electric bikes and classic bikes much more than casual users.
Use classic bikes the most on Wednesday (the middle of the workweek) followed by Thursday and Tuesday. These exact days (in the same order) are the most popular for electric bikes usage by members as well.
The most popular routes for classic bikes are located at the Lower East Side nearby the Williamsburg Bridge, at the Kips Bay neighborhood, at the Chelsea neighborhood, at the Lincoln Tunnel, and at East Harlem.
The most popular routes for docked bikes are located at at the Chelsea neighborhood, at the Yorkville neighborhood. Several pretty popular routes for docked bikes are also located around a half of Central Park.
Outside of Manhattan, the most popular routes are also located at Queensbridge, at the Williamsburg neighborhood, at Governors Island (especially on the side of the Hugh L. Carey Tunnel), & at the Franklin D. Roosevelt Four Freedoms Park for classic bikes. For docked bikes, it is at Prospect Park (Brooklyn) on the side of Prospect Park West.
The most popular electric bike routes (used > 500 times) are located at the Pulaski Bridge, at the Garment District neighborhood, and at the Financial District neighborhood nearby One New York Plaza.

8.7.2. Why would casual riders buy «Citi Bike» annual memberships?

Casual riders would buy annual memberships to:

Save money.
Increase the convenience of «Citi Bike» bikes usage.
Stop thinking about paying for every single ride over and over again.
For possible access to additional statistics (how many calories they burned, routes they take the most, etc.) with gamification elements (daily achievements, local challenges, etc.)

8.7.3. How can «Citi Bike» influence casual riders to become members?

«Citi Bike» can influence casual riders to become members by:

Offering weekly and monthly subscription plans in addition to the day pass and annual membership plan «Citi Bike» currently has to motivate casual users to use «Citi Bike» services more during the workweek (i.e., Monday — Friday).
Offering special electric bikes-only subscription plans, which will be cheaper than «full» (classic + electric bikes) plans, as casual members use classic bikes much more frequently than electric bikes every day of the week.
Offering a discount for subscribing to the weekly/monthly/annual membership plan, which became a standard practice for subscription-based services. The bigger subscription is purchased — the bigger the discount should be!
Offering detailed statistics to «Citi Bike» weekly/monthly/annual members: how many calories they burned, routes they take the most, etc. Casual riders take rides with a greater distance and 2 times more length on average — therefore, more such statistics can be presented to them and more data «Citi Bike» can acquire from these users!
Offering social and saving opportunities to «Citi Bike» weekly/monthly/annual members: group bike riding with other «Citi Bike» members, discounts in partnership stores, etc. For example, since the most popular routes for docked bikes are located at Central Park, why not to think about mutual partnership with a dating service like «Tinder», «Bumble», or «Hinge» for summer?

8.7.4. How can «Citi Bike» influence annual members to use its services even more?

«Citi Bike» can influence annual members to use its services even more by:

Offering special gamification elements (like achievements / usage strikes) during the weekends, as annual members make the least number of rides on Sundays and Saturdays.
Offering promotional invites for cheap daily/weekly passes for an annual member’s friends and/or family members. These invites might motivate annual members to take rides with their friends/family on weekends (i.e., using «Citi Bike» more). Additionally, it also motivates these friends and/or family members to start using «Citi Bike» services on a weekly basis and consider paying for a subscription plan.
Offering special electric bikes-only gamification elements (like achievements / usage strikes) and promotions from partners, as annual members use classic bikes much more frequently than electric bikes every day of the week.
Sending occasional newsletter emails to annual members about advantages of using electric bikes in comparison with classic ones. As annual members use classic bikes much more frequently than electric bikes every day of the week, it reduces the lifespan of the electric bikes and «Citi Bike» spends budget ineffectively on restoring these bikes and changing their batteries.
Increasing amount of «Citi Bike» bikes and advertisement presence at Hudson River Park, Brooklyn Bridge (outside of Manhattan), the Franklin D. Roosevelt Four Freedoms Park, and at Prospect Park (Brooklyn)), since these areas are pretty popular among both types of users — casual ones and annual members.
Testing advertisement presence in Brooklyn at its biggest parks: Prospect Park (Brooklyn)) (besides the side of Prospect Park West), Sunset Park, the Dyker Beach Park and Golf Course, Luna Park (Coney Island)), Marine Park), Highland Park), etc.

9. Act

The step consists of these tasks I have already completed above (5. Share):

Prepare a final version of a report on data analysis to stakeholders.
Report on the data analysis to the stakeholders.

10. References

The post featured image by Vlad Dorokhin is licensed under CC BY-SA 4.0

—

A black-and-white image of people on bicycles is licensed under the public domain

New York City Bike-Sharing During COVID-19 (Data Analysis)

Table Of Contents

1. Introduction

2. About The Company

3. Research Project Roadmap

4. Ask

4.1. Objective

5. Prepare

5.1. Data Organization

5.2. Data Validation And Accessibility

5.3. Data Privacy

5.4. The Code Used For The «Prepare» Step

6. Process

6.1. The Code Used For The «Process» Step

7. Analyze

7.1. The Code Used For The «Analyze» Step

8. Share

8.1. The Average Distance And Ride Length

8.2. Number Of Rides By User Type During A Week

8.3. Bike Type Usage By User Type

8.4. Bike Type Usage By Both User Types During A Week

8.5. The Most Popular Routes By Casual Users (Used > 1000 Times (1) And Used > 500 Times (2))

8.6. The Most Popular Routes By Annual Members (Used > 1000 Times (1) And Used > 500 Times (2))

8.7. Suggestions To The Questions Guiding The Future Marketing Program For «Citi Bike»

8.7.1. How do casual riders and annual members use «Citi Bike» bikes differently?

8.7.2. Why would casual riders buy «Citi Bike» annual memberships?

8.7.3. How can «Citi Bike» influence casual riders to become members?

8.7.4. How can «Citi Bike» influence annual members to use its services even more?

9. Act

10. References