top of page

Overview

As a group, we were tasked to select a Big Data set; in our case, we chose the 2021 Yellow Taxi Trip Data and Taxi Zone Dataset provided by NYC Open Data. The dataset contains information about every yellow taxi ride taken in NYC for 2021 & a dataset with zonal information. When defining the data analytics use case, we built a query system to display the different tipping trends and cash versus credit card usage patterns throughout New York City. Our goal is for this information to be readily available to taxi drivers so they can access it anytime and optimize their daily routes by maximizing their income based on specific zones and boroughs of New York City. This project was carried out using Spark, SQL, and Flask.


Take a look at the project below.

bottom of page