View the Project on GitHub dennymarcels/NYCTaxiFarePrediction
This dataset was proposed as a challenge at Kaggle. It contains a train.csv
file, consisting of around 55M entries, and a test.csv
file having around 10k entries. The features given are: pickup_datetime
, pickup_longitude
, pickup_latitude
, dropoff_longitude
, dropoff_latitude
and passenger_count
. The objective of the challenge is to predict fare_amount
based on these features.
Although I trained a model in a sample of 250,000 rows and made predictions to submit, my main concern this time was on data exploration, data cleaning and feature engineering. This got me even to a new package that I used to plot coordinates over an actual map. Pretty sweet!