(In case you were wondering, that’s me!)
Math enthusiast with informal experience in coding. I am passionate with developing programmatic solutions willing to optimize and automate those sorts of processes humans are terribly designed for. The more that part of life is taken care of, the more time we have for those things humans excel.
Here I showcase the personal projects I managed to bring to life!
Web app that captures the index and thumb positions to simulate hand control of volume with video bar feedback. This implementation leverages CV2, mediapipe and Flask.
Created on May, 2021. Last relevant update on May, 2021.
Web app that translates from Portuguese to Esperando instantly after each keystroke. This implementation leverages a pre-trained transformers-based model available via the transformers library, and Flask.
Created on May, 2021. Last relevant update on May, 2021.
This implementation is confidential. I am relying on multidimensional data from clinical exams to develop a model capable of delivering fast, easy and reliable diagnostics for COVID-19. General information at the PoLiVirUS website.
Created on September, 2020. Last relevant update on January, 2021.
This implementation is confidential. In this project, I relied on BERT Portuguese to train a model to classify news articles into one of ten possible categories. The dataset was built from scratch using webscraping library Beautiful Soup (I still wonder where that name came from). The trained model was containerized using Docker and turned into a web tool using Google Kubernetes Engine. ~A prototype can be check here.~ (offline due to incident costs). A txt file containing one news text in Portuguese must be supplied. I suggest getting a news article from El País Brasil since the categories are expected to match.
Created on June, 2020. Last relevant update on June, 2020.
IPython Notebook containing my implementation of the apriori algorithm to determine association rules based on the dataset included in a Kaggle competition that goes by the same name. The data consists of over 3 million orders in a grocery store, indexed by user. The orders contain product information and timestamp.
Created on May 27, 2020. Last relevant update on May 27, 2020.
IPython Notebook containing code for my approach to the Can you determine if two individuals are related? Kaggle challenge. The dataset contains face photos of individuals grouped by family. The objective is to determine from the photos if two individuals are genetically related.
Created on July 31, 2019. Last relevant update on July 31, 2019.
IPython Notebook containing code for the exploration of the FIFA 2019 Complete Player Dataset from Kaggle. This dataset contains several attributes for all 18k+ FIFA soccer players. I used the technical features, and some physical features of the athletes, to group them in technical profiles. Then I investigated how these profiles were related to the players’s actual positions in the field.
Created on May 30, 2019. Last relevant update on May 30, 2019.
IPython Notebook containing code for my implementation of the NYC Taxi Fare Prediction challenge from Kaggle. This dataset contains loooots of instances for taxi rides, along with features depicting time, pickup/dropoff coordinates and number of passengers. I spent most of my energy on data exploration, data cleaning and feature engineering. It was one of the most fun datasets I worked with this far.
Created on April 9, 2019. Last relevant update on April 9, 2019.
IPython Notebook containing code for my implementation of the Human Activity Recognition Using Smartphones Data Set. The dataset contains features derived from movement measured by the accelerometer and gyroscope of a smartphone while volunteers were performing six activities. There are 10299 instances and 561 features for this classification problem. I took this opportunity to explore the Scikit-Learn documentation, first on the supervised learning methods, then on the best-performing model for the tuning process.
Created on March 29, 2019. Last relevant update on March 29, 2019.
IPython Notebook containing code for my implementation of the Black Friday Hackaton by Analytics Vidhya. The hackaton consisted of implementing a regression model to predict Purchase Amount from customer behavior and demografic information. I took this opportunity to delve a little deeper into the tuning process.
Created on March 11, 2019. Last relevant update on March 11, 2019.
IPython Notebook having code for my implementation of the 2009 Knowledge Discovery and Data Mining challenge. The challenge comprised implementing a model to classify three behaviors of the clientele, based on a large dataset containing 50,000 instances of 230 variables, of which 190 were numerical and 40 were categorical. The output classes were unbalanced.
Created on Dec 27, 2018. Last relevant update on Dec 27, 2018.
Code to use a Deep Learning approach to predict flower species from images. The project concerns downloading a pre-trained network, defining and training a classifier on a dataset containing 102 flower species and more than 6,500 images, saving the best-performing model, and using this model to make predictions on new images. This project was originally part of the PyTorch Scholarship Challenge Nanodegree Program, developed by Udacity and Facebook and hosted at Udacity.
Created on Dec 18, 2018. Last relevant update on Dec 18, 2018.
Code to implement a Shiny app consisting of a “next word” predictor, based on a text collection consisting of blogs, news and twitter texts. This project was originally part of the Data Science Specialization course, developed by the Johns Hopkins University and hosted at Coursera.
Created on Oct 19, 2018. Last relevant update on Oct 19, 2018.
Code to implement a Shiny app that runs a summary of one or more MyFxBook accounts, enabling risk management decisions.
Created on Sep 5, 2018. Last relevant update on Dec 24, 2018.
Code to investigate taxonomic and functional properties of the microbiome of samples of mangrove sediment, by means of next-generation DNA sequencing technology.
Created on Sep 4, 2018. Last relevant update on May 20, 2019.