Matthew is available for hire

Matthew Alhonte

Verified Expert in Engineering

Data Scientist and Developer

Location

New York, NY, United States

Toptal Member Since

August 21, 2018

Matt has officially worked as a Python-based data scientist for the past six years; however, 在过去的十年里，他一直在研究统计学和编程的交叉领域(在数据科学家这个术语流行之前)。. 他将强大的技术技能与实验设计和统计推断的严格背景相结合. More recently, he's been focusing on machine learning, including some natural language processing and computer vision.

Portfolio

Ophidian Scientific

Amazon Web Services (AWS)， PostgreSQL, Keras, XGBoost, Random Forests, Spark...

Birch Infrastructure

Google Cloud Platform (GCP)， BigQuery, Data Build Tool (dbt)， Prefect, Python...

The University of Colorado — Office of Data Analytics

Amazon Web Services (AWS), XGBoost, Random Forests, Experimental Design...

Experience

Statistics - 11 years Data Visualization - 11 years Python - 6 years Machine Learning - 5 years Pandas - 5 years SQL - 5 years Functional Programming - 4 years Scikit-learn - 3 years

Availability

Part-time

Preferred Environment

PyCharm, Git, Spacemacs, Visual Studio Code (VS Code), Jupyter

The most amazing...

...我所做的是对一个包含电生理学读数的未记录文件格式进行逆向工程.

Work Experience

Data Science Consultant

2013 - PRESENT

Ophidian Scientific

Assisted numerous small clients with data-related work, ranging from data science and analysis, data engineering, and machine learning engineering.
Designed and built ETL pipelines in Python, Dask, and Prefect.
Oversaw the migrations between Google Sheets and Airtable. Airtable automation was execued in Python.
在Python中使用运筹学库来优化体育博彩网站FanDuel的团队.
为基于金融的出版物的文章归档构建了自然语言处理(NLP)分类器.

Technologies: Amazon Web Services (AWS)， PostgreSQL, Keras, XGBoost, Random Forests, Spark, Database Design, Experimental Design, Clojure, Docker, Jupyter, Time Series, Pandas, SQL, Machine Learning, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Operations Research, Data Visualization, ETL, Scientific Data Analysis, Data Engineering, Data Science, Python

Data Scientist & Data Architect

2021 - 2021

Birch Infrastructure

协助为一家公用事业规模的可再生能源和数据中心公司设计数据基础设施.
Created data pipelines with Prefect, mostly stitching together Google Cloud Functions and Cloud Run jobs.
使用dbt管理BigQuery数据仓库，制作表模式和转换.
Set up data infrastructure (including Prefect and dbt).

Technologies: Google Cloud Platform (GCP)， BigQuery, Data Build Tool (dbt)， Prefect, Python, Serverless

Senior Data Scientist

2018 - 2019

The University of Colorado — Office of Data Analytics

执行统计分析和建模，以支持学生的成功，并在大学数据分析办公室的重组期间帮助建立实践.
使用Jupyter和Zeppelin创建并向高级管理人员展示结果和可视化.
开发了一个基于蒙特卡罗模拟的模型来预测每个学期的学生保留率.
Built a Bayesian model of re-offense after student misconduct.
Modeled the effects of different kinds of financial aid with XGBoost.
Created a model to predict student GPAs with scikit-learn and Keras.
在大学数据分析办公室的重组过程中帮助建立实践.

Technologies: Amazon Web Services (AWS), XGBoost, Random Forests, Experimental Design, Data Visualization, Time Series, SQL, Data Science, Machine Learning, Oracle Database, Zeppelin, Jupyter, Keras, PySpark, Scikit-learn, Pandas, Python

Data Engineer

2017 - 2018

NOMI Beauty

为一家初创公司设计并构建了数据基础设施，使其更容易预订头发&-makeup appointments in hotel rooms.
Architected a big data pipeline with Spark, Kafka, and Cassandra.
Built data dashboards in Tableau for the operations team.
Designed an ETL for survey data from Typeform's API into MySQL.
在Jupyter笔记本中创建报告，使用Python与Altair和Seaborn进行数据可视化.
Designed and implemented a database schema in MySQL.
Designed and supported ETL from Couchbase to MySQL using Python.

Technologies: Amazon Web Services (AWS), Spark, Database Design, Data Visualization, SQL, Jupyter, Simulations, Cassandra, Apache Kafka, PySpark, MySQL, Pandas, Python

Data Science and Blockchain Integration Consultant

2017 - 2017

Tanktwo, Inc.

为管理物联网设备及其生成的数据构建了基于区块链的解决方案.
Create a demo of a potential network using Hyperledger.
Simulated a private blockchain network in action, using Python.
帮助向寻求投资的风险资本家展示了一个演示.
研究区块链的最佳实现，以满足业务需求.

Technologies: Amazon Web Services (AWS), Jupyter, Data Visualization, Time Series, Hyperledger, Pandas, Python

Data Science Consultant

2014 - 2017

Hospital for Special Surgery

在神经病学实验室研究数据科学课题，研究术中神经生理监测(IONM) -在手术过程中监测肌肉和神经以防止损伤.
逆向工程一个包含生物信号数据的未记录文件格式.
尝试使用Scikit-learn将神经传导读数分类为指示损伤或麻醉反应.
Visualized biosignal data with Plotly and presented findings.
研究了手术期间神经传导读数的Higuchi分形维数作为评估潜在损伤的手段.
使用Python数据套件(NumPy, Pandas和SciPy)分析生物信号数据.

Technologies: Experimental Design, Data Visualization, Time Series, Data Science, Machine Learning, Scikit-learn, PyEEG, Jupyter, Plotly, SciPy, Pandas, NumPy, Python

Natural Language Processing Consultant

2015 - 2015

New York City Department of Administrative Services

用Python抓取pdf文件，帮助将出版物《欧博体育app下载》(the City Record)的旧目录数字化.
Helped design a schema for entries (such as extracting addresses).
创建数据清理机制，以标准化来自100多个城市机构以不同格式报告的条目.
使用Python和NLTK对长达一个世纪的出版物语料库执行探索性自然语言处理(NLP).
Worked to integrate a pipeline into their MS Access.

Technologies: Jupyter, Data Visualization, Data Science, Machine Learning, Python, Natural Language Toolkit (NLTK)

Integration and Development Consultant

2013 - 2014

Broadband Technologies Group

Provided computer vision-based assistance for digitizing video archives.
Used OpenCV and Python to tag damaged video areas.
实现Python自动修复某些类型的损坏视频.
帮助构建一个Android应用程序，为现场表演提供同步字幕.
Prepared presentations with Jupyter.

Technologies: Jupyter, Data Visualization, OpenCV, Python

Research Assistant

2008 - 2013

Hunter College

Designed and validated a novel psychometric scale.
Analyzed survey data in SPSS.
Presented findings at research conferences.
Maintained relationships with the lab after graduation, eventually moving from data analysis to Python.
Worked on the publication of older data.

技术:实验设计，数据可视化，数据科学，SciPy, Python, SPSS

Summer Research Assistant

2009 - 2010

Yale School of Medicine

设计并指导了一项小型研究，调查最后通牒游戏中的精神病特征和行为.
Analyzed GSR data.
通过演示和DMDX对研究参与者进行基于计算机的任务.
Analyzed data from surveys and computer-based tasks.
Built and maintained a database of participants.

技术:实验设计、数据可视化、数据科学、DMDX、SPSS

Experience

Spring 2018 Complexity Challenge

http://github.com/mattalhonte/sfi-challenge

我在圣达菲研究所举办的2018年春季复杂性挑战赛中的参赛作品.

Graph Theory Notes

这是我写的一些代码，用来帮助我理解算法信息论在线课程的图论部分.

Binary Grid Search

http://hackersandslackers.com/tuning-machine-learning-hyperparameters-with-binary-search/

在这里，我正在尝试使用二分搜索来调整Scikit-learn中机器学习模型的超参数.

Recasting Low-cardinality Columns as Categoricals

http://hackersandslackers.com/recasting-low-cardinality-columns-as-categoricals-2

一个关于在Pandas中使用分类变量节省内存的简短教程. 它包括一个代码片段，用于获取数据帧并将低基数列重新转换为分类.

Removing Duplicate Columns in Pandas

http://hackersandslackers.com/remove-duplicate-columns-in-pandas

A short tutorial on finding and removing duplicate columns in Pandas.

Downcast Numerical Data Types with Pandas

http://hackersandslackers.com/downcast-numerical-columns-python-pandas/

关于通过将Pandas列向下转换为尽可能小的数字表示来节省内存的简短教程.

Sentiment Analysis With AWS SageMaker

http://github.com/mattalhonte/sagemaker-deployment/tree/master/Project

使用SageMaker版本的XGBoost对电影评论进行正面或负面分类.

Epilepsy Classifier

http://github.com/mattalhonte/epilepsy-classifier

A capstone project for Udacity's machine learning engineer nanodegree.

Python to Rust

A short walkthrough of training a machine learning model in Python, exporting a model artifact, and serving predictions in Rust. 它被接受为一个名为“tract”的相关Rust crate的官方文档."

Splitting Columns With Pandas

http://hackersandslackers.com/splitting-columns-with-pandas/

I wrote a tutorial on splitting up Pandas columns with nested data.

Education

2006 - 2012

Bachelor of Arts Degree in Psychology

Hunter College - New York City, NY, USA

Certifications

JANUARY 2020 - PRESENT

Machine Learning Engineer Nanodegree

Udacity

Skills

Libraries/APIs

Pandas, Scikit-learn, TensorFlow Deep Learning Library (TFLearn), XGBoost, NumPy, Keras, Dask, SciPy, OpenCV, Natural Language Toolkit (NLTK), PySpark, TensorFlow

Tools

DataViz, Jupyter, Spacemacs, PyCharm, SPSS, Plotly, DMDX, Git, Amazon SageMaker, BigQuery

Languages

Python 3, Python, SQL, Snowflake, Clojure, Rust

Paradigms

Data Science, Database Design, Agile, Functional Programming, ETL

Platforms

Jupyter Notebook, Amazon Web Services (AWS), Docker, Hyperledger, Oracle Database, Linux, Zeppelin, Apache Kafka, Google Cloud Platform (GCP), Visual Studio Code (VS Code)

Frameworks

Spark

Storage

Databases, NoSQL, Cassandra, PostgreSQL, MySQL

Other

Data, Statistical Data Analysis, Exploratory Data Analysis, Unstructured Data Analysis, Complex Data Analysis, Statistical Methods, Statistical Modeling, Statistical Forecasting, Statistical Analysis, Statistical Significance, Random Forests, Random Forest Regression, Experimental Design, Time Series, Machine Learning, Predictive Modeling, Data Visualization, Data Analysis, Data Analytics, Statistics, Computational Statistics, Bayesian Statistics, Statistical Programming, Amazon Machine Learning, Tf-idf, Convolutional Neural Networks (CNN), Analysis of Variance (ANOVA), Dashboards, Analytical Dashboards, Data Build Tool (dbt), Deep Learning, Natural Language Processing (NLP), Mathematical Modeling, Data Engineering, GPT, Generative Pre-trained Transformers (GPT), Operations Research, Simulations, PyEEG, Scientific Data Analysis, Prefect, Serverless

Collaboration That Works

How to Work with Toptal

在数小时内，而不是数周或数月，我们的网络将为您直接匹配全球行业专家.

Share your needs

在与Toptal领域专家的电话中讨论您的需求并细化您的范围.

Choose your talent

在24小时内获得专业匹配人才的简短列表，以进行审查，面试和选择.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring