Nathan Bennett

Data Engineer

Swipejobs

02.2018 - Current

Successfully trained and validated ML models using H2O and MLflow, which were then deployed into production. Employed Spark Streaming to efficiently process events from Kafka, H2O for prediction calculations, and Kafka for sending back results. This helped to enhance model accuracy and ensure smooth integration into our production environment.
Developed and maintained production ETL using Spark/Scala scripts to extract data from a variety of sources, such as MongoDB collections, log stash, and Postgres tables. Stored this data in S3 as Parquet files to ensure efficient data retrieval. This streamlined our data pipeline and improved data quality.
Utilized Apache Airflow to effectively orchestrate all tasks within the data warehouse, including ETL, reporting, and ML models running in production. This resulted in a more efficient and streamlined process.
Successfully migrated Spark jobs from running on EC2 instances to Kubernetes, resulting in a 33% reduction in daily ETL latency. Wrote a custom Spark submit operator in Airflow to create new pods in Kubernetes and execute Spark jobs. This allowed for faster data processing and more accurate reporting.
Utilized H2O.ai and NLP techniques to build and deploy a machine learning model as a Java microservice, parsing resumes to extract work history, skills, and education, resulting in a 27% faster profile creation process for applicants.
Integrated Apache Superset into our internal service desk to effectively display metrics and charts. Over 1600 users currently use these metrics to make key business decisions. This has significantly improved our ability to make data-driven decisions.
Successfully implemented Trino (formerly Presto) with Hive metastore to aggregate terabytes of Parquet files stored in S3. This allowed us to efficiently manage our data resources and perform complex queries in a timely manner.
Implemented and maintained Deequ for ETL tasks to ensure data quality. This helped us to identify and correct issues in our data pipeline, resulting in more accurate data and reporting.

Summary

Overview

Work History

Data Engineer

Education

Bachelor of Science (Honours) - Mathematics

Skills

Timeline

Data Engineer

Bachelor of Science (Honours) - Mathematics

Similar Profiles

Esmeralda QuijadaEsmeralda Quijada

Shawna BrownShawna Brown

OLABODE ADEBANJOOLABODE ADEBANJO

JESSICA PIERSONJESSICA PIERSON

Asmita ThapaAsmita Thapa