Designed and optimized complex Entity Relationship Diagrams (ERD) and implemented scalable middleware APIs with Django REST Framework, integrating front-end applications with AWS-hosted PostgreSQL databases through ORM design
Optimized and parallelized processing of large-scale cloud based datasets using Python packages Dask, Rasterio, Xarray, resulting in a 6x reduction in processing time and significant improvements in data accuracy for complex ETL workflows
Implemented robust software engineering practices including testing framework, Docker containerization, CI/CD pipelines, and version control Github to ensure scalable and maintainable data science code
Collaborate with cross-functional teams and client, coordinating efforts to implement end-to-end data pipelines, ensuring alignment with project goals and enhancing deliverable quality and code robustness
Research Data Analyst Intern
Walter and Eliza Institute of Medical Research
02.2024 - 06.2024
Served as part of the clinical dashboard project. Handled streaming patients' data using python and REDCap database and implement interactive dashboard with R shiny
Implemented a MongoDB instance, serving as an intermediary between the REDCap database and dashboard, to improve loading speed by 300%
Embedded geo-map, heat-map statistics, and Kaplan-Meier visualisations for survival rate on dashboard, facilitating data exploration for administrators and clinicians
Data Science Consultant
Fyto Fire-Fighting AI
02.2023 - 12.2023
Gathered and curated data from authoritative government sources, including Daymet, USGS, Cal-Fire, and NASA, ensuring access to reliable and high-quality datasets for analysis
Utilized geospatial techniques to merge, preprocess and integrate datasets, enabling comprehensive exploratory data analysis and visualisation with Python and ArcGIS tool
Achieved 90% accuracy by leveraging Ensemble Boosting method and CNN to classify fire occurrence within specified timeframes and locations for residents in California
Collaborated with a global team of data scientists and domain experts, fostering effective weekly communication and leverage diverse perspectives to enhance model performance
Data Engineer
Johnson Control
02.2022 - 01.2023
Implemented predictive maintenance for company-produced chillers using Weibull Distribution and Survival Analysis on Azure Databricks. Achieved significant savings in maintenance resources and improved budget efficiency
Supervised the secure document transformation process for internal employees using natural language processing techniques, with Spacy, Keras and NLTK packages on python
Managed weekly streaming data and seamlessly connected the data source with the Python script
Performed analysis and forecasting of time series alarm data from physical smart sensors with univariate models and boosting methods in Python, to monitor alarm anomaly and potential activity trends
Developed an interactive dashboard using Plotly Dash and Power BI for the aforementioned alert system, to presents data-driven insights to business leaders and maintenance workers
Extracted, transformed, and loaded industrial data from various sources, including MS SQL, Oracle DB, and Hive Database on Apache Hadoop. Automate the ETL process with Azure Data Factory Pipelines
Developed predictive models using Python and R to enhance data-driven decision-making.
Education
Master of Data Science -
The University of Melbourne
07.2024
Bachelor of Science - Data Science Major
The University of Melbourne
12.2021
Skills
Technical Skills: Data Warehousing Management, ETL Development, Data Pipeline, Cloud Data Architecture, Data Analysis and Modelling, Statistical Modelling, Machine Learning, Deep Learning(Tensorflow&Pytorch)
Tools: Python, AWS, SQL(Oracle/Hive/Microsoft SQL Server), Git(GitHub), Bash Scripting, Microsoft SQL Server, Microsoft Power BI, Microsoft Office, Microsoft Azure