Driven by a passion for data analysis and optimization, I honed my skills in SQL, Python, and Spark at The University of New South Wales. With a strong foundation in data structures and problem-solving, I excel in transforming complex data into actionable insights. My academic projects demonstrate efficiency improvement, showcasing my analytical and innovative capabilities.
SQL
Python
Spark: rdd and df
Data structure and algorithm
C
Jupyter Notebook
About Spark:
①
Project name: Denoise-and-process-data-by-rdd-and-df
URL: https://github.com/Convergent-Fzx/Denoise-and-process-data-by-rdd-and-df
Description: This project aims to proficiently use two different methods—RDD and DataFrame—under the Spark framework to denoise and process messy data, and filter out data combinations that meet the requirements.
②
Project name: Denoising-data-by-using-mapreduce
URL: https://github.com/Convergent-Fzx/Denoising-data-by-using-mapreduce
Description: This project used MapReduce to filter, compute, and manipulate relatively clean datasets, utilizing jobconf and an external combiner to reduce time complexity, improve efficiency, and make the results clearer and more readable.
③
Project name: Calculate-jecardsimilarity-by-using-dataframe
URL: https://github.com/Convergent-Fzx/Calculate-jecardsimilarity-by-using-dataframe
Description: This project utilized the advanced PK-SORT algorithm under the Spark framework, employing prefix filtering to filter and compute large-scale data, significantly reducing time complexity and improving project efficiency.
About SQL:
Project name: unsw-datasystem
URL: https://github.com/Convergent-Fzx/unsw-datasystem
Description: Complete the filtering of project requirements in large-scale and complex database structures, and write functional search functions for the school's main database.
About Python:
Project name: labyrinth
URL: https://github.com/Convergent-Fzx/labyrinth
Description: Implemented multiple maze pathfinding algorithms using Python, fully solving algorithmic problems under different data structures. The project required strong independent modeling and analytical skills.
About C:
Project name: pagerank and making crawler
URL: https://github.com/Convergent-Fzx/pagerank
Description: This project is an implementation of graph data structures and algorithms, including the PageRank algorithm and Dijkstra's shortest path algorithm. The main part of the project is a C program for managing graphs, allowing the creation of graphs, adding vertices and edges, and computing and displaying the results of graph algorithms like PageRank and shortest paths.
About jupyter notebook:
Project name:Fashion-items-classification
URL: https://github.com/Convergent-Fzx/Fashion-items-classification
Description: Using different models and structures, the project achieved the classification of different types of items (13 categories). In this project, classic training models such as VGG, HCNN, and ResNet were used to directly train the dataset. The project also employed semi-supervised learning, reinforcement learning, and transfer learning to enhance imbalanced datasets, improving model accuracy and significantly reducing training difficulty.