Summary
Overview
Work History
Education
Skills
Certification
Work Availability
Languages
Timeline
Projects
Generic

Crystal(Tianhui) Hu

NLP Research Scientist/ AI Engineer
Australian Citizenship,NSW

Summary

Grew up in Australia & Aussie citizen with Japan's Permanent Residency, results-driven AI and data scientist with over six years of experience specializing in credit risk, fraud detection, and RAG in natural language processing. Global perspective enriched by five professional years in Sydney with local English and two years in Tokyo, collaborating directly with Japanese stakeholders in their native language. Expertise includes building predictive models and engineering advanced NLP solutions, highlighted by the development of a patented RAG-based 'Evidence Tracker' that effectively eliminates LLM hallucinations to ensure reliable information delivery. Thrives in cross-functional environments with a strong commitment to aligning technical solutions with business objectives, leveraging a comprehensive, user-focused mindset to shape product direction and enhance tool robustness through innovative NLP methods.

Overview

7
7
years of professional experience
2036
2036
years of post-secondary education
6
6
Certificates
4
4
Languages

Work History

NLP Research Engineer

Moneyforward
06.2025 - Current
  • Developed a patented 'Evidence Tracker' tool based on Corrective Retrieval-Augmented Generation (CRAG) to eliminate hallucinations from large language models (LLMs), supporting the customer support team in finding accurate answers. This involved building and optimizing deep learning-based embeddings for document retrieval.
  • Identifying promising research topics from the latest studies in reducing hallucination in critical high risk field like finance , law , tax.
  • Evaluated the new system's performance, and performed fine-tuning to optimize results by self-designed evaluation metrics and industry benchmarks.
  • Core algorithm development and UI development via the Streamlit interface.

Research Scientist in Credit Risk

Money Forward
06.2024 - Current
  • Conducted research on credit limit determination for factoring services, focusing on improving accuracy through statistical analysis and machine learning.
  • Investigated cutting-edge credit control methods within B2B lending sector, facilitating long-term strategy formulation for Risk Department
  • Scrutinized large datasets, highlighting essential variables like lead desired amount and purchased amount, and advocated for new methodologies in setting credit limits.
  • Coordinated with BFW (BizForward) to integrate research goals with business priorities, articulating and demonstrating findings to stakeholders.
  • Evaluated limitations of present credit limit estimation approaches and devised pre-screening measures to inform users of lending limits ahead of application evaluation
  • Facilitated strategic planning for entire research project

Data Scientist

Commonwealth Bank of Australia
12.2021 - 10.2023
  • Developed and deployed advanced credit risk assessment models (PD model), applying statistical techniques and advanced analytics to drive data-driven decisions.
  • Conducted comprehensive data pre-processing, ensuring data quality by handling unbalanced datasets, missing value imputation, and one-hot encoding. Built and deployed ETL pipelines in Azure/AWS for data cleaning, transformation, and model training.
  • Design new metrics in home loan-related parameters and requirements, and forge robust data science roadmaps.
  • Gathered reporting requirements about construction loan, developed documentation, and created reports using Tableau and Power BI.
  • Assisted with A/B testing , casual interfence to improve product effectiveness and identify target segments for product launches.
  • Developed predictive models using machine learning algorithms to optimize product offerings.
  • Collaborated with cross-functional teams to integrate data analytics into business strategies.
  • Mentored junior data scientists in best practices for data analysis and model development.

Data Analyst and Developer

Hub 24
02.2021 - 12.2021
  • Automated manual reporting processes using reporting services and tools such as Microsoft Report Builder and SSRS Service, streamlining data analysis and insights generation.Generated actionable business insights through data modeling techniques.
  • Designed and deployed reporting solutions using SSRS Service, providing valuable insights to assist in decision making.Supported ad-hoc data requests through SQL queries, leveraging proficiency in database management and hands-on experience with GCP and AWS.
  • Led database redesign initiatives, ensuring stable staging and implementing data warehousing technologies. Successfully managed data migration from legacy systems to new platforms and designed ETL pipelines.
  • Employed data visualization tools like Yellowfin, Tableau, and Data Studio, Salesforce to create visually appealing and informative dashboards and reports.

Software (Web) Developer

Hub 24
06.2020 - 02.2021
  • Demonstrated expertise in modern and responsive web UI design, implementing and optimizing user interfaces.
  • Leveraged deep knowledge and research results of jQuery-UI, jQuery DataTable, Bootstrap, Tabulator, Cleave.js, and other relevant technologies.
  • Conducted front-end UI performance optimization, user experience improvement testing, and security testing.
  • Collaborated with the team to design feature solutions, ensuring seamless integration and optimal functionality.
  • Developed responsive web applications using HTML, CSS, and JavaScript frameworks.
  • Led back-end data structure design, data flow design, API development, testing, and database operations.

Web Developer and IT Tutor

Navigator Union
06.2019 - 06.2020
  • Tutoring IT courses / Test performance of company webpage / Design and optimize webpage UI

Education

MASTER OF INFORMATION TECHNOLOGY - SOFTWARE ENGINEERING AND DATA ANALYTIC AND MANAGEMENT

The University of Sydney
Sydney
07-2019

Graduate Certificate - Computer Science

The University of Sydney
Sydney Australia
04.2001 - 12.2017

Skills

Research planning

Python programming

RAG

Algorithm development

Big data analytics

Modeling

Certification

Google Analytic Certificate, GOOGLE

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Languages

English
Chinese (Mandarin)
Native or Bilingual
Japanese
Full Professional
English
Native or Bilingual

Timeline

NLP Research Engineer

Moneyforward
06.2025 - Current

Research Scientist in Credit Risk

Money Forward
06.2024 - Current

Data Scientist

Commonwealth Bank of Australia
12.2021 - 10.2023

Data Analyst and Developer

Hub 24
02.2021 - 12.2021

Software (Web) Developer

Hub 24
06.2020 - 02.2021

Web Developer and IT Tutor

Navigator Union
06.2019 - 06.2020

Graduate Certificate - Computer Science

The University of Sydney
04.2001 - 12.2017

MASTER OF INFORMATION TECHNOLOGY - SOFTWARE ENGINEERING AND DATA ANALYTIC AND MANAGEMENT

The University of Sydney

Projects

Kanji Generation with Stable Diffusion (NLP Research Project, Aug 2025 - Present)
Goal: The goal o f this project was to train a stable diffusion model t o generate novel Japanese Kanji characters from English definitions, reproducing the viral experiment that demonstrated Al's ability t o "hallucinate" new cultural symbols for modern concepts like "YouTube", "Gundam", and "Elon Musk" that don't have existing Kanji representations.

  • Data engineering and dataset creation: built a comprehensive dataset o f 6,410 Kanji characters by parsing KANJIDIC2 XML files t o extract English meanings and KanjiVG SVG files for stroke data, converted vector SVG drawings to 128x128 pixel images with pure black strokes (#000000) o n white backgrounds, ensuring no stroke order numbers were rendered, created complete metadata mapping with 16,692 English meanings (average
    2.6 per Kanji) and proper text prompts for training, implemented a quality control pipeline t o validate image consistency and remove artifacts
  • Generation and results: Developed an advanced concept generation system capable of creating Kanji for modern concents like "¡Phone " "Bitcoin " "Netflix " "Tesla. " "Instagram " "COVID-19." and "artificial intelligence"

Loan Prediction (Commonwealth Bank, Feb. 2022 – Apr. 2022)

Goal: The goal of the loan prediction model was to accurately predict whether a loan would be successfully approved or not based on the selected metrics. By utilizing feature selection techniques and building both Random Forest and logistic regression models, we aimed to achieve high prediction accuracy and validate the performance of the models.

  • For feature selection, selected the most meaningful metrics according to business scope and end goal: LoanId, gender, marriage status, dependents, education, applicantIncome, coapplicantIncome, LoanAmount, LoanAmountTerm, credithistory.
  • Filled missing values using mode or mean value, and split the data into 70% training data and 30% testing data.
  • Built a Random Forest Model and fit it to the dataset, achieving an accuracy of 77.2% which is fast and simple enough to predict if a loan would be successfully approved or not.
  • Built a logistic regression model and fit it to the dataset, achieving an accuracy of 82.2% to compare the prediction accuracy and validation score.

Probability of default (PD) model development (Commonwealth Bank, Apr 2023 - Sep 2023)
Goal: developed a probability o f default (PD) model t o estimate the likelihood o f default for credit card holders based on historical data and relevant features
• Developed a probability o f default (PD) model t o estimate the likelihood o f default for credit card holders based on historical data and relevant features, such as customer demographics, credit history, and financial Indicators
• Conducted exploratory data analysis to understand the underlying patterns and relationships i n the dataset
• Preprocessed and cleaned the data by handling missing values, outlier detection, and feature engineering
• Selected appropriate machine learning algorithms, including logistic regression, decision trees, and gradient boosting, to build and train the PD model
• Evaluated the model's performance using various metrics, such as accuracy, precision, recall, and F 1 score and fine-tuned the model t o optimize its predictive power
• Incorporated model interpretability techniques, such as feature importance analysis and partial dependence plots, to gain insights into the factors driving the default probability
• Collaborated with stakeholders, including risk management teams, and business analysts, t o validate and refine the PD model's performance, and ensure its alignment with business objectives

Crystal(Tianhui) HuNLP Research Scientist/ AI Engineer