Summary
Overview
Work History
Education
Skills
Patents
Publications
Timeline
Generic

Hongyang Yu

Data Scientist
Sydney

Summary

Data scientist with 5+ years of experience delivering state-of-art ML algorithms to distributed production environment at large scale.

Overview

9
9
years of professional experience
6
6
years of post-secondary education

Work History

Lead Data Scientist

Coupa
10.2018 - Current

Data scientist leading development of multiple high impact features across the entire organization. Highlighted projects are listed below.

  • Contract document intelligence system. Led the design and implementation of a GenAI-based contract document understanding feature. Leveraged GraghRAG, few shot learning and chain-of-thoughts prompting.
  • Automatic account and tax code recommendation system. Led the design and implementation of an innovative recommender system that automatically recommends account & tax code for item purchased based on past customer purchase history. A US patent pending US 2024/0177244 A1
  • Automatic invoice data extraction system. Led the design, implementation and world-wide rollout of a state-of-the-art deep learning based invoice data extraction system. The system is processing 1million invoices monthly with more than 99% accuracy. Brought in around 15M revenue per year. Two us patents granted (US Patent 11,914,567, US Patent 11,450,126)

Consultant

Anticaner BioScience
01.2021 - Current

Working as the consulting chief scientist of the newly found AI drug design department of Anticancer BioScience, a leading synthetical lethal drug design company for treating cancer. During my tenure, I have developed two world leading structure-based drug generative models and filed two US and international patents. Publication and patents listed below

Publications

  • Yu, H. K., & Yu, H. C. (2022). Powerful molecule generation with simple ConvNet. Bioinformatics, 38(13), 3438-3443.
  • Yu, Hongyang, and Hongjiang Yu. "TensorVAE: a simple and efficient generative model for conditional molecular conformation generation." Transactions on Machine Learning Research.

Patents

  • Methods and models for direct molecular conformation generation. US Patent App. 18/386,435
  • Graph based machine learning for generating valid small molecule compounds. US Patent US20230197209A1

Research Fellow

Victoria University Of Wellington
04.2018 - 08.2018
  • I have conducted original research in Bayesian statistics
  • I have taught a master level course on Computational Statistics

Joint Research Fellow

Queensland University Of Technology & Texas A&M
02.2016 - 04.2018

Responsibilities:

  • Successfully supervised two PhD candidates through to their graduation.
  • Served as the primary investigator for an Australian Research Council linkage project (ARCLP150100545), focused on developing stochastic models for assessing maintenance cost prediction and optimizing resource allocation.
  • Designed and taught two graduate-level research courses.

Machine Learning Engineer

CSIRO
02.2016 - 05.2016

Responsibilities:

  • Provided data science support to Tasmanian Government initiated Sense-T’s Adaptive Water Resources Management research project
  • Cleaned and analysed water quality data for the south Esk and Ringarooma catchments
  • Developed a machine learning tool with a friendly graphic user interface for accurately predicting water temperature variation
  • Presented the model to local farmers in a workshop and was well-received

Education

Ph.D. - Automation Science

University of Tasmania
Tasmania
02.2013 - 05.2016

Bachelor of Engineering - Mechanical Engineering

University of Tasmania
Tasmania
02.2010 - 12.2012

Skills

    Python (pandas, numpy, scipy, flask)

    Tensorflow, PyTorch

    Linux and Git for source control

    Docker Container, K8s, AWS Sage Maker, ML DevOps CI/CD

Patents

  • TEXT-BASED MACHINE LEARNING EXTRACTION OF TABLE DATA FROM A READ-ONLY DOCUMENT - US Patent 11,914,567
  • SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING CANONICAL DATA FROM ELECTRONIC DOCUMENTS - US Patent 11,450,126
  • SYSTEMS AND METHODS FOR AUTOMATICALLY RECOMMENDING ACCOUNT CODES - US Patent App. 18/070930
  • LOW RANK ADAPTATION IN MULTICLASS DEEP LEARNING CLASSIFIER - Filed by Baker Botts on 23 Dec 2023
  • GRAPH MACHINE LEARNING FOR GENERATING SMALL VALID COMPOUNDS - US Patent App. 63/291552
  • METHODS AND MODELS FOR DIRECT MOLECULAR CONFORMATION GENERATION - US Patent App. 18/386,435


Publications

I have published over 30 research articles, which have garnered more than 900 citations. My work has achieved an h-index of 16 and an i10-index of 20.


  • Yu, H., Khan, F., Garaniya, V., & Ahmad, A. (2014). Self-organizing map based fault diagnosis technique for non-Gaussian processes. Industrial & Engineering Chemistry Research, 53(21), 8831-8843. American Chemical Society.
  • Yu, H., Khan, F., & Garaniya, V. (2015). Modified independent component analysis and Bayesian network-based two-stage fault diagnosis of process operations. Industrial & Engineering Chemistry Research, 54(10), 2724-2742. American Chemical Society.
  • Yu, H., Khan, F., & Garaniya, V. (2015). Risk-based fault detection using Self-Organizing map. Reliability Engineering & System Safety, 139, 82-96. Elsevier.
  • Yu, H., Khan, F., & Garaniya, V. (2015). A probabilistic multivariate method for fault diagnosis of industrial processes. Chemical Engineering Research and Design, 104, 306-318. Elsevier.
  • Yu, H., Khan, F., & Garaniya, V. (2015). Nonlinear Gaussian Belief Network based fault diagnosis for industrial processes. Journal of Process Control, 35, 178-200. Elsevier.
  • Yerbury, A., Coote, A., Garaniya, V., & Yu, H. (2015). Design of a solar Stirling engine for marine and offshore applications. International Journal of Renewable Energy Technology, 7(1), 1-45. Inderscience.
  • Yu, H., Khan, F., & Garaniya, V. (2016). A sparse PCA for nonlinear fault diagnosis and robust feature discovery of industrial processes. AIChE Journal.
  • Yu, H., Khan, F., & Garaniya, V. (2015). An Alternative Formulation of PCA for Process Monitoring using Distance Correlation. Industrial & Engineering Chemistry Research. American Chemical Society.
  • Yu, H., Khan, F., & Garaniya, V. (2016). Risk-based process system monitoring using self-organizing map integrated with loss functions. The Canadian Journal of Chemical Engineering. Wiley.
  • Islam, R., Yu, H., Abbassi, R., Garaniya, V., & Khan, F. (2016). Development of a monograph for human error likelihood assessment in marine operations. Safety Science, 91, 33-39. Elsevier.
  • Khakzad, N., Yu, H., Paltrinieri, N., & Khan, F. (2016). Reactive Approaches of Probability Update Based on Bayesian Methods. In Dynamic Risk Analysis in the Chemical and Petroleum Industry (pp. 51-61). Elsevier.
  • Yu, H. (2016). Dynamic risk assessment of complex process operations based on a novel synthesis of soft-sensing and loss function. Process Safety and Environmental Protection.
  • Yu, H. (2017). A Novel Semiparametric Hidden Markov Model for Process Failure Mode Identification. IEEE transactions on automation science and engineering.
  • Yu, H., Khan, F., & Veitch, B. (2017). A Flexible Hierarchical Bayesian Modeling Technique for Risk Analysis of Major Accidents. Risk Analysis.
  • Yu, H., & Khan, F. (2017). Improved latent variable models for nonlinear and dynamic process monitoring. Chemical Engineering Science.
  • Yu, H., Borghesani, P., Cholette, M., Kent, G., Ma, L., & Burke, B. (2017). Development of condition-based maintenance for sugar mill assets. In Australian Society of Sugar Cane Technologies Conference, 39 (pp. 568-576). Australian Society of Sugar Cane Technologies.
  • Yu, H., Garaniya, V., Pennings, P., & Vogt, J. (2017). Numerical analysis of cavitation about marine propellers using a compressible multiphase VOF fractional step method. In 9TH Australasian Congress on Applied Mechanics (ACAM9).
  • Hongyang, Y. (2016). Development of advanced fault diagnosis techniques for complex industrial processes (Doctoral dissertation, University Of Tasmania).
  • Yu, H., Goldsworthy, L., Brandner, P. A., Li, J., & Garaniya, V. (2018). Modelling thermal effects in cavitating high-pressure diesel sprays using an improved compressible multiphase approach. Fuel, 222, 125-145. Elsevier.
  • Islam, R., & Yu, H. Human Factors in Marine and Offshore Systems. In Methods in Chemical Process Safety, 1. Elsevier.
  • Rebello, S., Yu, H., & Ma, L. (2018). An integrated approach for system functional reliability assessment using Dynamic Bayesian Network and Hidden Markov Model. Reliability Engineering & System Safety, 180, 124-135. Elsevier.
  • Cholette, M. E., Yu, H., Borghesani, P., Ma, L., & Kent, G. (2019). Degradation modeling and condition-based maintenance of boiler heat exchangers using gamma processes. Reliability Engineering & System Safety, 183, 184-196. Elsevier.
  • Fletcher, T., Garaniya, V., Chai, S., Abbassi, R., Brown, R. J., Yu, H., Van, T. C., & Khan, F. (2018). An application of machine learning to shipping emission inventory. International Journal of Maritime Engineering, 160(A4).
  • Rebello, S., Yu, H., & Ma, L. (2019). An integrated approach for real-time hazard mitigation in complex industrial processes. Reliability Engineering & System Safety, 188, 297-309. Elsevier.
  • Cholette, M., Kent, G., Yu, H., Wang, N., Borghesani, P., & Ma, L. (2019). A systematic approach to prioritising capital replacements. In Proceedings of the 41st Australian Society of Sugar Cane Technologists Conference (pp. 29-39). Australian Society of Sugar Cane Technologists.
  • Yu, H., & Yu, H. (2022). Powerful Molecule Generation with Simple ConvNet. Bioinformatics.
  • Yu, H. (2022, September 20). Systems and Methods for Automatically Extracting Canonical Data From Electronic Documents (US Patent No. 11,450,126).
  • Yu, H., Borhanazad, H., & Mandlecha, S. (2024, February 27). Text-based machine learning extraction of table data from a read-only document (US Patent No. 11,914,567).
  • Yu, H., & Yu, H. (2024). TensorVAE: a simple and efficient generative model for conditional molecular conformation generation. Transactions on Machine Learning Research.
  • Yu, H., Borghesani, P., Cholette, M. E., Kent, G. A., Ma, L., Burke, B. J., et al. (2019). Development of condition-based maintenance for sugar mill assets. World Sugar Yearbook, 2019, 38-44. Informa UK Ltd.

Timeline

Consultant

Anticaner BioScience
01.2021 - Current

Lead Data Scientist

Coupa
10.2018 - Current

Research Fellow

Victoria University Of Wellington
04.2018 - 08.2018

Joint Research Fellow

Queensland University Of Technology & Texas A&M
02.2016 - 04.2018

Machine Learning Engineer

CSIRO
02.2016 - 05.2016

Ph.D. - Automation Science

University of Tasmania
02.2013 - 05.2016

Bachelor of Engineering - Mechanical Engineering

University of Tasmania
02.2010 - 12.2012
Hongyang YuData Scientist