Summary
Overview
Work History
Education
Skills
Residency
Websites
Timeline
Generic

HEMA SAINI

Sydney,Australia

Summary

Senior Data Engineer with over 10 years of experience in delivering comprehensive data solutions in finance, insurance, and industrial sectors. Proficient in Azure Databricks, CI/CD automation, and PySpark, with a proven ability to modernize legacy systems and lead cloud migration projects. Demonstrated expertise in managing full project lifecycles and mentoring junior engineers while ensuring high standards of engineering excellence. Skilled in developing scalable real-time and batch data pipelines, collaborating with data scientists, and aligning solutions with business and regulatory objectives for major clients such as Prudential, DBS, and OCBC.

Overview

12
12
years of professional experience

Work History

Senior Data Engineer

GrainCorp Pty Ltd
Sydney, Australia
01.2023 - Current
  • Leading end-to-end delivery of multiple data engineering projects—covering both real-time (Autoloader-based) and batch processing—from requirement gathering and design to development, deployment, and ongoing maintenance.
  • Built scalable data pipelines using Azure Databricks, Autoloader, PySpark, SQL, and Delta Lake to transform and serve telemetry and operational data.
  • Configured and managed Azure IoT infrastructure including IoT Hub and Azure Functions for ingesting and routing industrial sensor data.
  • Implemented ingestion and transformation logic using Databricks Autoloader with Delta Lake merge strategies, enabling schema evolution, late-arriving data support, and high-volume processing.
  • Designed and implemented CI/CD pipelines, including YAML-based deployment workflows in Azure DevOps to automate build, test, and release processes for both data pipelines and machine learning models developed by the data science team.
  • Led and supported two junior data engineers, providing technical guidance, overseeing task delivery, and fostering their professional development through regular feedback and mentorship.

Senior Data Engineer

Avensys Consulting (Prudential – Financial Services)
Singapore
04.2021 - 10.2022
  • Led data engineering initiatives to modernize Prudential’s finance data stack by migrating ETL jobs and critical data assets—including Unix shell scripts, Informatica workflows, and T-SQL procedures—from Unix-based on-premises systems to Azure Data Factory and Databricks.
  • Designed and implemented scalable pipelines using ADF, PySpark, and Azure Databricks, incorporating Spark SQL to transform and deliver investment policy, claims, and customer transaction data.
  • Utilized DevOps practices through Azure DevOps and GitHub to manage CI/CD pipelines, automate job deployments, and ensure version control.
  • Created reusable PySpark modules for data quality validation, schema checks, and exception handling, improving operational resilience across finance pipelines.
  • Partnered with finance and actuarial teams to ensure cloud pipelines met regulatory standards and financial reporting needs (e.g., MAS compliance).
  • Mentored junior engineers and promoted engineering best practices, improving delivery speed and code maintainability.

Senior Data Engineer

Comtel Pte. Ltd. (Contractor for DBS & OCBC)
Singapore
07.2017 - 07.2019

Client Project: OCBC Bank – Customer Data Platform & Compliance Pipelines

  • Designed and developed PySpark pipelines to ingest and transform customer transaction and demographic data into Hive, aligning with Teradata FSLDM-based data models.
  • Utilized Spark SQL, HiveQL, and PostgreSQL to process large-scale JSON and structured files for AML, KYC, and regulatory reporting.
  • Validated and enriched data using Python, Pandas, NumPy, and PostgreSQL, especially for Excel-based customer onboarding data.
  • Migrated and re-engineered legacy Hive/SQL scripts into PySpark, improving scalability, modularity, and maintainability.
  • Collaborated with business users on SIT, UAT, and production rollouts, documenting transformation logic aligned with FSLDM subject areas.

Client Project: DBS Bank – Data Warehouse Migration & Regulatory Reporting

  • Migrated batch jobs from AIX to Hadoop (Linux), integrating legacy financial data into HDFS and Hive, maintaining FSLDM consistency across financial and risk domains.
  • Built PySpark and SQL-based Python modules for data cleansing and historical tracking, enabling efficient daily file ingestion and validation.
  • Optimized Teradata SQL queries for performance and restructured job scheduling using Control-M to ensure alignment with SLAs and dependency chains.
  • Worked with data architects to map Teradata entities to Hive structures, preserving reporting lineage and ensuring compliance.
  • Developed data workflows for a contextual marketing initiative, enabling targeted offers based on customer transaction and spend behavior.

Software Engineer

Infosys Ltd (Client: Horizon – Insurance Domain)
India
10.2013 - 04.2017
  • Designed, developed, and enhanced ETL processes using Informatica PowerCenter, leveraging SQL and PL/SQL to support transformation logic for structured and unstructured insurance data.
  • Developed integration workflows for Claims Processing Engine and HIPAA-compliant formats (834, 275, 277) using Informatica B2B Data Transformation.
  • Built ingestion and transformation workflows using Unix shell scripting, SQL*Loader, and SQL, handling flat files and Excel-based source systems.
  • Created and validated reports using Excel for insurance web services; managed debugging through session logs and error tracking.
  • Maintained comprehensive documentation including ETL specs, unit tests, and migration checklists to support integration for Medicaid and Medicare compliance.

Education

Master of Technology - Data Science and Engineering

BITS Pilani
01.2024

Bachelor of Technology - Information Technology

Himachal Pradesh University
01.2012

Skills

  • Python
  • Data pipeline development
  • SQL
  • T-SQL
  • PL/SQL
  • Unix Shell Scripting
  • YAML
  • Apache Spark
  • PySpark
  • Databricks
  • Hadoop
  • HDFS
  • Hive
  • Delta Lake
  • Databricks Autoloader
  • Azure Data Factory
  • Azure Databricks
  • Azure Functions
  • Azure DevOps
  • Git
  • CI/CD
  • Control-M
  • SQL Server
  • Teradata
  • PostgreSQL
  • Informatica PowerCenter
  • Informatica B2B Data Transformation
  • Loader
  • Power BI
  • Excel
  • Model deployment pipelines
  • Working with data scientists
  • Version control
  • Stakeholder communication
  • CI/CD automation

Residency

Australian Permanent Resident

Timeline

Senior Data Engineer

GrainCorp Pty Ltd
01.2023 - Current

Senior Data Engineer

Avensys Consulting (Prudential – Financial Services)
04.2021 - 10.2022

Senior Data Engineer

Comtel Pte. Ltd. (Contractor for DBS & OCBC)
07.2017 - 07.2019

Software Engineer

Infosys Ltd (Client: Horizon – Insurance Domain)
10.2013 - 04.2017

Master of Technology - Data Science and Engineering

BITS Pilani

Bachelor of Technology - Information Technology

Himachal Pradesh University
HEMA SAINI