Summary
Overview
Work History
Education
Skills
Timeline
Generic

Harsha D

Summary

  • I have 6years’ experience as a Data Engineer comprising design, development, and implementation of data models for enterprise-level applications.
  • Create and execute data storage strategies utilizing Azure services, including Azure Data Lake Storage, Azure SQL Database.
  • Gained experience on creating and managing data pipelines, using Azure Data Factory and Azure Databricks.
  • Using Azure HDInsight and Azure Stream Analytics, create and manage data processing jobs.
  • Experience migrating SQL databases to Azure Data Lake, Azure Data Lake Analytics, Azure SQL Database, Data Bricks, and Azure SQL Data warehouse.
  • Have a strong experience in using Azure BLOB and Data Lake Storage as well as loading data into Azure SQL SynapseAnalytics.
  • Have strong data modelling skills for utilizing SQL and PL/SQL query-based methods to determine results. Teams from engineers, data scientists, and stakeholders are our downstream customers.
  • Have experience in big data tools, including Avro, CSV, and Parquet file formats, and Apache Kafka, Apache Spark, Airflow, Hive, Sqoop, and Delta Lake.
  • Has experience on migrating SQL databases with Microsoft Visual Studio's Managed Instance, Azure Data Factory, and SSIS.
  • Experience with Pyspark and Spark-Scala apps for interactive analysis, batch processing, and stream processing; proficiency with Spark Core, Spark SQL, and Spark Streaming; and knowledge of Spark's architecture and components.
  • Experienced in developing and carrying out ETL (Extract, Transform, Load) processes with Informatica Power Center,guaranteeing precise and prompt data transfer.
  • Expertise in Data Warehouse and OLTP/OLAP implementation, including project scope, analysis, requirement gathering, data modelling, effort estimation, ETL/ELT design and development, system testing, implementation, and production support.
  • Solid work experience on Big Data technologies with hands on experience in installing, configuring, and using ecosystem components like HDFS, HBase, Hive, Snowflake, Kafka, and Spark.
  • Strong experience in using Pyspark, SQL, Python, and advanced SQL for automating simple to overly complicated ETLprocesses.
  • Familiarity with Power BI and Tableau for creating interactive visualizations and reports based on processed data.
  • Experience with data pipeline building, backend Microservices development, and REST API using Python.
  • An in-depth comprehension and familiarity with NoSQL databases including MongoDB, PostgreSQL, and Cassandra.
  • Experienced in writing PL/SQL, SQL, and Bash scripts. Strong understanding of relational databases, such as MySQL, Oracle, and SQL Server.
  • Highly knowledgeable in using tech stack components like Gitlab, and Jenkins to develop CI/CD pipelines. a background in creating large-scale Python REST APIs for applications.
  • Having experience in utilizing issue tracking systems such as JIRA and Control Version Systems like Git and SVN.
  • Knowledge of Agile approaches, such as Scrum. familiarity in setting up and utilizing Splunk applications on Linux and UNIX systems.

Overview

9
9
years of professional experience

Work History

Azure Data Engineer

EPAM
08.2021 - Current

Client: Bank of America

Description: The objective of the project was takedown of various modules like Daily Transactions, Fixed Deposits and various loan details used to update the interest rates and Balances and generating quarterly fixed deposits upgrading details. All sections have a feature of adding, updating, and querying the databases.

Responsibilities:

  • Experience in designing and implementing Azure data services, including Power BI, Synapse, Azure Databricks, Azure Logging and Monitoring, Azure Blob Storage, Azure Data Lake, Azure Data Factory, Azure Analysis Services, and Azure Analysis Services.
  • Designed re-useable Azure Data Factory based data pipeline infrastructure that transforms provisioned data to be available for consumption by Azure SQL Data warehouse and Azure SQL DB.
  • Developed ETL workflows that integrate with PostgreSQL instances using Azure Data Factory or Azure Databricks.
  • Worked with both business stakeholders and data analysts to comprehend their needs and put appropriate data models in place.
  • Worked with core aspects of importing, manipulating data, data modelling, table relationships and DAX.
  • Used of Hadoop framework involving Hadoop Distributed File system and its components like Pig, Hive, Sqoop, Pyspark.
  • Used Pyspark and ADF with Databricks, create pipelines, data flows, and intricate data transformations and manipulations.
  • Designed custom-built input adapters using Spark, Hive, and Sqoop to ingest and analyze data (Snowflake, MS SQL) into HDFS.
  • Used the most popular streaming tool Kafka to load the data on the Hadoop File system and move the same data to the Cassandra NoSQL database.
  • Managed project workflows using JIRA and maintained version control using Git, leading to improved project management and collaboration
  • Obtaining data in multiple compression formats from merchant and vendor sources, RDMS (SQL Server), and REST APIs; regularly processed, stored, and handled massive volumes of data.
  • Experienced working with both modern Azure technologies for ETL and legacy SSIS and SSRS products and procedures.
  • Developed and constructed Extract Transformation Loading (ETL) pipelines using Spark and PySpark to move customer, account, and credit card data into an enterprise Hadoop data lake.
  • Created a data pipeline that loads data into data connections using Hive and Python.Analyze and map data for several data sources.
  • Worked on continuous integration and continuous deployment of (CI/CD) solutions, using Git, Jenkins, and Docker to setup and configure big data architecture on the AWS cloud platform.
  • Integrated tables into our Data Lake from Kafka using batch, real-time, or micro-batching approaches. Spark tasks are used to ingest, filter, and extract data.
  • Worked with Agile Methodology in multidisciplinary teams. used the Waterfall Methodology to manage teams and projects.
  • Worked on incidents and the Scrum Task board using JIRA as the Scrum Tool.

Data Engineer

EY
06.2019 - 08.2021

Client: Deutsche Bank

Description: The bank main activities are loan origination and deposit generation. In this bank, duties included creating, implementing, and maintaining all databases, including complex queries, triggers, and stored procedures. Also helped with the administration of several bank databases in both development and production environments. It offers the entire spectrum of personal and business banking services and products.

Responsibilities:

  • Developed a data pipeline and used Azure stack components such as AzureDataFactory, AzureDataLake,AzureDataBricks, AzureSynapseanalytics, and AzureKeyVault for analytics.
  • Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
  • Utilized the Power BI reporting tool to showcase novel data visualization techniques and sophisticated reporting approaches to the team.
  • Managed the creation and development of data schemas, ETL pipelines using Python and MySQL stored procedures, and Jenkins automation
  • Developed and enhanced Snowflake tables, views, and schemas to enable effective data retrieval and storage for reporting and analytics requirements.
  • Implemented massive ETL with customer data on AZURE with data factory to leading high performance and optimized solution.
  • Used Cloudera, HDFS, MapReduce, Hive, HiveUDF, Pig, Sqoop, and Spark to analyse large and important datasets.
  • Created Spark applications for data extraction, transformation, and aggregation from various file formats using Python, Pyspark, and Spark-SQL. These applications were then analysed and transformed to reveal insights into client usage patterns.
  • Deployed windows Kubernetes cluster with Azure Container service (ACS) from Azure CLI and utilized Kubernetes and Docker for runtime environment of the CI/CD system to build test and deploy.
  • I worked with NoSQL databases including HBase, processed data using Hadoop Tools, imported data from MySQL, and exported data to the NoSQL database Cassandra.
  • Developed be spoke interactive reports, workbooks, and dashboards using Tableau to build.
  • Created Python scripts to contact Cassandra RESTAPI, carried out several modifications, and moved the data into Spark.
  • Worked on Spark/Scalaand Python regular expression (regex) projects in the Hadoop/Hive environment using Linux and Windows as big data resources.

Data Analyst

Global Logic Technologies
12.2015 - 02.2017

Client: HSBC

Description: With unified financial control, consumers can take use of an integrated online banking suite. For easy account updates, thorough transaction insights, dynamic spending report charts, and effective budget tracking tools, this safe online application provides a convenient home dashboard.

Responsibilities:

  • Designed, developed, modified and enhanced the database structures and database objects. Data warehouse and data mart designs efficiently support BI and end user requirements
  • Experience on using Report Designer and Report Builder in Microsoft SQL Server 2012/2008/2005 Reporting Services (SSRS), good report design skills.
  • Published Power BI Reports in the required originations and made Power BI Dashboards available in web clients and mobile apps.
  • Developed an Azure data factory pipeline to store data in Azure PostersSQL service.
  • Extensive experience with Snow-Flake Modeling, FACT & Dimension Tables, Physical and Logical Data Modeling, Star Schema Modeling, and Data Warehouse.
  • Implemented several DAX functions for various fact calculations for efficient data visualization in Power BI.
  • Worked with data migration and ETL (Extract, Transform, Load) processes to transfer data between databases or systems
  • Developed impactful reports using SSRS, MS Excel, Pivot tables and Tableau to solve the business requirements.
  • Explored data in a variety of ways and across multiple visualizations using Power BI.
  • Worked on the principles of Data warehousing, Fact Tables, Dimension Tables, Dimensional modelling. Developed complex T-SQL queries, stored procedures, triggers, and views to extract, manipulate, and transform data for reporting and analysis.
  • Used the GIT version control system for repository access and coordination with CI tools.
  • Used Python libraries and SQL queries/subqueries to create several datasets which produced statistics, tables, figures, charts, and graphs.
  • Familiarity with Unix/Linux operating systems, including file systems, process management, and system utilities.

Education

Master of Science - Information Systems

Deakin University
Melbourne, VIC
06-2019

Skills

  • ETL Tools: Azure Data Factory
  • Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, Apache Spark, Kafka, Azure
  • NO SQL Database: Cassandra, Mongo DB
  • Programming & Scripting: Python, Scala, SQL, PL/SQL
  • Databases: Azure SQL Database, Oracle, MY SQL, PostgreSQL
  • Version Control: GIT, CI/CD
  • Cloud Computing: Microsoft Azure
  • OP: Linux, Unix
  • Hadoop, HDFS, Hive, Apache Spark, Apache Sqoop, Apache Kafka, Apache storm
  • Python, Unix shell scripting
  • Development Methodologies: SDLC, Agile, Waterfall

Timeline

Azure Data Engineer

EPAM
08.2021 - Current

Data Engineer

EY
06.2019 - 08.2021

Data Analyst

Global Logic Technologies
12.2015 - 02.2017

Master of Science - Information Systems

Deakin University
Harsha D