As a Senior Software Engineer at Tech Mahindra, I specialize in building and optimizing enterprise data lake solutions to ingest and analyze large volumes of data from various sources. I utilize my expertise in Azure Data Factory, Azure Databricks, Delta Lake, SQL, and other cloud technologies to ensure governance, cost management, security, storage, DevOps, and consumption design patterns. Collaborating with a cross-functional team of data scientists, analysts, and engineers, I deliver high-quality data products and insights to our clients.
I have a robust background in data science and engineering, holding a master's degree in data science from Monash University and extensive experience in data warehouse development, ETL processes, Power BI dashboards, and machine learning models. My keen interest in applying advanced analytics and AI solutions to solve real-world problems drives me to continuously seek out data science and data engineering challenges to learn new skills and techniques. I am a diligent, collaborative, and creative professional committed to excellence and innovation.
As a Senior Software Engineer, I specialize in the design, development, and maintenance of data architecture, solutions, pipelines, and systems that support the organization's data-driven lake house/warehouse. My responsibilities include troubleshooting data issues, resolving batch job failures, and ensuring the smooth operation of framework scripts and pipelines. I actively perform data migration tasks, facilitating seamless data transfers and transformations between various data sources and destinations. I excel in data engineering tasks in Databricks, harnessing its capabilities for advanced data processing, transformation, and analysis, thus contributing to the organization's data management and analytics initiatives.
My daily tasks encompass system and server monitoring, service management, and stakeholder engagement to ensure the availability, accuracy, and accessibility of data for various business needs. I collaborate closely with data architects, solution designers, developers, testers, and business stakeholders to facilitate a seamless flow of data within the organization. I regularly review solution designs, engage in development and testing activities, provide environment support, and meticulously document technical aspects. My commitment lies in optimizing data processes and enhancing system performance while effectively managing costs.
My extensive skill set includes strong proficiency in Python, PySpark, Azure Data Factory framework, and Azure SQL, along with expertise in shell scripting, change/incident/problem management, and a solid understanding of continuous integration and deployment pipelines, including branching strategies, and Git repositories.
Additionally, I possess valuable knowledge in telecom domain concepts, Scala/Spark, Azure DevOps, and a variety of tools such as Power BI, Tableau, Alteryx, Apache Airflow, and Confluence. These enhance my capacity to drive data engineering excellence within the organization.
Tools used: Python, PySpark, SQL, Databricks, Azure stack
Roles and Responsibilities:
Tools used : Azure Blob storages, Azure DB services, Python, PySpark, SQL, Github, Jfrog, Octopus
Tools used: Python, PySpark, SQL, Databricks, Azure stack
Full Time - Contract (21-Oct-2019 to 14-May-2021)
1. Data Outputs Team
Data pipeline investigation for our migration project
Data Warehouse Migration Process
Automation of parts of Data Warehouse pipeline
National Reporting and pipeline management
.
2. Data Science Team
Radiology AI Project
.
3. Research Outputs Team
Research Data Extraction
Tools Used - Python, SSIS,SSAS, PowerBI, Tableau, Crontab, Unix, SAS, SQL Server, Azure, Jupyter Notebook , Tensorflow, Keras, PyTorch, Seaborn, NLTK, pandas, numpy
Project Management - Agile
Other skills - Presentation, hosting training sessions & Public Speaking
.
Full Time - Permanent (12-May-2015 to 23-Mar-2017)
Project : Migration to outlook
Client : One of the Big 4 financing firm
The purpose of this project was to help our client get rid of an old technology. For this purpose, our team created two software. One was to automate the deactivation, decommission and archive process, and the second software was to create metadata about the databases that were being archived.
My roles were to help develop a few features of this software and also to perform testing. My weekly responsibility in this project were to analyze global databases and to come up with recommendations about which database to deactivate, decommission and restore on a weekly basis. Furthermore, I had to communicate and generate reports that provide insights into the project to different teams from different parts of the world.
.
Furthermore, I was responsible to handle escalations and to maintain SLA, and provide our clients with reports showing how our progress was aligned with SLA. During this project, my major tasks were data wrangling (Cleaning and quality improvement), data analysis,database management, software development, testing, reporting and archiving of data. To sum up, once we got rid of the unwanted data, we further analyzed databases which are important and that needed to be migrated to new technology for saving costs and improving efficiency.
.
Tools Used - SQL, Excel, Python, SharePoint and lotus notes.
Data Lake Architecture / Data Warehouse Architecture
SQL Server (SSIS, SSRS, SSAS)
Azure (Blob storage, DB Service, Azure Data Factory, Synapse)
Machine Learning Algorithm
Linux / Unix
Azure Devops ( CI / CD pipeline) / GitHub
Tableau / Power BI
Data Mining / Data Analysis / Statistical analysis
Agile framework / Scrum
Optimization Techniques / AutoLoader / Kafka Integration / Delta live tables / Cluster Management /
Python
Introduction to Data Engineering using Pyspark
Available on request
Introduction to Data Engineering using Pyspark
Biomedical Image Analysis
Image Processing in Python
Python for Data Science
SQL Server DBA
Machine Learning Algorithm A-Z
Hands on Hadoop
Industry Experience
R Programming
Python Programming
.NET Certification