Experienced Data Engineer specializing in designing, implementing, and managing complex data architectures for over 15 years. Proficient in cloud-based and on-premises solutions, particularly in the AWS ecosystem. Exceptional in building and optimizing data pipelines to transform raw data into insightful business intelligence. Detail-oriented designs, develops and maintains highly scalable, secure and reliable data structures. Accustomed to working closely with system architects, software architects and design analysts to understand business or industry requirements to develop comprehensive data models. Proficient at developing database architectural strategies at the modeling, design and implementation stages.
Data Engineering and Pipeline Design:
Data Processing and Integration: Leveraged Apache Spark and Databricks for efficient data ingestion and processing, ensuring robust data handling.
Data Modeling: Applied DBT and Data Vault methodologies to design and implement complex data models, structuring data in multi-layered architecture (Bronze, Silver, Gold).
ETL Process Development and Maintenance: Developed and maintained scalable ETL pipelines, integrating various data sources and ensuring smooth data flow.
CI/CD Deployment: Managed end-to-end CI/CD deployment processes using Jenkins, Terraform, and Bitbucket, streamlining integration and delivery cycles.
Testing and Quality Assurance: Created and implemented comprehensive test scenarios to verify data integrity and reliability across all stages.
Source and Destination Management: Integrated data from diverse systems and distributed it to various platforms and third-party services, optimizing data utilization for business growth and customer experience enhancement.
Collaboration and Workflow Automation:
Pipeline Automation: Partnered with data scientists and engineers to automate data workflows, enhancing efficiency and accuracy in data processing.
Continuous Improvement: Conducted performance tuning and optimization of data jobs, reducing latency and improving resource utilization.
Internal Training and Mentorship: Provided guidance and training to junior engineers on best practices in data engineering, focusing on the strategic use of Databricks, DBT, and other technologies.
Focus: Apache Spark, Databricks, DBT, Data Vault, Jenkins, Terraform, Bitbucket, CI/CD, Data Integration, Data Modeling, ETL
Designed and implemented data pipelines using Apache Spark and Databricks, ensuring efficient data ingestion and processing.
Modeled data using DBT and Data Vault methodologies, structuring data in multiple layers (Bronze, Silver, Gold) for optimal reporting and analysis.
Developed and maintained ETL processes to integrate and manage data from various sources, ensuring smooth data flow and storage.
Managed CI/CD deployment processes with Jenkins, Terraform, and Bitbucket, streamlining development and deployment cycles.
Created and executed comprehensive test scenarios to ensure data integrity and reliability across all processing stages.
Integrated data from diverse systems and distributed it to various platforms and third-party services, enhancing data utilization for business growth and improved customer experience.
Automated data workflows in collaboration with data scientists, improving efficiency and accuracy in data processing.
Conducted performance tuning on data jobs to reduce latency and improve resource utilization.
Mentored and trained junior engineers on best practices in data engineering, focusing on technologies such as Databricks, DBT, and CI/CD tools.
Technologies: Apache Spark, Databricks, DBT, Data Vault, Jenkins, Terraform, Bitbucket, CI/CD, Data Integration, Data Modeling, ETL