Accomplished Software Engineer with a proven track record at Clairvoyant, enhancing data processing efficiency by 40% through expert application of Big Data technologies and cloud services. Demonstrates strong analytical skills and a collaborative approach, significantly improving project outcomes and operational excellence.
Project: Cloud-based ETL transformation and optimization in AWS for real-time data processing and analytics
Technology Stack: PySpark, AWS Glue, AWS Lambda, Amazon s3, DynamoDB, AWS Step functions
Team Size: 9
Project Description:
· Utilized AWS Glue for ETL operations, Amazon s3 for storage, Amazon Athena for analytics, and AWS step functions to automate processes.
· Integrated AWS Lambda for serverless computing, optimizing resource utilization and cost efficiency by 20% and reducing human efforts by 50%.
· Successfully deployed a real-time data processing pipeline on AWS that handled streaming data from 3 different sources resulting in a 40% improvement in data accuracy and completeness.
· Automated 6 processes utilizing AWS step functions, yielding a 15% decrease in manual errors and 3 hours saved per week.
Role:
· Directed the execution of ETL development utilizing AWS Glue, refining data extraction, transformation, and loading operations; enhanced data quality and accuracy, leading to a 30% decrease in data processing errors and a 20% improvement in data reliability.
· Tested the ETL pipeline using Jira test automation and documented the results of each stage, reducing unit and manual testing efforts by 50%.
· Improved throughput and efficiency by 30% through performance tuning and optimization of the data processing pipeline.
· Tested the API source data and API connections using Postman decreasing the data redundancy by 35%.
Project: Data transformation and storage
Technology Stack: Python, Apache Spark, Apache Hive, Apache Hadoop (HDFS), Airflow
Team size: 7
Project Description:
· The data source for the application is Oracle, building ETL pipelines to extract and perform transformations as per the business requirements using spark transformation functions, cleaning the data, and loading data into the target system Apache Hive and then processing by the downstream applications.
· Apache Airflow to orchestrate the pipeline, enabling the timely processing and transformation of large datasets, ensuring that the most up-to-date information is available for downstream analytics and reporting.
· Migrated on-premise big data architecture to the Google Cloud platform, Big query, dataflow, pub/sub resulting in a cost reduction of 40%.
Role:
· Collaborated with teams to define data extraction requirements and transformations, ensuring alignment with project goals.
· Resolved 70% of Apache Airflow ETL job issues within 30 minutes through quick code fixes or job restarts, ensuring minimum disruption to data processing.
· Created unit test cases using TestNG, and mock data in CSV files to test ETL pipelines.
· Developed RDDs, built-in functions, to process data instead of UDF to decrease the processing time by 25%.
· Performed code analysis and refactored code by improving the data processing efficiency by 40%.
Project: Asset Management System Integration
Technology Stack: Java, c++, web services, SQL, Shell script
Team size: 8
Project Description:
· Designed and maintained NC3, a TCP/IP daemon that interfaces with external systems through XML requests, enabling clients to reserve assets with ease for future use.
· Strong error-handling procedures are in place to handle user-related problems such as address mismatches and missing ordered assets, guaranteeing data integrity and client satisfaction.
· Created to act as a central database hub, maximizing system performance and usability by allowing requesting apps to store and retrieve data.
Role:
· Executed efficient MYSQL database schemas and data redundancy by 15%.
· Oversaw writing test cases and manual testing of application UI decreased bugs while unit testing by 15%.
· Leveraged Shell Scripting for automation tasks, improving operational efficiency by 10%.
· Coordinated with cross-functional teams to identify system requirements and implement solutions with 100% accuracy.
· Developed Java-based user interfaces, enhancing the overall user experience of the monitoring system.
Big Data Technologies: Hadoop, Apache Spark, Hive, Kafka
Cloud Services: AWS(Amazon Web Services)
Programming Languages: Java, Python
Database: MySQL, HBase, Hive
Scripting languages: shell script
Analytical Skills: Data Visualization, Predictive Analysis, Data Modeling, Risk Analysis, Recovery Planning