Data Enthusiast with nearly 4+ years of Information Technology experience having sound knowledge in Data Engineering, Data analytics, cloud, Data Science, Big Data, SQL, Software development methodologies, Data cleaning, Data Governance and Data management.
Assisted in the developement and implementation of various ETL pipelines
• Conducted data cleaning and preparation tasks while coordinating with multiple team to develop data pipelines to improve data quality and accessibility
• Developed end-to-end data analytics framework utilizing Amazon Redshift,Glue and Lambda enabling business to obtain KPIs faster with reduced costs
• Transform raw data into a structured format suitable for analysis and reporting. Develop ETL (Extract, Transform, Load) processes to cleanse, enrich, and transform data.
• Worked on the Data modelling package, where the bunch of scripts like SQL and python were tested on the Jupyter notebooks before pushing them into release artifact repo.
• I would perform the data ingestion where the data will be ingested into the S3 bucket and later it would be curated and moved to the downstream.
• Testing, debugging, diagnosing, and correcting errors and faults in an applications programming language within established testing protocols, guidelines, and quality standards to ensure data models and applications perform to specification.
• Working on the curation of the data whenever the new data got ingested into the S3 raw bucket, where the raw data would be standardised in to parquet.
• Monitoring of Data pipelines to check any issue in running the pipeline in its activity run when the data has been ingested and curated.
• Troubleshooting and patching most of the infrastructure activities in production and non-production to make them compliant for the security purposes.
• Experienced in working with Jenkins and Airflow as part of Data ingestion and Data Curation
• Worked on AWS glue tables to create a schema and move the data across from S3 bucket to Glue
Used Python to scrape, clean, and analyse large datasets.
• Working with large/complex datasets.
• Monitoring of Data pipeline to check any issue in running the pipeline in its activity run.
• Implemented various ETL process and moved the data to the AWS Redshift.
• Experience in big data querying tools like Hive and HBase.
• Used MongoDB and HBase for NoSQL Databases.
• Interpret data, analyse results using statistical techniques and design reports, dashboards, and visualizations.
• Wrote SQL queries to extract and analyse complex datasets.
• Performed ETL using Python and built machine learning models.
• NLTK library for data cleaning, removal of stop words and punctuations, Tokenization, stemming and lemmatization.
• Built TF-IDF features using NLTK and performed sentiment analysis using Neural Networks.
• Applied various pre-trained models like VGG16/RestNet/XG-Boost algorithm to get the better accuracy of the model.
• Model analysis was done using Precision, Recall, F1 score, AUC and ROC.
• Performed EDA, feature extraction, data visualization and generated a model using random forest regressor. Feature selection was done using Extra tree regressor technique.
• Worked with Image data for image recognition using Machine learning and Deep learning techniques. • Frequently worked with libraries like pandas, matplotlib, seaborn, Scikit-Learn, NumPy, Keras, fastai and Tensorflow.