About
- I am currently doing MS in Data Science from University of Michigan-Dearborn and set to graduate on 27th April 2024.
- I have 3.5+ years of rich experience in automotive manufacturing operations as a Data Analyst.
- Implemented various projects related big data analysis, real -time production monitoring, data warehousing, predictive modeling, and computer vision for root cause analysis, process improvement, quality improvement and operation optimization
- Led more than 20+ Major and 10+ minor projects for process and quality improvements.
- As a Research Assistant worked on landmark detection, object detection using various state of art CNN methods such as MobileNet, ResNet, U-Net, and YoloV2.
- Worked on various projects related to Regression, Classification, Computer vision and Natural Language processing using concepts of Machine Learning, Deep Learning, and Statistical methods.
- Experienced in handling big data and maintaining the data pipeline (ETL/ELT) for using it for data analysis and machine learning.
- Hands-on experience on developing the cost-optimized, efficient, secure, scalable, and highly available machine learning and data analytics solutions in AWS and GCP using compute, database, object storage, data warehousing, and serverless architecture.
- Strong technical background in programming, data structure and algorithm, data modeling, data normalization, ad-hoc analysis, efficient SQL querying, and cloud tools (AWS and GCP).
- You can reach out to me through email- faiyaza@umich.edu
- I am currently looking for Data Scientist, Machine Learning Engineering, Data Analyst or Data Engineer Full-time opportunites.
Education
Master of Science, Data Science
Aug'2022 - Apr'2024 (Expected)
University of Michigan-Dearborn, Michigan, USA
Relevant courses: Database Systems,Cloud Computing (GCP), Big Data, Multivariate Statistics, Deep Learning, Applied Regression Analysis, Natural Lanaguage Processing, Data Security and Privacy, and Artificial Intelligence.
Bachelor of Technology, Mechanical Engineering
2014 - 2018
Motilal Nehru National Institute of Technology, Prayagraj, India
Relevant courses: Industrial engineering, Numerical methods and statistical techniques, Quality Engineering, Software Project Management
Professional Experience
Research Assistant
Jan'2024 - Present
Sustainability Center-University of Michigan, Dearborn, MI, USA
- Developing custom object detection algorithm for processing of thermal videos to recreate the normalized 3-D visualization.
- Creating Web Application Frontend and Backend using S3, EC2, Lambda, CloudFront for interactive dashboard for 3-D printer time-series data, real-time thermal video analysis and visualization with low latency.
Data Analyst
Aug'2023 - Nov'2023
iLabs-University of Michigan, Dearborn, MI, USA
- Conducted web-scraping using python script for data collection. Performed data prepossessing, data modeling, exploratory data analysis and statistical analysis to pinpoint critical factors contributing to a hockey team’s success.
- Built Interactive Tableau Dashboard to track the Players and teams performances.
Research Assistant (Volunteering)
May'2023 - Aug'2023
University of Michigan-Dearborn (CIS Dept), MI, USA
- Implemented ResNet50, MobileNetV2, and U-Net architecture with modified dense layer for facial landmark detection on thermal images with custom loss function i.e wing loss to train the model. Achieved 0.04 Normalized Mean Error
- Reviewed various research papers and replicating the models related to computer vision techniques to detect facial landmarks.
Data Analyst
Jul'2018 - Mar'2022
Hero MotoCorp Limited, Halol, Gujarat, India
- 10% productivity improvement in machine shop by developing a real-time production monitoring dashboard to locate the failure points and root cause analysis quickly. Implemented by using AWS cloud services such as Kinesis, Lambda, DynamoDB, Grafana, and Tableau to track the KPIs i.e equipment efficiency, productivity, quality rate etc.
- 90% reporting time reduction in regular and ad-hoc reporting by implementing ETL systems with AWS Glue, S3, Lambda, and Athena, extracting data from various sources including ERP systems (SAP) and SQL databases.
- Successfully led 20+ Major and Minor data analytics projects related to resource optimization, process improve- ment, and quality improvement related projects in the manufacturing unit involved in project planning, scheduling, resource allocation, project execution, effective communication and collaboration with stakeholders
- Designed dashboards using Amazon QuickSight, Looker and Tableau to track machine OEE, productivity metrics, MTBF, and key operational parameters by extracting data from Redshift and BigQuery through complex SQL queries.
- Performed batch data processing on EMR, staged data in S3, and transferred to Redshift for complex querying and analysis
- Utilized Athena with the Glue Catalog for ad-hoc querying, integrating with Tableau for analysis,visualization and reporting.
- Performed statistical analysis and hypothesis testing using R on a manufacturing machine process assesses quality parameters, identifies variations, and optimizes product quality
- Executed data modeling, normalization , and conducted data quality checkpoints to ensure data accuracy and integrity.
Portfolio Projects
Batch and Streaming Big Data Analysis Projects
End to End Football Leagues Analytics [Live Dashboard]
- Designed a robust football data analytics pipeline on both AWS and GCP, encompassing 950+ global leagues, ensuring automated data collection, preprocessing, and storage.
- Deployed a cloud solution emphasizing serverless architecture, automated scheduling, minimal overhead cost, real-time data processing, and seamless integration with visualization and machine learning tools.
- Orchestrated data warehousing solutions using GCP’s BigQuery and AWS’s Redshift, enabling robust SQL querying, seamless data transfer, and integration with Looker, Tableau and machine learning platforms like Vertex AI and SageMaker.
Twitter Streaming Sentiment Analysis [Report]
- Built streaming Pipeline using Twitter API, NiFi, Kafka, and Spark Streaming using AWS EC2 Instance.
- Trained Word2Vec model and Decision Tree models for sentiment prediction. Achieved 78% accuracy, and 0.69 AUC-ROC.
- Built a dashboard for visualizing the sentiment on particular topic, and adhoc-queries on streaming data for 5 mins window.
SQL Queries performance evaluation [Report]
- Extracted the IMDB dataset from server, preprocessed, and created the ER diagram, DDL and DML statements.
- Created different analytical queries, and evaluated the performance of lookup queries with and without indexing.
- By indexing for lookup queries reduce the queries time by about 85%.
Machine Learning Projects
Bank Telemarketing Term Deposit Subscription Analysis and Prediction [Report]
- Performed the Data Cleaning, Exploratory data analysis, ANOVA, and chi-square test to check feature independence.
- Trained the logistic regression and decision tree classifier to predict the subscription probability. Achieved 0.60 AUC-ROC.
Adult Income Class Classification
- Performed data preprocessing, exploratory data analysis, feature engineering, and treating the unbalanced data.
- Trained classifier a logistic regression and random forest classifier for predicting the annual income class. Achieved 0.66 F-1 Score and 0.88 ROC-AUC score.
Deep Learning Projects
Deep Implicit Movie Recommendation system
- Implemented a deep implicit movie recommendation system for IMDB dataset to predict the rating of user for particular movie
- Optimized the model using triplet loss, by measuring similarity between user and item embeddings. Achieved 92% accuracy
Autonomous Driving Object Detection using YOLOv2
- Developed a object detection model to identify the 80 different objects with 5 anchor boxes by using YOLO algorithm.
- Performed transfer learning using pre-trained YOLOv2 model and increased the accuracy of model by applying Non-max supression.
Remaining Useful Prediction for the Aircraft Gas Turbine Engine [Report]
- Performed the exploratory data analysis, feature engineering, and build the CNN architecture for time-series analysis
- Trained the CNN model and fine-tune the model to overcome the overfitting issue. Achieved 21.5 MAE and 24.3 RMSE.
Semantic Image Segmentation using U-Net Architecture
- Implemented semantic image segmentation on the CARLA self-driving car dataset.
- Used Encoder block and Decoder block along with skip-connection at each level to improve the accuracy of masking prediction.
- Applied sparse categorical cross-entropy loss to train the model for pixelwise prediction
Handwritten digit recognition
- Trained Convolutional and Artificial Neural Network based classifier to predict the handwritten 28x28 pixel image.
- Achieved accuracy of 98% in CNN and 97% on ANN. Evaluated F-1 Score, Recall, Precision and confusion matrix.
NLP Projects
Neural Machine Translation using Attention Model
- Developed an attention-based model for Neural Machine Translation (NMT) specifically designed to translate human-readable dates into machine-readable dates.
- Model incorporates pre-attention Bi-LSTM and post-attention Long Short-Term Memory layer to enhance translation accuracy.
Named-Entity Recognition to Process Resume using Transformer Model
- Developed a Transformer-based model to process resumes and extract information such as name, skills, designation etc.
- Performed transfer learning using the DistilBERT fast tokenizer and a pre-trained transformer model for parsing resumes.
Parts of Speech Tagging using Viterbi Algorithm
- Implemented Viterbi algorithm for parts of speech tagging for each word in a sentence using Penn Treebank datasets.
- Achieved 92.3% accuracy on test set using Viterbi algorithm and 85.18% accuracy for assigning most frequent tag.
Word Sense Disambiguation
- Applied Naïve Bayes algorithm for identification of correct sense of particular word in given context.
- Created a bag of words for each sense for a given words in a training set.
- Implemented 5 fold cross validation and achieved 90.18% accuracy on ‘bass’ dataset.
Extractive Text Summarization using TF-IDF Vectorizer
- Developed the text summarization by using the concepts TF-IDF and Centroid on CNN News Dataset.
- Centroid based approach is providing better central idea for given text and achieved the average 0.42 ROUGE-1
Skills
Recommenations
There are few recommendations from my reporting managers, colleagues, and friends