A Data Scientist and former electrical engineer with 7+ years of hands-on experience operating large-scale power plants—where precision, systems thinking, and problem-solving were part of everyday life. After transitioning to run my family's retail business, I unexpectedly fell in love with coding—especially Python—and found my next calling in data science. I recently completed an intensive, project-driven Data Science program at Springboard, where I developed practical skills in Python, SQL, machine learning, and turning raw data into real insights. My engineering mindset, business experience, and passion for lifelong learning now fuel my drive to solve real-world problems with data.
Beside being a Data Scientist, I'm a proud father of two amazing daughters, living in Rajshahi—a riverside city known for its summer fruits and unforgettable sunsets over the Padma River. Whether debugging a neural network or hiking along the riverbank, I’m always driven by curiosity, clarity, and the pursuit of elegant solutions.
When I'm not working with data, you'll find me immersed in books — especially Bengali detective novels. I'm a huge fan of Feluda by the legendary Satyajit Ray and Kakababu by Sunil Gangopadhyay, both of which sparked my early love for mysteries. I also enjoy the works of Saratchandra Chattopadhyay, whose stories offer timeless reflections on life and society. Travel is another passion of mine — though occasional, it's always enriching. I've had the chance to explore stunning destinations like Kenya, Nepal, and India. The Maasai Mara Reserve in Kenya stands out as an unforgettable experience — witnessing lions roaming in prides, jaguars lounging in the shade, and zebras thundering across the plains was truly breathtaking. Above all, I value quality time with my family. Whether it’s a quiet evening at home or a shared adventure, those moments mean the most to me.
A machine learning model trained to distinguish phishing URLs from legitimate ones.
View GitHub RepoPhishing URLs are deceptive links crafted by cybercriminals to steal sensitive user information. These malicious URLs are commonly spread via spam emails, fraudulent messages, and compromised websites.
Phishing attacks pose severe risks to individuals and organizations, leading to financial losses and data breaches. In 2023, phishing attacks surged by 173% compared to the previous quarter.
This project aims to build an efficient and scalable machine learning model to detect phishing URLs and mitigate cybersecurity threats.
This approach focuses on analyzing the inherent characteristics of phishing URLs, avoiding dependence on external data sources like robots.txt or WHOIS records. Instead, it extracts features directly from the URL, ensuring faster and more robust detection.
@lru_cache()
for performance
# Extracting the Fully Qualified Domain Name (FQDN)
# (containing domain, subdomain, TLD, port, etc.)
df['FQDN'] = df['url'].str.split('/').str[0]
Used a pandas DataFrame containing legitimate top level domains extracted from wikipedia https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains
Dataset sourced from Kaggle containing labeled phishing and legitimate URLs.
Cleaned non-ASCII characters, extra prefixes (like extra https or www), and malformed entries.
%2F
instead of /
, hex IPs)We tested multiple models using scikit-learn
:
Text-based feature extraction using TF-IDF and compression via TruncatedSVD.
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
tfidf = TfidfVectorizer()
X_text = tfidf.fit_transform(urls)
svd = TruncatedSVD(n_components=100)
X_reduced = svd.fit_transform(X_text)
Word2Vec
for smarter domain comparisonsBERT
or LSA
via NLTK for deeper language modelingTF-GNN
for graph-based URL analysisComputer vision project detecting defective jar lids using image classification.
View GitHub RepoIn modern manufacturing, automation plays a critical role in enhancing efficiency, consistency, and quality control. One key area where automation can add significant value is in defect detection during the production process. This project focuses on automating the inspection of jar lids by developing a Convolutional Neural Network (CNN) model to classify defective versus non-defective lids.
Manual inspection is time-consuming and error-prone, making it unsuitable for large-scale operations. By leveraging deep learning and computer vision techniques, we aim to streamline the quality control process, reduce inspection time, and increase the accuracy of defect identification.
Our CNN-based model is trained on labeled image data to learn distinguishing features between acceptable and defective lids, enabling real-time and scalable quality assessment on the production line — contributing to the broader vision of smart manufacturing and Industry 4.0.
A labeled image dataset sourced from Kaggle was used for the binary classification task. The dataset included images of both intact and damaged jar lids.
To expand and diversify the dataset, the following image transformation techniques were used:
These preprocessing techniques helped create a more robust training dataset by exposing the model to a variety of realistic scenarios.
The CNN model was built using TensorFlow/Keras for binary classification.
Conv2D(32, kernel_size=(5, 5), padding='same', kernel_initializer='he_normal')
LeakyReLU(alpha=0.1)
→ MaxPooling2D(pool_size=(2, 2))
Conv2D(64, kernel_size=(5, 5), padding='same')
LeakyReLU
→ MaxPooling2D
Conv2D(128, kernel_size=(5, 5), padding='same')
LeakyReLU
→ MaxPooling2D
Flatten()
Dense(256, activation='relu', kernel_initializer='he_normal')
Dropout(0.3)
Dense(2)
(output logits)
model.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
metrics=['accuracy']
)
model.fit(
train_data,
validation_data=val_data,
batch_size=256,
epochs=50,
callbacks=[EarlyStopping(patience=3)]
)
Class Correct Incorrect Total Accuracy (%)
Intact 2017 149 2166 93.1%
Damaged 1769 249 2018 87.7%
Overall — — 4184 90.4%
Class Correct Incorrect Total Accuracy (%)
Intact 2035 132 2167 93.9%
Damaged 1759 259 2018 87.2%
Overall — — 4185 90.5%
These future directions aim to improve performance and generalization, making the system even more suitable for real-time deployment in modern manufacturing pipelines.
Over the course of 9-month Springboard Data Science bootcamp, completed 600+ hours of rigorous study — including structured modules from DataCamp, LinkedIn Learning, and original blog content, supplemented by countless hours of deep dives into YouTube tutorials and independent exploration.
Built 15+ hands-on projects using real-world datasets, applying techniques like Bayesian Optimization, Unsupervised Learning, and probabilistic modeling to solve meaningful data challenges.
In total, written over 9,000 lines of code, each one sharpening skills in Python, machine learning, and data storytelling.