About Me
Snapshot
I love turning messy data into clarity.
I earned my Master's degree in Information Systems from the University of Maryland, focused on data engineering and machine learning. I care deeply about building systems people can rely on—pipelines that are fast, resilient, and easy to trust.
Over the years, I've partnered with Fortune 500 teams, academic labs, and startups, translating complex problems into calm, usable data products that help people move with confidence.
Daily events processed
20M+
Real-time analytics at scale
Pipeline runtime gains
6h → 45m
Optimization that unlocks agility
Docs processed
100K+
LLM-powered search experiences
Fortune 500 clients
6+
Enterprise-grade delivery
Technical Strengths
- Scalable Architecture: Designed systems handling 20M+ daily events.
- Performance Optimization: Reduced pipeline runtimes from 6 hours to 45 minutes.
- AI/ML Integration: Built RAG systems processing 100K+ documents.
- Cloud Expertise: Multi-cloud deployments on AWS, GCP, and Azure.
Leadership & Impact
- Cross-functional Leadership: Led teams delivering enterprise data platforms.
- Innovation Driver: Pioneered AI adoption reducing analysis time by 60%.
- Mentorship: Guided junior engineers in modern data practices.
- Business Value: Delivered solutions serving 6 Fortune 500 clients.
Data Defender Challenge
Test your skills in this interactive data challenge game.
Use arrow keys to move, SPACEBAR to shoot!
📄 Data (10pts) |🐛 Bug (20pts) |🦠 Virus (30pts)
Experience
Wells Fargo (via Capgemini America Inc.)
Data Engineer
Leading cloud-first modernization and data quality initiatives for enterprise-scale migration programs.
Leading the solution delivery of a proof-of-concept on Google Cloud Platform for enterprise-wide Ground-to-Cloud migration strategy.
Building event-driven data quality pipelines where file drops from on-prem SQL Server to GCS trigger Airflow DAGs running schema validation and data quality checks on Dataflow, curating validated data into an Iceberg-based central lakehouse.
University of Maryland
Data Specialist
Applying cutting-edge data engineering and ML methods to solve real-world data challenges at scale.
Designed a Pub/Sub–Dataflow pipeline to stream 20M+ ELMS events/day into BigQuery for Superset dashboards, enabling engagement analytics for 230 academic programs.
Optimized ingestion of 119 Redshift tables with Python, SQL, and Informatica; implemented CDC and validation checks, slashing pipeline runtime from 6 hours to 45 minutes.
Rebuilt a data warehouse by crafting custom fact and dimension tables, enabling hierarchical KPI analysis for Tableau dashboards across 12 departments.
Developed an LLM-powered RAG tool with Streamlit, LangChain, and Elasticsearch; processed 100K+ survey responses for semantic search, sentiment analysis, and summarization, shrinking review timelines from 5 to 2 days.
Tiger Analytics
Senior Software Engineer - Data Platform
Led the efforts to build a self-serve data-fabric on AWS and GCP, used by 6 Fortune 500 clients to streamline enterprise data operations and analytics.
Led a cross-functional team to deliver a self-serve AWS Data Fabric, driving data mesh adoption for six Fortune 500 clients and accelerating time-to-insight across five domains.
Engineered batch and streaming ingest pipelines for 10+ sources, integrating CDC, encryption, and AWS Macie, reducing onboarding cycle time by 90% and meeting PII compliance.
Built a Spark + Deequ data quality framework running 30+ rules for parity, schema validation, and anomaly detection; eliminated 85% invalid records before publishing.
Piloted Apache Iceberg lakehouse on S3, enabling schema evolution, time-travel queries, and optimized reads—delivering cost-effective versioned datasets for ML workloads.
Developed FastAPI microservices deployed on Kubernetes, allowing 85+ daily users to launch ELT jobs via Airflow + Spark UI without engineer dependencies.
Implemented DataHub metadata catalog, boosting daily active users by 3x and expanding cross-domain data sharing.
Optimized platform performance: reduced median API latency by 750ms, improved cold start times by 500ms.
Xenonstack Pvt. Limited
Intern & Software Engineer
Building the technical foundation that would shape my entire career in data engineering and MLOps.
Migrated legacy Hadoop jobs to Databricks (Spark + Kafka), streaming 10GB/day of telemetry from 45 IoT sites into a Delta Lake built on medallion architecture.
Enabled incremental loading for timely forecasts, powering 15-minute demand predictions.
Integrated MLflow for pipeline experiment tracking and model registry—improving iteration velocity by 33% for forecasting and anomaly detection.
Featured Projects





Featured Articles
Insights and tutorials on data engineering, machine learning, and cloud technologies
Tools & Technologies
Methods & Concepts
Core Engineering
Data Architecture
AI & ML
Advanced Systems
Education
Building a strong foundation through academic excellence and continuous learning in technology and data science.
Master of Information Management
University of Maryland
Key Achievements
Specialized in Data Science and Machine Learning
SpecializationReceived a full tuition scholarship for the entire course duration
AchievementParticipated in intramural sports and data science club activities
ActivitiesBachelor of Engineering in Information Technology
Panjab University Chandigarh
Key Achievements
Thesis on classifying Wireless Sensor Networks using ML algorithms
ResearchMember of the Panjab University Entrepreneurship Development Cell
LeadershipParticipated in inter-university football championships
SportsLet's Talk About Your Data Needs
Whether you're looking to build a data platform, optimize existing pipelines, or explore how AI/ML can enhance your data strategy, I'd love to hear from you.
Available for consulting, full-time opportunities, and collaborations
Beyond the Work
Photography as a Reset
Outside of data engineering, I slow down with a camera. It keeps me curious, grounded, and detail-oriented.
Follow @zuiko_vision













