Hello

English

Pronunciation: HEL-oh

Greeting you in 30 languages around the world

About Me

Snapshot

I love turning messy data into clarity.

I earned my Master's degree in Information Systems from the University of Maryland, focused on data engineering and machine learning. I care deeply about building systems people can rely on—pipelines that are fast, resilient, and easy to trust.

Over the years, I've partnered with Fortune 500 teams, academic labs, and startups, translating complex problems into calm, usable data products that help people move with confidence.

Daily events processed

20M+

Real-time analytics at scale

Pipeline runtime gains

6h → 45m

Optimization that unlocks agility

Docs processed

100K+

LLM-powered search experiences

Fortune 500 clients

6+

Enterprise-grade delivery

Technical Strengths

  • Scalable Architecture: Designed systems handling 20M+ daily events.
  • Performance Optimization: Reduced pipeline runtimes from 6 hours to 45 minutes.
  • AI/ML Integration: Built RAG systems processing 100K+ documents.
  • Cloud Expertise: Multi-cloud deployments on AWS, GCP, and Azure.

Leadership & Impact

  • Cross-functional Leadership: Led teams delivering enterprise data platforms.
  • Innovation Driver: Pioneered AI adoption reducing analysis time by 60%.
  • Mentorship: Guided junior engineers in modern data practices.
  • Business Value: Delivered solutions serving 6 Fortune 500 clients.

Data Defender Challenge

Test your skills in this interactive data challenge game.

Use arrow keys to move, SPACEBAR to shoot!

Score: 0
High: 0
❤️Lives: 3
Level: 1
Defend against data threats! Use arrow keys or mouse to move, SPACEBAR to shoot.
📄 Data (10pts) |🐛 Bug (20pts) |🦠 Virus (30pts)
Professional Journey

Experience

Enterprise Migration

Wells Fargo (via Capgemini America Inc.)

Data Engineer

Oct 2025 – Present
Charlotte, NC

Leading cloud-first modernization and data quality initiatives for enterprise-scale migration programs.

Key Achievements
Leading the solution delivery of a proof-of-concept on Google Cloud Platform for enterprise-wide Ground-to-Cloud migration strategy.
Ground-to-Cloud POCArchitecture
GCPMigration StrategyEnterprise Architecture
Building event-driven data quality pipelines where file drops from on-prem SQL Server to GCS trigger Airflow DAGs running schema validation and data quality checks on Dataflow, curating validated data into an Iceberg-based central lakehouse.
Event-driven validationData Quality
SQL ServerGCSAirflowDataflowIceberg
Innovation & Research

University of Maryland

Data Specialist

Sep 2023 – May 2025
College Park, MD

Applying cutting-edge data engineering and ML methods to solve real-world data challenges at scale.

Key Achievements
Designed a Pub/Sub–Dataflow pipeline to stream 20M+ ELMS events/day into BigQuery for Superset dashboards, enabling engagement analytics for 230 academic programs.
20M+ daily eventsReal-time Analytics
Pub/SubDataflowBigQuerySuperset
Optimized ingestion of 119 Redshift tables with Python, SQL, and Informatica; implemented CDC and validation checks, slashing pipeline runtime from 6 hours to 45 minutes.
6 hours → 45 minutesPerformance
PythonSQLInformaticaRedshiftCDC
Rebuilt a data warehouse by crafting custom fact and dimension tables, enabling hierarchical KPI analysis for Tableau dashboards across 12 departments.
12 departmentsData Architecture
Data WarehouseFact TablesDimension TablesTableauKPI Analysis
Developed an LLM-powered RAG tool with Streamlit, LangChain, and Elasticsearch; processed 100K+ survey responses for semantic search, sentiment analysis, and summarization, shrinking review timelines from 5 to 2 days.
5 days → 2 daysAI/ML
LLMRAGStreamlitLangChainElasticsearchNLP
Platform Engineering

Tiger Analytics

Senior Software Engineer - Data Platform

Jul 2021 – Jul 2023
Chennai, India

Led the efforts to build a self-serve data-fabric on AWS and GCP, used by 6 Fortune 500 clients to streamline enterprise data operations and analytics.

Key Achievements
Led a cross-functional team to deliver a self-serve AWS Data Fabric, driving data mesh adoption for six Fortune 500 clients and accelerating time-to-insight across five domains.
6 Fortune 500 clientsLeadership
AWSData MeshCross-functional Team
Engineered batch and streaming ingest pipelines for 10+ sources, integrating CDC, encryption, and AWS Macie, reducing onboarding cycle time by 90% and meeting PII compliance.
90% faster onboardingAutomation
CDCEncryptionAWS MaciePII Compliance
Built a Spark + Deequ data quality framework running 30+ rules for parity, schema validation, and anomaly detection; eliminated 85% invalid records before publishing.
85% invalid records filteredData Quality
SparkDeequData ValidationSchema Validation
Piloted Apache Iceberg lakehouse on S3, enabling schema evolution, time-travel queries, and optimized reads—delivering cost-effective versioned datasets for ML workloads.
Cost-effective ML datasetsInnovation
Apache IcebergS3LakehouseSchema Evolution
Developed FastAPI microservices deployed on Kubernetes, allowing 85+ daily users to launch ELT jobs via Airflow + Spark UI without engineer dependencies.
85+ daily usersMicroservices
FastAPIKubernetesAirflowSpark
Implemented DataHub metadata catalog, boosting daily active users by 3x and expanding cross-domain data sharing.
3x daily active usersMetadata
DataHubMetadata CatalogCross-domain Sharing
Optimized platform performance: reduced median API latency by 750ms, improved cold start times by 500ms.
750ms latency reductionPerformance
API OptimizationCold Start OptimizationPerformance Tuning
Data Engineering

Xenonstack Pvt. Limited

Intern & Software Engineer

Jan 2019 – Nov 2019
Chandigarh, India

Building the technical foundation that would shape my entire career in data engineering and MLOps.

Key Achievements
Migrated legacy Hadoop jobs to Databricks (Spark + Kafka), streaming 10GB/day of telemetry from 45 IoT sites into a Delta Lake built on medallion architecture.
10GB/day processingInfrastructure
HadoopDatabricksSparkKafkaDelta Lake
Enabled incremental loading for timely forecasts, powering 15-minute demand predictions.
15-minute predictionsReal-time Analytics
Incremental LoadingDemand ForecastingReal-time Processing
Integrated MLflow for pipeline experiment tracking and model registry—improving iteration velocity by 33% for forecasting and anomaly detection.
33% faster iterationMLOps
MLflowExperiment TrackingModel RegistryPipeline Optimization

Featured Projects

Data Fusion Engineering

Data Fusion Engineering

Google CloudDataflowBigQueryPython

Comprehensive data pipeline solution for processing and analyzing large-scale datasets using Google Cloud Platform services.

Intelligent Record Management

Intelligent Record Management

PythonNLPElasticsearchFastAPI

AI-powered document processing system with semantic search capabilities for efficient information retrieval.

Loan Default Prediction System

Loan Default Prediction System

PythonScikit-learnPandasMLflow

Machine learning model to predict loan defaults using historical data and advanced feature engineering.

Data Prep for Fintech Analytics

Data Prep for Fintech Analytics

Apache SparkAWSPythonAirflow

ETL pipeline for processing financial transaction data and generating actionable insights.

Monitoring EKS Cluster

Monitoring EKS Cluster

KubernetesPrometheusGrafanaAWS

Comprehensive monitoring solution for Kubernetes clusters with alerting and visualization.

Sports Analytics System

Sports Analytics System

PythonPandasTableauStatistics

Data analysis platform for sports performance metrics and predictive modeling.

Featured Articles

Insights and tutorials on data engineering, machine learning, and cloud technologies

Loading articles...

Tools & Technologies

Python
SQL
Scala
Bash
FastAPI
LangChain
DBT
Great Expectations
Pytest
Apache Spark
Apache Beam
Kafka
Airflow
Apache Iceberg
Airbyte
Terraform
Docker
Deequ
Redshift
BigQuery
Postgres
MySQL
DynamoDB
DuckDB
Neo4j
Elasticsearch
FAISS
BERT
LangChain
AWS Bedrock
RAG Systems
Knowledge Graphs
Transformers
LLM APIs
Python
SQL
Scala
Bash
FastAPI
LangChain
DBT
Great Expectations
Pytest
Apache Spark
Apache Beam
Kafka
Airflow
Apache Iceberg
Airbyte
Terraform
Docker
Deequ
Redshift
BigQuery
Postgres
MySQL
DynamoDB
DuckDB
Neo4j
Elasticsearch
FAISS
BERT
LangChain
AWS Bedrock
RAG Systems
Knowledge Graphs
Transformers
LLM APIs

Methods & Concepts

Core Engineering

Data Structures & Algorithms
Distributed Systems
MLOps
CI/CD Pipelines
GitOps
REST APIs
Backend Development
Server Side Programming

Data Architecture

Data Lakehouse Architecture
Data Mesh
Data Governance
ETL/ELT Pipelines
Real-time Data Processing
Data Warehousing
Data Cataloging
Streaming Analytics

AI & ML

RAG Systems
Vector Databases
Fine-tuning
Prompt Engineering
Model Deployment & Monitoring
Feature Engineering
Automated Machine Learning
Explainable AI

Advanced Systems

Event-Driven Architecture
Data Observability
Data Security
Federated Learning
Edge Computing
IoT Data Integration
Data Quality
Metadata Management
Academic Journey

Education

Building a strong foundation through academic excellence and continuous learning in technology and data science.

Master of Information Management

University of Maryland

2023 - 2025
College Park, MD
GPA3.97/4.0

Key Achievements

Specialized in Data Science and Machine Learning

Specialization

Received a full tuition scholarship for the entire course duration

Achievement

Participated in intramural sports and data science club activities

Activities

Bachelor of Engineering in Information Technology

Panjab University Chandigarh

2015 - 2019
Chandigarh, India
GPA7.72/10.0

Key Achievements

Thesis on classifying Wireless Sensor Networks using ML algorithms

Research

Member of the Panjab University Entrepreneurship Development Cell

Leadership

Participated in inter-university football championships

Sports

Let's Talk About Your Data Needs

Whether you're looking to build a data platform, optimize existing pipelines, or explore how AI/ML can enhance your data strategy, I'd love to hear from you.

Available for consulting, full-time opportunities, and collaborations

Beyond the Work

Photography as a Reset

Outside of data engineering, I slow down with a camera. It keeps me curious, grounded, and detail-oriented.

Follow @zuiko_vision
Photography showcase
Photography showcase
Photography showcase
Photography showcase
Photography showcase
Photography showcase
Photography showcase
Photography showcase
Photography showcase
Photography showcase
Photography showcase
Photography showcase
Photography showcase