Résumé
Data & AI Engineer
I build scalable data platforms and AI infrastructure for trusted enterprise workflows.
Python, SQL, Spark, Beam, Kafka, Airflow, Iceberg, BigQuery, Rust, TypeScript, RAG, vector search, and agentic AI systems.
Snapshot
Enterprise data systems, now focused on trusted AI workflows.
- Experience
- 4+ years
- Focus
- Data platforms · AI systems
- Current
- Enterprise modernization
- Data platforms
- Streaming pipelines
- Cloud modernization
- ML systems
- RAG workflows
- Governed AI
Experience
- 00
Data Engineer
Wells Fargo · via Capgemini America Inc.
Oct 2025 – Present · Charlotte, NC
Leading GenAI-assisted modernization, lakehouse architecture, and event-driven ingestion for regulated financial data.
- 900+Java modules migrated
- 35+financial datasets
- 20+ yrsreporting context
- Led a GenAI-assisted modernization initiative using GitHub Copilot to migrate 900+ legacy Java modules into Spark and Apache Beam pipelines in roughly 7 months versus a 2+ year manual rewrite estimate.
- Architected an Apache Iceberg + BigQuery lakehouse for an on-prem-to-GCP migration POC, onboarding 35+ financial datasets with SAR/CSAR compliance considerations.
- Built event-driven ingestion on GCS, Airflow, and Dataflow with schema enforcement, DQ checks, curated lakehouse outputs, and a BigQuery semantic layer over 20+ years of financial data.
- GCP
- GitHub Copilot
- Apache Beam
- Spark
- Airflow
- Dataflow
- BigQuery
- Apache Iceberg
- 01
Data Specialist
University of Maryland
Sep 2023 – May 2025 · College Park, MD
Built high-throughput analytics and AI systems for academic operations and research.
- 20M/dayELMS events → BQ
- 6h→45m119-table CDC
- 5d→2dsurvey RAG review
- Engineered a Pub/Sub + Dataflow streaming pipeline ingesting 20M+ daily ELMS events into BigQuery for real-time Superset dashboards tracking 15 KPIs across 230 academic programs.
- Overhauled 119 Redshift ingestion workflows in Python, SQL, and AWS with CDC and validation, cutting runtime from 6 hours to 45 minutes.
- Prepared an LLM-powered RAG POC over 100K+ open-ended survey responses using Python, LangChain, and Elasticsearch, reducing qualitative review effort by 60%.
- BigQuery
- Pub/Sub
- Dataflow
- LangChain
- Elasticsearch
- Python
- 02
Senior Software Engineer. Data Platform
Tiger Analytics
Jul 2021 – Jul 2023 · Chennai, India
Led delivery of a self-serve data fabric used by Fortune 500 clients.
- 6enterprise clients
- 75%onboarding time cut
- 85+analysts + ML engineers
- Led a 7-engineer team building a self-serve AWS data platform that enabled data mesh adoption across 6 enterprise clients.
- Designed an Apache Iceberg lakehouse on S3 with Glue Catalog, schema evolution, ACID transactions, and time-travel queries for reproducible ML training datasets and historical analytics.
- Developed Airbyte/Airflow batch and streaming ingestion with CDC, encryption, Macie PII detection, and a Spark + Deequ DQ framework with 30+ rules, reducing onboarding time by 75% and blocking 85% of bad data before downstream ML.
- Shipped FastAPI microservices on Kubernetes exposing ELT orchestration as self-serve REST APIs for 85+ analysts and ML engineers.
- AWS
- Spark
- Airflow
- Airbyte
- FastAPI
- Kubernetes
- Apache Iceberg
- Deequ
- 03
Intern & Software Engineer
Xenonstack Pvt. Limited
Jan 2019 – Nov 2019 · Chandigarh, India
Built the data engineering foundation for realtime analytics and MLOps workflows.
- 10GB/day45 IoT sites
- 15minforecast freshness
- 33%MLOps iteration ↑
- Delivered Spark + Kafka pipelines processing 10GB/day of IoT telemetry from 45 geo-distributed sites into a medallion Delta Lake.
- Enabled incremental forecasting pipelines with 15-minute prediction freshness.
- Improved ML iteration speed by 33% through MLflow integration.
- Databricks
- Spark
- Kafka
- Delta Lake
- MLflow
Education
Master of Information Management
University of Maryland. College Park
Aug 2023 – May 2025 · College Park, MD
- Graduate assistantship as Data Specialist. Built analytics + AI systems across 230 academic programs.
- Coursework: distributed systems, retrieval, applied ML, data governance.
B.E. Information Technology
Panjab University, Chandigarh
2015 – 2019 · India