Résumé
Data & AI Engineer
I build scalable data platforms and AI infrastructure for trusted enterprise workflows.
Python, SQL, Spark, Beam, Kafka, Airflow, Iceberg, BigQuery, Rust, TypeScript, RAG, vector search, and agentic AI systems.
Experience
- 00
Data Engineer
Wells Fargo · via Capgemini America Inc.
Oct 2025 – Present · Charlotte, NC
Leading GenAI-assisted modernization, lakehouse architecture, and event-driven ingestion for regulated financial data.
- 900+Java modules migrated
- 35+financial datasets
- 20+ yrsreporting context
- Led a GenAI-assisted modernization initiative using GitHub Copilot to migrate 900+ legacy Java modules into Spark and Apache Beam pipelines in roughly 7 months versus a 2+ year manual rewrite estimate.
- Architected an Apache Iceberg + BigQuery lakehouse for an on-prem-to-GCP migration POC, onboarding 35+ financial datasets with SAR/CSAR compliance considerations.
- Built event-driven ingestion on GCS, Airflow, and Dataflow with schema enforcement, DQ checks, curated lakehouse outputs, and a BigQuery semantic layer over 20+ years of financial data.
- GCP
- GitHub Copilot
- Apache Beam
- Spark
- Airflow
- Dataflow
- BigQuery
- Apache Iceberg
- 01
Data Specialist
University of Maryland
Sep 2023 – May 2025 · College Park, MD
Built high-throughput analytics and AI systems for academic operations and research.
- 20M/dayELMS events → BQ
- 6h→45m119-table CDC
- 5d→2dsurvey RAG review
- Engineered a Pub/Sub + Dataflow streaming pipeline ingesting 20M+ daily ELMS events into BigQuery for real-time Superset dashboards tracking 15 KPIs across 230 academic programs.
- Overhauled 119 Redshift ingestion workflows in Python, SQL, and AWS with CDC and validation, cutting runtime from 6 hours to 45 minutes.
- Prepared an LLM-powered RAG POC over 100K+ open-ended survey responses using Python, LangChain, and Elasticsearch, reducing qualitative review effort by 60%.
- BigQuery
- Pub/Sub
- Dataflow
- LangChain
- Elasticsearch
- Python
- 02
Senior Software Engineer. Data Platform
Tiger Analytics
Jul 2021 – Jul 2023 · Chennai, India
Led delivery of a self-serve data fabric used by Fortune 500 clients.
- 6enterprise clients
- 75%onboarding time cut
- 85+analysts + ML engineers
- Led a 7-engineer team building a self-serve AWS data platform that enabled data mesh adoption across 6 enterprise clients.
- Designed an Apache Iceberg lakehouse on S3 with Glue Catalog, schema evolution, ACID transactions, and time-travel queries for reproducible ML training datasets and historical analytics.
- Developed Airbyte/Airflow batch and streaming ingestion with CDC, encryption, Macie PII detection, and a Spark + Deequ DQ framework with 30+ rules, reducing onboarding time by 75% and blocking 85% of bad data before downstream ML.
- Shipped FastAPI microservices on Kubernetes exposing ELT orchestration as self-serve REST APIs for 85+ analysts and ML engineers.
- AWS
- Spark
- Airflow
- Airbyte
- FastAPI
- Kubernetes
- Apache Iceberg
- Deequ
- 03
Intern & Software Engineer
Xenonstack Pvt. Limited
Jan 2019 – Nov 2019 · Chandigarh, India
Built the data engineering foundation for realtime analytics and MLOps workflows.
- 10GB/day45 IoT sites
- 15minforecast freshness
- 33%MLOps iteration ↑
- Delivered Spark + Kafka pipelines processing 10GB/day of IoT telemetry from 45 geo-distributed sites into a medallion Delta Lake.
- Enabled incremental forecasting pipelines with 15-minute prediction freshness.
- Improved ML iteration speed by 33% through MLflow integration.
- Databricks
- Spark
- Kafka
- Delta Lake
- MLflow
Education
Master of Information Management
University of Maryland. College Park
Aug 2023 – May 2025 · College Park, MD
- Graduate assistantship as Data Specialist. Built analytics + AI systems across 230 academic programs.
- Coursework: distributed systems, retrieval, applied ML, data governance.
B.E. Information Technology
Panjab University, Chandigarh
2015 – 2019 · India