Data & AI Engineerrust · typescript · trusted ai workflows

I build the data platforms that AI actually runs on.

The hard part isn't the model — it's making the data underneath defensible. Four years on enterprise modernization, analytics platforms, and now Rust and TypeScript systems for AI workflows that hold up under audit.

LEGACY_MODERNIZATION

900+

Java modules → Spark + Beam

STREAMING_SCALE

20M/day

events into BigQuery

DATASETS_ONBOARDED

35+

financial datasets → lakehouse

INGEST_RUNTIME

6h→45m

119-table CDC + validation

01 / The work

Three projects, one thesis.

Each one is a different angle on the same question: how do you make AI useful inside a real organization without breaking the things underneath?

pulseql.project

Governed analytics

product summary · public view

PulseQL

Active

A governed data workspace for teams that want AI-assisted analysis without losing control of review, privacy, or operational trust.

  • Data Workspace
  • Governed AI
  • Desktop Product

atrium.project

Enterprise knowledge

product summary · public view

Atrium

In development

A knowledge product for teams that need answers from company documents with clear citations, honest refusal, and permission-aware user experience.

  • Knowledge Search
  • Citations
  • Enterprise Access

relay.project

Safer AI workflows

product summary · public view

Relay

In development

A coordination layer for AI-assisted engineering teams that need shared context, stronger review signals, and safer workflows across tools.

  • Engineering Context
  • Team Workflows
  • AI Safety
Ground what it can. Refuse what it cannot. Run within bounds the team can defend.
The thread connecting them

02 / What I believe

Three principles, repeated.

That shorthand above sits on top of these three. Whether the system is a modernization, an analytics platform, or an AI workflow, the same primitives keep showing up.

  1. 01

    Build the data foundation first

    Reliable AI starts with governed data systems: ingestion, schema enforcement, data quality, lakehouse tables, semantic definitions, and lineage.

    Iceberg + BigQuery lakehouse work across 35+ financial datasets and 20+ years of reporting context.

  2. 02

    Ground AI in enterprise context

    LLM systems need retrieval evidence, metric contracts, access boundaries, audit trails, and explicit execution paths before they can be trusted.

    RAG work over 100K+ survey responses reduced qualitative review effort by 60%.

  3. 03

    Modernize with measurable exits

    Large rewrites need migration systems, not one-off scripts: repeatable transformations, validation gates, and clear ownership for every generated artifact.

    GenAI-assisted modernization moved 900+ Java modules toward Spark and Apache Beam in roughly seven months.

Data engineering

Batch, streaming, quality, and lakehouse work for enterprise data teams.

01
  • Python
  • SQL
  • Apache Spark
  • Apache Beam
  • Kafka
  • Airflow
  • Deequ
  • Iceberg

Pipelines, semantic layers, and modernization systems at production scale.

ML & AI systems

Model workflows, feature systems, deployment paths, and evaluation loops.

02
  • TensorFlow
  • PyTorch
  • Scikit-learn
  • MLflow
  • Vertex AI
  • SageMaker
  • Feature stores
  • Model deployment

Built across applied ML, MLOps, and AI-assisted review workflows.

Generative AI

Grounded LLM applications that keep context, evidence, and review visible.

03
  • LangChain
  • RAG pipelines
  • Vector databases
  • Knowledge graphs
  • LLM applications
  • Agentic AI
  • Rust
  • TypeScript

RAG systems, knowledge interfaces, and agentic AI infrastructure.

Cloud & DevOps

Cloud delivery, infrastructure automation, observability, and platform hygiene.

04
  • AWS
  • GCP
  • Terraform
  • Shell scripting
  • Docker
  • Jenkins
  • Kubernetes
  • Grafana

AWS, GCP, containers, CI/CD, and production monitoring workflows.

03 / The stack

Where those principles meet keys on a keyboard.

Four layers — data, ML, generative AI, cloud. The tools differ; the test is the same: does this workflow hold up under audit?

The work is data infrastructure before it is AI.
Why I keep coming back to this

04 / The path

Where the patterns came from.

Four roles teaching the same lesson in different ways — that the infrastructure decisions outlast the systems they live inside.

  1. 012025

    Data Engineer

    Wells Fargo · via Capgemini America Inc.

    Charlotte, NC

  2. 022023

    Data Specialist

    University of Maryland

    College Park, MD

  3. 032021

    Senior Software Engineer. Data Platform

    Tiger Analytics

    Chennai, India

  4. 042019

    Intern & Software Engineer

    Xenonstack Pvt. Limited

    Chandigarh, India

Offline

Field notes. Analog photography.

Shot on an Olympus EM-10 between trips and walks. A quieter counterweight to the systems work: composition, patience, and noticing what the frame leaves out.

05 / Notes

Where I think out loud about this.

Short technical notes published when there's something specific to say. The decisions behind the systems, in writing.

All notes