Work

Production systems I've designed, built, or led, spanning data science, data engineering, AI engineering, and the products built around them.

Safari King Africa

Live2025 – present · Software developer

A full marketing site and an internal CRM for a Tanzania-based safari operator, built to replace a legacy static site. Seventy-plus public pages and a real back-office, a bookings pipeline with an enforced status machine, AI-assisted itinerary and reply drafting, a blog CMS, reviews moderation, and subscriber management. Roughly forty-four thousand lines of TypeScript on Next.js 16 with Prisma over PostgreSQL, a thirteen-table schema with soft-delete and a unified trash, audit logging, tokenized share links, and emailed-OTP two-factor admin auth. Claude is wired into seven operator workflows with prompt caching, streaming, output sanitization, and per-admin rate limits.

Next.jsClaude APIPostgreSQLTypeScriptVercel

SSA Marine Procurement Agent

Case study2025 · AI engineer

A coordinator agentic architecture on Azure, deployed as a Teams app for stakeholders across SSA Marine. A top-level coordinator agent (LangGraph) routes each question to one of two sub-agents, a RAG sub-agent that grounds answers in indexed SharePoint policy documents, and an email sub-agent that takes over when the RAG layer can't find an answer, drafting an escalation email to the procurement team and holding it until the stakeholder explicitly approves the final draft. Pydantic-enforced tool-call validation, redacted audit logging, and Azure Logic Apps as the only path that can actually dispatch.

LangGraphAzure OpenAIFastAPIRAGReact

Ubunifu Madness

Live2026 – present · AI engineer

A full-stack NCAA basketball prediction platform for March Madness, built around one calibrated model that powers Kaggle-style predictions, locked daily picks, bracket simulation, and a grounded chat agent. A nine-stage daily ETL pipeline keeps Elo, advanced stats, and records current across Division I men's and women's basketball. The model is a logistic-regression and LightGBM ensemble on forty-three engineered features, trained on more than a hundred and sixty thousand games with smoothed isotonic calibration, and validated on a held-out 2023 to 2026 split at a 0.139 Brier score and about 80 percent accuracy. The agent uses seven tools and one hard rule: every claim has to come from a live database query, and it states the model's probability without altering it.

LightGBMClaude APIFastAPIPostgreSQL

LTIMindtree Data & AI Practice

Archived2021 – 2024 · Junior → Senior Data Engineer

Three years on LTIMindtree's Data & AI practice. Started as a Junior Data Engineer (Oct 2021 – Jun 2022) advising Microsoft stakeholders on cloud data architecture and diagnosing production incidents in Azure Data Factory, SSIS, and Stream Analytics under SLA commitments. Promoted to Senior Data Engineer (Jun 2022 – Aug 2024), where I led technical delivery for a thirty-person Data & AI team serving Fortune 500 clients on Microsoft Azure, modernizing analytics stacks across Azure Data Factory and Databricks, designing cloud migration strategies for legacy systems, and scaling the engineering practice from zero to thirty engineers. Reduced issue resolution time by 30% through process optimization and promoted four team members into leadership roles.

AzureDatabricksADFSSIS

MUTE: My Unique Tone Experience

Personal2025 – 2026 · Master's capstone · Full-stack engineer

A hearing-aware audio personalization platform built as a University of Washington master's capstone, with the NOISE Lab at UT Dallas. A musician takes two in-browser hearing tests, and a background worker applies the resulting attenuation curve to their own recordings so they can hear what their earplugs sound like. FastAPI and PostgreSQL with a Redis-free job queue (FOR UPDATE SKIP LOCKED, retries, stale-job recovery), JWT auth with refresh-token rotation, role-based access control, GDPR-style consent logging and erasure, and FFT-domain DSP across seven audiometric frequencies. Ninety-nine endpoints, a fifteen-table schema, and 274 automated tests.

FastAPIPostgreSQLNext.jsRailwaySQLAlchemy

E-commerce Analytics Pipeline

Personal2025 · Personal project

A production-grade Medallion Architecture (Bronze, Silver, Gold) pipeline processing 25.9 million e-commerce events across five Gold analytics tables using AWS Glue and PySpark. Optimized query performance by converting raw JSONL to Snappy-compressed Parquet with pre-aggregated Gold tables, reducing query latency by 16.6x (10.9s to 0.66s) and data scanned by 412,027x (419 MB to 2 KB). Incremental processing via Glue job bookmarks reduced re-run time from 20 minutes to under 60 seconds.

AWS GluePySparkAthenaCloudFormation

CIT Impact Analysis: Police Use of Force

Personal2025 · Research project

A statistical study of whether Crisis Intervention Team training reduces police use of force, using more than a hundred thousand Seattle crisis-contact records from 2015 to 2025. Binary and multinomial logistic regression, controlling for call severity, precinct, and officer demographics, found CIT-certified officers were associated with thirteen percent higher odds of using force (OR 1.13, p=0.027), alongside more referrals to services, more arrests, and more emergent detentions. I call it the interventionist effect, training seems to make officers act rather than do nothing. Built as a replication and extension of a 2022 study, with interactive Plotly visualizations and explicit treatment of selection, population, and temporal bias.

PythonStatsmodelsPlotlyPandas