Atharva Joshi · Data Scientist · ML Engineer

01 · Selected Works

Projects That Ship01

~/finreg-ml · python3

HuggingFace Demo

Open on HF ↗

PROJECT_01 / 05 · FLAGSHIP

finreg-ml

Regulation-aware ML pipeline for finance. GovernedModel, SHAP explainability, fairness audits, EU AI Act compliance, drift detection (KS+PSI), consolidated reports. Published on PyPI.

Tests Passing

v0.2.0

PyPI Published

Modules

MIT

Open Source

Pythonscikit-learnSHAPPyPIGitHub ActionsFastAPI

GitHub PyPI ↗ Live Demo ↗

# finreg-ml v0.2.0 $ pip install finreg-ml Successfully installed finreg-ml-0.2.0 $ python -c "from finreg import GovernedModel; print(GovernedModel.__doc__)" EU AI Act compliant ML pipeline with SHAP explainability, fairness audits, drift detection (KS + PSI), and auto-reports. Supports: scikit-learn estimators | 10 modules | 46 tests $ pytest tests/ -ra ✓ 46 passed in 4.31s

~/stocksense · typescript+python

Live Dashboard

Open in new tab ↗

PROJECT_02 / 05 · LIVE DEMO

stocksense

End-to-end demand forecasting and inventory health platform. HGBT + SARIMAX with walk-forward backtesting, Supabase Postgres backing realtime dashboard, PySpark feature parity. Next.js + Recharts UI deployed on Vercel.

22%

RMSE Lift

Tests Passing

12 × 2

SKU Panels

live

Realtime

PythonPySparkNext.jsSupabaseVercel

GitHub Live Demo ↗

# stocksense v0.1.0 $ python -m stocksense.run → Walk-forward CV across 24 panels · 3 folds · 14d horizon → HGBT MAPE 21.9% vs Seasonal Naive 27.4% → Selected HGBT on every panel · bias near zero ✓ 31 tests · Pandas/PySpark parity verified

~/clarify · typescript

AI Agent

Open in new tab ↗

PROJECT_03 / 05 · LIVE DEMO

clarify

Production LLM agent that turns a free-text brief into a validated BA artifact pack. Multi-step reasoning with a self-correcting clarification loop, typed IDs, RACI matrix, traceability. Llama 3.3 70B fallback chain via OpenRouter.

LLM AgentTool Calling

Next.jsAI SDKOpenRouterVercel

GitHub Live Demo ↗

# clarify $ # input: "vendor invoice approval tool" → Agent asks 6 clarifying questions before assuming → Restates assumptions, then ships artifact pack → Typed IDs (BR/SR/FR/NFR/TR/TC) · RTM · RACI register ✓ First-draft turnaround: ~6 hours → under 5 minutes

~/agenteval · python3

CLI Framework

PROJECT_04 / 05

agenteval

AI agent evaluation framework with AgentRunner, LLM-as-judge scoring (OpenAI + Anthropic), CLI, safety checks for PII & prompt injection, multi-format export.

LLM-as-judgeSafety Checks

PythonOpenAIAnthropicpydanticCLI

GitHub

# agenteval v0.2.0 $ agenteval run --agent my_agent.py --suite evals/suite.json → Loading 11 modules · 62 tests passing → LLM Judge: claude-sonnet-4 (Anthropic) → Safety: PII scan ✓ · Injection scan ✓ → Export: JSON · CSV · Markdown ✓ Report saved: eval_results.json

~/crypto-stat-arb · python3

Backtest Engine

PROJECT_05 / 05

crypto-stat-arb

Statistical arbitrage engine. Engle-Granger cointegration, Johansen basket trading, Kalman filter hedge ratios, walk-forward backtesting, regime detection, paper trading via Kraken API.

Kraken APIKalman Filter

Pythonstatsmodelsscipypandas

GitHub

# crypto-stat-arb v0.1.0 $ python -m cryptoarb.backtest --pair BTC-ETH --window 90d → Engle-Granger cointegration test → Kalman filter hedge ratio (rolling) → Walk-forward backtest · regime detection → Market neutral confirmed (BTC corr ≈ 0.03) ✓ 107 tests · paper trading via Kraken API

02 · Open Source DNA

Building in Public02

COMBINED GITHUB STARS ACROSS CONTRIBUTIONS

Merged Pull Requests

#549

TauricResearch/TradingAgents ⭐ 50.8k

Unicode encoding fix

+12 −3

MERGED

#776

Microsoft/agent-governance-toolkit ⭐ 1.2k

EU AI Act risk classifier

+247 −18

MERGED

#786

Microsoft/agent-governance-toolkit ⭐ 1.2k

Docs follow-up

+31 −8

MERGED

#1410

AI4Finance-Foundation/FinRL ⭐ 14.6k

Threading bug fix

+7 −14

MERGED

Open Pull Requests

#345

goldmansachs/gs-quant ⭐ 10k

Pandas 2.x compatibility

+89 −41

OPEN

#113

google/tf-quant-finance ⭐ 5.3k

MD5 to SHA-256 security fix

+3 −3

OPEN

#9809

sktime/sktime ⭐ 9.7k

NaiveForecaster bug fix

+18 −22

OPEN

#512

ranaroussi/quantstats ⭐ 7k

Compounded flag for calmar/rar

+14 −6

OPEN

#364

bukosabino/ta ⭐ 5k

Rank + Percentile indicators

+67 −0

OPEN

04 · Where I've Worked

Professional Experience04

Research · Buffalo, NY

University at Buffalo

Research Data Analyst

Built data pipelines and statistical models for university research stakeholders. Combined Python and R for hypothesis testing and regression analysis, plus SQL data integration into automated reporting.

Oct 2025 to Apr 2026 · Buffalo, NY

★ Impact

Reduced 696-feature dataset to 80 high-signal variables (88% cut) · Integrated 3 sources into 41 production reports · Reclaimed 15+ hrs/week of stakeholder time

Stack

Python, R, SQL, Statistical Analysis, Regression, Hypothesis Testing, Data Pipelines

Data Science · Buffalo, NY

Machinery Monitoring Systems LLC

Data Scientist

Shipped a Python AI service end to end, from EDA through validation and deployment. Built model evaluation infrastructure tracking accuracy, latency, drift, and exception rate across iterative retraining cycles.

Aug 2025 to Dec 2025 · Buffalo, NY

★ Impact

99.98% production accuracy · Automated manual triage costing ~4 hrs/day · 3 retraining cycles sustaining 99%+ accuracy across releases

Stack

Python, FastAPI, Docker, Model Evaluation, MLOps, Drift Monitoring

ML Engineer · India

Rucha Yantra LLP

Machine Learning Engineer

Designed and trained classification models on multi-sensor industrial telemetry. Built Python and SQL backend pipelines on AWS processing 10K+ daily records across multiple device classes.

Feb 2023 to Jul 2024 · India

★ Impact

Shipped 3 ML modules from scoping to deployment · Zero rollbacks across 17 months · ~$80K estimated annual customer cost savings

Stack

Python, SQL, AWS, Classification Models, Multi-Sensor Telemetry, Industrial ML

Internship · India

Chandra Engineering

Data Analyst Intern

Trained 3 forecasting models on 12+ months of multi-product sales data using walk-forward backtesting, replacing a manual Excel workflow. Surfaced insights through Tableau dashboards for ops planning.

Sep 2021 to Dec 2022 · India

★ Impact

+20% forecast accuracy · -15% stock-outs · -10% overstocking · $10K annual savings · +5% on-time fulfillment

Stack

Random Forest, XGBoost, TensorFlow, Python, SQL, Tableau, Walk-Forward Backtesting

05 · Education

Education05

MS · STEM · GPA 3.73

State University of New York at Buffalo

MS, Data Science

STEM-designated Data Science program combining rigorous statistical modeling with production ML engineering, quantitative finance, and real-world systems work.

Aug 2024 to Dec 2025 · Buffalo, NY

★ Achievement

GPA 3.73 · 5 open source projects · 11 PRs across Microsoft, Google, Goldman Sachs ecosystems

Focus Areas

Machine Learning, Statistical Modeling, Quantitative Finance, MLOps, AI Evaluation, Data Engineering

B.E. · 🥇 Gold Medalist

Jawaharlal Nehru Engineering College, India

B.E., Electronics & Telecommunication Engineering

Graduated as university topper with the Gold Medal, the highest academic distinction. Built the foundation in algorithms, electronics, systems, and applied mathematics.

Aug 2019 to Jul 2023 · India

🥇 Gold Medalist · 8.41/10 GPA · Electronics & Telecom

Focus Areas

Algorithms, Data Structures, Signal Processing, Communication Systems, Applied Mathematics, Embedded Systems

06 · The Human

About06

I'm a data scientist and ML engineer who builds production-grade ML systems. My work bridges statistical modeling with software engineering, from cointegration-based stat arb engines to quantitative finance pipelines. Passionate about open source, with PRs across repos at Microsoft, Google, and Goldman Sachs (100k+ combined stars).

🥇

Featured Achievement

Bachelor's Gold Medalist

Graduated top of batch with university gold medal in B.E. Electronics & Telecommunication Engineering. A credential earned through consistent excellence, not circumstance.

// Always Supporting