live Decision-Support Systems

Employee Communication Simulator

A simulation platform for testing how high-stakes leadership messages land across a diverse organization. Weighted persona modeling, Monte Carlo scenario analysis, multi-provider LLM integration, and executive-ready reporting in PDF and PowerPoint.

Python React 19 Multi-LLM Monte Carlo Executive Reporting Simulation Engine Decision Support

"Decision-support systems are not just about modeling the right answer. They are about reliability, recoverability, interpretability, and packaging the output in a form leaders can actually use."

01 The Problem

Leadership communications often fail not because the message is empty, but because different audiences infer different intent, fairness, and personal risk from the same text. A message that sounds reassuring to a senior leader might feel threatening to a mid-level contractor in a different geography.

I wanted to build a tool that could simulate those differences before a message was sent, then package the output into something executives could act on. Not just raw data. Framed analysis with clear scenario ranges.

02 What I Built

60

Weighted Personas

450

Represented Org

3

Simulation Modes

4

Output Surfaces

What started as a lightweight script for testing employee reactions evolved into a multi-component product: a weighted simulation engine, a multi-provider LLM layer, a Monte Carlo scenario model, a React dashboard, durable async job orchestration, saved-run retrieval, and automated executive reporting in PDF and PPTX formats.

The system models 60 weighted personas representing a 450-person IT organization — spanning employees vs. contractors, US vs. India, leaders vs. ICs, and principal architects. Three simulation modes (heuristic, LLM-backed, Monte Carlo) give different tradeoffs between speed, cost, and depth.

03 Architecture & Stack

The stack was deliberately lightweight where possible and specialized only where the problem demanded it. The system relies heavily on plain Python with filesystem-based persistence, React for the interactive surface, and a provider abstraction layer across four LLM backends.

System Flow React Dashboard › Run setup & filters › Persona explorer › Monte Carlo viewer › Saved run browser › Job status polling › Refresh recovery › PDF / PPTX reports › Run reload Python API Server HTTP endpoints · threads · async dispatch · static serving Background Job Layer Queued Running Checkpointed Interrupted / recovered Simulation Engine 60 weighted personas · heuristic + stochastic scoring Monte Carlo scenario ranges · checkpointed outputs LLM Transport retries · schema validation · cost accounting · provider abstraction Groq DeepSeek OpenAI Ollama Heuristic Mode (offline) Saved Artifacts JSON MD PDF PPTX · run retrieval · reload
Backend Python standard library HTTP server + threads — APIs, async jobs, static serving, run retrieval
Frontend React 19 + Vite 8 — run setup, polling, visual summaries, Monte Carlo exploration, persona browser
Simulation Weighted persona modeling — heuristic scoring, stochastic adjustments, Monte Carlo ranges, checkpointed outputs
AI Layer Groq, DeepSeek, OpenAI, Ollama — shared transport with retries, schema validation, usage/cost accounting
Persistence Filesystem-first — JSON/MD run outputs, job state and logs, selector JSONs, local provider config
Reporting Python → Markdown/HTML, headless PDF rendering, PptxGenJS for editable slide decks

04 Key Design Decisions

Weighted personas over flat personas. Expanded from 15 generic archetypes to 60 weighted personas representing real organizational proportions. Summary metrics now represent a simulated org, not disconnected characters. That distinction is critical for executive trust.

Kept heuristic mode alongside LLMs. Not every run needs hosted inference. Preserving a fast, low-cost simulation path made the tool practical for iterative message drafting, with LLM mode available when depth mattered more than speed.

Saved outputs as first-class product objects. Run listing, retrieval, and reloading were built into the experience, not treated as throwaway files. Executive reporting operates off saved runs. That is what makes a tool credible as a decision-support system rather than a demo.

Durable job lifecycle over request-response. The most consequential shift was moving from "one request, one response" to long-running jobs with persistent state, checkpoints, and explicit failure modes. Once Monte Carlo runs, LLM calls, and background tasks entered the picture, durability became a product requirement.

05 Challenges & Resolutions

The project surfaced 10 significant engineering challenges. Here are the ones that best demonstrate systems thinking and product judgment:

LLM runs exposed operational reality

At scale, the system moved from ideal prompt paths into provider-specific behavior: rate limits (Groq 429s), schema drift, inconsistent output fidelity. Resolution: backoff logic, transport retries, persona caps for validation, narrower schemas, and lighter default models. This demonstrates that AI systems succeed through operational discipline, not prompts alone.

Dashboard lost state on refresh

The frontend initially treated the dashboard as a page, not a stateful task workspace. Resolution: persisted form state, active job IDs, and request snapshots with reconnect behavior after refresh. This is product empathy: the implementation changed to fit the user's actual workflow, not the developer's happy path.

Statistical output wasn't executive-friendly

Percentile language (p10/p50/p90) was mathematically correct but not intuitive for leaders. Resolution: reframed into "low case / typical case / high case," added explanatory copy, and turned thresholds into direct business questions. This separates useful tools from clever tools.

Long-running jobs needed durable infrastructure

Runs could fail after stepping away, return with no results, or disappear without diagnostic evidence. Resolution: persisted job records, log files, checkpointed outputs after Monte Carlo passes, explicit interrupted states, and recovery from latest checkpoint. Classic systems leadership — design for observability and graceful degradation.

06 What I Learned

Reliability matters as much as intelligence

Good prompts and structured outputs are necessary but not sufficient. Retries, checkpoints, state persistence, and logs are what make AI features dependable.

Analytics need translation

Mathematically correct output is not automatically decision-ready. Leaders need framing, thresholds, and scenario narratives. Not raw statistical jargon.

Adoption is a product design problem

People use tools that match their workflow. Refresh recovery, saved runs, and report packaging mattered because they supported real behavior.

"This project taught me how to move from model-centric thinking to system-centric thinking. The hard part was not generating reactions. It was building the surrounding product so those reactions could be trusted, revisited, explained, and acted on."

07 Why This Matters

This project demonstrates the ability to take an ambiguous problem, build the first version quickly, recognize when the original architecture stops being good enough, and evolve the system into something more resilient, interpretable, and useful. Solo, end-to-end, spanning backend, frontend, AI integration, simulation design, and executive reporting.

It shows systems thinking, product design under real constraints, operational hardening of AI features, and the ability to translate analytics into decisions that leaders can act on.