Work Graph & Investment Materialization¶
Core Contract (Non-Negotiable)¶
These rules are architectural constraints. Violations are regressions.
| Rule | Explanation |
|---|---|
| WorkUnits are evidence containers | Not categories, not classification layers |
| LLM categorizes at compute-time only | Never at query/render time |
| Theme roll-up is deterministic | From subcategories, no LLM involvement |
| UX renders persisted data only | No recomputation of categories/edges/weights |
| LLM explanations are read-only | May not alter persisted decisions |
| Sinks only for persistence | No static files, no export paths |
WorkUnit Definition¶
What a WorkUnit Is¶
- An evidence container
- A unit of aggregation
- An attribution boundary
What a WorkUnit Is NOT¶
- A category
- A theme
- A subcategory
- A visible classification layer
This distinction is subtle but critical. Treating WorkUnits as categories breaks the normalization model.
Investment Taxonomy (Fixed)¶
Themes¶
The five canonical themes (see Investment View):
- Feature Delivery
- Operational / Support
- Maintenance / Tech Debt
- Quality / Reliability
- Risk / Security
Subcategories¶
Fixed set per theme, curated for comparability across organizations.
Rules: - Themes and subcategories are canonical and non-configurable - Provider labels/types (Jira/GitHub/GitLab) are inputs only - Provider-native labels are normalized away
Compute-Time LLM Categorization¶
Schema Compliance¶
LLM output MUST be strict JSON matching:
work_graph/investment/llm_schema.py
Output Requirements¶
| Requirement | Details |
|---|---|
| Keys | Must come from canonical subcategory registry |
| Probabilities | Must be valid (0-1) and normalized |
| Evidence quotes | Must be extractive substrings from provided inputs |
| Sum | Subcategory probs sum to theme probs; theme probs sum to ~1.0 |
Two-Stage Mapping¶
- LLM outputs subcategories only — Maps text to subcategory distribution
- Theme roll-up is deterministic — Non-LLM, computed from subcategories
This prevents category drift and maintains normalization.
Retry Policy¶
- One repair attempt only
- On continued failure: mark invalid, apply deterministic fallback
- Persist audit fields for every categorization run
Audit Fields¶
Always persist:
- categorized_at — Timestamp of categorization
- model_version — LLM model used
- prompt_hash — Hash of prompt template
- raw_response — Original LLM output (for debugging)
- fallback_applied — Boolean if fallback was used
UX-Time LLM Explanation (Allowed)¶
Constraints¶
LLM may generate explanation text only from: - Persisted distributions - Stored evidence
LLM must not: - Recompute categories - Change edges - Modify weights - Alter distributions
Labeling¶
All explanation output must be labeled as AI-generated.
Canonical Explanation Prompt¶
You are explaining a precomputed investment view.
You are not allowed to:
- Recalculate scores
- Change categories
- Introduce new conclusions
- Be conversational (no "Hello", "As an AI", or interactive follow-ups)
Explain the investment view in three distinct sections:
1. **SUMMARY**: Provide a high-level narrative (max 3 sentences) using
probabilistic language (appears, leans, suggests) explaining why
the work leans toward the primary categories.
2. **REASONS**: List the specific evidence (structural, contextual,
textual) that contributed most to this interpretation.
3. **UNCERTAINTY**: Disclose where uncertainty exists based on the
evidence quality and evidence mix.
Always include evidence quality level and limits.
Language Rules¶
Allowed Language¶
- appears
- leans
- suggests
Forbidden Language¶
- is
- was
- detected
- determined
The distinction maintains appropriate uncertainty and avoids false precision.
Persistence Contract¶
Required¶
- All compute outputs persisted via
metrics/sinks/*only
Forbidden¶
- JSON/YAML dumps
- Output file paths
- Debug files under
work_graph/or investment modules
OpenAI Response Handling¶
Production bug fix for blank responses (content_length=0, finish_reason=length).
Implementation Checklist¶
| Item | Requirement |
|---|---|
| JSON mode | Include explicit JSON instruction in system and user message |
| Tokens | Use max_completion_tokens (minimum 512) |
| Retry | Automatic single retry on whitespace, truncated, or invalid JSON |
| Retry strategy | Double tokens and simplify/harden JSON prompt on retry |
| Observability | Log finish_reason, content_length, and token params |
Migration Notes¶
Backend Tasks (if touching Investment code)¶
- Remove any logic treating WorkUnits as categorization layers
- Ensure LLM outputs subcategory distributions only
- Roll subcategories into themes deterministically
- Update schemas and APIs accordingly
Frontend Tasks (if touching Investment UI)¶
- Make theme the default and primary view
- Support drill-down: theme → subcategory → evidence
- Remove any visualization treating WorkUnits as peers to categories
- Ensure charts always roll up by theme unless explicitly drilled