Part 1 — Existing Platforms: What They Measure¶

System Flow (Connect → Sync → Calculate)¶

connect: Github, GitLab, IDEs, CI/CD, SCM, Databases
sync: code events, PRs, reviews, issues, deployments
calculate: metrics, scores, trends, risk indicators, health signals

LinearB — Flow & Team Efficiency¶

Focus: flow, predictability, PR health, resource allocation.

Metrics

Cycle Time
Coding Time
Pickup Time
Review Time
Deploy Time
Deployment Frequency
Lead Time for Changes
Work Breakdown
New work vs rework vs unplanned vs refactor
Investment Profile
Strategic work %
Maintenance %
Unplanned work %
PR Process Health
PR size
Review depth
Review response time
Time-to-approve
Bottlenecks
Review congestion
Idle WIP
Review Silos / Single-reviewer dependencies

GitPrime / Pluralsight Flow — Engineering Behavior¶

Focus: individual contribution patterns and risk.

Metrics

Impact Score (weighted contribution proxy)
Efficiency (merged vs reworked code)
Code Fundamentals
Active days
Commits/day
PRs opened/merged
Lines changed
Rework rate
Churn %
Collaboration
Review load
Review turnaround
Risk Indicators
High-churn files
Ownership hotspots
Health Signals
Sustained intensity
Weekend / off-hours work
Irregular contribution patterns

GitLab Analytics — Value Stream & DevOps¶

Focus: delivery pipeline and DORA.

Metrics

Lead Time for Changes
Value Stream Stages
Issue → Code → Review → Merge → Deploy
Merge Request Analytics
MR size
Approval count
Discussion count
Time to merge
CI/CD
Pipeline duration
Success rate
MTTR
Change failure rate
Contribution Analytics
Security / Vulnerability introduction rate

typeapp.app — Cognitive Load & Well-being¶

Focus: IDE-level behavior and burnout risk.

Metrics

Flow State Duration
Context Switching
File switches
Tab switches
Project switches
Typing Behavior
Error rate
Undo/redo density
Distraction Index
Wellness Signals
Late-night streaks
Burst–burnout cycles
Cognitive Load Index

Part 2 — Unified Metrics Framework¶

1. Delivery & Velocity¶

Coding Time
Review Time
Rework Time
Deploy Time
Deployment Frequency
Throughput (PRs, issues, story points)
Work Composition
Investment Profile (Strategic vs Maintenance)

2. Code Quality & Risk¶

Code Risk Index
Churn
Ownership concentration
Hotspots
Cyclomatic Complexity (radon; snapshots persisted and loaded via store.get_complexity_snapshots())
Rework Rate
PR Quality
Size distribution
Comment density
Rejection rate
Stability
Bugs introduced
Defect escape rate

3. Collaboration & Team Dynamics¶

Review Responsiveness
Reviewer Load
Review Network Graph
Knowledge Distribution
Bus factor
Contribution entropy
Communication Density

4. Developer Health & Cognitive Load¶

Context Switching Score
Focus Time Index
Work Pattern Health
Late-night activity
Weekend work
Editing & Typing Behavior

5. Systemic & Process Health¶

Bottleneck Index
WIP Congestion
Stale Work Detection
Predictability Index

Part 3 — Developer Health Model¶

Four Dimensions

Delivery — speed, throughput, predictability
Durability — quality, stability, long-term risk
Developer Well-being — flow, burnout risk, load
Dynamics — collaboration, knowledge spread

This framework fully covers and extends LinearB, GitPrime, GitLab, and typeapp.app with a single coherent analytics model.

Absolutely — I’ll continue without pause and expand the entire system:

Data model
Events and ingestion
Scoring & formulas
Detailed metric definitions
Architecture
Dashboards & views
Roadmap

This will give you a complete draft developer health platform capable of rivaling LinearB, GitPrime, TypeApp, and GitLab Analytics in one unified product.

🚧 PART 4 — Detailed Data Model¶

Git Events (commits, branches, PRs, reviews)
Issue/Task Events (create, update, close, transitions)
CI/CD Events (pipelines, deploys, failures, restores)
IDE Telemetry Events (typing, context switching, focus time)

🧮 PART 5 — Metric Definitions & Formulas¶

Everything the earlier platforms measure + extended metrics.

1️⃣ DELIVERY METRICS (Flow & Velocity)¶

Cycle Time¶

cycle_time = merged_at - first_commit_timestamp_in_branch

Coding Time¶

coding_time = pr_created_at - first_commit_timestamp

Review Time¶

review_time = first_review_timestamp - pr_created_at

Rework Time¶

rework_time = merged_at - first_approval_timestamp

Deploy Time¶

deploy_time = first_prod_deploy_timestamp - merged_at

Throughput¶

features_completed = count(tasks.type=="feature")
bugs_fixed = count(tasks.type=="bug")
prs_merged = count(prs where state="merged")

Work Mix¶

rework_percent = churn_loc_last_30_days / total_loc_last_30_days
new_work_percent = new_features_loc / total_loc
bugfix_percent = bugfix_loc / total_loc
refactor_percent = refactor_loc / total_loc

DORA Metrics¶

Deployment frequency = deploy_count / time_period
Lead time for changes = average(coding_time + review_time + deploy_time)
MTTR = mean(time_from_incident_to_restore)
Change failure rate = failed_deploys / total_deploys

2️⃣ CODE QUALITY & RISK METRICS¶

Churn¶

commit_churn = (loc_added + loc_deleted)
rework_rate = churn_in_30_days / total_loc_touched

Hotspot Risk¶

A weighted model:

hotspot_score = file_churn * number_of_contributors * commit_frequency

PR Size¶

pr_size = lines_added + lines_deleted

Review Quality¶

comments_per_review = total_review_comments / number_of_reviews
review_coverage = files_reviewed / files_changed

3️⃣ COLLABORATION & TEAM DYNAMICS¶

Review Responsiveness¶

time_to_first_review = min(review.submitted_at) - pr_created_at

Collaboration Network¶

Track edges: author → reviewer

Metrics:

reciprocity
centrality
isolated contributors
bottleneck reviewers

Knowledge Distribution (Bus Factor)¶

ownership_score(file) = commits_by_user / total_commits_on_file
bus_factor = number_of_files_with_ownership>0.75

4️⃣ DEVELOPER HEALTH & COGNITIVE LOAD¶

Flow Score (0–100)¶

Uses editor telemetry:

focus_blocks = sequences of 10+ min uninterrupted editing
context_switch_penalty = file_switches + tab_switches
flow_score = focus_blocks * 10 - context_switch_penalty * 2

Cognitive Load¶

load_index = (avg_time_between_edits + tab_switch_rate + undo_density)

Burnout Indicators¶

late_night_activity = activity between 12am–5am
weekend_activity = sat/sun commits
burst_cycles = # of commit storms within <2h
risk_score = weighted_sum(late_night, weekend, bursts)

5️⃣ SYSTEM HEALTH METRICS¶

WIP Congestion¶

stale_prs = prs open > X days
queued_reviews = prs waiting for review > Y hours
build_queue_length = count(pipelines waiting)

Predictability¶

estimate_accuracy = |estimate - actual| / estimate
cycle_time_variance = variance(cycle_time)

📊 PART 7 — Dashboard Drafts¶

1. Org Dashboard¶

Cycle Time
Deployment Frequency
Hotspot files
Bottleneck teams
Investment Profile
Predictability Index

2. Team Dashboard¶

PR review health
Collaboration graph
Throughput
Roadmap completion forecast
Team burnout risk

3. Developer Dashboard¶

Flow score
Deep work patterns
Impact score
Review contribution
Churn & rework (healthy or high)

4. Repo Dashboard¶

Hotspots
Bus factor
Risky files
Churn trends
Contribution activity

Developer Health Platform – Full Draft Specification¶

1. Metric formulations & weights¶

1.1 Normalization & scoring pattern¶

General pattern (so everything fits into 0–100):

Raw metric → normalize into [0,1] using either
min–max based on historical window, or
percentile within org/team.
Direction: some metrics “lower is better”.
If lower is better: score = 1 - normalized_value
If higher is better: score = normalized_value
Metric Score (0–100): metric_score = score * 100

All metrics below follow: raw formula → normalized score and then roll into dimension scores.

1.2 Delivery metrics (Flow & DORA)¶

1.2.1 Cycle time¶

Per PR:

cycle_time(pr) = merged_at - first_commit_time_on_branch(pr)

Normalized (lower = better):

ct_norm = clip( (cycle_time - CT_min) / (CT_max - CT_min), 0, 1 )
ct_score = (1 - ct_norm) * 100

1.2.2 Coding / Review / Deploy times¶

For each PR:

coding_time  = pr_created_at - first_commit_time
review_time  = time_of_first_review - pr_created_at
rework_time  = merged_at - time_of_first_approval
deploy_time  = first_prod_deploy_at - merged_at

Flow Balance score (detect one-stage dominance):

flow_balance_ratio = max(coding_time, review_time, rework_time, deploy_time) / cycle_time
# If one stage > 60% of total, that’s a bottleneck
balance_norm = clip( (flow_balance_ratio - 0.25) / (0.75 - 0.25), 0, 1 )
flow_balance_score = (1 - balance_norm) * 100

1.2.3 Deployment frequency¶

For a team in a period T (e.g., 14 days):

deploy_freq = number_of_prod_deploys / T_days

Normalize vs org history:

df_norm = clip( (deploy_freq - DF_min) / (DF_max - DF_min), 0, 1 )
deploy_freq_score = df_norm * 100

1.2.4 DORA metrics¶

Lead time for changes = average cycle_time for prod-bound PRs → same pattern as cycle time.
MTTR (Mean Time To Restore):

MTTR = avg(incident_resolved_at - incident_detected_at)
mttr_norm, mttr_score = lower-is-better normalization

Change failure rate:

change_failure_rate = failed_deploys / total_deploys
cfr_norm = clip( (change_failure_rate - CFR_min) / (CFR_max - CFR_min), 0, 1 )
cfr_score = (1 - cfr_norm) * 100

1.2.5 Delivery Dimension Score¶

For a team or org:

Cycle Time score (CT)
Deploy Frequency score (DF)
Lead Time score (LT)
MTTR score (MT)
Change Failure Rate score (CF)

Weights (example):

DeliveryScore = 0.30*CT + 0.20*DF + 0.20*LT + 0.15*MT + 0.15*CF

1.3 Code quality & risk metrics¶

1.3.1 Churn & rework¶

Per file over window W (e.g., 30 days):

loc_touched = Σ(|loc_added| + |loc_deleted|)
loc_reworked_soon = Σ(loc_modified_within_30d_of_being_added)

rework_rate = loc_reworked_soon / loc_touched

Normalize (lower is better):

rw_norm = clip( (rework_rate - RW_min) / (RW_max - RW_min), 0, 1 )
rework_score = (1 - rw_norm) * 100

1.3.2 File hotspot score¶

For each file f:

churn_f        = loc_touched_in_W
contributors_f = number_of_distinct_authors_in_W
commit_freq_f  = commits_touching_file_in_W / days_in_W

hotspot_raw = α*log(1 + churn_f) + β*contributors_f + γ*commit_freq_f
# α,β,γ ~ 0.4, 0.3, 0.3 initially

Normalize across files:

hs_norm = (hotspot_raw - HS_min) / (HS_max - HS_min)
hotspot_score_file = hs_norm * 100

Team-level Code Risk score can be e.g. 80th percentile of hotspot scores in that team’s modules, inverted:

risk_raw = P80(hotspot_score_file_for_team)
risk_score = 100 - risk_raw

Ownership concentration (for hotspot drivers) is derived from git blame data:

ownership_concentration = max(lines_by_author) / total_lines

Synthetic fixtures include an expanded file set to improve blame-driven ownership coverage. Blame-only sync is available via cli.py sync blame --provider <local|github|gitlab>.

1.3.3 PR Size¶

pr_size = loc_added + loc_deleted

Normalize with a sweet spot (e.g. 20–400 LOC):

if pr_size < target_min:
    size_score = 40 + 60*(pr_size / target_min)   # Very tiny PRs not ideal
elif pr_size <= target_max:
    size_score = 100
else:
    overflow = min(pr_size - target_max, cap)
    size_score = max(40, 100 - overflow / scale) # penalize very large ones

1.3.4 Review depth & coverage¶

review_comments_per_pr = total_comments_on_pr / 1
review_coverage = reviewed_files_count / files_changed

Combined PR Review Health score:

depth_score = sigmoid( a * (review_comments_per_pr - target_comments) )
coverage_score = review_coverage * 100

review_health_score = 0.4*size_score + 0.3*depth_score + 0.3*coverage_score

Where sigmoid(x) = 100 / (1 + exp(-x)) scaled to [0,100].

1.3.5 Quality & Durability Dimension Score¶

Metrics:

Rework score
Code risk score
Review health score
Defect introduction rate score

Weights:

DurabilityScore = 0.35*ReworkScore + 0.30*CodeRiskScore
                  + 0.20*ReviewHealthScore + 0.15*DefectRateScore

1.4 Collaboration & team dynamics metrics¶

1.4.1 Review responsiveness¶

time_to_first_review = avg( first_review_timestamp - pr_created_at )

# lower better
resp_norm = (time_to_first_review - RESP_min) / (RESP_max - RESP_min)
ReviewResponsivenessScore = (1 - resp_norm) * 100

1.4.2 Review load & reciprocity¶

For each user u in window W:

reviews_given_u  = count(reviews where reviewer_id = u)
reviews_received_u = count(prs authored_by_u that had reviews)
review_balance_u = reviews_given_u / (reviews_received_u + 1)

Team reciprocity via dispersion:

team_balance = variance(review_balance_u across team)
ReciprocityScore = 100 - normalized_variance(team_balance)

1.4.3 Knowledge distribution / Bus factor¶

Bus Factor (Truck Factor): The smallest number of developers that account for >= 50% of the total code churn in the window.
Code Ownership Gini: Gini coefficient of code contribution (churn) distribution. 0.0 = perfect equality, 1.0 = perfect inequality.

bus_factor = number_of_devs_contributing_50_percent_churn
gini = (2 * sum(i * y_i) / (n * sum(y_i))) - (n + 1) / n

1.4.4 Dynamics Dimension Score¶

Use:

ReviewResponsivenessScore
ReciprocityScore
BusFactorScore
Cross-team-review score (optional)

DynamicsScore = 0.30*ReviewResponsivenessScore
              + 0.25*ReciprocityScore
              + 0.30*BusFactorScore
              + 0.15*CrossTeamReviewScore

1.5 Developer well-being & cognitive load metrics¶

1.5.1 Flow Score¶

From editor events:

focus_block = a continuous period ≥ 10 min
              with no context_switch events (tab/file/project) and
              active edits every ≤ 2 min

num_blocks   = count(focus_block) in day
avg_block_len = avg(duration of focus_block)
context_switches = count(context_switch events per hour)
interruptions = count(non-editor-window-focus events per hour)

flow_raw = w1*num_blocks + w2*avg_block_len - w3*context_switches - w4*interruptions

Normalize:

flow_norm = (flow_raw - FLOW_min) / (FLOW_max - FLOW_min)
FlowScore = clip(flow_norm, 0, 1) * 100

Start with w1=2, w2=0.1, w3=1, w4=1 as tunables.

1.5.2 Cognitive load index¶

Signals:

avg time between edits in same file
undo/redo density
error bursts (rapid changes + reverts)

Example:

avg_edit_gap = avg(time_between_edits_same_file)
undo_density = total_undo_ops / total_edits
switch_density = tab_switches / hour

load_raw = a1*avg_edit_gap + a2*undo_density + a3*switch_density
LoadScore = (1 - normalized(load_raw)) * 100

Higher score = better (manageable load).

1.5.3 Burnout risk score¶

Signals in last 14–30 days:

late_night_ratio = commits_or_edits_between_00_05 / total_commits_or_edits
weekend_ratio    = weekend_commits / total_commits
streak_length    = longest_consecutive_days_with_activity
burst_index      = fraction_of_commits_in_top_20% busiest_hours

Combine:

burnout_raw = b1*late_night_ratio + b2*weekend_ratio + b3*streak_length_norm + b4*burst_index
BurnoutRiskScore = (1 - normalized(burnout_raw)) * 100

(High score = low risk.)

1.5.4 Well-being Dimension Score¶

Combine:

WellBeingScore = 0.40*FlowScore + 0.25*LoadScore + 0.35*BurnoutRiskScore

1.6 System & process health metrics¶

1.6.1 Bottleneck index¶

For each stage s in {coding, review, qa, deploy}:

stage_time_s = avg(time_spent_in_stage_s)
stage_fraction_s = stage_time_s / total_cycle_time
bottleneck_raw = max(stage_fraction_s)
BottleneckScore = (1 - normalized(bottleneck_raw)) * 100

1.6.2 WIP congestion¶

stale_prs_ratio = stale_prs / total_open_prs
queue_length    = queued_reviews / team_size
build_queue     = queued_builds / historical_median

congestion_raw = c1*stale_prs_ratio + c2*queue_length + c3*build_queue
CongestionScore = (1 - normalized(congestion_raw)) * 100

1.6.3 Predictability¶

Defined as Completion Rate (how well the team clears its plate).

predictability_score = items_completed / (items_completed + wip_count_end_of_day)

1.6.4 System Health Dimension Score¶

SystemHealthScore = 0.40*BottleneckScore
                  + 0.30*CongestionScore
                  + 0.30*PredictabilityScore

2. Healthy vs unhealthy thresholds (initial draft)¶

These are initial org-agnostic defaults; tune them to your context.

Delivery¶

Cycle time (PR → deploy)
Excellent: < 24h
Healthy: 24–72h
At risk: 3–7 days
Unhealthy: > 7 days
Time to first review
Excellent: < 2h
Healthy: 2–8h
At risk: 8–24h
Unhealthy: > 24h
Deployment frequency
Excellent: multiple times per day
Healthy: daily–few times/week
At risk: < 1/week
Unhealthy: < 1/month
Change failure rate
Excellent: < 10%
Healthy: 10–20%
At risk: 20–30%
Unhealthy: > 30%

Code Quality & Risk¶

Rework rate (loc re-touched within 30 days)
Excellent: < 10%
Healthy: 10–20%
At risk: 20–35%
Unhealthy: > 35%
Average PR size (median)
Healthy: 50–300 LOC
At risk: 300–600 LOC
Unhealthy: > 600 LOC regularly
Hotspot concentration (files with high risk)
Healthy: < 5% of files
At risk: 5–15%
Unhealthy: > 15%

Dynamics & Collaboration¶

Bus factor (single-owner files >75%)
Healthy: < 25% of team files
At risk: 25–50%
Unhealthy: > 50%
Review reciprocity
Healthy: most devs in [.5, 2] given/received ratio
Unhealthy: many devs only receiving or only giving

Well-being¶

Late-night ratio
Healthy: < 5%
At risk: 5–15%
Unhealthy: > 15%
Weekend ratio
Healthy: < 5–10%
At risk: 10–25%
Unhealthy: > 25%
Flow time (uninterrupted focus per day)
Excellent: ≥ 2–3 hours
Healthy: 1–2 hours
At risk: < 1 hour
Unhealthy: mostly fragmented 10–15 min blocks

3. Org-wide “Developer Health Score”¶

Use the 4D model:

DeliveryScore
DurabilityScore
WellBeingScore
DynamicsScore

Optionally add SystemHealthScore as a 5th dimension.

3.1 Dimension aggregation¶

Example weights:

DH_Delivery   = DeliveryScore
DH_Durability = DurabilityScore
DH_WellBeing  = WellBeingScore
DH_Dynamics   = DynamicsScore
DH_System     = SystemHealthScore

DeveloperHealthScore = 0.25*DH_Delivery
                      + 0.25*DH_Durability
                      + 0.20*DH_WellBeing
                      + 0.20*DH_Dynamics
                      + 0.10*DH_System

Compute per org, per team, per repo, per individual (with some metric substitutions).

7. Predictive AI models¶

7.1 Delivery risk prediction (PR-level)¶

Goal: Predict whether a PR will be “slow” or “problematic” at creation time.

Label: slow_pr = cycle_time > org_p75_cycle_time (binary)
Features: PR size, file count, hotspot involvement, author historical cycle time, team WIP, time-of-day/day-of-week, reviewer count
Model: Gradient boosting (XGBoost/LightGBM) or logistic regression
Output: P(slow_pr) + suggested mitigations (split PR, add reviewer, etc.)

7.2 Burnout risk prediction (user-level)¶

Goal: Predict burnout risk for each dev next 2–4 weeks.

Label: Proxy: future spike in late-night/weekend + drop in FlowScore (or HR flag if integrated)
Features: late-night ratio, weekend ratio, FlowScore mean/variance, streak length, review load, context switching, team stress/incident load
Model: time-series classification or GBM on rolling window features
Output: BurnoutRiskProbability 0–1 → 0–100 display

7.3 Expected cycle time / delivery date¶

Goal: Predict cycle time for a new PR or task.

Label: numeric cycle time
Features: 7.1 + repo/component + team throughput context
Model: regression (GBM, random forest, or linear w/ interactions)
Output: predicted cycle time + confidence interval

7.4 Hotspot evolution¶

Goal: Predict which files will become high-risk hotspots soon.

Label: future_hotspot = hotspot_score_file > threshold in next W
Features: churn, entropy, commit frequency, bug density, ownership volatility
Model: GBM / logistic regression
Output: “emerging risky modules” highlights

8. Dashboard mockups (textual Figma)¶

8.1 Org Dashboard (“Executive View”)¶

Top bar

Time range selector (Last 7 / 30 / 90 days)
Org dropdown
Overall Developer Health Score pill (e.g. 82/100) + sparkline

Row 1: 4 dimension cards

Delivery: score + median cycle time, deploy frequency, change failure rate
Durability: score + rework %, hotspot count, defect rate
Well-being: score + flow hrs/dev/day, burnout risk index
Dynamics: score + review responsiveness, bus factor

Row 2: Charts

Cycle Time Breakdown (stacked: coding, review, deploy) over time
Deploy Frequency vs Change Failure Rate (dual-axis)

Row 3: Tables

Teams ranked by Developer Health Score
Top 10 hotspot repos/services

8.2 Team Dashboard (“Eng Lead View”)¶

Header

Team name + date range + Team Health Score

Row 1: KPIs

Delivery: cycle time, time to first review, deploys/week
Quality: rework %, defect rate, top hotspots
Well-being: flow hrs/dev/day, late-night %, burnout risk
Dynamics: review delay, bus factor, review balance

Row 2: Charts

PR Flow Timeline (scatter: PRs by age vs cycle time, colored by size)
Team Review Network (graph: nodes=devs, edges=reviews)

Row 3: People table

Dev, Impact proxy, FlowScore, BurnoutRisk, ReviewLoad, Churn%
Flags: high review load, high late-night work, isolation

8.3 Developer Dashboard (“Personal View”)¶

Header

Dev name + “Your Health this month: 78/100”

Row 1: Summary

Delivery vs team, Quality vs team, Flow, Burnout risk

Row 2: Charts

Flow over time
Work mix (new vs bugfix vs refactor)

Row 3: Suggestions

Auto-generated coaching insights (split PRs, protect focus blocks, reduce late-night pattern, etc.)

8.4 Repo / Service Dashboard¶

Header

Repo name + risk indicator

Row 1: Risk & Quality

Risk gauge
Hotspot file count
Defects tied to repo

Row 2: Hotspots table

file path, risk, churn, contributors, linked bugs

Row 3: Ownership & Bus factor

contributor distribution
single-owner file visualization