Developer Health Metrics (v2)¶

This repository computes daily “developer health” metrics from:

synced Git facts (git_commits, git_commit_stats, git_pull_requests)
optional work tracking data (Jira issues, GitHub issues/Projects v2, GitLab issues)

Derived time series are written to ClickHouse (preferred) and/or MongoDB (optional) for Grafana.

All timestamps are treated as UTC (ClickHouse stores DateTime in UTC; Mongo stores naive datetimes as UTC by convention).

Important: Jira (and other issue trackers) is not a replacement for pull request data. Work items are used for planning/throughput/WIP metrics, while PR metrics (cycle time, merge frequency, review metrics when available) come from the Git provider sync.

Source Data¶

Git facts must already exist in the backend you point the job at:

git_commits
git_commit_stats
git_pull_requests

Work tracking facts should be synced from provider APIs via python cli.py sync work-items .... See docs/task_trackers.md for configuration. (metrics daily --provider ... still exists as a convenience/backward-compatible path.) --provider auto (default) loads work items from the database only and skips provider API calls.

CI/CD pipeline facts are synced from GitHub/GitLab via the sync cicd job:

python cli.py sync cicd --provider github --db "<DB_CONN>" --auth "$GITHUB_TOKEN" --owner "<org>" --repo "<repo>"
python cli.py sync cicd --provider gitlab --db "<DB_CONN>" --auth "$GITLAB_TOKEN" --gitlab-url "<URL>" --project-id <ID>

Deployments and incident facts are synced via their own jobs:

python cli.py sync deployments --provider github --db "<DB_CONN>" --auth "$GITHUB_TOKEN" --owner "<org>" --repo "<repo>"
python cli.py sync incidents --provider github --db "<DB_CONN>" --auth "$GITHUB_TOKEN" --owner "<org>" --repo "<repo>"
python cli.py sync deployments --provider gitlab --db "<DB_CONN>" --auth "$GITLAB_TOKEN" --gitlab-url "<URL>" --project-id <ID>
python cli.py sync incidents --provider gitlab --db "<DB_CONN>" --auth "$GITLAB_TOKEN" --gitlab-url "<URL>" --project-id <ID>

Note: Jira Ops/Service Desk incidents are planned once project-to-repo or deployment mapping is defined.

Derived Tables / Collections¶

Git / Repo / User¶

repo_metrics_daily
user_metrics_daily
commit_metrics

Work Tracking¶

work_item_metrics_daily (daily aggregates, by provider/team/repo)
work_item_user_metrics_daily (daily aggregates, by provider/user/team)
work_item_cycle_times (per-work-item fact rows for completed items)

Team Well-being (team-level only)¶

team_metrics_daily

Metric Definitions¶

Commit size bucketing¶

total_loc = additions + deletions (summed from git_commit_stats)
bucket:
small: total_loc <= 50
medium: 51..300
large: > 300

PR cycle time¶

For PRs with merged_at on day D:

cycle_time_hours = (merged_at - created_at) / 3600
Distribution fields (per repo/day and user/day):
median_pr_cycle_hours (p50)
pr_cycle_p75_hours
pr_cycle_p90_hours

Large change thresholds¶

Large commits: total_loc > 300
Large PRs: additions + deletions >= LARGE_PR_LOC_THRESHOLD (default: 1000)
Note: PR size facts are best-effort and may be missing depending on the connector.

Daily user metrics (`user_metrics_daily`)¶

Keyed by (repo_id, author_email, day) where author_email falls back to author_name when email is missing.

Commits: counts, LOC added/deleted, distinct files changed (union across the day), large commits, avg commit size
PRs: authored (created that day), merged (merged that day), avg/p50/p75/p90 PR cycle hours (for PRs merged that day)
Collaboration (nullable when not available):
pr_pickup_time_p50_hours: PR created → first interaction (comment/review)
pr_first_review_p50_hours / p90: PR created → first review
pr_review_time_p50_hours: first review → merge
reviews_given, changes_requested_given: counts of review submissions for the day

Null behavior:

if review/comment facts are unavailable, pickup/first-review/review-time fields remain NULL

Daily repo metrics (`repo_metrics_daily`)¶

Keyed by (repo_id, day).

Commits: count, LOC touched, avg size, large commit ratio
PRs: merged count, p50/p75/p90 PR cycle hours
Quality (best-effort): large_pr_ratio, pr_rework_ratio (requires PR size and review facts)
Knowledge:
bus_factor: smallest number of developers accounting for ≥ 50% of code churn
code_ownership_gini: Gini coefficient of code contribution inequality (0..1)

Optional per-commit metrics (`commit_metrics`)¶

Keyed by (repo_id, day, author_email, commit_hash).

Work item normalization + cycle times¶

Work items are normalized into a unified WorkItem abstraction (models/work_items.py).

Best-effort fields:

started_at: first transition into normalized in_progress
completed_at: first transition into normalized done or canceled

Time metrics:

lead_time_hours = completed_at - created_at (when completed)
cycle_time_hours = completed_at - started_at (when started + completed)

Null behavior:

items missing started_at are excluded from cycle-time distributions

Work tracking daily metrics (`work_item_metrics_daily`)¶

Keyed by (day, provider, team_id, work_scope_id).

work_scope_id is provider-native and is used to avoid assuming a repo UUID exists for work items:

Jira: Jira project key (e.g. ABC)
GitHub: repository full name (e.g. owner/repo) when sourced from issues; Projects v2 uses ghprojv2:<org>#<number>
GitLab: project full path (e.g. group/project)
Throughput: items_started, items_completed
Ownership gap: items_completed_unassigned (subset of completed items that had no assignee at completion time)
WIP: wip_count_end_of_day
Cycle/lead time distributions (nullable if no samples): p50/p90
WIP age distributions (nullable if no WIP samples): p50/p90
bug_completed_ratio: completed bugs / total completed
story_points_completed: sum of story points completed (Jira only, when configured)
predictability_score: Completion Rate = items_completed / (items_completed + wip_count_end_of_day)

Work item facts (`work_item_cycle_times`)¶

Keyed by (provider, work_item_id); stored as ClickHouse ReplacingMergeTree by computed_at and as Mongo upserts.

Work item time-in-state (`work_item_state_durations_daily`)¶

Keyed by (day, provider, work_scope_id, team_id, status) and computed from provider status transitions (Jira changelog when available).

Null behavior:

items without status transition history contribute no rows

Unassigned work¶

If a work item has no assignee, daily rollups:

count it under team_id='' (shown as “unassigned” in dashboards)
emit a work_item_user_metrics_daily row with user_identity='unassigned' so you can chart “closed unassigned” work over time

Team well-being (team-level only)¶

Computed from commits (deduplicated by commit hash) and a team mapping.

after_hours_commit_ratio: commits outside business hours on weekdays
weekend_commit_ratio: commits on weekends

Configuration:

BUSINESS_TIMEZONE (default: UTC)
BUSINESS_HOURS_START (default: 9)
BUSINESS_HOURS_END (default: 17)

Storage Targets¶

ClickHouse (tables)¶

Tables are created automatically if missing:

repo_metrics_daily
user_metrics_daily
commit_metrics
team_metrics_daily
work_item_metrics_daily
work_item_user_metrics_daily
work_item_cycle_times

They are MergeTree tables partitioned by toYYYYMM(day) and ordered by the natural keys for Grafana queries.

ClickHouse query notes¶

These derived tables are append-only with a computed_at version. To query the latest value per key, use argMax(<metric>, computed_at) grouped by the key columns. If you need a daily total, do it in two steps (no nested aggregates):

SELECT
  day,
  sum(items_completed_unassigned_latest) AS items_completed_unassigned
FROM
(
  SELECT
    day,
    provider,
    work_scope_id,
    team_id,
    argMax(items_completed_unassigned, computed_at) AS items_completed_unassigned_latest
  FROM work_item_metrics_daily
  WHERE provider = 'jira'
  GROUP BY day, provider, work_scope_id, team_id
)
GROUP BY day
ORDER BY day;

Re-computations are append-only and distinguished by computed_at. To query the latest metrics for a key/day, use argMax(..., computed_at) in ClickHouse.

MongoDB (collections)¶

Collections are created automatically:

repo_metrics_daily
user_metrics_daily
commit_metrics
team_metrics_daily
work_item_metrics_daily
work_item_user_metrics_daily
work_item_cycle_times

Documents use stable compound _id keys and are written via upserts, so recomputation is safe.

SQLite (tables)¶

Tables are created automatically in the same .db file:

repo_metrics_daily
user_metrics_daily
commit_metrics
team_metrics_daily
work_item_metrics_daily
work_item_user_metrics_daily
work_item_cycle_times

Running The Daily Job¶

The job reads source data from the same backend you point it at (ClickHouse or MongoDB), using the synced tables/collections:

git_commits
git_commit_stats
git_pull_requests It also supports SQLite, reading the same tables and writing metrics tables into the same .db file.

Environment variables¶

DATABASE_URI (or DATABASE_URL): ClickHouse, MongoDB, SQLite, or PostgreSQL URI for both reading source data and writing derived metrics.
SECONDARY_DATABASE_URI: Required when running with --sink both to write to a secondary backend.

Examples¶

Compute one day (backend inferred from --db or DATABASE_URI):
python cli.py metrics daily --date 2025-02-01 --db clickhouse://localhost:8123/default
python cli.py metrics daily --date 2025-02-01 --db mongodb://localhost:27017/mergestat
python cli.py metrics daily --date 2025-02-01 --db sqlite:///./mergestat.db
Compute 7-day backfill ending at a date:
python cli.py metrics daily --date 2025-02-01 --backfill 7 --db clickhouse://localhost:8123/default
Filter to one repository:
python cli.py metrics daily --date 2025-02-01 --repo-id <uuid> --db clickhouse://localhost:8123/default
Compute git + work item metrics (requires provider credentials; see docs/task_trackers.md):
python cli.py sync work-items --provider all --date 2025-02-01 --backfill 30 --db clickhouse://localhost:8123/default
python cli.py metrics daily --date 2025-02-01 --backfill 30 --db clickhouse://localhost:8123/default

Dependencies¶

ClickHouse uses clickhouse-connect (already in requirements.txt).
MongoDB uses pymongo (available via the motor dependency in requirements.txt).
SQLite uses sqlalchemy (already in requirements.txt).

IC Metrics & Landscape (v3)¶

We compute canonical Individual Contributor (IC) metrics and "Developer Landscape" maps to visualize patterns in churn, throughput, and work-in-progress.

Note: These metrics are designed for identifying signals and patterns, not for ranking individuals.

Identity Resolution¶

Users are mapped to a canonical identity_id (preferring email) across providers.
Configuration: config/teams.yaml or config/team_mapping.yaml.

IC Metrics (`user_metrics_daily`)¶

Extends the daily user metrics with work tracking and unified throughput signals:

identity_id: Canonical user identity.
loc_touched: Sum of additions + deletions.
delivery_units: prs_merged + work_items_completed.
work_items_active: Number of items in progress/review/blocked at end of day.
cycle_p50_hours: Median cycle time (PRs).

Landscape Maps (`ic_landscape_rolling_30d`)¶

Rolling 30-day metrics normalized by team percentiles (0..1). Stored in ClickHouse table ic_landscape_rolling_30d.

Map 1: Churn vs Throughput¶

X: log(churn_loc_30d) (rolling 30d sum of LOC touched)
Y: delivery_units_30d (rolling 30d sum of delivery units)

Map 2: Cycle Time vs Throughput¶

X: log(cycle_p50_30d_hours) (median of daily median PR cycle times over 30d)
Y: delivery_units_30d

Map 3: WIP vs Throughput¶

X: wip_max_30d (max active work items over 30d)
Y: delivery_units_30d

Each coordinate (x_raw, y_raw) is normalized per team into (x_norm, y_norm) representing the percentile rank within the team.

Code Complexity (`repo_complexity_daily`)¶

Complexity metrics are computed by scanning local git clones at specific historical references using radon.

Cyclomatic Complexity (CC): Measures the number of linearly independent paths through a program's source code.
Metric fields:
cyclomatic_total: Sum of CC for all functions/classes in the repo.
cyclomatic_avg: Mean CC per function.
high_complexity_functions: Count of functions with CC > 15 (configurable).
very_high_complexity_functions: Count of functions with CC > 25.
Backfilling: Uses git checkout (via GitPython) to analyze historical state.

Investment Metrics (`investment_metrics_daily`)¶

Categorizes engineering effort into investment areas (e.g., "New Value", "Security", "Infrastructure") using a rule-based classifier.

Artifacts Classified:
Work Items: Based on labels, components, and title keywords.
Commits (Churn): Based on file path patterns (e.g., infra/ or tests/).
Metric fields:
investment_area: The assigned category.
project_stream: A secondary grouping (e.g., "Project Phoenix").
delivery_units: Story points or count of work items completed.
churn_loc: Sum of additions + deletions associated with the area.
Configuration: config/investment_areas.yaml defines the matching rules and priorities.

Views and interpretations¶

Metrics are intended to be consumed through views that constrain interpretation and provide drill-down.

Views index: user-guide/views-index.md
Investment View: user-guide/investment-view.md
Work Graph: user-guide/work-graph.md
Computations index: computations/index.md

Developer Health Metrics (v2)¶

Source Data¶

Derived Tables / Collections¶

Git / Repo / User¶

Work Tracking¶

Team Well-being (team-level only)¶

Metric Definitions¶

Commit size bucketing¶

PR cycle time¶

Large change thresholds¶

Daily user metrics (user_metrics_daily)¶

Daily repo metrics (repo_metrics_daily)¶

Optional per-commit metrics (commit_metrics)¶

Work item normalization + cycle times¶

Work tracking daily metrics (work_item_metrics_daily)¶

Work item facts (work_item_cycle_times)¶

Work item time-in-state (work_item_state_durations_daily)¶

Unassigned work¶

Team well-being (team-level only)¶

Storage Targets¶

ClickHouse (tables)¶

ClickHouse query notes¶

MongoDB (collections)¶

SQLite (tables)¶

Running The Daily Job¶

Environment variables¶

Examples¶

Dependencies¶

IC Metrics & Landscape (v3)¶

Identity Resolution¶

IC Metrics (user_metrics_daily)¶

Landscape Maps (ic_landscape_rolling_30d)¶

Map 1: Churn vs Throughput¶

Map 2: Cycle Time vs Throughput¶

Map 3: WIP vs Throughput¶

Code Complexity (repo_complexity_daily)¶

Investment Metrics (investment_metrics_daily)¶

Views and interpretations¶

Daily user metrics (`user_metrics_daily`)¶

Daily repo metrics (`repo_metrics_daily`)¶

Optional per-commit metrics (`commit_metrics`)¶

Work tracking daily metrics (`work_item_metrics_daily`)¶

Work item facts (`work_item_cycle_times`)¶

Work item time-in-state (`work_item_state_durations_daily`)¶

IC Metrics (`user_metrics_daily`)¶

Landscape Maps (`ic_landscape_rolling_30d`)¶

Code Complexity (`repo_complexity_daily`)¶

Investment Metrics (`investment_metrics_daily`)¶