GraphQL Analytics API¶

The GraphQL analytics API provides a read-only query interface for dev-health-ops analytics data. All queries compile allowlisted primitives into parameterized SQL — no arbitrary queries are permitted.

Primary server code¶

src/dev_health_ops/api/graphql/app.py
src/dev_health_ops/api/graphql/schema.py
src/dev_health_ops/api/graphql/resolvers/analytics.py
src/dev_health_ops/api/graphql/sql/compiler.py

What it provides¶

Breakdowns: grouped aggregations (for treemaps, tables)
Timeseries: bucketed metrics (for area/line charts)
Sankey: node/edge flows (for investment flows)

Key design points¶

Queries compile to SQL via src/dev_health_ops/api/graphql/sql/* and execute against the analytics store.
Cost limits and validation are enforced in src/dev_health_ops/api/graphql/cost.py. Detailed security controls, depth/alias limits, and authorization configurations are described in the GraphQL Security Posture guide.
Caching and persisted queries are supported via src/dev_health_ops/api/graphql/persisted.py and src/dev_health_ops/api/graphql/persisted_queries.json.

Endpoint¶

POST /graphql

Authentication¶

All queries require org_id as a query parameter. This scopes all data access to the specified organization.

Schema Overview¶

Query.catalog¶

Fetch available dimensions, measures, and cost limits.

query Catalog($orgId: String!, $dimension: DimensionInput) {
  catalog(orgId: $orgId, dimension: $dimension) {
    dimensions { name description }
    measures { name description }
    limits {
      maxDays
      maxBuckets
      maxTopN
      maxSankeyNodes
      maxSankeyEdges
      maxSubRequests
    }
    values { value count }  # Only when dimension is specified
  }
}

Query.analytics¶

Execute batch analytics queries including timeseries, breakdowns, and Sankey flows.

query Analytics($orgId: String!, $batch: AnalyticsRequestInput!) {
  analytics(orgId: $orgId, batch: $batch) {
    timeseries {
      dimension
      dimensionValue
      measure
      buckets { date value }
    }
    breakdowns {
      dimension
      measure
      items { key value }
    }
    sankey {
      nodes { id label dimension value }
      edges { source target value }
    }
  }
}

Query.productTelemetryDashboard¶

Fetch first-party product usage dashboard aggregates from the persisted ClickHouse product_telemetry_events table. This query is read-only, org-scoped, and uses the same half-open date range contract as the product telemetry dashboard UI: startDate is inclusive and endDate is exclusive.

query ProductTelemetryDashboard($orgId: String!, $input: ProductTelemetryDashboardInput!) {
  productTelemetryDashboard(orgId: $orgId, input: $input) {
    dailyActiveUsers { day activeAnonymousUsers }
    topRoutes { routePattern events sessions anonymousUsers }
    featureViews { feature surface views anonymousUsers }
    filterChanges { view filterKey changes avgValueCount }
    chartInteractions { chart action surface interactions sessions }
    clientErrors { routePattern boundary errorClass errors affectedAnonymousUsers }
    sessionSummary {
      p50DurationMs
      p75DurationMs
      p90DurationMs
      p95DurationMs
      avgPagesViewed
      avgInteractions
    }
  }
}

The resolver derives the product telemetry org_id_hash from the authenticated organization before querying ClickHouse. Callers still provide the raw orgId variable used by the GraphQL authorization layer; raw organization IDs are not read from product_telemetry_events.

Example Queries¶

1. Catalog Values for TEAM and REPO¶

# Fetch available teams
query TeamValues {
  catalog(orgId: "my-org", dimension: TEAM) {
    values { value count }
  }
}

# Fetch available repos
query RepoValues {
  catalog(orgId: "my-org", dimension: REPO) {
    values { value count }
  }
}

2. Analytics with Timeseries, Breakdown, and Sankey¶

query FullAnalytics {
  analytics(
    orgId: "my-org"
    batch: {
      timeseries: [{
        dimension: THEME
        measure: COUNT
        interval: WEEK
        dateRange: {
          startDate: "2025-01-01"
          endDate: "2025-01-31"
        }
      }]
      breakdowns: [{
        dimension: REPO
        measure: CHURN_LOC
        dateRange: {
          startDate: "2025-01-01"
          endDate: "2025-01-31"
        }
        topN: 10
      }]
      sankey: {
        path: [WORK_TYPE, REPO, TEAM]
        measure: COUNT
        dateRange: {
          startDate: "2025-01-01"
          endDate: "2025-01-31"
        }
        maxNodes: 50
        maxEdges: 200
      }
    }
  ) {
    timeseries {
      dimension
      dimensionValue
      buckets { date value }
    }
    breakdowns {
      dimension
      items { key value }
    }
    sankey {
      nodes { id label value }
      edges { source target value }
    }
  }
}

Curl example (Sankey coverage)¶

curl -s -X POST "http://localhost:8000/graphql?org_id=default" \
  -H "Content-Type: application/json" \
  -H "X-Org-Id: default" \
  -d '{
    "query": "query CoverageSankey($orgId: String!, $batch: AnalyticsRequestInput!) { analytics(orgId: $orgId, batch: $batch) { sankey { coverage { teamCoverage repoCoverage } nodes { id label dimension value } edges { source target value } } } }",
    "variables": {
      "orgId": "default",
      "batch": {
        "useInvestment": true,
        "timeseries": [],
        "breakdowns": [],
        "sankey": {
          "path": ["WORK_TYPE", "REPO", "TEAM"],
          "measure": "COUNT",
          "dateRange": { "startDate": "2026-01-01", "endDate": "2026-01-31" }
        }
      }
    }
  }'

Curl example (Investment breakdowns via dev-health-web)¶

curl -s -X POST "http://localhost:3000/graphql?org_id=default" \
  -H "Content-Type: application/json" \
  -H "X-Org-Id: default" \
  -d '{
    "query": "query InvestmentBreakdown($orgId: String!, $batch: AnalyticsRequestInput!) { analytics(orgId: $orgId, batch: $batch) { breakdowns { dimension measure items { key value } } } }",
    "variables": {
      "orgId": "default",
      "batch": {
        "useInvestment": true,
        "breakdowns": [
          { "dimension": "THEME", "measure": "COUNT", "dateRange": { "startDate": "2026-01-06", "endDate": "2026-01-20" }, "topN": 50 },
          { "dimension": "SUBCATEGORY", "measure": "COUNT", "dateRange": { "startDate": "2026-01-06", "endDate": "2026-01-20" }, "topN": 100 }
        ],
        "filters": { "scope": { "level": "ORG", "ids": [] } }
      }
    }
  }'

Available Dimensions¶

Dimension	Description
`TEAM`	Team identifier
`REPO`	Repository identifier
`AUTHOR`	Author/contributor identifier
`WORK_TYPE`	Type of work item (issue, PR, etc.)
`THEME`	Investment theme category
`SUBCATEGORY`	Investment subcategory

Available Measures¶

Measure	Description
`COUNT`	Count of work units
`CHURN_LOC`	Lines of code changed
`CYCLE_TIME_HOURS`	Average cycle time in hours
`THROUGHPUT`	Distinct work units completed

Bucket Intervals¶

For timeseries queries: DAY, WEEK, MONTH

Cost Limits¶

All requests are subject to cost limits to protect database performance:

Limit	Default	Description
`maxDays`	365	Maximum date range in days
`maxBuckets`	100	Maximum timeseries buckets
`maxTopN`	50	Maximum breakdown items
`maxSankeyNodes`	100	Maximum Sankey nodes
`maxSankeyEdges`	500	Maximum Sankey edges
`maxSubRequests`	10	Maximum queries in a batch

Requests exceeding limits return a COST_LIMIT_EXCEEDED error:

{
  "errors": [{
    "message": "Date range of 400 days exceeds limit of 365",
    "extensions": {
      "code": "COST_LIMIT_EXCEEDED",
      "limit_name": "max_days",
      "limit_value": 365,
      "requested_value": 400
    }
  }]
}

Error Codes¶

Code	Description
`COST_LIMIT_EXCEEDED`	Request exceeds a cost limit
`VALIDATION_ERROR`	Invalid dimension, measure, or input
`AUTHORIZATION_ERROR`	Missing or invalid org_id
`PERSISTED_QUERY_ERROR`	Unknown query ID or version mismatch
`QUERY_TIMEOUT`	Query exceeded timeout limit

Persisted Queries¶

For performance, you can use persisted query IDs instead of sending the full query text:

curl -X POST /graphql \
  -H "X-Persisted-Query-Id: catalog-dimensions" \
  -H "Content-Type: application/json" \
  -d '{"variables": {"orgId": "my-org"}}'

Available persisted queries:

catalog-dimensions -- Fetch catalog with dimensions and measures
team-values -- Fetch distinct team values
repo-values -- Fetch distinct repo values

Security Notes¶

No arbitrary SQL: All queries are compiled from allowlisted primitives
org_id enforcement: Required on all queries, enforced in all SQL WHERE clauses
Parameterized SQL: All values are parameterized, preventing SQL injection
Query timeouts: Default 30-second timeout on all database queries

Web client¶

See: Web GraphQL Client.

Resolver SQL Validation Contract (CHAOS-1752)¶

Every GraphQL resolver that emits ClickHouse SQL must register a fixture in tests/api/graphql/sql_explain_fixtures.py. The fixture exercises each SQL-emitting helper with representative arguments; tests/api/graphql/ test_resolver_sql_explain.py replays the captured queries through EXPLAIN PLAN against a real ClickHouse with production DDL applied.

EXPLAIN PLAN validates:

SQL parse
Table existence
Column existence (catches mistyped column references)
Function signature resolution
Aggregation legality (catches alias-collision bugs like CHAOS-1751 #2)
Parameter binding shape

It does not execute the data path, so the full suite runs in <30s.

Adding coverage for a new resolver¶

Implement the resolver. Keep SQL-emitting work in helper functions or compile functions that accept a sink/client — not embedded in HTTP handlers.
Add a fixture entry to ALL_RESOLVER_SQL_FIXTURES:

async def _fixture_<name>(sink: CapturingSink) -> None:
    context = FakeGraphQLContext(client=sink, org_id=SAMPLE_ORG_ID)
    await my_resolver_helper(context, ...representative args...)
    # Cover every WHERE/JOIN branch the helper can produce.

ALL_RESOLVER_SQL_FIXTURES = [
    ...,
    ("<name>", _fixture_<name>),
]

Run the test locally with CLICKHOUSE_URI pointed at a fresh ClickHouse container: pytest tests/api/graphql/test_resolver_sql_explain.py.

If EXPLAIN PLAN rejects a query, the resolver has a latent bug that would surface as a 500 in production. Fix it before merge; do not silence the test.