GraphQL Analytics API¶
The GraphQL analytics API provides a read-only query interface for dev-health-ops analytics data. All queries compile allowlisted primitives into parameterized SQL — no arbitrary queries are permitted.
Primary server code¶
src/dev_health_ops/api/graphql/app.pysrc/dev_health_ops/api/graphql/schema.pysrc/dev_health_ops/api/graphql/resolvers/analytics.pysrc/dev_health_ops/api/graphql/sql/compiler.py
What it provides¶
- Breakdowns: grouped aggregations (for treemaps, tables)
- Timeseries: bucketed metrics (for area/line charts)
- Sankey: node/edge flows (for investment flows)
Key design points¶
- Queries compile to SQL via
src/dev_health_ops/api/graphql/sql/*and execute against the analytics store. - Cost limits and validation are enforced in
src/dev_health_ops/api/graphql/cost.py. Detailed security controls, depth/alias limits, and authorization configurations are described in the GraphQL Security Posture guide. - Caching and persisted queries are supported via
src/dev_health_ops/api/graphql/persisted.pyandsrc/dev_health_ops/api/graphql/persisted_queries.json.
Endpoint¶
POST /graphql
Authentication¶
All queries require org_id as a query parameter. This scopes all data access to the specified organization.
Schema Overview¶
Query.catalog¶
Fetch available dimensions, measures, and cost limits.
query Catalog($orgId: String!, $dimension: DimensionInput) {
catalog(orgId: $orgId, dimension: $dimension) {
dimensions { name description }
measures { name description }
limits {
maxDays
maxBuckets
maxTopN
maxSankeyNodes
maxSankeyEdges
maxSubRequests
}
values { value count } # Only when dimension is specified
}
}
Query.analytics¶
Execute batch analytics queries including timeseries, breakdowns, and Sankey flows.
query Analytics($orgId: String!, $batch: AnalyticsRequestInput!) {
analytics(orgId: $orgId, batch: $batch) {
timeseries {
dimension
dimensionValue
measure
buckets { date value }
}
breakdowns {
dimension
measure
items { key value }
}
sankey {
nodes { id label dimension value }
edges { source target value }
}
}
}
Query.productTelemetryDashboard¶
Fetch first-party product usage dashboard aggregates from the persisted ClickHouse product_telemetry_events table. This query is read-only, org-scoped, and uses the same half-open date range contract as the product telemetry dashboard UI: startDate is inclusive and endDate is exclusive.
query ProductTelemetryDashboard($orgId: String!, $input: ProductTelemetryDashboardInput!) {
productTelemetryDashboard(orgId: $orgId, input: $input) {
dailyActiveUsers { day activeAnonymousUsers }
topRoutes { routePattern events sessions anonymousUsers }
featureViews { feature surface views anonymousUsers }
filterChanges { view filterKey changes avgValueCount }
chartInteractions { chart action surface interactions sessions }
clientErrors { routePattern boundary errorClass errors affectedAnonymousUsers }
sessionSummary {
p50DurationMs
p75DurationMs
p90DurationMs
p95DurationMs
avgPagesViewed
avgInteractions
}
}
}
The resolver derives the product telemetry org_id_hash from the authenticated organization before querying ClickHouse. Callers still provide the raw orgId variable used by the GraphQL authorization layer; raw organization IDs are not read from product_telemetry_events.
Example Queries¶
1. Catalog Values for TEAM and REPO¶
# Fetch available teams
query TeamValues {
catalog(orgId: "my-org", dimension: TEAM) {
values { value count }
}
}
# Fetch available repos
query RepoValues {
catalog(orgId: "my-org", dimension: REPO) {
values { value count }
}
}
2. Analytics with Timeseries, Breakdown, and Sankey¶
query FullAnalytics {
analytics(
orgId: "my-org"
batch: {
timeseries: [{
dimension: THEME
measure: COUNT
interval: WEEK
dateRange: {
startDate: "2025-01-01"
endDate: "2025-01-31"
}
}]
breakdowns: [{
dimension: REPO
measure: CHURN_LOC
dateRange: {
startDate: "2025-01-01"
endDate: "2025-01-31"
}
topN: 10
}]
sankey: {
path: [WORK_TYPE, REPO, TEAM]
measure: COUNT
dateRange: {
startDate: "2025-01-01"
endDate: "2025-01-31"
}
maxNodes: 50
maxEdges: 200
}
}
) {
timeseries {
dimension
dimensionValue
buckets { date value }
}
breakdowns {
dimension
items { key value }
}
sankey {
nodes { id label value }
edges { source target value }
}
}
}
Curl example (Sankey coverage)¶
curl -s -X POST "http://localhost:8000/graphql?org_id=default" \
-H "Content-Type: application/json" \
-H "X-Org-Id: default" \
-d '{
"query": "query CoverageSankey($orgId: String!, $batch: AnalyticsRequestInput!) { analytics(orgId: $orgId, batch: $batch) { sankey { coverage { teamCoverage repoCoverage } nodes { id label dimension value } edges { source target value } } } }",
"variables": {
"orgId": "default",
"batch": {
"useInvestment": true,
"timeseries": [],
"breakdowns": [],
"sankey": {
"path": ["WORK_TYPE", "REPO", "TEAM"],
"measure": "COUNT",
"dateRange": { "startDate": "2026-01-01", "endDate": "2026-01-31" }
}
}
}
}'
Curl example (Investment breakdowns via dev-health-web)¶
curl -s -X POST "http://localhost:3000/graphql?org_id=default" \
-H "Content-Type: application/json" \
-H "X-Org-Id: default" \
-d '{
"query": "query InvestmentBreakdown($orgId: String!, $batch: AnalyticsRequestInput!) { analytics(orgId: $orgId, batch: $batch) { breakdowns { dimension measure items { key value } } } }",
"variables": {
"orgId": "default",
"batch": {
"useInvestment": true,
"breakdowns": [
{ "dimension": "THEME", "measure": "COUNT", "dateRange": { "startDate": "2026-01-06", "endDate": "2026-01-20" }, "topN": 50 },
{ "dimension": "SUBCATEGORY", "measure": "COUNT", "dateRange": { "startDate": "2026-01-06", "endDate": "2026-01-20" }, "topN": 100 }
],
"filters": { "scope": { "level": "ORG", "ids": [] } }
}
}
}'
Available Dimensions¶
| Dimension | Description |
|---|---|
TEAM |
Team identifier |
REPO |
Repository identifier |
AUTHOR |
Author/contributor identifier |
WORK_TYPE |
Type of work item (issue, PR, etc.) |
THEME |
Investment theme category |
SUBCATEGORY |
Investment subcategory |
Available Measures¶
| Measure | Description |
|---|---|
COUNT |
Count of work units |
CHURN_LOC |
Lines of code changed |
CYCLE_TIME_HOURS |
Average cycle time in hours |
THROUGHPUT |
Distinct work units completed |
Bucket Intervals¶
For timeseries queries: DAY, WEEK, MONTH
Cost Limits¶
All requests are subject to cost limits to protect database performance:
| Limit | Default | Description |
|---|---|---|
maxDays |
365 | Maximum date range in days |
maxBuckets |
100 | Maximum timeseries buckets |
maxTopN |
50 | Maximum breakdown items |
maxSankeyNodes |
100 | Maximum Sankey nodes |
maxSankeyEdges |
500 | Maximum Sankey edges |
maxSubRequests |
10 | Maximum queries in a batch |
Requests exceeding limits return a COST_LIMIT_EXCEEDED error:
{
"errors": [{
"message": "Date range of 400 days exceeds limit of 365",
"extensions": {
"code": "COST_LIMIT_EXCEEDED",
"limit_name": "max_days",
"limit_value": 365,
"requested_value": 400
}
}]
}
Error Codes¶
| Code | Description |
|---|---|
COST_LIMIT_EXCEEDED |
Request exceeds a cost limit |
VALIDATION_ERROR |
Invalid dimension, measure, or input |
AUTHORIZATION_ERROR |
Missing or invalid org_id |
PERSISTED_QUERY_ERROR |
Unknown query ID or version mismatch |
QUERY_TIMEOUT |
Query exceeded timeout limit |
Persisted Queries¶
For performance, you can use persisted query IDs instead of sending the full query text:
curl -X POST /graphql \
-H "X-Persisted-Query-Id: catalog-dimensions" \
-H "Content-Type: application/json" \
-d '{"variables": {"orgId": "my-org"}}'
Available persisted queries:
catalog-dimensions-- Fetch catalog with dimensions and measuresteam-values-- Fetch distinct team valuesrepo-values-- Fetch distinct repo values
See also: Persisted Queries.
Security Notes¶
- No arbitrary SQL: All queries are compiled from allowlisted primitives
- org_id enforcement: Required on all queries, enforced in all SQL WHERE clauses
- Parameterized SQL: All values are parameterized, preventing SQL injection
- Query timeouts: Default 30-second timeout on all database queries
Web client¶
See: Web GraphQL Client.
Resolver SQL Validation Contract (CHAOS-1752)¶
Every GraphQL resolver that emits ClickHouse SQL must register a fixture
in tests/api/graphql/sql_explain_fixtures.py. The fixture exercises each
SQL-emitting helper with representative arguments; tests/api/graphql/
test_resolver_sql_explain.py replays the captured queries through
EXPLAIN PLAN against a real ClickHouse with production DDL applied.
EXPLAIN PLAN validates:
- SQL parse
- Table existence
- Column existence (catches mistyped column references)
- Function signature resolution
- Aggregation legality (catches alias-collision bugs like CHAOS-1751 #2)
- Parameter binding shape
It does not execute the data path, so the full suite runs in <30s.
Adding coverage for a new resolver¶
- Implement the resolver. Keep SQL-emitting work in helper functions or compile functions that accept a sink/client — not embedded in HTTP handlers.
- Add a fixture entry to
ALL_RESOLVER_SQL_FIXTURES:
async def _fixture_<name>(sink: CapturingSink) -> None:
context = FakeGraphQLContext(client=sink, org_id=SAMPLE_ORG_ID)
await my_resolver_helper(context, ...representative args...)
# Cover every WHERE/JOIN branch the helper can produce.
ALL_RESOLVER_SQL_FIXTURES = [
...,
("<name>", _fixture_<name>),
]
- Run the test locally with
CLICKHOUSE_URIpointed at a fresh ClickHouse container:pytest tests/api/graphql/test_resolver_sql_explain.py.
If EXPLAIN PLAN rejects a query, the resolver has a latent bug that
would surface as a 500 in production. Fix it before merge; do not silence
the test.