Observability Tooling Analysis¶
Open-source observability stack evaluation for replacing Sentry SaaS while pre-revenue.
Status: Approved for implementation
This analysis was completed 2026-03-03. Implementation is deferred until prioritized.
Quick Links¶
| Section | Description |
|---|---|
| Current State | What's instrumented today |
| Tool Comparison | Side-by-side matrix of evaluated tools |
| Recommendation | SigNoz + GlitchTip |
| Architecture | Phase 1 (Compose) and Phase 2 (K8s) diagrams |
| Migration Steps | Step-by-step implementation guide |
| Cost Analysis | SaaS vs self-hosted cost comparison |
| Risks | Known risks and mitigations |
Current State¶
Backend (FastAPI/Python)¶
| Layer | Tool | Config |
|---|---|---|
| Errors | sentry-sdk[fastapi,celery]>=2.0.0 |
DSN-based, 10% traces, 0% profiles |
| Tracing | opentelemetry-*>=1.24.0 |
OTLP gRPC to localhost:4317; instruments FastAPI, HTTPX, SQLAlchemy, Celery |
| Metrics | prometheus-fastapi-instrumentator |
/metrics endpoint with custom counters/histograms for Celery, ClickHouse, LLM, GitHub |
| Logs | python-json-logger |
JSON structured logging with correlation IDs via X-Request-ID |
| Alerting | Prometheus rules | alerts/rules.yml — API availability, ClickHouse, Celery, LLM, GitHub rate limits |
Key files:
src/dev_health_ops/sentry.py— Sentry init with FastAPI, Starlette, Celery, Logging integrationssrc/dev_health_ops/tracing.py— OTEL setup with OTLP gRPC exportersrc/dev_health_ops/metrics/prometheus.py— Custom Prometheus metricssrc/dev_health_ops/logging_config.py— JSON log configurationsrc/dev_health_ops/api/middleware/correlation_id.py— Request ID propagation
Frontend (Next.js) — dev-health-web¶
| Layer | Tool | Config |
|---|---|---|
| Errors | @sentry/nextjs@^10.40.0 |
Server + client + edge configs, 10% traces (prod) |
| Session Replay | Sentry Replay | 10% session sampling, 100% on error |
| Web Vitals | Custom src/lib/webVitals.ts |
LCP, INP, CLS, FCP, TTFB to optional /api/v1/rum |
| Telemetry | Custom src/lib/telemetry.ts |
Events via sendBeacon() to /api/v1/telemetry |
Key Observation¶
The backend already has production-grade OTEL instrumentation exporting OTLP gRPC. The only vendor lock-in is Sentry for error tracking and session replay. This makes migration straightforward — any OTEL-native backend receives traces with zero code changes.
Tool Comparison¶
Evaluated Tools¶
| Tool | Category | Links |
|---|---|---|
| SigNoz | Full observability platform | signoz.io |
| GlitchTip | Sentry-compatible error tracking | glitchtip.com |
| Highlight.io | Full-stack monitoring + session replay | highlight.io |
| BugSink | Minimal Sentry-compatible error tracking | bugsink.com |
| OpenReplay | Session replay specialist | github.com/openreplay |
| Self-hosted Sentry | Full Sentry, self-hosted | develop.sentry.dev |
| Groundcover | eBPF Kubernetes APM | groundcover.com |
Feature Matrix¶
| Tool | Sentry SDK Compat | OTEL Native | Resources | Errors | APM | Metrics | Logs | Session Replay | K8s |
|---|---|---|---|---|---|---|---|---|---|
| SigNoz | No | Yes | 8GB+ RAM | Yes | Yes | Yes (PromQL) | Yes | No | Helm |
| GlitchTip | Yes (drop-in) | No | 2 CPU / 2GB RAM | Yes | Basic | No | No | No | Possible |
| Highlight.io | No | Yes | Moderate | Yes | Yes | Yes | Yes | Yes | Possible |
| BugSink | Yes (drop-in) | No | Minimal | Yes | No | No | No | No | No |
| OpenReplay | No | No | Moderate | Basic | No | No | No | Yes | Yes |
| Self-hosted Sentry | Yes (identical) | Partial | 16-32GB RAM | Yes | Yes | No | No | Yes | Difficult |
| Groundcover | No | Yes (exports OTEL) | Low (eBPF) | Basic | Yes | Yes | Yes | No | Yes |
Tool Deep Dives¶
SigNoz — Platform Analytics + APM
What: Open-source Datadog/New Relic alternative built natively on OpenTelemetry with ClickHouse storage.
Strengths:
- Backend already exports OTLP — SigNoz consumes traces with zero code changes
- Replaces separate Prometheus + Grafana stack with built-in PromQL
- Exception monitoring can partially replace Sentry for backend errors
- Single pane for logs + metrics + traces with signal correlation
- Helm chart for Kubernetes; single binary for Docker
Requirements: 4 CPU, 8GB RAM minimum. ClickHouse for storage.
Limitations: No session replay. No Sentry SDK compatibility. ClickHouse resource growth with data volume.
GlitchTip — Sentry Drop-in Replacement
What: Lightweight open-source error tracker that speaks the Sentry SDK protocol.
Strengths:
- Zero code changes — same Sentry SDKs, just swap the DSN
- 2 CPU / 2GB RAM (vs self-hosted Sentry's 16-32GB)
- v6 "All-in-One" mode: single container with web + worker
- Includes basic APM (performance transactions) and uptime monitoring
Requirements: Single container + PostgreSQL + Redis, or all-in-one mode.
Limitations: No session replay. No log management. No metrics/dashboards. Basic APM only.
Highlight.io — Full-Stack with Session Replay
What: Open-source monitoring platform with error tracking, logging, tracing, and session replay.
Strengths: Only OSS tool with session replay + errors + traces + logs. Built on OTEL + ClickHouse. Next.js SDK available.
Risks: Smaller community. Self-hosted docs less mature. Requires full SDK swap (not Sentry-compatible).
BugSink — Minimal Error Tracking
What: Ultra-lightweight Sentry-compatible error tracker.
Strengths: Sentry SDK compatible drop-in. Single Docker container. Minimal resources.
Limitations: Error tracking only — no APM, traces, metrics, logs, or replay. Very young project.
OpenReplay — Session Replay Specialist
What: Open-source session replay platform (LogRocket/FullStory alternative).
Strengths: Best OSS session replay. Kubernetes-native. Integrates with Sentry/Datadog as complement. Privacy controls.
Limitations: Replay is its primary feature — not a full observability platform. Adds maintenance surface.
Self-Hosted Sentry — Full Feature Parity
What: The actual Sentry product, self-hosted via Docker Compose.
Why not now: 16GB RAM minimum (32GB recommended). 12+ containers. Complex upgrades, I/O intensive. Missing AI features (Seer), spike protection. Maintenance burden acknowledged by Sentry.
Groundcover — eBPF Kubernetes APM
What: eBPF-based observability for Kubernetes with zero SDK instrumentation.
Strengths: No code changes needed. Captures traces, metrics, logs at kernel level. Data stays in your infrastructure. Exports OTEL data to SigNoz.
Pricing: Per-node (not per-event). Free tier available. Not fully open-source (commercial with free tier).
Limitations: Requires Kubernetes. No error tracking or session replay.
Recommendation¶
SigNoz (traces/metrics/logs/APM) + BugSink (error tracking via existing Sentry SDKs)
Why BugSink over GlitchTip
BugSink was chosen over GlitchTip for the pre-revenue stage (<50 orgs) because: single container vs 3 services, reuses existing Postgres vs dedicated instance, handles 1.5M events/day on 4GB RAM, smart retention with no manual cleanup, and supports Sentry CLI + source map uploads for the Next.js frontend. GlitchTip remains a viable option if APM/uptime monitoring become needed later.
Why This Combo¶
- Zero backend code changes — OTLP already exports to
localhost:4317, BugSink speaks Sentry protocol - Minimal frontend changes — swap
NEXT_PUBLIC_SENTRY_DSN, replay already conditional - Minimal resource footprint — BugSink is a single container sharing existing Postgres
- Kubernetes-ready — SigNoz has Helm chart, BugSink is a single container, add Groundcover eBPF later
- No vendor lock-in — everything is OTEL standard
- $0 software cost with near-zero incremental infrastructure cost
What You Lose¶
- Session replay (defer to OpenReplay when needed)
- Sentry AI features (Seer)
- APM/performance dashboards in error tracker (SigNoz covers this)
What You Gain¶
- Unlimited events with full data ownership
- Unified traces + metrics + logs in SigNoz with signal correlation
- Production-ready K8s path with eBPF auto-instrumentation
- Minimal operational surface for error tracking
Decision Matrix¶
| Option | Score | Rationale |
|---|---|---|
| SigNoz + BugSink | 9/10 | OTEL-native platform + minimal-ops error tracking, ideal for <50 orgs |
| SigNoz + GlitchTip | 7/10 | More features but heavier (3 services, dedicated Postgres, sparse docs) |
| Highlight.io alone | 7/10 | Has replay, but requires full SDK swap |
| Self-hosted Sentry | 5/10 | Feature parity but 16-32GB RAM is wasteful pre-revenue |
| SigNoz alone | 6/10 | Good APM but error tracking less polished |
Architecture¶
Phase 1: Docker Compose (Pre-Revenue)¶
Replace Sentry SaaS with free alternatives at minimal infrastructure cost.
┌─────────────────────────────────────────────────────┐
│ Docker Compose │
│ │
│ ┌──────────┐ OTLP gRPC ┌──────────────────┐ │
│ │ FastAPI │─────────────▶│ SigNoz │ │
│ │ + Celery │ │ (OTEL Collector + │ │
│ │ │──/metrics──▶ │ ClickHouse + │ │
│ └──────────┘ │ Query Service) │ │
│ │ └──────────────────┘ │
│ │ Sentry SDK (DSN swap) ▲ │
│ ▼ │ │
│ ┌──────────┐ │ │
│ │ GlitchTip│ ┌─────────┘ │
│ │ (errors) │ │ OTLP export │
│ └──────────┘ │ │
│ ▲ │ │
│ │ Sentry SDK │ │
│ ┌──────────┐ │ │
│ │ Next.js │──────────────┘ │
│ │ frontend │ │
│ └──────────┘ │
└─────────────────────────────────────────────────────┘
Error Tracking: BugSink (Sentry SDK compatible, single container)
APM/Traces: SigNoz (consumes existing OTLP traces)
Metrics: SigNoz (PromQL, replaces Prometheus+Grafana)
Logs: SigNoz (JSON log ingestion)
Session Replay: DEFERRED
Alerting: SigNoz (migrate Prometheus alert rules)
Phase 2: Kubernetes (Production)¶
Add eBPF-based auto-instrumentation and scale the observability stack.
┌─────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ Groundcover eBPF Sensor (DaemonSet) │ │
│ │ Auto-captures: traces, metrics, logs │ │
│ └─────────────────┬─────────────────────────┘ │
│ │ OTLP export │
│ ▼ │
│ ┌──────────────────────────────────────┐ │
│ │ SigNoz (Helm) │ │
│ │ - OTEL Collector │ │
│ │ - ClickHouse │ │
│ │ - Query Service + Frontend │ │
│ └──────────────────────────────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ FastAPI │ │ Celery │ │ Next.js │ │
│ │ Pod │ │ Workers │ │ Pod │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │
│ └──── Sentry SDK (DSN) ──────────┘ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ BugSink (single) │ │
│ └──────────────────┘ │
└─────────────────────────────────────────────────────┘
eBPF Layer: Groundcover (auto-instrumentation, free tier)
Error Tracking: BugSink (same Sentry SDKs, single container)
APM/Traces: SigNoz + Groundcover data
Metrics: SigNoz (PromQL + eBPF metrics)
Logs: SigNoz (centralized log aggregation)
Session Replay: OpenReplay OR Highlight.io (evaluate at this stage)
Migration Steps¶
Step 1: Add SigNoz to Docker Compose¶
The backend already exports OTLP to localhost:4317. Add SigNoz as a compose service and traces flow immediately.
# Add to compose.yml
signoz:
image: signoz/signoz:latest
ports:
- "8080:8080" # SigNoz UI
- "4317:4317" # OTLP gRPC (already targeted by tracing.py)
- "4318:4318" # OTLP HTTP
volumes:
- signoz-data:/var/lib/signoz
Result
Traces, metrics, and logs visible in SigNoz UI at http://localhost:8080 with zero code changes.
Step 2: Add BugSink to Docker Compose¶
Single container, reuses the existing Postgres (with a bugsink database created via init script).
bugsink:
image: bugsink/bugsink:latest
ports:
- "8800:8000"
environment:
DATABASE_URL: "postgresql://postgres:postgres@postgres:5432/bugsink"
SECRET_KEY: ${BUGSINK_SECRET_KEY:?BUGSINK_SECRET_KEY must be set}
BASE_URL: http://localhost:8800
PORT: "8000"
CREATE_SUPERUSER: "admin@example.com:changeme" # first run only
depends_on:
postgres:
condition: service_healthy
Step 3: Swap Sentry DSN¶
One environment variable change per service — no code changes.
# Before
SENTRY_DSN=https://xxx@o123.ingest.sentry.io/456
# After
SENTRY_DSN=https://key@localhost:8800/1
# Before
NEXT_PUBLIC_SENTRY_DSN=https://xxx@o123.ingest.sentry.io/789
# After
NEXT_PUBLIC_SENTRY_DSN=https://key@localhost:8800/2
Step 4: Migrate Prometheus Alert Rules¶
SigNoz supports PromQL. Migrate alerts/rules.yml into SigNoz's alerting configuration.
Step 5: Remove Sentry-Specific Features¶
- Session Replay: Controlled via
NEXT_PUBLIC_SENTRY_REPLAY_ENABLEDenv var (defaults to false for BugSink compatibility) - Profiles: Already disabled (
SENTRY_PROFILES_RATE=0.0)
Step 6 (K8s): Deploy via Helm¶
# SigNoz
helm repo add signoz https://charts.signoz.io
helm install signoz signoz/signoz -n observability --create-namespace
# BugSink — single container, point DATABASE_URL at your cluster Postgres
kubectl run bugsink --image=bugsink/bugsink:latest \
--env="DATABASE_URL=postgresql://..." \
--env="SECRET_KEY=..." \
--port=8000 -n observability
Step 7 (K8s): Add Groundcover eBPF Sensor¶
helm repo add groundcover https://helm.groundcover.com
helm install groundcover groundcover/groundcover \
--set global.clusterName=dev-health-prod \
-n groundcover --create-namespace
Cost Analysis¶
Current: Sentry SaaS¶
| Plan | Limits | Monthly Cost |
|---|---|---|
| Developer (free) | 5K errors, 10K transactions, 50 replays, 1 user | $0 |
| Team | 50K errors, 100K transactions | $26+ |
| Business | Scales with usage | $80+ |
Proposed: Self-Hosted¶
| Tool | Infrastructure | Software | Events |
|---|---|---|---|
| SigNoz | ~$20-40/mo (2-4 CPU VPS or shared K8s) | $0 | Unlimited |
| BugSink | ~$0/mo (shares existing Postgres) | $0 | Unlimited |
| Groundcover (K8s) | In-cluster | $0 (free tier) | Per-node |
| Total | ~$20-40/mo | $0 | Unlimited |
Risk Assessment¶
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| BugSink missing Sentry SDK features | Medium | Low | Core error tracking works; unsupported features (replays, APM) silently ignored |
| SigNoz ClickHouse resource growth | Medium | Medium | Set retention policies; app already runs ClickHouse |
| Maintenance burden of self-hosting | Low | Low | BugSink is single container; SigNoz has Helm chart |
| No session replay | High | Low | Acceptable pre-revenue; add OpenReplay when needed |
| Groundcover free tier limits | Low | Low | Per-node pricing, generous for small clusters |
References¶
- SigNoz Documentation
- SigNoz Kubernetes Install
- BugSink Documentation
- BugSink: GlitchTip vs Sentry vs BugSink
- GlitchTip (evaluated, deferred — heavier than BugSink for current scale)
- Highlight.io
- OpenReplay
- Self-Hosted Sentry
- Groundcover
- Groundcover Pricing