Distributed tracing using OpenTelemetry and ClickHouse

Overview

Distributed tracing backend using OpenTelemetry and ClickHouse

Uptrace is a distributed tracing system that uses OpenTelemetry to collect data and ClickHouse database to store it. ClickHouse is the only dependency.

Screenshot goes here

Features:

  • OpenTelemetry protocol via gRPC (:14317) and HTTP (:14318)
  • Span/Trace grouping
  • SQL-like query language
  • Percentiles
  • Systems dashboard

Roadmap:

  • Errors/logs support
  • More dashboards for services and hosts
  • ClickHouse cluster support
  • TLS support
  • Improved SQL support using CockroachDB SQL parser

Getting started

  • Docker example allows to run Uptrace with a single command.
  • Installation guide with pre-compiled binaries for Linux, MacOS, and Windows.

Running Uptrace locally

To run Uptrace locally, you need Go 1.18 and ClickHouse.

Step 1. Create uptrace ClickHouse database:

clickhouse-client -q "CREATE DATABASE uptrace"

Step 2. Reset ClickHouse database schema:

go run cmd/uptrace/main.go ch reset

Step 3. Start Uptrace:

go run cmd/uptrace/main.go serve

Step 4. Open Uptrace UI at http://localhost:14318

Uptrace will monitor itself using uptrace-go OpenTelemetry distro. To get some test data, just reload the UI few times.

Running UI locally

You can also start the UI locally:

cd vue
pnpm install
pnpm serve

And open http://localhost:19876

Issues
  • feat: add missing Tempo APIs

    feat: add missing Tempo APIs

    So how I test / develop it. From the root directory:

    docker-compose up -d
    
    # make sure everything is up
    docker-compose ps
    
    # start uptrace without a container
    DEBUG=2 go run cmd/uptrace/main.go serve
    
    # open grafana at http://localhost:3000 and chose Uptrace datasource
    

    That is a development workflow that does NOT run Uptrace in a container so we can quickly reload it. We probably also need a separate docker-compose example that does that.

    opened by vmihailenco 5
  • Feat/spans json response

    Feat/spans json response

    This PR adds a new endpoint like this:

    curl -v http://localhost:14318/api/traces/0b0462c3-4698-ff90-dbc6-6870ced6775b/json
    
    {
       "resourceSpans":[
          {
             "instrumentationLibrarySpans":[
                {
                   "spans":[
                      {
                         "traceId":"CwRiw0aY/5DbxmhwztZ3Ww==",
                         "spanId":"JdNs9MF0KeQ=",
                         "parentSpanId":"AAAAAAAAAAA=",
                         "name":"GET /*path",
                         "kind":"SPAN_KIND_SERVER",
                         "startTimeUnixNano":"1651733829634968350",
                         "endTimeUnixNano":"1651733829783441370",
                         "attributes":[
                            {
                               "key":"http.flavor",
                               "value":{
                                  "stringValue":"1.1"
                               }
                            },
                            {
                               "key":"net.host.name",
                               "value":{
                                  "stringValue":"localhost"
                               }
                            },
                            {
                               "key":"http.wrote_bytes",
                               "value":{
                                  "intValue":"2680172"
                               }
                            },
                            {
                               "key":"telemetry.sdk.name",
                               "value":{
                                  "stringValue":"opentelemetry"
                               }
                            },
                            {
                               "key":"http.user_agent.version",
                               "value":{
                                  "stringValue":"101.0.4951.41"
                               }
                            },
                            {
                               "key":"http.route",
                               "value":{
                                  "stringValue":"/*path"
                               }
                            },
                            {
                               "key":"service.name",
                               "value":{
                                  "stringValue":"serve"
                               }
                            },
                            {
                               "key":"http.user_agent.name",
                               "value":{
                                  "stringValue":"Chrome"
                               }
                            },
                            {
                               "key":"telemetry.sdk.language",
                               "value":{
                                  "stringValue":"go"
                               }
                            },
                            {
                               "key":"http.target",
                               "value":{
                                  "stringValue":"/js/chunk-vendors.76f0d740.js.map"
                               }
                            },
                            {
                               "key":"http.user_agent.os",
                               "value":{
                                  "stringValue":"Linux"
                               }
                            },
                            {
                               "key":"http.route.param.path",
                               "value":{
                                  "stringValue":"js/chunk-vendors.76f0d740.js.map"
                               }
                            },
                            {
                               "key":"http.user_agent",
                               "value":{
                                  "stringValue":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.41 Safari/537.36"
                               }
                            },
                            {
                               "key":"host.name",
                               "value":{
                                  "stringValue":"vmihailenco"
                               }
                            },
                            {
                               "key":"net.peer.ip",
                               "value":{
                                  "stringValue":"::1"
                               }
                            },
                            {
                               "key":"net.host.port",
                               "value":{
                                  "intValue":"14318"
                               }
                            },
                            {
                               "key":"otel.library.version",
                               "value":{
                                  "stringValue":"semver:0.31.0"
                               }
                            },
                            {
                               "key":"http.method",
                               "value":{
                                  "stringValue":"GET"
                               }
                            },
                            {
                               "key":"http.host",
                               "value":{
                                  "stringValue":"localhost:14318"
                               }
                            },
                            {
                               "key":"http.scheme",
                               "value":{
                                  "stringValue":"http"
                               }
                            },
                            {
                               "key":"net.transport",
                               "value":{
                                  "stringValue":"ip_tcp"
                               }
                            },
                            {
                               "key":"net.peer.port",
                               "value":{
                                  "intValue":"46672"
                               }
                            },
                            {
                               "key":"http.user_agent.os_version",
                               "value":{
                                  "stringValue":"x86_64"
                               }
                            },
                            {
                               "key":"http.client_ip",
                               "value":{
                                  "stringValue":"::1"
                               }
                            },
                            {
                               * Connection #0 to host localhost left intact"key":"otel.library.name",
                               "value":{
                                  "stringValue":"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
                               }
                            },
                            {
                               "key":"telemetry.sdk.version",
                               "value":{
                                  "stringValue":"1.6.3"
                               }
                            },
                            {
                               "key":"http.status_code",
                               "value":{
                                  "intValue":"200"
                               }
                            }
                         ],
                         "status":{
                            "code":"STATUS_CODE_OK"
                         }
                      }
                   ]
                }
             ]
          }
       ]
    }
    

    @lmangani I am using the official protobuf bindings for Go from go.opentelemetry.io/proto/otlp/trace/v1 and it seems to work fine, but the example you provided also contains some fields that are not part of OTLP, for example:

          "instrumentationLibrarySpans": [
            {
              "instrumentationLibrary": {},
              "spans": [
                {
                  "traceID": "d6e9329d67b6146b",
                  "spanID": "1234",
                  "name": "span from bash!",
                  "references": [],
                  "startTime": 1651401486889077,
                  "startTimeUnixNano": 1651401486889077000,
                  "endTimeUnixNano": 1651401486989077000,
    
    // OTLP DOES NOT HAVE THESE FIELDS START
    
                  "duration": 100000,
                  "tags": [
                    {
                      "key": "http.method",
                      "value": "GET",
                      "type": "string"
                    },
                    {
                      "key": "http.path",
                      "value": "/api",
                      "type": "string"
                    }
                  ],
                  "logs": [],
                  "processID": "p1",
                  "warnings": null,
                  "localEndpoint": {
                    "serviceName": "shell script"
                  },
    
    // OTLP DOES NOT HAVE THESE FIELDS END
    
                  "traceId": "AAAAAAAAAADW6TKdZ7YUaw==",
                  "spanId": "AAAAAAAAEjQ="
                }
              ]
            }
          ]
    
    opened by vmihailenco 4
  • Clickhouse request errors

    Clickhouse request errors

    Good day!

    I've started uptrace setup via Docker and got an errors in UI:

    Screenshot 2021-12-29 at 12 45 35

    *ch.Error: DB::Exception: Unknown function toFloat64OrDefault. Maybe you meant: ['toFloat64OrNull','dictGetFloat64OrDefault']: While processing toFloat64OrDefault(span.duration)

    Uptrace logs:

    [bunrouter]  09:45:30.094   500     16.918ms   GET      /api/tracing/groups?time_gte=2021-12-29T08:46:00.000Z&time_lt=2021-12-29T09:46:00.000Z&query=group+by+span.group_id+%7C+span.count_per_min+%7C+span.error_pct+%7C+p50(span.duration)+%7C+p90(span.duration)+%7C+p99(span.duration)&system=http:unknown_service          *ch.Error: DB::Exception: Unknown function toFloat64OrDefault. Maybe you meant: ['toFloat64OrNull','dictGetFloat64OrDefault']: While processing toFloat64OrDefault(`span.duration`)
    
    [ch]  09:45:30.696   SELECT               68.642ms  SELECT count() / 60 AS "span.count_per_min", countIf(`span.status_code` = 'error') / count() AS "span.error_pct", quantileTDigest(0.5)(toFloat64OrDefault(s."span.duration")) AS "p50(span.duration)", quantileTDigest(0.9)(toFloat64OrDefault(s."span.duration")) AS "p90(span.duration)", quantileTDigest(0.99)(toFloat64OrDefault(s."span.duration")) AS "p99(span.duration)", s."span.group_id" AS "span.group_id", any(s."span.system") AS "span.system", any(s."span.name") AS "span.name" FROM "spans_index_buffer" AS "s" WHERE (s.`span.time` >= '2021-12-29 08:46:00') AND (s.`span.time` < '2021-12-29 09:46:00') AND (s.`span.system` = 'http:unknown_service') GROUP BY "span.group_id" LIMIT 1000       *ch.Error: DB::Exception: Unknown function toFloat64OrDefault. Maybe you meant: ['toFloat64OrNull','dictGetFloat64OrDefault']: While processing toFloat64OrDefault(`span.duration`) 
    
    [bunrouter]  09:45:30.607   500    108.234ms   GET      /api/tracing/groups?time_gte=2021-12-29T08:46:00.000Z&time_lt=2021-12-29T09:46:00.000Z&query=group+by+span.group_id+%7C+span.count_per_min+%7C+span.error_pct+%7C+p50(span.duration)+%7C+p90(span.duration)+%7C+p99(span.duration)&system=http:unknown_service          *ch.Error: DB::Exception: Unknown function toFloat64OrDefault. Maybe you meant: ['toFloat64OrNull','dictGetFloat64OrDefault']: While processing toFloat64OrDefault(`span.duration`)
    

    Clickhouse version: altinity/clickhouse-server:21.8.12.1.testingarm (cause I have macbook on m1 chip)

    opened by like-a-freedom 4
  • Error sorting by wrong column span.count_per_min

    Error sorting by wrong column span.count_per_min

    http://oteldev-01.moncc.net:14318/explore/13385337/spans?time_dur=3600&query=group%20by%20span.group_id%20%7C%20span.count_per_min%20%7C%20span.error_pct%20%7C%20%7Bp50,p90,p99%7D%28span.duration%29%20%7C%20where%20span.group_id%20%3D%20%228962033991366417035%22&system=http%3Aotel-ui-dev&sort_by=span.count_per_min&sort_dir=desc

    *ch.Error: DB::Exception: Missing columns: 'span.count_per_min' while processing query: 'SELECTspan.id,span.trace_idFROM spans_index_buffer AS s WHERE (project_id = 13385337) AND (span.time>= toDateTime('2022-04-15 20:29:00', 'UTC')) AND (span.time< toDateTime('2022-04-15 21:29:00', 'UTC')) AND (span.system= 'http:otel-ui-dev') AND (span.group_id= '8962033991366417035') ORDER BYspan.count_per_minDESC LIMIT 10', required columns: 'span.id' 'span.trace_id' 'project_id' 'span.time' 'span.system' 'span.count_per_min' 'span.group_id', maybe you meant: ['span.id','span.trace_id','project_id','span.time','span.system','span.group_id']

    image

    image

    bug 
    opened by dixanms 3
  • [question] is the ingestion to clickhouse buffered?

    [question] is the ingestion to clickhouse buffered?

    i mean, clickhouse can do only 100 batch insert per second before it gets code: 252, message: Too many parts (300). Merges are processing significantly slower than inserts, the question is: is the insert buffered/batched?

    opened by kokizzu 2
  • In example, docker-compose up failed

    In example, docker-compose up failed

    docker-uptrace-1 | *ch.Error: DB::Exception: Missing columns: 'time' while processing query: 'project_id, span.system, span.group_id, time', required columns: 'project_id' 'span.system' 'time' 'span.group_id' 'project_id' 'span.system' 'time' 'span.group_id' docker-uptrace-1 | [ch] 09:03:18.366 ALTER 5.539ms ALTER TABLE ch_migration_locks DROP COLUMN col1 docker-uptrace-1 | 2022/03/23 09:03:18 DB::Exception: Missing columns: 'time' while processing query: 'project_id, span.system, span.group_id, time', required columns: 'project_id' 'span.system' 'time' 'span.group_id' 'project_id' 'span.system' 'time' 'span.group_id'

    bug 
    opened by puniey 1
  • unexpected filter filtering spans by duration using pre-defined menu

    unexpected filter filtering spans by duration using pre-defined menu

    When filtering spans by duration, receive an error

    unexpected ">" in "where span.duration ><-= 500ms"

    Manually removing the = in the filter produces results >

    where span.duration > 500ms

    duration_filter

    bug 
    opened by jkrech17 1
  • Issue in traefik routing to OTLP/HTTP port

    Issue in traefik routing to OTLP/HTTP port

    I am running uptrace in docker and used traefik for routing the OLTP/HTTP requests and its working while accessing the UI, since UI and OTLP/HTTP are the same ports , OTLP/HTTP also should work. But while sending trace data to OLTP/HTTP via traefik, we are not getting the POST request in uptrace.

    But if we use the direct port we can see the POST requests in the docker logs like below.

    
    [bunrouter]  07:04:03.145   200        188µs   POST     /v1/traces
    [bunrouter]  07:04:03.149   200        177µs   POST     /v1/traces
    

    Thanks.

    opened by rinshadka 1
  • fix panic when there is no otel name/version

    fix panic when there is no otel name/version

    when otel.library.name and otel.library.version is not provided uptrace crashes opentelemetry-rust doesn't provide them in grpc so it crashes uptrace.

    read the docs at            https://docs.uptrace.dev/guide/os.html#otlp
    OTLP/gRPC (listen.grpc)     http://localhost:14317
    OTLP/HTTP (listen.http)     http://localhost:14318
    UI (listen.http)            http://localhost:14318/
    
    panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    	panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x16bc329]
    
    goroutine 72 [running]:
    go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End.func1()
    	go.opentelemetry.io/otel/[email protected]/trace/span.go:243 +0x2a
    go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End(0xc00023a480, {0x0, 0x0, 0x120?})
    	go.opentelemetry.io/otel/[email protected]/trace/span.go:282 +0x8a2
    panic({0x179c9c0, 0x2ac8490})
    	runtime/panic.go:838 +0x207
    github.com/uptrace/uptrace/pkg/tracing.(*TraceServiceServer).process(0xc000098000, {0xc00071c1a8, 0x1, 0x17b0940?})
    	github.com/uptrace/uptrace/pkg/tracing/otlp_grpc.go:70 +0x129
    github.com/uptrace/uptrace/pkg/tracing.(*TraceServiceServer).Export(0x2?, {0x25e39a8?, 0xc0003fa870?}, 0xc000716100)
    	github.com/uptrace/uptrace/pkg/tracing/otlp_grpc.go:60 +0x6d
    go.opentelemetry.io/proto/otlp/collector/trace/v1._TraceService_Export_Handler.func1({0x25e39a8, 0xc0003fa870}, {0x17ed860?, 0xc000716100})
    	go.opentelemetry.io/proto/[email protected]/collector/trace/v1/trace_service_grpc.pb.go:85 +0x78
    go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1({0x25e39a8, 0xc0007120c0}, {0x17ed860, 0xc000716100}, 0xc0001fe6e0, 0xc000554900)
    	go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:325 +0x664
    go.opentelemetry.io/proto/otlp/collector/trace/v1._TraceService_Export_Handler({0x1870a40?, 0xc000098000}, {0x25e39a8, 0xc0007120c0}, 0xc000100360, 0xc0001ff640)
    	go.opentelemetry.io/proto/[email protected]/collector/trace/v1/trace_service_grpc.pb.go:87 +0x138
    google.golang.org/grpc.(*Server).processUnaryRPC(0xc00043afc0, {0x25e62f8, 0xc000112180}, 0xc000128000, 0xc0000a4060, 0x2ac9de0, 0x0)
    	google.golang.org/[email protected]/server.go:1282 +0xccf
    google.golang.org/grpc.(*Server).handleStream(0xc00043afc0, {0x25e62f8, 0xc000112180}, 0xc000128000, 0x0)
    	google.golang.org/[email protected]/server.go:1616 +0xa1b
    google.golang.org/grpc.(*Server).serveStreams.func1.2()
    	google.golang.org/[email protected]/server.go:921 +0x98
    created by google.golang.org/grpc.(*Server).serveStreams.func1
    	google.golang.org/[email protected]/server.go:919 +0x28a
    
    opened by altanozlu 1
  • feat: read cLoki samples from the DB

    feat: read cLoki samples from the DB

    In this PR, Uptrace reads logs directly from cLoki database and displays them when users view traces. To do that, Uptrace filters log messages that contain the current trace id.

    To generate logs with trace ids, I've used the example based on Vector that exports logs to cLoki. The readme should explain how to run it.

    cLoki UI:

    cloki

    cLoki samples shown in Uptrace UI:

    cloki-uptrace

    opened by vmihailenco 0
  • read project list from a dynamic source (feature request)

    read project list from a dynamic source (feature request)

    Feature request: Read the project list periodically from a dynamic source such as a URL, or from the output of a script.

    Currently, a reconfiguration and a service restart is required, which is not ideal for complex environments

    enhancement 
    opened by dixanms 1
  • Feature: Save Filters / Configurable Default Landing Page

    Feature: Save Filters / Configurable Default Landing Page

    It could be beneficial to allow the main summary view in Uptrace to be configurable by a particular attribute. Right now there are Systems, Services, and hosts but say you had an attribute of app.name. It would be useful to be able to add additional views based on some attributes users have added to a group by Application, etc.

    image

    We may look at how this could be implemented and could look to contribute, but thought I would submit an issue as well.

    Thanks!

    opened by jkrech17 2
Releases(v0.2.15)
Owner
Uptrace
All-in-one tool to optimize performance and monitor errors & logs
Uptrace
Go-clickhouse - ClickHouse client for Go

ClickHouse client for Go 1.18+ This client uses native protocol to communicate w

Uptrace 94 May 21, 2022
ClickHouse http proxy and load balancer

chproxy English | 简体中文 Chproxy, is an http proxy and load balancer for ClickHouse database. It provides the following features: May proxy requests to

Vertamedia 893 May 18, 2022
Collects many small inserts to ClickHouse and send in big inserts

ClickHouse-Bulk Simple Yandex ClickHouse insert collector. It collect requests and send to ClickHouse servers. Installation Download binary for you pl

Nikolay Pavlovich 355 May 11, 2022
Mogo: a lightweight browser-based logs analytics and logs search platform for some datasource(ClickHouse, MySQL, etc.)

mogo Mogo is a lightweight browser-based logs analytics and logs search platform

Shimo HQ 623 May 20, 2022
OpenTelemetry instrumentation for database/sql

otelsql It is an OpenTelemetry instrumentation for Golang database/sql, a port from https://github.com/open-telemetry/opentelemetry-go-contrib/pull/50

Sam Xie 88 May 14, 2022
OpenTelemetry instrumentations for Go

OpenTelemetry instrumentations for Go Instrumentation Package Metrics Traces database/sql ✔️ ✔️ GORM ✔️ ✔️ sqlx ✔️ ✔️ logrus ✔️ Zap ✔️ Contributing To

Uptrace 61 May 19, 2022
Otelsql - OpenTelemetry SQL database driver wrapper for Go

OpenTelemetry SQL database driver wrapper for Go Add a OpenTelemetry wrapper to

Nhat 20 May 11, 2022
Bifrost ---- 面向生产环境的 MySQL 同步到Redis,MongoDB,ClickHouse,MySQL等服务的异构中间件

Bifrost ---- 面向生产环境的 MySQL 同步到Redis,ClickHouse等服务的异构中间件 English 漫威里的彩虹桥可以将 雷神 送到 阿斯加德 和 地球 而这个 Bifrost 可以将 你 MySQL 里的数据 全量 , 实时的同步到 : Redis MongoDB Cl

brokerCAP 1.1k May 19, 2022
support clickhouse

Remote storage adapter This is a write adapter that receives samples via Prometheus's remote write protocol and stores them in Graphite, InfluxDB, cli

weetime 24 May 10, 2022
Jaeger ClickHouse storage plugin implementation

Jaeger ClickHouse Jaeger ClickHouse gRPC storage plugin. This is WIP and it is based on https://github.com/bobrik/jaeger/tree/ivan/clickhouse/plugin/s

Pavol Loffay 1 Feb 15, 2022
Clickhouse support for GORM

clickhouse Clickhouse support for GORM Quick Start package main import ( "fmt" "github.com/sweetpotato0/clickhouse" "gorm.io/gorm" ) // User

null 1 Oct 24, 2021
An observability database aims to ingest, analyze and store Metrics, Tracing and Logging data.

BanyanDB BanyanDB, as an observability database, aims to ingest, analyze and store Metrics, Tracing and Logging data. It's designed to handle observab

The Apache Software Foundation 87 May 16, 2022
go mysql driver, support distributed transaction

Go-MySQL-Driver A MySQL-Driver for Go's database/sql package Features Requirements Installation Usage DSN (Data Source Name) Password Protocol Address

Open Transaction 33 Apr 3, 2022
a lightweight distributed transaction management service, support xa tcc saga

a lightweight distributed transaction management service, support xa tcc saga

null 6k May 14, 2022
PgSQL compatible on distributed database TiDB

TiDB for PostgreSQL Introduction TiDB for PostgreSQL is an open source launched by Digital China Cloud Base to promote and integrate into the open sou

DigitalChinaOpenSource 328 May 12, 2022
OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

OctoSQL OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases, streaming sources and file formats using

Jacob Martin 3.1k May 18, 2022
Use SQL to query host, DNS and exploit information using Shodan. Open source CLI. No DB required.

Shodan Plugin for Steampipe Query Shodan with SQL Use SQL to query host, DNS and exploit information using Shodan. For example: select * from shod

Turbot 23 May 10, 2022
Query and Provision Cloud Infrastructure using an extensible SQL based grammar

Deploy, Manage and Query Cloud Infrastructure using SQL [Documentation] [Developer Guide] Cloud infrastructure coding using SQL InfraQL allows you to

InfraQL 22 Apr 5, 2022