Jaeger ClickHouse storage plugin implementation

Overview

Jaeger ClickHouse

This is implementation of Jaeger's storage plugin for ClickHouse. See as well jaegertracing/jaeger/issues/1438 for historical discussion regarding Clickhouse plugin.

Note that this project is community maintained. If it is not up-to-date or missing any features please open the issue or submit a pull-request.

Documentation

Refer to the config.yaml for all supported configuration options.

Build & Run

Docker database example

docker run --rm -it -p9000:9000 --name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server:21
GOOS=linux make build run
make run-hotrod

Open localhost:16686 and localhost:8080.

Custom database

You need to specify connection options in config.yaml file, then you can run

make build
SPAN_STORAGE_TYPE=grpc-plugin {Jaeger binary adress} --query.ui-config=jaeger-ui.json --grpc-storage-plugin.binary=./{name of built binary} --grpc-storage-plugin.configuration-file=config.yaml --grpc-storage-plugin.log-level=debug

Credits

This project is based on https://github.com/bobrik/jaeger/tree/ivan/clickhouse/plugin/storage/clickhouse.

Issues
  • Explanation of '{cluster}'

    Explanation of '{cluster}'

    Our replication and sharding guide uses https://github.com/pavolloffay/jaeger-clickhouse/blob/main/guide-sharding-and-replication.md#replication '{cluster}' substitution when creating distributed table e.g.

    CREATE TABLE IF NOT EXISTS jaeger_spans ON CLUSTER '{cluster}' AS jaeger_spans_local ENGINE = Distributed('{cluster}', default, jaeger_spans_local, cityHash64(traceID));
    

    I am not sure if I understand what it exactly does. Could somebody explain it? @EinKrebs @chhetripradeep

    Let's say my CH deployment defines two clusters

    <remote_servers>
        <example_cluster1>
           ...
        </example_cluster1>
        <example_cluster2>
           ...
        </example_cluster2>
    </remote_servers>
    

    So if the create command is executed would it crate tables on all clusters?

    opened by pavolloffay 13
  • TLS support, connection options & some refactoring

    TLS support, connection options & some refactoring

    • Added database name, username and password options for connection to database;
    • abled connection using TLS;
    • did some code refactoring;
    • changed README.

    Can you please give me code review and some ideas about README content?

    Resolves #18

    opened by EinKrebs 9
  • Document and add support for deleting data/TTL

    Document and add support for deleting data/TTL

    We should document how the old data can be removed (alter table jager_spans drop partition 20201) and add support for TTL https://clickhouse.tech/docs/en/sql-reference/statements/alter/ttl/ (The user could specify the number of days in the config).

    E.g.

    CREATE TABLE IF NOT EXISTS jaeger_index_local (
         timestamp DateTime CODEC(Delta, ZSTD(1)),
         traceID String CODEC(ZSTD(1)),
         service LowCardinality(String) CODEC(ZSTD(1)),
         operation LowCardinality(String) CODEC(ZSTD(1)),
         durationUs UInt64 CODEC(ZSTD(1)),
         tags Array(String) CODEC(ZSTD(1)),
         INDEX idx_tags tags TYPE bloom_filter(0.01) GRANULARITY 64,
         INDEX idx_duration durationUs TYPE minmax GRANULARITY 1
    ) ENGINE MergeTree()
    PARTITION BY toDate(timestamp)
    ORDER BY (service, -toUnixTimestamp(timestamp))
    TTL timestamp + INTERVAL 90 DAY
    SETTINGS index_granularity=1024
    

    cc) @chhetripradeep could you please loop in and document how do you delete old data?

    opened by pavolloffay 8
  • Durable database writes

    Durable database writes

    Hi! Thanks for the project, I believe it's of a great value to the community.

    Currently, this plugin accumulated data and writes it to the database. I think it's important to do several things to ensure more durable writes:

    1. Retry network and database failures. Use exponential backoff in a case when the database cannot server write immediately.
    2. Buffer data not written to DB. Ensure that the buffer does not overflow. Sacrifice data intentionally if it cannot be stored in DB.
    3. Reload connection string when requested: a user can add new shards to CH installation

    What do you think?

    opened by x4m 6
  • Make replicated deployment work without user explicitly creating tables

    Make replicated deployment work without user explicitly creating tables

    The https://github.com/pavolloffay/jaeger-clickhouse/blob/main/guide-sharding-and-replication.md#replication requires uses to run SQL scripts on one node (bc we use ON CLUSTER).

    We could add a new config option replication: true that would indicate that replication is enabled. The plugin would then use

    • ON CLUSTER
    • replicated merge trees in local tables
    • create global tables

    cc) @EinKrebs is this smth that interests you?

    opened by pavolloffay 6
  • Looking for maintainers

    Looking for maintainers

    This project does not seem to have an active maintainer. There are a couple of open PRs from @nickbp and @bocharovf. Is anybody of you willing to take part in the project and maintain it?

    cc) @EinKrebs

    opened by pavolloffay 5
  • Expose metrics

    Expose metrics

    Closes https://github.com/pavolloffay/jaeger-clickhouse/issues/19

    Adds metrics for batch size and flush interval along with their count in prometheus exposition format.

    Signed-off-by: Pradeep Chhetri [email protected]

    opened by chhetripradeep 5
  • Running with hotrod results in  Too many simultaneous queries. Maximum: 100

    Running with hotrod results in Too many simultaneous queries. Maximum: 100

    2021.07.14 17:06:49.783711 [ 219 ] {11925d3b-7684-4919-827b-319af811c400} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
    2021.07.14 17:06:49.783769 [ 1010 ] {d4de6e5e-6305-4802-842e-13c660886ef2} <Error> TCPHandler: Code: 202, e.displayText() = DB::Exception: Too many simultaneous queries. Maximum: 100, Stack trace:
    
    0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x8d31b5a in /usr/bin/clickhouse
    1. DB::ProcessList::insert(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::IAST const*, std::__1::shared_ptr<DB::Context const>) @ 0xfcd6802 in /usr/bin/clickhouse
    2. ? @ 0xfe21ab3 in /usr/bin/clickhouse
    3. DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, bool) @ 0xfe208e3 in /usr/bin/clickhouse
    4. DB::TCPHandler::runImpl() @ 0x1069f6c2 in /usr/bin/clickhouse
    5. DB::TCPHandler::run() @ 0x106b25d9 in /usr/bin/clickhouse
    6. Poco::Net::TCPServerConnection::start() @ 0x1338b30f in /usr/bin/clickhouse
    7. Poco::Net::TCPServerDispatcher::run() @ 0x1338cd9a in /usr/bin/clickhouse
    8. Poco::PooledThread::run() @ 0x134bfc19 in /usr/bin/clickhouse
    9. Poco::ThreadImpl::runnableEntry(void*) @ 0x134bbeaa in /usr/bin/clickhouse
    10. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
    11. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
    

    The DB is started as docker run --rm -it -p9000:9000 --name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server:21

    bug 
    opened by pavolloffay 5
  • Add Operation.SpanKind support

    Add Operation.SpanKind support

    Requirement - what kind of business use case are you trying to solve?

    I ran jaeger grpc-plugin integration tests with this plugin and it failed.

    Problem - what in Jaeger blocks you from solving the requirement?

    Integration test failed because this plugin doesn't support jaeger/spanstore.Operation.SpanKind.

    opened by EinKrebs 4
  • Fix development env for os other than linux

    Fix development env for os other than linux

    Docker container expect linux binaries. We are always looking for GOOS and GOARCH which will be different in macos hence it will fail with /data/jaeger-clickhouse-darwin-amd64: exec format error

    On OSX:

    ❯ make run
    docker run --rm --name jaeger -e JAEGER_DISABLED=true --link some-clickhouse-server -it -u 502 -p16686:16686 -p14250:14250 -p14268:14268 -p6831:6831/udp -v "/Users/pradeep/gh/jaeger-clickhouse:/data" -e SPAN_STORAGE_TYPE=grpc-plugin jaegertracing/all-in-one:1.24.0 --query.ui-config=/data/jaeger-ui.json --grpc-storage-plugin.binary=/data/jaeger-clickhouse-darwin-amd64 --grpc-storage-plugin.configuration-file=/data/config.yaml --grpc-storage-plugin.log-level=debug
    2021/07/17 04:44:26 maxprocs: Leaving GOMAXPROCS=6: CPU quota undefined
    {"level":"info","ts":1626497066.4456441,"caller":"flags/service.go:117","msg":"Mounting metrics handler on admin server","route":"/metrics"}
    {"level":"info","ts":1626497066.445714,"caller":"flags/service.go:123","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
    {"level":"info","ts":1626497066.4459236,"caller":"flags/admin.go:105","msg":"Mounting health check on admin server","route":"/"}
    {"level":"info","ts":1626497066.445999,"caller":"flags/admin.go:111","msg":"Starting admin HTTP server","http-addr":":14269"}
    {"level":"info","ts":1626497066.446192,"caller":"flags/admin.go:97","msg":"Admin server started","http.host-port":"[::]:14269","health-status":"unavailable"}
    2021-07-17T04:44:26.446Z [DEBUG] starting plugin: path=/data/jaeger-clickhouse-darwin-amd64 args=["/data/jaeger-clickhouse-darwin-amd64", "--config", "/data/config.yaml"]
    {"level":"fatal","ts":1626497066.4515028,"caller":"command-line-arguments/main.go:103","msg":"Failed to init storage factory","error":"grpc-plugin builder failed to create a store: error attempting to connect to plugin rpc cl
    ient: fork/exec /data/jaeger-clickhouse-darwin-amd64: exec format error","stacktrace":"main.main.func1\n\tcommand-line-arguments/main.go:103\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.
    go:838\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:943\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:883\nmain.main\n\tcommand-line-argument
    s/main.go:216\nruntime.main\n\truntime/proc.go:225"}
    

    Signed-off-by: Pradeep Chhetri [email protected]

    opened by chhetripradeep 4
  • Fix panic when max_span_count is reached, add counter metric

    Fix panic when max_span_count is reached, add counter metric

    Panic seen in ghcr.io/jaegertracing/jaeger-clickhouse:0.8.0 with log-level=debug:

    panic: undefined type *clickhousespanstore.WriteWorker return from workerHeap
    
    goroutine 20 [running]:
    github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore.(*WriteWorkerPool).CleanWorkers(0xc00020c300, 0xc00008eefc)
    	github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore/pool.go:95 +0x199
    github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore.(*WriteWorkerPool).Work(0xc00020c300)
    	github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore/pool.go:50 +0x15e
    created by github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore.(*SpanWriter).backgroundWriter
    	github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore/writer.go:89 +0x226
    

    Also adds metric counter and logging to surface when things are hitting backpressure.

    Signed-off-by: Nick Parker [email protected]

    Which problem is this PR solving?

    • Fixes panic when recently added max_span_count flush is being exercised

    Short description of the changes

    • Updates case type so that the panic no longer occurs
    • Adds debug log and metric counter to allow monitoring of the setting, since reaching the limit may indicate backpressure/problems with writing spans to ClickHouse. (In my case I was seeing the panic when ClickHouse's disk was full)
    opened by nickbp 3
  • Model alternative for jaeger_index table

    Model alternative for jaeger_index table

    On jaeger_index tables, the tags is coded as a nested array with key and values. It is good for the only usage of Jaeger-query but in our company we are using jaeger also for analytics purposes. Since Clickhouse 21.3, the Map type (https://clickhouse.com/docs/en/sql-reference/data-types/map/) is available. I think It could be a good alternative to Nested .

    Do you have already made some performance (time and storage) tests with Map ? Could it be an acceptable contribution (with a flag to not activate it by default) ?

    opened by Etienne-Carriere 0
  • Dockerizing proposal

    Dockerizing proposal

    I have a number of proposals that I can make to this project:

    1. Two-stage build in docker. In this way, we will have a build in a reproducible environment.
    2. Optimized linking as much as possible. The image is required for production use. At high loads, even minor optimizations save resources.
    3. Build plugin along with Jaeger source code. In this way, we will influence the optimization of the building of Jaeger. We can use cache or saved docker levels to speed up building.
    4. Use Debian releases instead of Alpine distributive. One of the optimizations is linking with system libraries. Alpine has limited multithreading functionality due to the use of musl instead of libc. But there is no problem supporting both distributions.
    5. Image versioning that includes the Jaeger version, the plugin version, and the label that this container contains the plugin. The same approach is used by snyk. For example, the image will have the following tags:
      • ghcr.io/jaegertracing/jaeger-collector:1.29.0-clickhouse-0.8.0-stretch
      • ghcr.io/jaegertracing/jaeger-collector:1.29.0-clickhouse-0.8.0
      • ghcr.io/jaegertracing/jaeger-collector:1.29.0-clickhouse
      • ghcr.io/jaegertracing/jaeger-collector:clickhouse-0.8.0-stretch
      • ghcr.io/jaegertracing/jaeger-collector:clickhouse-0.8.0
      • ghcr.io/jaegertracing/jaeger-collector:clickhouse
    6. Have a complete set of images of own production: all-in-one, jaeger-agent, jaeger-collector, jaeger-ingester, jaeger-query.
    7. Run E2E-tests using docker-compose. example.

    The implementation of part of the above can be found in this project https://github.com/levonet/docker-jaeger. I'm ready to move this infrastructure and do support by my team during the time of using Jaeger.

    opened by levonet 1
  • Got plugin error

    Got plugin error "transport: error while dialing: dial unix /tmp/plugin"

    Describe the bug Got error in Jaeger UI with Clickhouse gRPC plugin when search for traces: HTTP Error: plugin error: rpc error: code = Unavailable desc = connection error: desc = "transport: error while dialing: dial unix /tmp/plugin2381205323: connect: connection refused

    Seems it happens

    • either after several hours of inactivity of Jaeger Query
    • either after jaeger_index_local exceeds ~70kk rows

    Clickhouse is up and running. Restarting Jaeger Query fix the problem temporary (until next long search).

    To Reproduce Steps to reproduce the behavior:

    1. Ingest ~70kk rows in jaeger_index_local
    2. Search for traces

    Expected behavior Traces are found

    Screenshots image

    Version (please complete the following information):

    • OS: Linux
    • Jaeger Query version: 1.25
    • Clickhouse plugin versin: 0.8
    • Clickhouse version: 21.8.10.19
    • Deployment: Kubernetes

    What troubleshooting steps did you try?

    Additional context jaeger-query-clickhouse-5ff64c9dbc-h7jr4.log

    bug 
    opened by bocharovf 6
  • Add integration test for replicated database.

    Add integration test for replicated database.

    Requirement - what kind of business use case are you trying to solve?

    Test replicated database in integration tests as well.

    Problem - what in Jaeger-ClickHouse blocks you from solving the requirement?

    No such config in workflows.

    good first issue 
    opened by EinKrebs 0
  • serialized, err = proto.Marshal(span)  insert error

    serialized, err = proto.Marshal(span) insert error

    Describe the bug serialized, err = proto.Marshal(span) insert error

    Screenshots image

    Version (please complete the following information):

    • OS: [e.g. Linux]
    • Jaeger version: latest
    • clickhouse version :21.8.3.44

    What troubleshooting steps did you try? Try to follow https://www.jaegertracing.io/docs/latest/troubleshooting/ and describe how far you were able to progress and/or which steps did not work.

    Additional context Add any other context about the problem here.

    bug 
    opened by wangpu666 0
Releases(0.11.0)
Owner
Jaeger - Distributed Tracing Platform
Jaeger - Distributed Tracing Platform
Uptrace - Distributed tracing backend using OpenTelemetry and ClickHouse

Distributed tracing backend using OpenTelemetry and ClickHouse Uptrace is a dist

Rohan 0 Mar 8, 2022
Jaeger ClickHouse storage plugin implementation

This is implementation of Jaeger's storage plugin for ClickHouse.

Jaeger - Distributed Tracing Platform 116 Jun 29, 2022
Jaeger-s3 - Jaeger gRPC storage plugin for Amazon S3

jaeger-s3 jaeger-s3 is gRPC storage plugin for Jaeger, which uses Amazon Kinesis

Johannes Würbach 7 May 3, 2022
Jaeger-influxdb - The repository that contains InfluxDB Storage gRPC plugin for Jaeger

NOTICE: This repository is archived and is no longer maintained. Please use http

Rohan 0 Feb 16, 2022
ClickHouse Operator creates, configures and manages ClickHouse clusters running on Kubernetes

ClickHouse Operator ClickHouse Operator creates, configures and manages ClickHouse clusters running on Kubernetes. Features The ClickHouse Operator fo

RadonDB 17 Jun 13, 2022
Go-clickhouse - ClickHouse client for Go

ClickHouse client for Go 1.18+ This client uses native protocol to communicate w

Uptrace 109 Jun 21, 2022
The Container Storage Interface (CSI) Driver for Fortress Block Storage This driver allows you to use Fortress Block Storage with your container orchestrator

fortress-csi The Container Storage Interface (CSI) Driver for Fortress Block Storage This driver allows you to use Fortress Block Storage with your co

Fortress 0 Jan 23, 2022
CNCF Jaeger, a Distributed Tracing Platform

Jaeger - a Distributed Tracing System Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing platform created by Uber Technologies and do

Jaeger - Distributed Tracing Platform 16k Jun 29, 2022
CNCF Jaeger, a Distributed Tracing Platform

Jaeger - a Distributed Tracing System Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing platform created by Uber Technologies and do

Jaeger - Distributed Tracing Platform 16k Jun 30, 2022
CNCF Jaeger, a Distributed Tracing Platform

Jaeger - a Distributed Tracing System Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing platform created by Uber Technologies and do

Jaeger - Distributed Tracing Platform 16k Jun 29, 2022
Rest API to get KVB departures - Written in Go with hexagonal architecture and tracing via OpenTelemetry and Jaeger

KVB API Rest API to get upcoming departures per KVB train station Implemented in Go with hexagonal architecture and tracing via OpenTelemetry and Jaeg

Jan Ritter 3 May 7, 2022
Fibonacci by golang, opentelemetry, jaeger

Fibonacci Technology stack Opentelemetry Jaeger Prometheus Development Run Run docker-compose and main.go: make all Run docker-compose down: make down

null 0 Jan 14, 2022
ClickHouse http proxy and load balancer

chproxy English | 简体中文 Chproxy, is an http proxy and load balancer for ClickHouse database. It provides the following features: May proxy requests to

Vertamedia 922 Jun 21, 2022
Collects many small inserts to ClickHouse and send in big inserts

ClickHouse-Bulk Simple Yandex ClickHouse insert collector. It collect requests and send to ClickHouse servers. Installation Download binary for you pl

Nikolay Pavlovich 364 Jun 20, 2022
Bifrost ---- 面向生产环境的 MySQL 同步到Redis,MongoDB,ClickHouse,MySQL等服务的异构中间件

Bifrost ---- 面向生产环境的 MySQL 同步到Redis,ClickHouse等服务的异构中间件 English 漫威里的彩虹桥可以将 雷神 送到 阿斯加德 和 地球 而这个 Bifrost 可以将 你 MySQL 里的数据 全量 , 实时的同步到 : Redis MongoDB Cl

brokerCAP 1.2k Jun 30, 2022
Golang driver for ClickHouse

ClickHouse Golang SQL database driver for Yandex ClickHouse Key features Uses native ClickHouse tcp client-server protocol Compatibility with database

ClickHouse 2k Jun 25, 2022
VectorSQL is a free analytics DBMS for IoT & Big Data, compatible with ClickHouse.

NOTICE: This project have moved to fuse-query VectorSQL is a free analytics DBMS for IoT & Big Data, compatible with ClickHouse. Features High Perform

VectorEngine 208 May 6, 2022
support clickhouse

Remote storage adapter This is a write adapter that receives samples via Prometheus's remote write protocol and stores them in Graphite, InfluxDB, cli

weetime 28 Jun 21, 2022
Data syncing in golang for ClickHouse.

ClickHouse Data Synchromesh Data syncing in golang for ClickHouse. based on go-zero ARCH A typical data warehouse architecture design of data sync Aut

好未来技术 825 Jun 21, 2022
Clickhouse support for GORM

clickhouse Clickhouse support for GORM Quick Start package main import ( "fmt" "github.com/sweetpotato0/clickhouse" "gorm.io/gorm" ) // User

null 1 Oct 24, 2021
Distributed tracing using OpenTelemetry and ClickHouse

Distributed tracing backend using OpenTelemetry and ClickHouse Uptrace is a dist

Uptrace 729 Jun 21, 2022
Mogo: a lightweight browser-based logs analytics and logs search platform for some datasource(ClickHouse, MySQL, etc.)

mogo Mogo is a lightweight browser-based logs analytics and logs search platform

Shimo HQ 670 Jun 22, 2022
Uptrace - Distributed tracing backend using OpenTelemetry and ClickHouse

Distributed tracing backend using OpenTelemetry and ClickHouse Uptrace is a dist

Rohan 0 Mar 8, 2022
TurtleDex is a decentralized cloud storage platform that radically alters the landscape of cloud storage

TurtleDex is a decentralized cloud storage platform that radically alters the landscape of cloud storage. By leveraging smart contracts, client-side encryption, and sophisticated redundancy (via Reed-Solomon codes), TurtleDex allows users to safely store their data with hosts that they do not know or trust.

TurtleDev 531 May 29, 2021
TurtleDex is a decentralized cloud storage platform that radically alters the landscape of cloud storage.

TurtleDex is a decentralized cloud storage platform that radically alters the landscape of cloud storage. By leveraging smart contracts, client-side e

TurtleDev 18 Feb 17, 2021
Rclone ("rsync for cloud storage") is a command line program to sync files and directories to and from different cloud storage providers.

Rclone ("rsync for cloud storage") is a command line program to sync files and directories to and from different cloud storage providers.

rclone 33.4k Jun 24, 2022