Grafana Tempo is a high volume, minimal dependency distributed tracing backend.

Overview

Tempo Logo

Grafana Tempo is an open source, easy-to-use and high-scale distributed tracing backend. Tempo is cost-efficient, requiring only object storage to operate, and is deeply integrated with Grafana, Prometheus, and Loki. Tempo can be used with any of the open source tracing protocols, including Jaeger, Zipkin, OpenCensus, Kafka, and OpenTelemetry. It supports key/value lookup only and is designed to work in concert with logs and metrics (exemplars) for discovery.

Tempo is Jaeger, Zipkin, Kafka, OpenCensus and OpenTelemetry compatible. It ingests batches in any of the mentioned formats, buffers them and then writes them to Azure, GCS, S3 or local disk. As such it is robust, cheap and easy to operate!

Getting Started

Further Reading

To learn more about Tempo, consult the following documents & talks:

Getting Help

If you have any questions or feedback regarding Tempo:

OpenTelemetry

Tempo's receiver layer, wire format and storage format are all based directly on standards and code established by OpenTelemetry. We support open standards at Grafana!

Check out the Integration Guides to see examples of OpenTelemetry instrumentation with Tempo.

Other Components

tempo-query

tempo-query is jaeger-query with a hashicorp go-plugin to support querying Tempo. Please note that tempo only looks up a trace by ID. Searching for traces is not supported, and the service and operation lists will not populate.

tempo-vulture

tempo-vulture is tempo's bird themed consistency checking tool. It pushes traces and queries Tempo. It metrics 404s and traces with missing spans.

tempo-cli

tempo-cli is the place to put any utility functionality related to tempo. See Documentation for more info.

TempoDB

TempoDB is included in the this repository but is meant to be a stand alone key value database built on top of cloud object storage (azure/gcs/s3). It is a natively multitenant, supports a WAL and is the storage engine for Tempo.

License

Grafana Tempo is distributed under AGPL-3.0-only. For Apache-2.0 exceptions, see LICENSING.md.

Comments
  • S3 disk space usage always increasing

    S3 disk space usage always increasing

    Describe the bug Hi all, I've Tempo deployed in microservices mode in my Kubernetes cluster and my Minio S3 storage disk space usage is always increasing even if Compactor is set to use 1h of retention.

    Here is my config:

    compactor:
            compaction:
                block_retention: 1h
                compacted_block_retention: 15m
        distributor: {}
        http_api_prefix: ""
        ingester:
            lifecycler:
                ring:
                    replication_factor: 3
        memberlist:
            abort_if_cluster_join_fails: false
            bind_port: 7946
            join_members:
                - gossip-ring.tempo.svc.cluster.local.:7946
        metrics_generator:
            storage:
                remote_write:
                    - name: remote-write
                      send_exemplars: true
                      url: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090/api/v1/write
        metrics_generator_enabled: true
        multitenancy_enabled: true
        overrides:
            per_tenant_override_config: /overrides/overrides.yaml
        querier:
            frontend_worker:
                grpc_client_config:
                    max_send_msg_size: 1.34217728e+08
        search_enabled: true
        server:
            grpc_server_max_recv_msg_size: 1.34217728e+08
            grpc_server_max_send_msg_size: 1.34217728e+08
            http_listen_port: 3200
        storage:
            trace:
                azure: {}
                backend: s3
                blocklist_poll: "0"
                cache: memcached
                gcs: {}
                memcached:
                    consistent_hash: true
                    host: memcached
                    service: memcached-client
                    timeout: 200ms
                pool:
                    queue_depth: 2000
                s3:
                    access_key: ${S3_ACCESS_KEY}
                    bucket: tempo
                    endpoint: minio.minio.svc.cluster.local:9000
                    insecure: true
                    secret_key: ${S3_SECRET_KEY}
                wal:
                    path: /var/tempo/wal
    

    To Reproduce Steps to reproduce the behavior:

    1. Deploy Tempo in microservices mode with Minio as S3 storage
    2. Set Compactor with 1h of retention
    3. Start sending traces to Tempo
    4. Observe Minio disk space usage

    Expected behavior Disk space usage should decrease with compaction cycles.

    Environment:

    • Infrastructure: Kubernetes
    • Deployment tool: jsonnet

    Additional Context

    opened by irizzant 28
  • Azure DNS Lookup Failures

    Azure DNS Lookup Failures

    Describe the bug In Azure all components can register DNS errors while trying to work with meta.compacted.json. This issue does not occur in any other environments. Unsure if this is an Azure issue or a Tempo issue. The error itself is a failed tcp connection to a DNS server which suggests this is some issue with Azure infrastructure. However, the fact that the error (almost?) always occurs on meta.compacted.json suggests something about the way we handle that file is different and is causing this issue.

    The failures look like:

    reading storage container: Head "https://tempoe**************.blob.core.windows.net/tempo/single-tenant/d8aafc48-5796-4221-ac0b-58e001d18515/meta.compacted.json?timeout=61": dial tcp: lookup tempoe**************.blob.core.windows.net on 10.0.0.10:53: dial udp 10.0.0.10:53: operation was canceled
    

    or

    error deleting blob, name: single-tenant/*******************/data: Delete "https://tempoe******.blob.core.windows.net/tempo/single-tenant/5b1ab746-fee7-409c-944d-1c1d5ba7a70e/data?timeout=61": dial tcp: lookup tempoe******.blob.core.windows.net on 10.0.0.10:53: dial udp 10.0.0.10:53: operation was canceled
    

    We have seen this issue internally. Also reported here: #1372.

    opened by joe-elliott 26
  • Backend not hit

    Backend not hit

    Describe the bug We're using Tempo v1.3.1 with Grafana 8.3.6 and S3 as the storage backend. It seems like when we query traces for multiple hours (e.g. last 24h) only the ingester is queried for its data (which is always around the last 1-2h). When we chose a time range between now-1h and now-24h the 23h are returned correctly. You also "feel" that the backend is hit because it takes much longer.

    So it seems like when you query a time range where both the ingester and the object storage should be hit, only the ingester is.

    To Reproduce Steps to reproduce the behavior:

    1. Try to query the last 24h hours with default config for "query_ingesters_until" and "query_backend_after". See that it only returns the last 1-2 hours.
    2. Try to query now-24h to now-1h see that it returns the requested 23h and hits the object storage.

    Expected behavior When requesting the last 24h Tempo should return the whole 24h.

    Environment:

    • Infrastructure: Kubernetes
    • Deployment tool: helm (tempo-distributed chart v0.15.2)

    Additional Context

    Rendered Tempo config

    query_frontend:
      search:
        max_duration: 0
    multitenancy_enabled: false
    search_enabled: true
    compactor:
      compaction:
        block_retention: 1440h
      ring:
        kvstore:
          store: memberlist
    distributor:
      ring:
        kvstore:
          store: memberlist
      receivers:
        jaeger:
          protocols:
            thrift_compact:
              endpoint: 0.0.0.0:6831
            thrift_binary:
              endpoint: 0.0.0.0:6832
            thrift_http:
              endpoint: 0.0.0.0:14268
            grpc:
              endpoint: 0.0.0.0:14250
        otlp:
          protocols:
            http:
              endpoint: 0.0.0.0:55681
            grpc:
              endpoint: 0.0.0.0:4317
    querier:
      frontend_worker:
        frontend_address: tempo-tempo-distributed-query-frontend-discovery:9095
    ingester:
      lifecycler:
        ring:
          replication_factor: 1
          kvstore:
            store: memberlist
        tokens_file_path: /var/tempo/tokens.json
    memberlist:
      abort_if_cluster_join_fails: false
      join_members:
        - tempo-tempo-distributed-gossip-ring
    overrides:
      max_search_bytes_per_trace: 0
      per_tenant_override_config: /conf/overrides.yaml
    server:
      http_listen_port: 3100
      log_level: info
      log_format: json
      grpc_server_max_recv_msg_size: 4.194304e+06
      grpc_server_max_send_msg_size: 4.194304e+06
    storage:
      trace:
        backend: s3
        s3:
          bucket: XXXXX
          endpoint: s3.eu-central-1.amazonaws.com
          region: eu-central-1
        blocklist_poll: 5m
        local:
          path: /var/tempo/traces
        wal:
          path: /var/tempo/wal
        cache: memcached
        memcached:
          consistent_hash: true
          host: tempo-tempo-distributed-memcached
          service: memcached-client
          timeout: 500ms
    

    Helm values

        compactor:
          config:
            compaction:
              block_retention: 1440h
        config: |
          #--- This section is manually inserted (Robin) ---
          query_frontend:
            search:
              {{- if .Values.queryFrontend.extraConfig.max_duration }}
              max_duration: {{ .Values.queryFrontend.extraConfig.max_duration }}
              {{- else }}
              max_duration: 1h1m0s
              {{- end }}
          #-------------------------------------------------
          multitenancy_enabled: false
          search_enabled: {{ .Values.search.enabled }}
          compactor:
            compaction:
              block_retention: {{ .Values.compactor.config.compaction.block_retention }}
            ring:
              kvstore:
                store: memberlist
          distributor:
            ring:
              kvstore:
                store: memberlist
            receivers:
              {{- if  or (.Values.traces.jaeger.thriftCompact) (.Values.traces.jaeger.thriftBinary) (.Values.traces.jaeger.thriftHttp) (.Values.traces.jaeger.grpc) }}
              jaeger:
                protocols:
                  {{- if .Values.traces.jaeger.thriftCompact }}
                  thrift_compact:
                    endpoint: 0.0.0.0:6831
                  {{- end }}
                  {{- if .Values.traces.jaeger.thriftBinary }}
                  thrift_binary:
                    endpoint: 0.0.0.0:6832
                  {{- end }}
                  {{- if .Values.traces.jaeger.thriftHttp }}
                  thrift_http:
                    endpoint: 0.0.0.0:14268
                  {{- end }}
                  {{- if .Values.traces.jaeger.grpc }}
                  grpc:
                    endpoint: 0.0.0.0:14250
                  {{- end }}
              {{- end }}
              {{- if .Values.traces.zipkin}}
              zipkin:
                endpoint: 0.0.0.0:9411
              {{- end }}
              {{- if or (.Values.traces.otlp.http) (.Values.traces.otlp.grpc) }}
              otlp:
                protocols:
                  {{- if .Values.traces.otlp.http }}
                  http:
                    endpoint: 0.0.0.0:55681
                  {{- end }}
                  {{- if .Values.traces.otlp.grpc }}
                  grpc:
                    endpoint: 0.0.0.0:4317
                  {{- end }}
              {{- end }}
              {{- if .Values.traces.opencensus }}
              opencensus:
                endpoint: 0.0.0.0:55678
              {{- end }}
              {{- if .Values.traces.kafka }}
              kafka:
                {{- toYaml .Values.traces.kafka | nindent 6 }}
              {{- end }}
          querier:
            frontend_worker:
              frontend_address: {{ include "tempo.queryFrontendFullname" . }}-discovery:9095
              {{- if .Values.querier.config.frontend_worker.grpc_client_config }}
              grpc_client_config:
                {{- toYaml .Values.querier.config.frontend_worker.grpc_client_config | nindent 6 }}
              {{- end }}
          ingester:
            lifecycler:
              ring:
                replication_factor: 1
                kvstore:
                  store: memberlist
              tokens_file_path: /var/tempo/tokens.json
          memberlist:
            abort_if_cluster_join_fails: false
            join_members:
              - {{ include "tempo.fullname" . }}-gossip-ring
          overrides:
            {{- toYaml .Values.global_overrides | nindent 2 }}
          server:
            http_listen_port: {{ .Values.server.httpListenPort }}
            log_level: {{ .Values.server.logLevel }}
            log_format: {{ .Values.server.logFormat }}
            grpc_server_max_recv_msg_size: {{ .Values.server.grpc_server_max_recv_msg_size }}
            grpc_server_max_send_msg_size: {{ .Values.server.grpc_server_max_send_msg_size }}
          storage:
            trace:
              backend: {{.Values.storage.trace.backend}}
              {{- if eq .Values.storage.trace.backend "gcs"}}
              gcs:
                {{- toYaml .Values.storage.trace.gcs | nindent 6}}
              {{- end}}
              {{- if eq .Values.storage.trace.backend "s3"}}
              s3:
                {{- toYaml .Values.storage.trace.s3 | nindent 6}}
              {{- end}}
              {{- if eq .Values.storage.trace.backend "azure"}}
              azure:
                {{- toYaml .Values.storage.trace.azure | nindent 6}}
              {{- end}}
              blocklist_poll: 5m
              local:
                path: /var/tempo/traces
              wal:
                path: /var/tempo/wal
              cache: memcached
              memcached:
                consistent_hash: true
                host: {{ include "tempo.fullname" . }}-memcached
                service: memcached-client
                timeout: 500ms
        distributor:
          replicas: 1
        gateway:
          enabled: true
        global_overrides:
          max_search_bytes_per_trace: 0
        ingester:
          persistence:
            enabled: true
          replicas: 1
        memcachedExporter:
          enabled: true
        querier:
          replicas: 1
        queryFrontend:
          extraConfig:
            max_duration: "0"
          replicas: 1
        search:
          enabled: true
        server:
          logFormat: json
        serviceAccount:
          annotations:
            eks.amazonaws.com/role-arn: arn:aws:iam::XXXXX
          name: tempo
        serviceMonitor:
          enabled: true
        storage:
          trace:
            backend: s3
            s3:
              bucket: XXXXX
              endpoint: s3.eu-central-1.amazonaws.com
              region: eu-central-1
        traces:
          jaeger:
            grpc: true
            thriftBinary: true
            thriftCompact: true
            thriftHttp: true
          otlp:
            grpc: true
            http: true
    
    opened by mrkwtz 22
  • No Trace details found in Tempo-query UI

    No Trace details found in Tempo-query UI

    Describe the bug Not able to fetch Trace details generated by java client application

    To Reproduce Steps to reproduce the behavior:

    1. Started docker container using the compose file (https://github.com/grafana/tempo/blob/master/example/docker-compose/docker-compose.loki.yaml)
    2. Made some changes to the above file, 2.1. Removed port conflicts between TEMPO & Loki ports 2.2. Changed hostname to localhost (instead as tempo)
    3. Started spring-boot application which generates the traces & spans using Jaeger as given below 2020-11-17 16:01:21.396 INFO 15574 --- [-StreamThread-1] i.j.internal.reporters.LoggingReporter : Span reported: e5f9450a16f7dc89:e5f9450a16f7dc89:0:1 - key-selector 2020-11-17 16:01:21.401 INFO 15574 --- [-StreamThread-1] i.j.internal.reporters.LoggingReporter : Span reported: e5f9450a16f7dc89:2b5c574b5a4f0677:e5f9450a16f7dc89:1 - extract-field-for-aggregation-followed-by-groupby 2020-11-17 16:01:29.958 INFO 15574 --- [-StreamThread-1] i.j.internal.reporters.LoggingReporter : Span reported: e5f9450a16f7dc89:3ea54283e2df9fd7:2b5c574b5a4f0677:1 - perform-aggregator

    Expected behavior When I go to Tempo-Query UI to search for trace id e5f9450a16f7dc89, getting 404.

    Environment:

    • Docker images and Java client applications are running in ubuntu os (4.15.0-115-generic #116-Ubuntu)
    • Java client application (spring-boot) is a non docker application

    Additional Context The sample client application able to produce trace details in Jaeger-Query UI instead. Following arguments provided while running the client application. -Dopentracing.jaeger.udp-sender.host=localhost -Dopentracing.jaeger.udp-sender.port=6831 -Dopentracing.jaeger.const-sampler.decision=true -Dopentracing.jaeger.enabled=true -Dopentracing.jaeger.log-spans=true -Dopentracing.jaeger.service-name=xxx -Dopentracing.jaeger.http-sender.url=http://localhost:14268

    opened by vvmadhu 21
  • Multitenancy does not work with non-GRPC ingestion

    Multitenancy does not work with non-GRPC ingestion

    Describe the bug failed to extract org id when auth_enabled: true

    To Reproduce Steps to reproduce the behavior:

    tempo-local.yaml

    auth_enabled: true
    
    server:
      http_listen_port: 3100
    
    distributor:
      receivers:                           # this configuration will listen on all ports and protocols that tempo is capable of.
        jaeger:                            # the receives all come from the OpenTelemetry collector.  more configuration information can
          protocols:                       # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/master/receiver
            thrift_http:                   #
            grpc:                          # for a production deployment you should only enable the receivers you need!
            thrift_binary:
            thrift_compact:
        zipkin:
        otlp:
          protocols:
            http:
            grpc:
        opencensus:
    
    ingester:
      trace_idle_period: 10s               # the length of time after a trace has not received spans to consider it complete and flush it
      #max_block_bytes: 1_000_000           # cut the head block when it hits this size or ...
      traces_per_block: 1_000_000
      max_block_duration: 5m               #   this much time passes
    
    compactor:
      compaction:
        compaction_window: 1h              # blocks in this time window will be compacted together
        max_compaction_objects: 1000000    # maximum size of compacted blocks
        block_retention: 1h
        compacted_block_retention: 10m
    
    storage:
      trace:
        backend: local                     # backend configuration to use
        wal:
          path: E:\practices\docker\tempo\tempo\wal             # where to store the the wal locally
          bloom_filter_false_positive: .05 # bloom filter false positive rate.  lower values create larger filters but fewer false positives
          index_downsample: 10             # number of traces per index record
        local:
          path: E:\practices\docker\tempo\blocks
        pool:
          max_workers: 100                 # the worker pool mainly drives querying, but is also used for polling the blocklist
          queue_depth: 10000
    
    docker run -d --rm -p 6831:6831/udp -p 9411:9411 -p 3100:3100  --name tempo -v E:\practices\docker\tempo\tempo-local.yaml:/etc/tempo-local.yaml --network docker-tempo  grafana/tempo:0.5.0 --config.file=/etc/tempo-local.yaml
    
    docker run -d --rm -p 16686:16686 -v E:\practices\docker\tempo\tempo-query.yaml:/etc/tempo-query.yaml  --network docker-tempo  grafana/tempo-query:0.5.0  --grpc-storage-plugin.configuration-file=/etc/tempo-query.yaml
    
    curl -X POST http://localhost:9411 -H 'Content-Type: application/json' -H 'X-Scope-OrgID: demo' -d '[{
     "id": "1234",
     "traceId": "0123456789abcdef",
     "timestamp": 1608239395286533,
     "duration": 100000,
     "name": "span from bash!",
     "tags": {
        "http.method": "GET",
        "http.path": "/api"
      },
      "localEndpoint": {
        "serviceName": "shell script"
      }
    }]'
    
    level=error ts=2021-01-31T07:16:33.6168738Z caller=log.go:27 msg="failed to extract org id" err="no org id"
    

    image

    Expected behavior

    Environment:

    • Infrastructure: [ Kubernetes, laptop]
    • Deployment tool: [manual and jenkins]

    Additional Context

    bug 
    opened by mnadeem 18
  • Noisy error log in frontend processor -

    Noisy error log in frontend processor - "transport is closing"

    Tempo backend not ready to receive traffic even after hours

    Probably due to the following

    level=error ts=2021-01-30T15:47:14.273319231Z caller=frontend_processor.go:61 msg="error processing requests" address=127.0.0.1:9095 err="rpc error: code = Unavailable desc = transport is closing"
    

    Full Log

    level=info ts=2021-01-30T15:45:53.905814194Z caller=main.go:89 msg="Starting Tempo" version="(version=c189e23e, branch=master, revision=c189e23e)"
    --
      | level=info ts=2021-01-30T15:45:54.258823037Z caller=server.go:229 http=[::]:3100 grpc=[::]:9095 msg="server listening on addresses"
      | level=info ts=2021-01-30T15:45:54.260118522Z caller=frontend.go:24 msg="creating tripperware in query frontend to shard queries"
      | level=warn ts=2021-01-30T15:45:54.260443879Z caller=modules.go:140 msg="Worker address is empty in single binary mode.  Attempting automatic worker configuration.  If queries are unresponsive consider configuring the worker explicitly." address=127.0.0.1:9095
      | level=info ts=2021-01-30T15:45:54.26054227Z caller=worker.go:112 msg="Starting querier worker connected to query-frontend" frontend=127.0.0.1:9095
      | ts=2021-01-30T15:45:54Z level=info msg="OTel Shim Logger Initialized" component=tempo
      | level=info ts=2021-01-30T15:45:54.261646095Z caller=module_service.go:58 msg=initialising module=memberlist-kv
      | level=info ts=2021-01-30T15:45:54.261675553Z caller=module_service.go:58 msg=initialising module=overrides
      | level=info ts=2021-01-30T15:45:54.261691519Z caller=module_service.go:58 msg=initialising module=store
      | level=info ts=2021-01-30T15:45:54.26172941Z caller=module_service.go:58 msg=initialising module=server
      | level=info ts=2021-01-30T15:45:54.263537314Z caller=module_service.go:58 msg=initialising module=ring
      | level=info ts=2021-01-30T15:45:54.263580354Z caller=module_service.go:58 msg=initialising module=ingester
      | level=info ts=2021-01-30T15:45:54.26361552Z caller=module_service.go:58 msg=initialising module=compactor
      | level=info ts=2021-01-30T15:45:54.263636317Z caller=module_service.go:58 msg=initialising module=query-frontend
      | level=info ts=2021-01-30T15:45:54.26381546Z caller=module_service.go:58 msg=initialising module=querier
      | level=info ts=2021-01-30T15:45:54.263859051Z caller=module_service.go:58 msg=initialising module=distributor
      | level=info ts=2021-01-30T15:45:54.263973679Z caller=worker.go:192 msg="adding connection" addr=127.0.0.1:9095
      | level=info ts=2021-01-30T15:45:54.264012639Z caller=ingester.go:278 msg="beginning wal replay" numBlocks=0
      | level=info ts=2021-01-30T15:45:54.263814539Z caller=compactor.go:95 msg="waiting for compaction ring to settle" waitDuration=1m0s
      | level=info ts=2021-01-30T15:45:54.264449966Z caller=lifecycler.go:521 msg="not loading tokens from file, tokens file path is empty"
      | level=info ts=2021-01-30T15:45:54.26545844Z caller=client.go:242 msg="value is nil" key=collectors/ring index=1
      | level=info ts=2021-01-30T15:45:54.266328848Z caller=lifecycler.go:550 msg="instance not found in ring, adding with no tokens" ring=ingester
      | level=info ts=2021-01-30T15:45:54.266478004Z caller=lifecycler.go:397 msg="auto-joining cluster after timeout" ring=ingester
      | ts=2021-01-30T15:45:54Z level=info msg="No sampling strategies provided, using defaults" component=tempo
      | level=info ts=2021-01-30T15:45:54.267568344Z caller=app.go:212 msg="Tempo started"
      | level=info ts=2021-01-30T15:46:54.265244168Z caller=compactor.go:97 msg="enabling compaction"
      | level=info ts=2021-01-30T15:46:54.265316073Z caller=tempodb.go:278 msg="compaction and retention enabled."
      | level=error ts=2021-01-30T15:47:14.273319231Z caller=frontend_processor.go:61 msg="error processing requests" address=127.0.0.1:9095 err="rpc error: code = Unavailable desc = transport is closing"
      | level=error ts=2021-01-30T15:47:14.273303551Z caller=frontend_processor.go:61 msg="error processing requests" address=127.0.0.1:9095 err="rpc error: code = Unavailable desc = transport is closing"
      | level=error ts=2021-01-30T15:47:14.273303003Z caller=frontend_processor.go:61 msg="error processing requests" address=127.0.0.1:9095 err="rpc error: code = Unavailable desc = transport is closing"
      | level=error ts=2021-01-30T15:47:14.27332587Z caller=frontend_processor.go:61 msg="error processing requests" address=127.0.0.1:9095 err="rpc error: code = Unavailable desc = transport is closing"
      | level=error ts=2021-01-30T15:47:14.273321788Z caller=frontend_processor.go:61 msg="error processing requests" address=127.0.0.1:9095 err="rpc error: code = Unavailable desc = transport is closing"
    
    

    image

    To Reproduce Steps to reproduce the behavior:

    1. Start Tempo (grafana/tempo:latest) , as per the blog https://reachmnadeem.wordpress.com/2021/01/30/distributed-tracing-using-grafana-tempo-jaeger-with-amazon-s3-as-backend-in-openshift-kubernetes/

    Expected behavior

    Tempo backend should be ready to receive traffic

    Environment:

    • Infrastructure: [Kubernetes, Openshift]
    • Deployment tool: [Jenkins Pipeline]

    Additional Context

    opened by mnadeem 17
  • Parquet GC Crash

    Parquet GC Crash

    Describe the bug Almost every two days tempo crashes(mostly when i sleep :/)

    Environment:

    • Infrastructure: Ubuntu 22 in GCP
    • Tempo version is 9be0ae54dcf5393677b58aac3266b140289a533e

    Additional Context

    level=info ts=2022-07-21T02:46:34.415082428Z caller=compactor.go:150 msg="compacting block" block="&{Version:vParquet BlockID:04d6efdf-5d03-4b2e-ad4f-d6de785862f6 MinID:[0 0 0 0 0 0 0 0 0 19 114 70 138 122 196 128] MaxID:[255 253 55 147 253 121 118 45 44 46 196 196 224 202 84 47] TenantID:single-tenant StartTime:2022-07-21 02:41:10 +0000 UTC EndTime:2022-07-21 02:45:44 +0000 UTC TotalObjects:39445 Size:32469406 CompactionLevel:1 Encoding:none IndexPageSize:0 TotalRecords:4 DataEncoding: BloomShardCount:1 FooterSize:19866}"
    level=info ts=2022-07-21T02:46:34.415256519Z caller=compactor.go:150 msg="compacting block" block="&{Version:vParquet BlockID:0cbf760a-f00c-4e19-8638-0c222cc33c7d MinID:[0 0 0 0 0 0 0 0 0 0 72 181 216 239 229 74] MaxID:[255 254 125 251 18 105 239 112 254 110 161 216 181 72 240 92] TenantID:single-tenant StartTime:2022-07-21 02:16:09 +0000 UTC EndTime:2022-07-21 02:20:14 +0000 UTC TotalObjects:51495 Size:58062539 CompactionLevel:1 Encoding:none IndexPageSize:0 TotalRecords:6 DataEncoding: BloomShardCount:1 FooterSize:29007}"
    runtime: marked free object in span 0x7f64cb706368, elemsize=24 freeindex=0 (bad use of unsafe.Pointer? try -d=checkptr)
    0xc000410000 alloc unmarked
    0xc000410018 alloc unmarked
    0xc000410030 alloc unmarked
    0xc000410048 alloc unmarked
    0xc000410060 alloc unmarked
    0xc000410078 alloc marked
    0xc000410090 alloc unmarked
    0xc0004100a8 alloc unmarked
    0xc0004100c0 alloc marked
    0xc0004100d8 alloc unmarked
    0xc0004100f0 alloc marked
    0xc000410108 alloc unmarked
    0xc000410120 alloc unmarked
    0xc000410138 alloc marked
    0xc000410150 free  marked   zombie
    0x000000c000410150:  0x502d72656765614a  0x2e342d6e6f687479
    0x000000c000410160:  0x0000006e6f302e38
    0xc000410168 free  unmarked
    0xc000410180 free  unmarked
    0xc000410198 free  unmarked
    .....LONG OUTPUT....
    0xc000411f68 free  unmarked
    0xc000411f80 free  unmarked
    0xc000411f98 free  unmarked
    0xc000411fb0 free  unmarked
    0xc000411fc8 free  unmarked
    0xc000411fe0 free  unmarked
    fatal error: found pointer to free object
    
    goroutine 4067 [running]:
    runtime.throw({0x1ffd4d0?, 0xc000410168?})
            /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc00cc0b728 sp=0xc00cc0b6f8 pc=0x438731
    runtime.(*mspan).reportZombies(0x7f64cb706368)
            /usr/local/go/src/runtime/mgcsweep.go:776 +0x2e5 fp=0xc00cc0b7a8 sp=0xc00cc0b728 pc=0x427305
    runtime.(*sweepLocked).sweep(0x357bca0?, 0x0)
            /usr/local/go/src/runtime/mgcsweep.go:609 +0x8b2 fp=0xc00cc0b880 sp=0xc00cc0b7a8 pc=0x426d12
    runtime.sweepone()
            /usr/local/go/src/runtime/mgcsweep.go:369 +0xf0 fp=0xc00cc0b8d0 sp=0xc00cc0b880 pc=0x426190
    runtime.GC()
            /usr/local/go/src/runtime/mgc.go:451 +0x7e fp=0xc00cc0b908 sp=0xc00cc0b8d0 pc=0x41bc3e
    github.com/grafana/tempo/tempodb/encoding/vparquet.(*Compactor).Compact(0xc0090ecb00, {0x24e8318, 0xc0000500a0}, {0x24d1360, 0xc00001b360}, {0x24f4540, 0xc0009da830}, 0xc01c74fdc0, {0xc01c74fca0, 0x2, ...})
            /root/tempo/repo/tempo/tempodb/encoding/vparquet/compactor.go:105 +0x7f6 fp=0xc00cc0bb50 sp=0xc00cc0b908 pc=0x1340036
    github.com/grafana/tempo/tempodb.(*readerWriter).compact(0xc0009c6840, {0xc01c74fca0?, 0x2, 0x2}, {0xc0003ec013, 0xd})
            /root/tempo/repo/tempo/tempodb/compactor.go:189 +0x889 fp=0xc00cc0bdc8 sp=0xc00cc0bb50 pc=0x1794ca9
    github.com/grafana/tempo/tempodb.(*readerWriter).doCompaction(0xc0009c6840)
            /root/tempo/repo/tempo/tempodb/compactor.go:113 +0x4fb fp=0xc00cc0bf88 sp=0xc00cc0bdc8 pc=0x1793b1b
    github.com/grafana/tempo/tempodb.(*readerWriter).compactionLoop(0xc0009c6840)
            /root/tempo/repo/tempo/tempodb/compactor.go:72 +0x77 fp=0xc00cc0bfc8 sp=0xc00cc0bf88 pc=0x17935d7
    github.com/grafana/tempo/tempodb.(*readerWriter).EnableCompaction.func1()
            /root/tempo/repo/tempo/tempodb/tempodb.go:385 +0x26 fp=0xc00cc0bfe0 sp=0xc00cc0bfc8 pc=0x1799b26
    runtime.goexit()
            /usr/local/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00cc0bfe8 sp=0xc00cc0bfe0 pc=0x46bac1
    created by github.com/grafana/tempo/tempodb.(*readerWriter).EnableCompaction
            /root/tempo/repo/tempo/tempodb/tempodb.go:385 +0x1c5
    
    
    opened by altanozlu 15
  • python otel jaeger exporter,tempo query not data

    python otel jaeger exporter,tempo query not data

    python opentelemetry instrument install

    pip install opentelemetry-sdk
    pip install opentelemetry-distro
    pip install opentelemetry-exporter-jaeger-proto-grpc
    

    Testing python scripts:

    import time
    
    from opentelemetry import trace
    from opentelemetry.exporter.jaeger.proto import grpc
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.trace.export import (BatchSpanProcessor,ConsoleSpanExporter)
    from opentelemetry.sdk.resources import SERVICE_NAME, Resource
    
    trace.set_tracer_provider(TracerProvider(
        resource=Resource.create({SERVICE_NAME: "my-helloworld-service"})
        ))
    tracer = trace.get_tracer(__name__)
    
    # Create a JaegerExporter to send spans with gRPC
    # If there is no encryption or authentication set `insecure` to True
    # If server has authentication with SSL/TLS you can set the
    # parameter credentials=ChannelCredentials(...) or the environment variable
    # `EXPORTER_JAEGER_CERTIFICATE` with file containing creds.
    
    jaeger_exporter = grpc.JaegerExporter(
        collector_endpoint="localhost:14250",
        insecure=True,
    )
    
    trace.get_tracer_provider().add_span_processor(
        BatchSpanProcessor(ConsoleSpanExporter())
    )
    
    trace.get_tracer_provider().add_span_processor(
            BatchSpanProcessor(jaeger_exporter)
            )
    
    # create some spans for testing
    with tracer.start_as_current_span("foo") as foo:
        time.sleep(0.1)
        foo.set_attribute("my_atribbute", True)
        foo.add_event("event in foo", {"name": "foo1"})
        with tracer.start_as_current_span(
            "bar", links=[trace.Link(foo.get_span_context())]
        ) as bar:
            time.sleep(0.2)
            bar.set_attribute("speed", 100.0)
    
            with tracer.start_as_current_span("baz") as baz:
                time.sleep(0.3)
                baz.set_attribute("name", "mauricio")
    
            time.sleep(0.2)
    
        time.sleep(0.1)
    

    Exporting to Jaeger is normal and can be queried image

    Exports to Tempo cannot be queried,There was no hint image

    grafana tempo query result data:

    {
        "results": {
            "B": {
                "frames": [
                    {
                        "schema": {
                            "name": "Trace",
                            "refId": "B",
                            "meta": {
                                "preferredVisualisationType": "trace"
                            },
                            "fields": [
                                {
                                    "name": "traceID",
                                    "type": "string",
                                    "typeInfo": {
                                        "frame": "string"
                                    }
                                },
                                {
                                    "name": "spanID",
                                    "type": "string",
                                    "typeInfo": {
                                        "frame": "string"
                                    }
                                },
                                {
                                    "name": "parentSpanID",
                                    "type": "string",
                                    "typeInfo": {
                                        "frame": "string"
                                    }
                                },
                                {
                                    "name": "operationName",
                                    "type": "string",
                                    "typeInfo": {
                                        "frame": "string"
                                    }
                                },
                                {
                                    "name": "serviceName",
                                    "type": "string",
                                    "typeInfo": {
                                        "frame": "string"
                                    }
                                },
                                {
                                    "name": "serviceTags",
                                    "type": "string",
                                    "typeInfo": {
                                        "frame": "string"
                                    }
                                },
                                {
                                    "name": "startTime",
                                    "type": "number",
                                    "typeInfo": {
                                        "frame": "float64"
                                    }
                                },
                                {
                                    "name": "duration",
                                    "type": "number",
                                    "typeInfo": {
                                        "frame": "float64"
                                    }
                                },
                                {
                                    "name": "logs",
                                    "type": "string",
                                    "typeInfo": {
                                        "frame": "string"
                                    }
                                },
                                {
                                    "name": "references",
                                    "type": "string",
                                    "typeInfo": {
                                        "frame": "string"
                                    }
                                },
                                {
                                    "name": "tags",
                                    "type": "string",
                                    "typeInfo": {
                                        "frame": "string"
                                    }
                                }
                            ]
                        },
                        "data": {
                            "values": [
                                [],
                                [],
                                [],
                                [],
                                [],
                                [],
                                [],
                                [],
                                [],
                                [],
                                []
                            ]
                        }
                    }
                ]
            }
        }
    }
    
    stale 
    opened by tonycody 15
  • Unhealthy compactors do not leave the ring after rollout

    Unhealthy compactors do not leave the ring after rollout

    Describe the bug When rolling out a new deployment of the compactors, some old instances will remain in the ring as Unhealthy. The only fix seems to be to port-forward one of the compactors and use the /compactor/ring page to "Forget" all the unhealthy instances.

    To Reproduce Steps to reproduce the behavior:

    1. Start Tempo in kubernetes (we have tried the 1.1 release but the issue persists with af34e132a1b8)
    2. Perform a rollout of the compactors

    Expected behavior The compactors from the previous deployment leave the ring correctly

    Environment:

    • Infrastructure: kubernetes
    • Deployment tool: kubectl apply

    Additional Context We do not see this happen all the time. On one of our similarly sized but less busy clusters, old compactors rarely stay in the ring after a rollout. On the busier cluster, we had 14 unhealthy compactors from a previous deployment still in the ring, out of 30 in the deployment.

    Our tempo config for memberlist:

    memberlist:
          abort_if_cluster_join_fails: false
          join_members:
            - tempo-gossip-ring
          dead_node_reclaim_time: 15s
          bind_addr: ["${POD_IP}"]
    

    Sample logs from a compactor that stayed in the ring as unhealthy, from the moment where shutdown was requested:

    level=info ts=2021-10-26T15:26:54.628754652Z caller=signals.go:55 msg=""=== received SIGINT/SIGTERM ===\n*** exiting""level=info ts=2021-10-26T15:26:54.629120545Z caller=lifecycler.go:457 msg=""lifecycler loop() exited gracefully"" ring=compactor
    level=info ts=2021-10-26T15:26:54.629162701Z caller=lifecycler.go:768 msg=""changing instance state from"" old_state=ACTIVE new_state=LEAVING ring=compactor
    level=info ts=2021-10-26T15:26:55.632049563Z caller=lifecycler.go:509 msg=""instance removed from the KV store"" ring=compactor
    level=info ts=2021-10-26T15:26:55.63212068Z caller=module_service.go:96 msg=""module stopped"" module=compactor
    level=info ts=2021-10-26T15:26:55.632206675Z caller=module_service.go:96 msg=""module stopped"" module=overrides
    level=info ts=2021-10-26T15:26:55.632206876Z caller=memberlist_client.go:572 msg=""leaving memberlist cluster""
    level=info ts=2021-10-26T15:26:55.632249778Z caller=module_service.go:96 msg=""module stopped"" module=store
    level=warn ts=2021-10-26T15:27:05.735678769Z caller=memberlist_client.go:587 msg=""broadcast messages left in queue"" count=16 nodes=146
    level=info ts=2021-10-26T15:27:07.192768676Z caller=module_service.go:96 msg=""module stopped"" module=memberlist-kv
    level=info ts=2021-10-26T15:27:07.194383366Z caller=server_service.go:50 msg=""server stopped""
    level=info ts=2021-10-26T15:27:07.194466883Z caller=module_service.go:96 msg=""module stopped"" module=server
    level=info ts=2021-10-26T15:27:07.19452497Z caller=app.go:271 msg=""Tempo stopped""
    level=info ts=2021-10-26T15:27:07.194539895Z caller=main.go:135 msg=""Tempo running""
    

    I was confused by that last Tempo running line but looking at the code in main.go, this seems normal.

    opened by gravelg 15
  • Errors form distributor

    Errors form distributor "context deadline exceeded"

    Describe the bug I have a lot of dropped traces. Errors form distributors

    level=error ts=2022-08-08T12:54:50.237305922Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="context deadline exceeded"
    level=error ts=2022-08-08T12:55:01.377532608Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="context canceled"
    level=error ts=2022-08-08T12:55:20.983607767Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="context deadline exceeded"
    level=error ts=2022-08-08T12:55:34.83259462Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="context deadline exceeded"
    level=error ts=2022-08-08T12:55:38.333709329Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="context deadline exceeded"
    level=error ts=2022-08-08T12:55:41.23312395Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="context canceled"
    level=error ts=2022-08-08T12:55:44.841565822Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="context canceled"
    level=error ts=2022-08-08T12:56:01.016239081Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="context canceled"
    

    To Reproduce Steps to reproduce the behaviour:

    1. Start Tempo from helm chart tempo-distributed 0.20.3 and Grafana agent 0.25.1
    2. Perform Operations (Write)

    Tempo configuration

    multitenancy_enabled: false
    search_enabled: true
    compactor:
      compaction:
        block_retention: 700h
        iterator_buffer_size: 800
      ring:
        kvstore:
          store: memberlist
    distributor:
      ring:
        kvstore:
          store: memberlist
      receivers:
        jaeger:
          protocols:
            grpc:
              endpoint: 0.0.0.0:14250
            thrift_http:
              endpoint: 0.0.0.0:14268
    querier:
      frontend_worker:
        frontend_address: tempo-query-frontend-discovery:9095
    ingester:
      trace_idle_period: 1m
      lifecycler:
        ring:
          replication_factor: 2
          kvstore:
            store: memberlist
        tokens_file_path: /var/tempo/tokens.json
    memberlist:
      abort_if_cluster_join_fails: false
      join_members:
        - tempo-gossip-ring
    overrides:
      max_search_bytes_per_trace: 0
      ingestion_burst_size_bytes: 60000000
      ingestion_rate_limit_bytes: 50000000
      max_bytes_per_trace: 9000000
    server:
      http_listen_port: 3100
      log_level: info
      log_format: logfmt
      grpc_server_max_recv_msg_size: 4194304
      grpc_server_max_send_msg_size: 4194304
    storage:
      trace:
        backend: s3
        s3:
          bucket: m2-tempo-prod
          region: us-central1
          access_key: ******
          secret_key: ******
        blocklist_poll: 5m
        local:
          path: /var/tempo/traces
        wal:
          path: /var/tempo/wal
        cache: memcached
        memcached:
          consistent_hash: true
          host: tempo-memcached
          service: memcached-client
          timeout: 500ms
    metrics_generator:
      ring:
        kvstore:
          store: memberlist
      storage:
        path: /var/tempo/wal
    

    ** Grafana agent remote write **

    traces:
          configs:
            - name: jaeger
              receivers:
                ...
              batch:
                send_batch_size: 8192
                timeout: 20s
              remote_write:
                - endpoint: tempo-distributor.observability.svc.cluster.local:14250
                  insecure: true
                  insecure_skip_verify: true
                  protocol: grpc
                  format: jaeger
                  sending_queue:
                    queue_size: 5000
                  retry_on_failure:
                    max_elapsed_time: 30s
    

    Expected behaviour Distributors don't drop traces

    Environment:

    • Infrastructure: Kubernetes
    • Deployment tool: helm

    Additional Context Metric tempo_discarded_spans_total

    tempo_discarded_spans_total{container="distributor", endpoint="http", instance="10.107.156.36:3100", job="tempo-distributor", namespace="observability", pod="tempo-distributor-75db5b54cb-k6tl2", reason="internal_error", service="tempo-distributor", tenant="single-tenant"}
    
    stale 
    opened by javdet 14
  • Bump github.com/thanos-io/thanos from 0.24.0 to 0.27.0

    Bump github.com/thanos-io/thanos from 0.24.0 to 0.27.0

    Bumps github.com/thanos-io/thanos from 0.24.0 to 0.27.0.

    Release notes

    Sourced from github.com/thanos-io/thanos's releases.

    v0.27.0

    What's Changed

    Fixed

    • #5339 Receive: When running in routerOnly mode, an interupt (SIGINT) will now exit the process.
    • #5357 Store: Fix groupcache handling by making sure slashes in the cache's key are not getting interpreted by the router anymore.
    • #5427 Receive: Fix Ketama hashring replication consistency. With the Ketama hashring, replication is currently handled by choosing subsequent nodes in the list of endpoints. This can lead to existing nodes getting more series when the hashring is scaled. This change makes replication to choose subsequent nodes from the hashring which should not create new series in old nodes when the hashring is scaled. Ketama hashring can be used by setting --receive.hashrings-algorithm=ketama.

    Added

    • #5337 Thanos Object Store: Add the prefix option to buckets.
    • #5409 S3: Add option to force DNS style lookup.
    • #5352 Cache: Add cache metrics to groupcache: thanos_cache_groupcache_bytes, thanos_cache_groupcache_evictions_total, thanos_cache_groupcache_items and thanos_cache_groupcache_max_bytes.
    • #5391 Receive: Add relabeling support with the flag --receive.relabel-config-file or alternatively --receive.relabel-config.
    • #5408 Receive: Add support for consistent hashrings. The flag --receive.hashrings-algorithm uses default hashmod but can also be set to ketama to leverage consistent hashrings. More technical information can be found here: https://dgryski.medium.com/consistent-hashing-algorithmic-tradeoffs-ef6b8e2fcae8.
    • #5402 Receive: Implement api/v1/status/tsdb.

    Changed

    New Contributors

    Full Changelog: https://github.com/thanos-io/thanos/compare/v0.26.0...v0.27.0

    v0.27.0-rc.0

    What's Changed

    Fixed

    • #5339 Receive: When running in routerOnly mode, an interupt (SIGINT) will now exit the process.
    • #5357 Store: Fix groupcache handling by making sure slashes in the cache's key are not getting interpreted by the router anymore.
    • #5427 Receive: Fix Ketama hashring replication consistency. With the Ketama hashring, replication is currently handled by choosing subsequent nodes in the list of endpoints. This can lead to existing nodes getting more series when the hashring is scaled. This change makes replication to choose subsequent nodes from the hashring which should not create new series in old nodes when the hashring is scaled. Ketama hashring can be used by setting --receive.hashrings-algorithm=ketama

    Added

    • #5337 Thanos Object Store: Add the prefix option to buckets.

    ... (truncated)

    Changelog

    Sourced from github.com/thanos-io/thanos's changelog.

    v0.27.0 - 2022.07.05

    Fixed

    • #5339 Receive: Fix deadlock on interrupt in routerOnly mode.
    • #5357 Store: fix groupcache handling of slashes.
    • #5427 Receive: Fix Ketama hashring replication consistency.

    Added

    • #5337 Thanos Object Store: Add the prefix option to buckets.
    • #5409 S3: Add option to force DNS style lookup.
    • #5352 Cache: Add cache metrics to groupcache.
    • #5391 Receive: Add relabeling support.
    • #5408 Receive: Add support for consistent hashrings.
    • #5391 Receive: Implement api/v1/status/tsdb.

    Changed

    Removed

    • #5426 Compactor: Remove an unused flag --block-sync-concurrency.

    v0.26.0 - 2022.05.05

    Fixed

    • #5281 Blocks: Use correct separators for filesystem paths and object storage paths respectively.
    • #5300 Query: Ignore cache on queries with deduplication off.
    • #5324 Reloader: Force trigger reload when config rollbacked.

    Added

    • #5220 Query Frontend: Add --query-frontend.forward-header flag, forward headers to downstream querier.
    • #5250 Querier: Expose Query and QueryRange APIs through GRPC.
    • #5290 Add support for ppc64le.

    Changed

    • #4838 Tracing: Chanced client for Stackdriver which deprecated "type: STACKDRIVER" in tracing YAML configuration. Use type: GOOGLE_CLOUD instead (STACKDRIVER type remains for backward compatibility).
    • #5170 All: Upgraded the TLS version from TLS1.2 to TLS1.3.
    • #5205 Rule: Add ruler labels as external labels in stateless ruler mode.
    • #5206 Cache: Add timeout for groupcache's fetch operation.
    • #5218 Tools: Thanos tools bucket downsample is now running continously.
    • #5231 Tools: Bucket verify tool ignores blocks with deletion markers.
    • #5244 Query: Promote negative offset and @ modifier to stable features as per Prometheus #10121.
    • #5255 InfoAPI: Set store API unavailable when stores are not ready.
    • #5256 Update Prometheus deps v2.33.5.
    • #5271 DNS: Fix miekgdns resolver to work with CNAME records too.

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies go 
    opened by dependabot[bot] 14
  • Ability to remove dimension from intrinsicDimensions

    Ability to remove dimension from intrinsicDimensions

    Is your feature request related to a problem? Please describe. We had a lot of prometheus OOM crash lately now i've checked and the reason is status_message field and as far as i see there is no option to disable intrinsicDimensions response from tsdb analyze:

    Highest cardinality labels:
    41449 status_message
    

    Describe the solution you'd like I'd like to be able to change intrinsicDimensions

    opened by altanozlu 0
  • Log more information when a trace is too large to compact

    Log more information when a trace is too large to compact

    When a trace exceeds max_bytes_per_trace the compactor will drop spans. Currently this is tracked with a metric but it would be nice to log the trace ID and/or spans that were dropped. Need to decide on log level, a case could be made for warning, info, or debug.

    This looks straightforward to do and adding here and here would cover both v2 and parquet formats (and should be future-proof for other formats).

    enhancement good first issue 
    opened by mdisibio 0
  • Flaky wait until timeout

    Flaky wait until timeout

    Describe the bug I call https://tempo-gateway/api/search/tags multiple times some answer after <200ms, other takes the full query timeout time.

    I could fix it by completely shutdown all micro services an restart them. The issue come back after some time (some restarts of components).

    To Reproduce No idea.

    Steps to reproduce the behavior:

    1. Tested on main-de45a61 and 1.5 with parquet activated or not.
    level=info ts=2022-12-02T10:51:45.33060104Z caller=handler.go:78 tenant=single-tenant method=GET traceID= url=/api/search/tags duration=3.163304418s response_size=0 status=500 err="rpc error: code = Code(499) desc = context canceled"
    

    Expected behavior constant request time

    Environment:

    • Infrastructure: Kubernetes
    • Deployment tool: helm: tempo-distributed

    Additional Context

    opened by farodin91 1
  • [DOC] Added TraceQL draft doc

    [DOC] Added TraceQL draft doc

    What this PR does:

    Adds the first draft of the TraceQL documentation.

    Which issue(s) this PR fixes: Fixes #TS-140

    Checklist

    • [ ] Tests updated
    • [X] Documentation added
    • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
    type/docs 
    opened by knylander-grafana 0
  • Getting 500 Internal server error

    Getting 500 Internal server error

    Hi Tempo team,

    We are running the tempo rpm locally on our centos server and from few days we are getting 500 server error for all our traces.

    image

    I see you have some documentation regarding this https://grafana.com/docs/tempo/latest/troubleshooting/bad-blocks/

    But I am not sure how to find/delete the corrupted blocks. I am fairly new to tempo, so any input on how to fix this will be great.

    Below is our config

    image

    Thanks,

    opened by vikram0602 1
  • add ingestion slack latency

    add ingestion slack latency

    What this PR does: Added a new metric to keep track of the ingestion slack time for metrics generator.

    Which issue(s) this PR fixes: Fixes #

    Checklist

    • [ ] Tests updated
    • [ ] Documentation added
    • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
    opened by ie-pham 0
Releases(v1.5.0)
  • v1.5.0(Aug 17, 2022)

    Breaking Changes

    • (#1478) In order to build advanced visualization features into Grafana we have decided to change our spanmetric names to match OTel conventions. This way any functionality added to Grafana will work whether you use Tempo, Grafana Agent or the OTel Collector to generate metrics. Details in the span metrics documentation.
    • (#1556) Jsonnet users will need to specify ephemeral storage requests and limits for the metrics generator.
    • (#1481) Anonymous usage reporting has been added. Distributors and metrics generators will now require permissions to object storage equivalent to compactors and ingesters. This feature is enabled by default but can be disabled easily.
    • (#1558) Deprecated metrics tempodb_(gcs|s3|azure)_request_duration_seconds have been removed in favor of tempodb_backend_request_duration_seconds.

    Changes

    • [CHANGE] metrics-generator: Changed added metric label instance to __metrics_gen_instance to reduce collisions with custom dimensions. #1439 (@joe-elliott)
    • [CHANGE] Don't enforce max_bytes_per_tag_values_query when set to 0. #1447 (@joe-elliott)
    • [CHANGE] Add new querier service in deployment jsonnet to serve /status endpoint. #1474 (@annanay25)
    • [CHANGE] Swapped out Google Cloud Functions serverless docs and build for Google Cloud Run. #1483 (@joe-elliott)
    • [CHANGE] BREAKING CHANGE Change spanmetrics metric names and labels to match OTel conventions. #1478 (@mapno) Old metric names:
    traces_spanmetrics_duration_seconds_{sum,count,bucket}
    

    New metric names:

    traces_spanmetrics_latency_{sum,count,bucket}
    

    Additionally, default label span_status is renamed to status_code.

    • [CHANGE] Update to Go 1.18 #1504 (@annanay25)
    • [CHANGE] Change tag/value lookups to return partial results when reaching response size limit instead of failing #1517 (@mdisibio)
    • [CHANGE] Change search to be case-sensitive #1547 (@mdisibio)
    • [CHANGE] Relax Hedged request defaults for external endpoints. #1566 (@joe-elliott)
      querier:
        search:
          external_hedge_requests_at: 4s    -> 8s
          external_hedge_requests_up_to: 3  -> 2
      
    • [CHANGE] BREAKING CHANGE Include emptyDir for metrics generator wal storage in jsonnet #1556 (@zalegrala) Jsonnet users will now need to specify a storage request and limit for the generator wal.
        _config+:: {
          metrics_generator+: {
            ephemeral_storage_request_size: '10Gi',
            ephemeral_storage_limit_size: '11Gi',
          },
        }
    
    • [CHANGE] Two additional latency buckets added to the default settings for generated spanmetrics. Note that this will increase cardinality when using the defaults. #1593 (@fredr)
    • [CHANGE] Mark log_received_traces as deprecated. New flag is log_received_spans. Extend distributor spans logger with optional features to include span attributes and a filter by error status. #1465 (@faustodavid)

    Features

    • [FEATURE] Add parquet block format #1479 #1531 #1564 (@annanay25, @mdisibio)
    • [FEATURE] Add anonymous usage reporting, enabled by default. #1481 (@zalegrala) BREAKING CHANGE As part of the usage stats inclusion, the distributor will also require access to the store. This is required so the distirbutor can know which cluster it should be reporting membership of.
    • [FEATURE] Include messaging systems and databases in service graphs. #1576 (@kvrhdn)

    Enhancements

    • [ENHANCEMENT] Added the ability to have a per tenant max search duration. #1421 (@joe-elliott)
    • [ENHANCEMENT] metrics-generator: expose max_active_series as a metric #1471 (@kvrhdn)
    • [ENHANCEMENT] Azure Backend: Add support for authentication with Managed Identities. #1457 (@joe-elliott)
    • [ENHANCEMENT] Add metric to track feature enablement #1459 (@zalegrala)
    • [ENHANCEMENT] Added s3 config option insecure_skip_verify #1470 (@zalegrala)
    • [ENHANCEMENT] Added polling option to reduce issues in Azure blocklist_poll_jitter_ms #1518 (@joe-elliott)
    • [ENHANCEMENT] Add a config to query single ingester instance based on trace id hash for Trace By ID API. (1484)[https://github.com/grafana/tempo/pull/1484] (@sagarwala, @bikashmishra100, @ashwinidulams)
    • [ENHANCEMENT] Add blocklist metrics for total backend objects and total backend bytes #1519 (@ie-pham)
    • [ENHANCEMENT] Adds tempo_querier_external_endpoint_hedged_roundtrips_total to count the total hedged requests #1558 (@joe-elliott) BREAKING CHANGE Removed deprecated metrics tempodb_(gcs|s3|azure)_request_duration_seconds in favor of tempodb_backend_request_duration_seconds. These metrics have been deprecated since v1.1.
    • [ENHANCEMENT] Add tags option for s3 backends. This allows new objects to be written with the configured tags. #1442 (@stevenbrookes)
    • [ENHANCEMENT] metrics-generator: support per-tenant processor configuration #1434 (@kvrhdn)
    • [ENHANCEMENT] Include rollout dashboard #1456 (@zalegrala)
    • [ENHANCEMENT] Add SentinelPassword configuration for Redis #1463 (@zalegrala)
    • [ENHANCEMENT] Add support for time picker in jaeger query plugin. #1631 (@rubenvp8510)

    Bugfixes

    • [BUGFIX] Fix nil pointer panic when the trace by id path errors. #1441 (@joe-elliott)
    • [BUGFIX] Update tempo microservices Helm values example which missed the 'enabled' key for thriftHttp. #1472 (@hajowieland)
    • [BUGFIX] Fix race condition in forwarder overrides loop. 1468 (@mapno)
    • [BUGFIX] Fix v2 backend check on span name to be substring #1538 (@mdisibio)
    • [BUGFIX] Fix wal check on span name to be substring #1548 (@mdisibio)
    • [BUGFIX] Prevent ingester panic "cannot grow buffer" #1258 (@mdisibio)
    • [BUGFIX] metrics-generator: do not remove x-scope-orgid header in single tenant modus #1554 (@kvrhdn)
    • [BUGFIX] Fixed issue where backend does not support root.name and root.service.name #1589 (@kvrhdn)
    • [BUGFIX] Fixed ingester to continue starting up after block replay error #1603 (@mdisibio)
    Source code(tar.gz)
    Source code(zip)
    SHA256SUMS(1.12 KB)
    tempo_1.5.0_darwin_amd64.tar.gz(35.44 MB)
    tempo_1.5.0_darwin_arm64.tar.gz(34.22 MB)
    tempo_1.5.0_linux_amd64.deb(35.36 MB)
    tempo_1.5.0_linux_amd64.rpm(35.35 MB)
    tempo_1.5.0_linux_amd64.tar.gz(33.88 MB)
    tempo_1.5.0_linux_arm64.deb(32.21 MB)
    tempo_1.5.0_linux_arm64.rpm(32.22 MB)
    tempo_1.5.0_linux_arm64.tar.gz(30.99 MB)
    tempo_1.5.0_linux_armv6.deb(33.49 MB)
    tempo_1.5.0_linux_armv6.rpm(33.53 MB)
    tempo_1.5.0_linux_armv6.tar.gz(32.30 MB)
    tempo_1.5.0_windows_amd64.tar.gz(34.07 MB)
  • v1.5.0-rc.2(Aug 12, 2022)

  • v1.5.0-rc.0(Aug 4, 2022)

    Breaking Changes

    • (#1478) In order to build advanced visualization features into Grafana we have decided to change our spanmetric names to match OTel conventions. This way any functionality added to Grafana will work whether you use Tempo, Grafana Agent or the OTel Collector to generate metrics. Details below.
    • (#1556) Jsonnet users will need to specify ephemeral storage requests and limits for the metrics generator.
    • (#1481) Anonymous usage reporting has been added. Distributors and metrics generators will now require permissions to object storage equivalent to compactors and ingesters. This feature is enabled by default but can be disabled easily.
    • (#1558) Deprecated metrics tempodb_(gcs|s3|azure)_request_duration_seconds have been removed in favor of tempodb_backend_request_duration_seconds.

    Changes

    • [CHANGE] metrics-generator: Changed added metric label instance to __metrics_gen_instance to reduce collisions with custom dimensions. #1439 (@joe-elliott)
    • [CHANGE] Don't enforce max_bytes_per_tag_values_query when set to 0. #1447 (@joe-elliott)
    • [CHANGE] Add new querier service in deployment jsonnet to serve /status endpoint. #1474 (@annanay25)
    • [CHANGE] Swapped out Google Cloud Functions serverless docs and build for Google Cloud Run. #1483 (@joe-elliott)
    • [CHANGE] BREAKING CHANGE Change spanmetrics metric names and labels to match OTel conventions. #1478 (@mapno) Old metric names:
    traces_spanmetrics_duration_seconds_{sum,count,bucket}
    

    New metric names:

    traces_spanmetrics_latency_{sum,count,bucket}
    

    Additionally, default label span_status is renamed to status_code.

    • [CHANGE] Update to Go 1.18 #1504 (@annanay25)
    • [CHANGE] Change tag/value lookups to return partial results when reaching response size limit instead of failing #1517 (@mdisibio)
    • [CHANGE] Change search to be case-sensitive #1547 (@mdisibio)
    • [CHANGE] Relax Hedged request defaults for external endpoints. #1566 (@joe-elliott)
      querier:
        search:
          external_hedge_requests_at: 4s    -> 8s
          external_hedge_requests_up_to: 3  -> 2
      
    • [CHANGE] BREAKING CHANGE Include emptyDir for metrics generator wal storage in jsonnet #1556 (@zalegrala) Jsonnet users will now need to specify a storage request and limit for the generator wal.
        _config+:: {
          metrics_generator+: {
            ephemeral_storage_request_size: '10Gi',
            ephemeral_storage_limit_size: '11Gi',
          },
        }
    
    • [CHANGE] Two additional latency buckets added to the default settings for generated spanmetrics. Note that this will increase cardinality when using the defaults. #1593 (@fredr)
    • [CHANGE] Mark log_received_traces as deprecated. New flag is log_received_spans. Extend distributor spans logger with optional features to include span attributes and a filter by error status. #1465 (@faustodavid)

    Features

    • [FEATURE] Add parquet block format #1479 #1531 #1564 (@annanay25, @mdisibio)
    • [FEATURE] Add anonymous usage reporting, enabled by default. #1481 (@zalegrala) BREAKING CHANGE As part of the usage stats inclusion, the distributor will also require access to the store. This is required so the distirbutor can know which cluster it should be reporting membership of.
    • [FEATURE] Include messaging systems and databases in service graphs. #1576 (@kvrhdn)

    Enhancements

    • [ENHANCEMENT] Added the ability to have a per tenant max search duration. #1421 (@joe-elliott)
    • [ENHANCEMENT] metrics-generator: expose max_active_series as a metric #1471 (@kvrhdn)
    • [ENHANCEMENT] Azure Backend: Add support for authentication with Managed Identities. #1457 (@joe-elliott)
    • [ENHANCEMENT] Add metric to track feature enablement #1459 (@zalegrala)
    • [ENHANCEMENT] Added s3 config option insecure_skip_verify #1470 (@zalegrala)
    • [ENHANCEMENT] Added polling option to reduce issues in Azure blocklist_poll_jitter_ms #1518 (@joe-elliott)
    • [ENHANCEMENT] Add a config to query single ingester instance based on trace id hash for Trace By ID API. (1484)[https://github.com/grafana/tempo/pull/1484] (@sagarwala, @bikashmishra100, @ashwinidulams)
    • [ENHANCEMENT] Add blocklist metrics for total backend objects and total backend bytes #1519 (@ie-pham)
    • [ENHANCEMENT] Adds tempo_querier_external_endpoint_hedged_roundtrips_total to count the total hedged requests #1558 (@joe-elliott) BREAKING CHANGE Removed deprecated metrics tempodb_(gcs|s3|azure)_request_duration_seconds in favor of tempodb_backend_request_duration_seconds. These metrics have been deprecated since v1.1.
    • [ENHANCEMENT] Add tags option for s3 backends. This allows new objects to be written with the configured tags. #1442 (@stevenbrookes)
    • [ENHANCEMENT] metrics-generator: support per-tenant processor configuration #1434 (@kvrhdn)
    • [ENHANCEMENT] Include rollout dashboard #1456 (@zalegrala)
    • [ENHANCEMENT] Add SentinelPassword configuration for Redis #1463 (@zalegrala)

    Bugfixes

    • [BUGFIX] Fix nil pointer panic when the trace by id path errors. #1441 (@joe-elliott)
    • [BUGFIX] Update tempo microservices Helm values example which missed the 'enabled' key for thriftHttp. #1472 (@hajowieland)
    • [BUGFIX] Fix race condition in forwarder overrides loop. 1468 (@mapno)
    • [BUGFIX] Fix v2 backend check on span name to be substring #1538 (@mdisibio)
    • [BUGFIX] Fix wal check on span name to be substring #1548 (@mdisibio)
    • [BUGFIX] Prevent ingester panic "cannot grow buffer" #1258 (@mdisibio)
    • [BUGFIX] metrics-generator: do not remove x-scope-orgid header in single tenant modus #1554 (@kvrhdn)
    • [BUGFIX] Fixed issue where backend does not support root.name and root.service.name #1589 (@kvrhdn)
    • [BUGFIX] Fixed ingester to continue starting up after block replay error #1603 (@mdisibio)
    Source code(tar.gz)
    Source code(zip)
    SHA256SUMS(1.18 KB)
    tempo_1.5.0-rc.0_darwin_amd64.tar.gz(35.39 MB)
    tempo_1.5.0-rc.0_darwin_arm64.tar.gz(34.18 MB)
    tempo_1.5.0-rc.0_linux_amd64.deb(35.24 MB)
    tempo_1.5.0-rc.0_linux_amd64.rpm(35.30 MB)
    tempo_1.5.0-rc.0_linux_amd64.tar.gz(33.83 MB)
    tempo_1.5.0-rc.0_linux_arm64.deb(32.19 MB)
    tempo_1.5.0-rc.0_linux_arm64.rpm(32.25 MB)
    tempo_1.5.0-rc.0_linux_arm64.tar.gz(30.94 MB)
    tempo_1.5.0-rc.0_linux_armv6.deb(33.45 MB)
    tempo_1.5.0-rc.0_linux_armv6.rpm(33.45 MB)
    tempo_1.5.0-rc.0_linux_armv6.tar.gz(32.26 MB)
    tempo_1.5.0-rc.0_windows_amd64.tar.gz(34.03 MB)
  • v1.4.1(May 5, 2022)

  • v1.4.0(Apr 28, 2022)

    Breaking changes

    • After this rollout the distributors will use a new API endpoint on the ingesters to push spans. Please rollout all ingesters before rolling the distributors to prevent downtime. Also, during this period, the ingesters will use considerably more resources and should be scaled up (or incoming traffic should be heavily throttled). Once all distributors and ingesters have rolled performance will return to normal. Internally we have observed ~1.5x CPU load on the ingesters during the rollout. #1227 (@joe-elliott)
    • Querier options related to search have moved under a search block: #1350 (@joe-elliott)
      querier:
       search_query_timeout: 30s
       search_external_endpoints: []
       search_prefer_self: 2
      

      becomes

      querier:
        search:
          query_timeout: 30s
          prefer_self: 2
          external_endpoints: []
      
    • Dropped tempo-search-retention-duration parameter on the vulture. #1297 (@joe-elliott)

    New Features and Enhancements

    • [FEATURE] Added metrics-generator: an optional components to generate metrics from ingested traces #1282 (@mapno, @kvrhdn)
    • [ENHANCEMENT] v2 object encoding added. This encoding adds a start/end timestamp to every record to reduce proto marshalling and increase search speed. #1227 (@joe-elliott)
    • [ENHANCEMENT] Allow the compaction cycle to be configurable with a default of 30 seconds #1335 (@willdot)
    • [ENHANCEMENT] Add new config options for setting GCS metadata on new objects #1368 (@zalegrala)
    • [ENHANCEMENT] Add new scaling alerts to the tempo-mixin #1292 (@mapno)
    • [ENHANCEMENT] Improve serverless handler error messages #1305 (@joe-elliott)
    • [ENHANCEMENT] Added a configuration option search_prefer_self to allow the queriers to do some work while also leveraging serverless in search. #1307 (@joe-elliott)
    • [ENHANCEMENT] Make trace combination/compaction more efficient #1291 (@mdisibio)
    • [ENHANCEMENT] Add Content-Type headers to query-frontend paths #1306 (@wperron)
    • [ENHANCEMENT] Partially persist traces that exceed max_bytes_per_trace during compaction #1317 (@joe-elliott)
    • [ENHANCEMENT] Make search respect per tenant max_bytes_per_trace and added skippedTraces to returned search metrics. #1318 (@joe-elliott)
    • [ENHANCEMENT] Added tenant ID (instance ID) to trace too large message. #1385 (@cristiangsp)
    • [ENHANCEMENT] Add a startTime and endTime parameter to the Trace by ID Tempo Query API to improve query performance #1388 (@sagarwala, @bikashmishra100, @ashwinidulams)
    • [ENHANCEMENT] Add hedging to queries to external endpoints. #1350 (@joe-elliott) New config options and defaults:
      querier:
        search:
          external_hedge_requests_at: 5s
          external_hedge_requests_up_to: 3
      
    • [ENHANCEMENT] Add a startTime and endTime parameter to the Trace by ID Tempo Query API to improve query performance #1388 (@sagarwala, @bikashmishra100, @ashwinidulams)

    Bug Fixes

    • [BUGFIX] Correct issue where Azure "Blob Not Found" errors were sometimes not handled correctly #1390 (@joe-elliott)
    • [BUGFIX] Enable compaction and retention in Tanka single-binary #1352 (@irizzant)
    • [BUGFIX] Fixed issue when query-frontend doesn't log request details when request is cancelled #1136 (@adityapwr)
    • [BUGFIX] Update OTLP port in examples (docker-compose & kubernetes) from legacy ports (55680/55681) to new ports (4317/4318) #1294 (@mapno)
    • [BUGFIX] Fixes min/max time on blocks to be based on span times instead of ingestion time. #1314 (@joe-elliott)
      • Includes new configuration option to restrict the amount of slack around now to update the block start/end time. #1332 (@joe-elliott)
        storage:
          trace:
            wal:
              ingestion_time_range_slack: 2m0s
        
      • Includes a new metric to determine how often this range is exceeded: tempo_warnings_total{reason="outside_ingestion_time_slack"}
    • [BUGFIX] Prevent data race / ingester crash during searching by trace id by using xxhash instance as a local variable. #1387 (@bikashmishra100, @sagarwala, @ashwinidulams)
    • [BUGFIX] Fix spurious "failed to mark block compacted during retention" errors #1372 (@mdisibio)
    • [BUGFIX] Fix error message "Writer is closed" by resetting compression writer correctly on the error path. #1379 (@annanay25)

    Other Changes

    • [CHANGE] Vulture now exercises search at any point during the block retention to test full backend search. #1297 (@joe-elliott)
    • [CHANGE] Updated storage.trace.pool.queue_depth default from 200->10000. #1345 (@joe-elliott)
    • [CHANGE] Updated flags -storage.trace.azure.storage-account-name and -storage.trace.s3.access_key to no longer to be considered as secrets #1356 (@simonswine)
    Source code(tar.gz)
    Source code(zip)
    tempo_1.4.0_checksums.txt(97 bytes)
    tempo_1.4.0_linux_amd64.tar.gz(29.82 MB)
  • v1.4.0-rc.0(Apr 19, 2022)

    Breaking changes

    • After this rollout the distributors will use a new API endpoing on the ingesters to push spans. Please rollout all ingesters before rolling the distributors to prevent downtime. Also, during this period, the ingesters will use considerably more resources and should be scaled up (or incoming traffic should be heavily throttled). Once all distributors and ingesters have rolled performance will return to normal. Internally we have observed ~1.5x CPU load on the ingesters during the rollout. #1227 (@joe-elliott)
    • Querier options related to search have moved under a search block: #1350 (@joe-elliott)
      querier:
       search_query_timeout: 30s
       search_external_endpoints: []
       search_prefer_self: 2
      

      becomes

      querier:
        search:
          query_timeout: 30s
          prefer_self: 2
          external_endpoints: []
      
    • Dropped tempo-search-retention-duration parameter on the vulture. #1297 (@joe-elliott)

    New Features and Enhancements

    • [FEATURE] Added metrics-generator: an optional components to generate metrics from ingested traces #1282 (@mapno, @kvrhdn)
    • [ENHANCEMENT] v2 object encoding added. This encoding adds a start/end timestamp to every record to reduce proto marshalling and increase search speed. #1227 (@joe-elliott)
    • [ENHANCEMENT] Allow the compaction cycle to be configurable with a default of 30 seconds #1335 (@willdot)
    • [ENHANCEMENT] Add new config options for setting GCS metadata on new objects #1368 (@zalegrala)
    • [ENHANCEMENT] Add new scaling alerts to the tempo-mixin #1292 (@mapno)
    • [ENHANCEMENT] Improve serverless handler error messages #1305 (@joe-elliott)
    • [ENHANCEMENT] Added a configuration option search_prefer_self to allow the queriers to do some work while also leveraging serverless in search. #1307 (@joe-elliott)
    • [ENHANCEMENT] Make trace combination/compaction more efficient #1291 (@mdisibio)
    • [ENHANCEMENT] Add Content-Type headers to query-frontend paths #1306 (@wperron)
    • [ENHANCEMENT] Partially persist traces that exceed max_bytes_per_trace during compaction #1317 (@joe-elliott)
    • [ENHANCEMENT] Make search respect per tenant max_bytes_per_trace and added skippedTraces to returned search metrics. #1318 (@joe-elliott)
    • [ENHANCEMENT] Added tenant ID (instance ID) to trace too large message. #1385 (@cristiangsp)
    • [ENHANCEMENT] Add a startTime and endTime parameter to the Trace by ID Tempo Query API to improve query performance #1388 (@sagarwala, @bikashmishra100, @ashwinidulams)
    • [ENHANCEMENT] Add hedging to queries to external endpoints. #1350 (@joe-elliott) New config options and defaults:
      querier:
        search:
          external_hedge_requests_at: 5s
          external_hedge_requests_up_to: 3
      

    Bug Fixes

    • [BUGFIX] Correct issue where Azure "Blob Not Found" errors were sometimes not handled correctly #1390 (@joe-elliott)
    • [BUGFIX] Enable compaction and retention in Tanka single-binary #1352 (@irizzant)
    • [BUGFIX] Fixed issue when query-frontend doesn't log request details when request is cancelled #1136 (@adityapwr)
    • [BUGFIX] Update OTLP port in examples (docker-compose & kubernetes) from legacy ports (55680/55681) to new ports (4317/4318) #1294 (@mapno)
    • [BUGFIX] Fixes min/max time on blocks to be based on span times instead of ingestion time. #1314 (@joe-elliott)
      • Includes new configuration option to restrict the amount of slack around now to update the block start/end time. #1332 (@joe-elliott)
        storage:
          trace:
            wal:
              ingestion_time_range_slack: 2m0s
        
      • Includes a new metric to determine how often this range is exceeded: tempo_warnings_total{reason="outside_ingestion_time_slack"}
    • [BUGFIX] Prevent data race / ingester crash during searching by trace id by using xxhash instance as a local variable. #1387 (@bikashmishra100, @sagarwala, @ashwinidulams)
    • [BUGFIX] Fix spurious "failed to mark block compacted during retention" errors #1372 (@mdisibio)
    • [BUGFIX] Fix error message "Writer is closed" by resetting compression writer correctly on the error path. #1379 (@annanay25)

    Other Changes

    • [CHANGE] Vulture now exercises search at any point during the block retention to test full backend search. #1297 (@joe-elliott)
    • [CHANGE] Updated storage.trace.pool.queue_depth default from 200->10000. #1345 (@joe-elliott)
    • [CHANGE] Updated flags -storage.trace.azure.storage-account-name and -storage.trace.s3.access_key to no longer to be considered as secrets #1356 (@simonswine)
    Source code(tar.gz)
    Source code(zip)
    tempo_1.4.0-rc.0_checksums.txt(102 bytes)
    tempo_1.4.0-rc.0_linux_amd64.tar.gz(29.82 MB)
  • v1.3.2(Feb 23, 2022)

  • v1.3.1(Feb 2, 2022)

  • v1.3.0(Jan 24, 2022)

    Breaking changes

    This release updates OpenTelemetry libraries version to v0.40.0, and with that, it updates OTLP gRPC's default listening port from the legacy 55680 to the new 4317. There are two main routes to avoid downtime: configuring the receiver to listen in the old port 55680 and/or pushing traces to both ports simultaneously until the rollout is complete.

    As part of adding support for full backend search, a search config parameter has had its name change from query_frontend.search.max_result_limit to query_frontend.search.default_result_limit.

    • [CHANGE] BREAKING CHANGE The OTEL GRPC receiver's default port changed from 55680 to 4317. #1142 (@tete17)
    • [CHANGE] BREAKING CHANGE Moved querier.search_max_result_limit and querier.search_default_result_limit to query_frontend.search.max_result_limit and query_frontend.search.default_result_limit #1174.
    • [CHANGE] BREAKING CHANGE Remove deprecated ingester gRPC endpoint and data encoding. The current data encoding was introduced in v1.0. If running earlier versions, first upgrade to v1.0 through v1.2 and allow time for all blocks to be switched to the "v1" data encoding. #1215 (@mdisibio)

    New Features and Enhancements

    • [FEATURE]: Add support for inline environments. #1184 (@irizzant)
    • [FEATURE] Added support for full backend search. #1174 (@joe-elliott)
    • [ENHANCEMENT] Expose upto parameter on hedged requests for each backend with hedge_requests_up_to. #1085](https://github.com/grafana/tempo/pull/1085) (@joe-elliott)
    • [ENHANCEMENT] Search: drop use of TagCache, extract tags and tag values on-demand #1068 (@kvrhdn)
    • [ENHANCEMENT] Jsonnet: add $._config.namespace to filter by namespace in cortex metrics #1098 (@mapno)
    • [ENHANCEMENT] Add middleware to compress frontend HTTP responses with gzip if requested #1080 (@kvrhdn, @zalegrala)
    • [ENHANCEMENT] Allow query disablement in vulture #1117 (@zalegrala)
    • [ENHANCEMENT] Improve memory efficiency of compaction and block cutting. #1121 #1130 (@joe-elliott)
    • [ENHANCEMENT] Include metrics for configured limit overrides and defaults: tempo_limits_overrides, tempo_limits_defaults #1089 (@zalegrala)
    • [ENHANCEMENT] Add Envoy Proxy panel to Tempo / Writes dashboard #1137 (@kvrhdn)
    • [ENHANCEMENT] Reduce compactionCycle to improve performance in large multitenant environments #1145 (@joe-elliott)
    • [ENHANCEMENT] Added max_time_per_tenant to allow for independently configuring polling and compaction cycle. #1145 (@joe-elliott)
    • [ENHANCEMENT] Add tempodb_compaction_outstanding_blocks metric to measure compaction load #1143 (@mapno)
    • [ENHANCEMENT] Update mixin to use new backend metric #1151 (@zalegrala)
    • [ENHANCEMENT] Make TempoIngesterFlushesFailing alert more actionable #1157 (@dannykopping)
    • [ENHANCEMENT] Switch open-telemetry/opentelemetry-collector to grafana/opentelemetry-collectorl fork, update it to 0.40.0 and add missing dependencies due to the change #1142 (@tete17)
    • [ENHANCEMENT] Allow environment variables for Azure storage credentials #1147 (@zalegrala)
    • [ENHANCEMENT] jsonnet: set rollingUpdate.maxSurge to 3 for distributor, frontend and queriers #1164 (@kvrhdn)
    • [ENHANCEMENT] Reduce search data file sizes by optimizing contents #1165 (@mdisibio)
    • [ENHANCEMENT] Add tempo_ingester_live_traces metric #1170 (@mdisibio)
    • [ENHANCEMENT] Update compactor ring to automatically forget unhealthy entries #1178 (@mdisibio)
    • [ENHANCEMENT] Added the ability to pass ISO8601 date/times for start/end date to tempo-cli query api search #1208 (@joe-elliott)
    • [ENHANCEMENT] Prevent writes to large traces even after flushing to disk #1199 (@mdisibio)

    Bug Fixes

    • [BUGFIX] Add process name to vulture traces to work around display issues #1127 (@mdisibio)
    • [BUGFIX] Fixed issue where compaction sometimes dropped spans. #1130 (@joe-elliott)
    • [BUGFIX] Ensure that the admin client jsonnet has correct S3 bucket property. (@hedss)
    • [BUGFIX] Publish tenant index age correctly for tenant index writers. #1146 (@joe-elliott)
    • [BUGFIX] Ingester startup panic slice bounds out of range #1195 (@mdisibio)

    Other Changes

    • [CHANGE] Search: Add new per-tenant limit max_bytes_per_tag_values_query to limit the size of tag-values response. #1068 (@annanay25)
    • [CHANGE] Reduce MaxSearchBytesPerTrace ingester.max-search-bytes-per-trace default to 5KB #1129 @annanay25
    • [CHANGE] BREAKING CHANGE The OTEL GRPC receiver's default port changed from 55680 to 4317. #1142 (@tete17)
    • [CHANGE] Remove deprecated method Push from tempopb.Pusher #1173 (@kvrhdn)
    • [CHANGE] Upgrade cristalhq/hedgedhttp from v0.6.0 to v0.7.0 #1159 (@cristaloleg)
    • [CHANGE] Export trace id constant in api package #1176
    • [CHANGE] GRPC 1.33.3 => 1.38.0 broke compatibility with gogoproto.customtype. Enforce the use of gogoproto marshalling/unmarshalling for Tempo, Cortex & Jaeger structs. #1186 (@annanay25)
    Source code(tar.gz)
    Source code(zip)
    tempo_1.3.0_checksums.txt(97 bytes)
    tempo_1.3.0_linux_amd64.tar.gz(35.96 MB)
  • v1.3.0-rc.0(Jan 12, 2022)

    Breaking changes

    This release updates OpenTelemetry libraries version to v0.40.0, and with that, it updates OTLP gRPC's default listening port from the legacy 55680 to the new 4317. There are two main routes to avoid downtime: configuring the receiver to listen in the old port 55680 and/or pushing traces to both ports simultaneously until the rollout is complete.

    As part of adding support for full backend search, a search config parameter has had its name change from query_frontend.search.max_result_limit to query_frontend.search.default_result_limit.

    • [CHANGE] BREAKING CHANGE The OTEL GRPC receiver's default port changed from 55680 to 4317. #1142 (@tete17)
    • [CHANGE] BREAKING CHANGE Moved querier.search_max_result_limit and querier.search_default_result_limit to query_frontend.search.max_result_limit and query_frontend.search.default_result_limit #1174.

    New Features and Enhancements

    • [FEATURE]: Add support for inline environments. #1184 (@irizzant)
    • [FEATURE] Added support for full backend search. #1174 (@joe-elliott)
    • [ENHANCEMENT] Expose upto parameter on hedged requests for each backend with hedge_requests_up_to. #1085](https://github.com/grafana/tempo/pull/1085) (@joe-elliott)
    • [ENHANCEMENT] Search: drop use of TagCache, extract tags and tag values on-demand #1068 (@kvrhdn)
    • [ENHANCEMENT] Jsonnet: add $._config.namespace to filter by namespace in cortex metrics #1098 (@mapno)
    • [ENHANCEMENT] Add middleware to compress frontend HTTP responses with gzip if requested #1080 (@kvrhdn, @zalegrala)
    • [ENHANCEMENT] Allow query disablement in vulture #1117 (@zalegrala)
    • [ENHANCEMENT] Improve memory efficiency of compaction and block cutting. #1121 #1130 (@joe-elliott)
    • [ENHANCEMENT] Include metrics for configured limit overrides and defaults: tempo_limits_overrides, tempo_limits_defaults #1089 (@zalegrala)
    • [ENHANCEMENT] Add Envoy Proxy panel to Tempo / Writes dashboard #1137 (@kvrhdn)
    • [ENHANCEMENT] Reduce compactionCycle to improve performance in large multitenant environments #1145 (@joe-elliott)
    • [ENHANCEMENT] Added max_time_per_tenant to allow for independently configuring polling and compaction cycle. #1145 (@joe-elliott)
    • [ENHANCEMENT] Add tempodb_compaction_outstanding_blocks metric to measure compaction load #1143 (@mapno)
    • [ENHANCEMENT] Update mixin to use new backend metric #1151 (@zalegrala)
    • [ENHANCEMENT] Make TempoIngesterFlushesFailing alert more actionable #1157 (@dannykopping)
    • [ENHANCEMENT] Switch open-telemetry/opentelemetry-collector to grafana/opentelemetry-collectorl fork, update it to 0.40.0 and add missing dependencies due to the change #1142 (@tete17)
    • [ENHANCEMENT] Allow environment variables for Azure storage credentials #1147 (@zalegrala)
    • [ENHANCEMENT] jsonnet: set rollingUpdate.maxSurge to 3 for distributor, frontend and queriers #1164 (@kvrhdn)
    • [ENHANCEMENT] Reduce search data file sizes by optimizing contents #1165 (@mdisibio)
    • [ENHANCEMENT] Add tempo_ingester_live_traces metric #1170 (@mdisibio)
    • [ENHANCEMENT] Update compactor ring to automatically forget unhealthy entries #1178 (@mdisibio)
    • [ENHANCEMENT] Added the ability to pass ISO8601 date/times for start/end date to tempo-cli query api search #1208 (@joe-elliott)
    • [ENHANCEMENT] Prevent writes to large traces even after flushing to disk #1199 (@mdisibio)

    Bug Fixes

    • [BUGFIX] Add process name to vulture traces to work around display issues #1127 (@mdisibio)
    • [BUGFIX] Fixed issue where compaction sometimes dropped spans. #1130 (@joe-elliott)
    • [BUGFIX] Ensure that the admin client jsonnet has correct S3 bucket property. (@hedss)
    • [BUGFIX] Publish tenant index age correctly for tenant index writers. #1146 (@joe-elliott)
    • [BUGFIX] Ingester startup panic slice bounds out of range #1195 (@mdisibio)

    Other Changes

    • [CHANGE] Search: Add new per-tenant limit max_bytes_per_tag_values_query to limit the size of tag-values response. #1068 (@annanay25)
    • [CHANGE] Reduce MaxSearchBytesPerTrace ingester.max-search-bytes-per-trace default to 5KB #1129 @annanay25
    • [CHANGE] BREAKING CHANGE The OTEL GRPC receiver's default port changed from 55680 to 4317. #1142 (@tete17)
    • [CHANGE] Remove deprecated method Push from tempopb.Pusher #1173 (@kvrhdn)
    • [CHANGE] Upgrade cristalhq/hedgedhttp from v0.6.0 to v0.7.0 #1159 (@cristaloleg)
    • [CHANGE] Export trace id constant in api package #1176
    • [CHANGE] GRPC 1.33.3 => 1.38.0 broke compatibility with gogoproto.customtype. Enforce the use of gogoproto marshalling/unmarshalling for Tempo, Cortex & Jaeger structs. #1186 (@annanay25)
    • [CHANGE] BREAKING CHANGE Remove deprecated ingester gRPC endpoint and data encoding. The current data encoding was introduced in v1.0. If running earlier versions, first upgrade to v1.0 through v1.2 and allow time for all blocks to be switched to the "v1" data encoding. #1215 (@mdisibio)
    Source code(tar.gz)
    Source code(zip)
  • v1.2.1(Nov 15, 2021)

  • v1.2.0(Nov 5, 2021)

    Breaking Changes

    This release contains a number of small breaking changes. They will likely have no impact on your deployment, but it should be noted that due to a change in the API between the query-frontend and querier there may be a temporary read outage during deployment.

    • [CHANGE] BREAKING CHANGE Drop support for v0 and v1 blocks. See 1.1 changelog for details #919 (@joe-elliott)
    • [CHANGE] BREAKING CHANGE Consolidate status information onto /status endpoint #952 @zalegrala) The following endpoints moved. /runtime_config moved to /status/runtime_config /config moved to /status/config /services moved to /status/services
    • [CHANGE] BREAKING CHANGE Change ingester metric ingester_bytes_metric_total in favor of ingester_bytes_received_total #979 (@mapno)
    • [CHANGE] Renamed CLI flag from --storage.trace.maintenance-cycle to --storage.trace.blocklist_poll. This is a BREAKING CHANGE #897 (@mritunjaysharma394)
    • [CHANGE] BREAKING CHANGE Support partial results from failed block queries #1007 (@mapno) Querier GET /querier/api/traces/<traceid> response's body has been modified to return tempopb.TraceByIDResponse instead of simply tempopb.Trace. This will cause a disruption of the read path during rollout of the change.
    • [CHANGE] BRREAKING CHANGE Change the metrics name from cortex_runtime_config_last_reload_successful to tempo_runtime_config_last_reload_successful #945 (@kavirajk)

    New Features and Enhancements

    • [FEATURE] Add ability to search ingesters for traces #806 (@mdisibio @kvrhdn @annanay25)
    • [FEATURE] Add runtime config handler #936 (@mapno)
    • [FEATURE] Add ScalableSingleBinary operational run mode #1004 (@zalegrala)
    • [ENHANCEMENT] Added "query blocks" cli option. #876 (@joe-elliott)
    • [ENHANCEMENT] Added "search blocks" cli option. #972 (@joe-elliott)
    • [ENHANCEMENT] Added traceid to trace too large message. #888 (@mritunjaysharma394)
    • [ENHANCEMENT] Add support to tempo workloads to overrides from single configmap in microservice mode. #896 (@kavirajk)
    • [ENHANCEMENT] Updated config defaults to reflect better capture operational knowledge. #913 (@joe-elliott)
      ingester:
        trace_idle_period: 30s => 10s  # reduce ingester memory requirements with little impact on querying
        flush_check_period: 30s => 10s
      query_frontend:
        query_shards: 2 => 20          # will massively improve performance on large installs
      storage:
        trace:
          wal:
            encoding: none => snappy   # snappy has been tested thoroughly and ready for production use
          block:
            bloom_filter_false_positive: .05 => .01          # will increase total bloom filter size but improve query performance
            bloom_filter_shard_size_bytes: 256KiB => 100 KiB # will improve query performance
      compactor:
        compaction:
          chunk_size_bytes: 10 MiB => 5 MiB  # will reduce compactor memory needs
          compaction_window: 4h => 1h        # will allow more compactors to participate in compaction without substantially increasing blocks
      
    • [ENHANCEMENT] Make s3 backend readError logic more robust #905 (@wei840222)
    • [ENHANCEMENT] Add gen index and gen bloom commands to tempo-cli. #903 (@annanay25)
    • [ENHANCEMENT] Implement trace comparison in Vulture #904 (@zalegrala)
    • [ENHANCEMENT] Compression updates: Added s2, improved snappy performance #961 (@joe-elliott)
    • [ENHANCEMENT] Add support for vulture sending long running traces #951 (@zalegrala)
    • [ENHANCEMENT] Shard tenant index creation by tenant and add functionality to handle stale indexes. #1005 (@joe-elliott)
    • [ENHANCEMENT] Support partial results from failed block queries #1007 (@mapno)
    • [ENHANCEMENT] Add new metric tempo_distributor_push_duration_seconds #1027 (@zalegrala)
    • [ENHANCEMENT] Add query parameter to show the default config values and the difference between the current values and the defaults. #1045 (@MichelHollands)
    • [ENHANCEMENT] Adding metrics around ingester flush retries #1049 (@dannykopping)
    • [ENHANCEMENT] Performance: More efficient distributor batching #1075 (@joe-elliott)
    • [ENHANCEMENT] Include tempo-cli in the release #1086 (@zalegrala)

    Bug Fixes

    • [BUGFIX] Update port spec for GCS docker-compose example #869 (@zalegrala)
    • [BUGFIX] Fix "magic number" errors and other block mishandling when an ingester forcefully shuts down #937 (@mdisibio)
    • [BUGFIX] Fix compactor memory leak #806 (@mdisibio)
    • [BUGFIX] Set span's tag span.kind to client in query-frontend #975 (@mapno)
    • [BUGFIX] Fixes tempodb_backend_hedged_roundtrips_total to correctly count hedged roundtrips. #1079 (@joe-elliott)
    • [BUGFIX] Update go-kit logger package to remove spurious debug logs #1094 (@bboreham)

    Other Changes

    • [CHANGE] update jsonnet alerts and recording rules to use job_selectors and cluster_selectors for configurable unique identifier labels #935 (@kevinschoonover)
    • [CHANGE] Add troubleshooting language to config for server.grpc_server_max_recv_msg_size and server.grpc_server_max_send_msg_size when handling large traces #1023 (@thejosephstevens)
    Source code(tar.gz)
    Source code(zip)
    tempo_1.2.0_checksums.txt(97 bytes)
    tempo_1.2.0_linux_amd64.tar.gz(34.70 MB)
  • v1.2.0-rc.1(Nov 2, 2021)

    Breaking Changes

    This release contains a number of small breaking changes. They will likely have no impact on your deployment, but it should be noted that due to a change in the API between the query-frontend and querier there may be a temporary read outage during deployment.

    • [CHANGE] BREAKING CHANGE Drop support for v0 and v1 blocks. See 1.1 changelog for details #919 (@joe-elliott)
    • [CHANGE] BREAKING CHANGE Consolidate status information onto /status endpoint #952 @zalegrala) The following endpoints moved. /runtime_config moved to /status/runtime_config /config moved to /status/config /services moved to /status/services
    • [CHANGE] BREAKING CHANGE Change ingester metric ingester_bytes_metric_total in favor of ingester_bytes_received_total #979 (@mapno)
    • [CHANGE] Renamed CLI flag from --storage.trace.maintenance-cycle to --storage.trace.blocklist_poll. This is a BREAKING CHANGE #897 (@mritunjaysharma394)
    • [CHANGE] BREAKING CHANGE Support partial results from failed block queries #1007 (@mapno) Querier GET /querier/api/traces/<traceid> response's body has been modified to return tempopb.TraceByIDResponse instead of simply tempopb.Trace. This will cause a disruption of the read path during rollout of the change.
    • [CHANGE] BRREAKING CHANGE Change the metrics name from cortex_runtime_config_last_reload_successful to tempo_runtime_config_last_reload_successful #945 (@kavirajk)

    New Features and Enhancements

    • [FEATURE] Add ability to search ingesters for traces #806 (@mdisibio @kvrhdn @annanay25)
    • [FEATURE] Add runtime config handler #936 (@mapno)
    • [FEATURE] Add ScalableSingleBinary operational run mode #1004 (@zalegrala)
    • [ENHANCEMENT] Added "query blocks" cli option. #876 (@joe-elliott)
    • [ENHANCEMENT] Added "search blocks" cli option. #972 (@joe-elliott)
    • [ENHANCEMENT] Added traceid to trace too large message. #888 (@mritunjaysharma394)
    • [ENHANCEMENT] Add support to tempo workloads to overrides from single configmap in microservice mode. #896 (@kavirajk)
    • [ENHANCEMENT] Updated config defaults to reflect better capture operational knowledge. #913 (@joe-elliott)
      ingester:
        trace_idle_period: 30s => 10s  # reduce ingester memory requirements with little impact on querying
        flush_check_period: 30s => 10s
      query_frontend:
        query_shards: 2 => 20          # will massively improve performance on large installs
      storage:
        trace:
          wal:
            encoding: none => snappy   # snappy has been tested thoroughly and ready for production use
          block:
            bloom_filter_false_positive: .05 => .01          # will increase total bloom filter size but improve query performance
            bloom_filter_shard_size_bytes: 256KiB => 100 KiB # will improve query performance
      compactor:
        compaction:
          chunk_size_bytes: 10 MiB => 5 MiB  # will reduce compactor memory needs
          compaction_window: 4h => 1h        # will allow more compactors to participate in compaction without substantially increasing blocks
      
    • [ENHANCEMENT] Make s3 backend readError logic more robust #905 (@wei840222)
    • [ENHANCEMENT] Add gen index and gen bloom commands to tempo-cli. #903 (@annanay25)
    • [ENHANCEMENT] Implement trace comparison in Vulture #904 (@zalegrala)
    • [ENHANCEMENT] Compression updates: Added s2, improved snappy performance #961 (@joe-elliott)
    • [ENHANCEMENT] Add support for vulture sending long running traces #951 (@zalegrala)
    • [ENHANCEMENT] Shard tenant index creation by tenant and add functionality to handle stale indexes. #1005 (@joe-elliott)
    • [ENHANCEMENT] Support partial results from failed block queries #1007 (@mapno)
    • [ENHANCEMENT] Add new metric tempo_distributor_push_duration_seconds #1027 (@zalegrala)
    • [ENHANCEMENT] Add query parameter to show the default config values and the difference between the current values and the defaults. #1045 (@MichelHollands)
    • [ENHANCEMENT] Adding metrics around ingester flush retries #1049 (@dannykopping)
    • [ENHANCEMENT] Performance: More efficient distributor batching #1075 (@joe-elliott)
    • [ENHANCEMENT] Include tempo-cli in the release #1086 (@zalegrala)

    Bug Fixes

    • [BUGFIX] Update port spec for GCS docker-compose example #869 (@zalegrala)
    • [BUGFIX] Fix "magic number" errors and other block mishandling when an ingester forcefully shuts down #937 (@mdisibio)
    • [BUGFIX] Fix compactor memory leak #806 (@mdisibio)
    • [BUGFIX] Set span's tag span.kind to client in query-frontend #975 (@mapno)
    • [BUGFIX] Fixes tempodb_backend_hedged_roundtrips_total to correctly count hedged roundtrips. #1079 (@joe-elliott)
    • [BUGFIX] Update go-kit logger package to remove spurious debug logs #1094 (@bboreham)

    Other Changes

    • [CHANGE] update jsonnet alerts and recording rules to use job_selectors and cluster_selectors for configurable unique identifier labels #935 (@kevinschoonover)
    • [CHANGE] Add troubleshooting language to config for server.grpc_server_max_recv_msg_size and server.grpc_server_max_send_msg_size when handling large traces #1023 (@thejosephstevens)
    Source code(tar.gz)
    Source code(zip)
    tempo_1.2.0-rc.1_checksums.txt(102 bytes)
    tempo_1.2.0-rc.1_linux_amd64.tar.gz(34.70 MB)
  • v1.2.0-rc.0(Oct 28, 2021)

    Breaking Changes

    This release contains a number of small breaking changes. They will likely have no impact on your deployment, but it should be noted that due to a change in the API between the query-frontend and querier there may be a temporary read outage during deployment.

    • [CHANGE] BREAKING CHANGE Drop support for v0 and v1 blocks. See 1.1 changelog for details #919 (@joe-elliott)
    • [CHANGE] BREAKING CHANGE Consolidate status information onto /status endpoint #952 @zalegrala) The following endpoints moved. /runtime_config moved to /status/runtime_config /config moved to /status/config /services moved to /status/services
    • [CHANGE] BREAKING CHANGE Change ingester metric ingester_bytes_metric_total in favor of ingester_bytes_received_total #979 (@mapno)
    • [CHANGE] Renamed CLI flag from --storage.trace.maintenance-cycle to --storage.trace.blocklist_poll. This is a BREAKING CHANGE #897 (@mritunjaysharma394)
    • [CHANGE] BREAKING CHANGE Support partial results from failed block queries #1007 (@mapno) Querier GET /querier/api/traces/<traceid> response's body has been modified to return tempopb.TraceByIDResponse instead of simply tempopb.Trace. This will cause a disruption of the read path during rollout of the change.
    • [CHANGE] BRREAKING CHANGE Change the metrics name from cortex_runtime_config_last_reload_successful to tempo_runtime_config_last_reload_successful #945 (@kavirajk)

    New Features and Enhancements

    • [FEATURE] Add ability to search ingesters for traces #806 (@mdisibio @kvrhdn @annanay25)
    • [FEATURE] Add runtime config handler #936 (@mapno)
    • [FEATURE] Add ScalableSingleBinary operational run mode #1004 (@zalegrala)
    • [ENHANCEMENT] Added "query blocks" cli option. #876 (@joe-elliott)
    • [ENHANCEMENT] Added "search blocks" cli option. #972 (@joe-elliott)
    • [ENHANCEMENT] Added traceid to trace too large message. #888 (@mritunjaysharma394)
    • [ENHANCEMENT] Add support to tempo workloads to overrides from single configmap in microservice mode. #896 (@kavirajk)
    • [ENHANCEMENT] Updated config defaults to reflect better capture operational knowledge. #913 (@joe-elliott)
      ingester:
        trace_idle_period: 30s => 10s  # reduce ingester memory requirements with little impact on querying
        flush_check_period: 30s => 10s
      query_frontend:
        query_shards: 2 => 20          # will massively improve performance on large installs
      storage:
        trace:
          wal:
            encoding: none => snappy   # snappy has been tested thoroughly and ready for production use
          block:
            bloom_filter_false_positive: .05 => .01          # will increase total bloom filter size but improve query performance
            bloom_filter_shard_size_bytes: 256KiB => 100 KiB # will improve query performance
      compactor:
        compaction:
          chunk_size_bytes: 10 MiB => 5 MiB  # will reduce compactor memory needs
          compaction_window: 4h => 1h        # will allow more compactors to participate in compaction without substantially increasing blocks
      
    • [ENHANCEMENT] Make s3 backend readError logic more robust #905 (@wei840222)
    • [ENHANCEMENT] Add gen index and gen bloom commands to tempo-cli. #903 (@annanay25)
    • [ENHANCEMENT] Implement trace comparison in Vulture #904 (@zalegrala)
    • [ENHANCEMENT] Compression updates: Added s2, improved snappy performance #961 (@joe-elliott)
    • [ENHANCEMENT] Add support for vulture sending long running traces #951 (@zalegrala)
    • [ENHANCEMENT] Shard tenant index creation by tenant and add functionality to handle stale indexes. #1005 (@joe-elliott)
    • [ENHANCEMENT] Support partial results from failed block queries #1007 (@mapno)
    • [ENHANCEMENT] Add new metric tempo_distributor_push_duration_seconds #1027 (@zalegrala)
    • [ENHANCEMENT] Add query parameter to show the default config values and the difference between the current values and the defaults. #1045 (@MichelHollands)
    • [ENHANCEMENT] Adding metrics around ingester flush retries #1049 (@dannykopping)
    • [ENHANCEMENT] Performance: More efficient distributor batching #1075 (@joe-elliott)
    • [ENHANCEMENT] Include tempo-cli in the release #1086 (@zalegrala)

    Bug Fixes

    • [BUGFIX] Update port spec for GCS docker-compose example #869 (@zalegrala)
    • [BUGFIX] Fix "magic number" errors and other block mishandling when an ingester forcefully shuts down #937 (@mdisibio)
    • [BUGFIX] Fix compactor memory leak #806 (@mdisibio)
    • [BUGFIX] Set span's tag span.kind to client in query-frontend #975 (@mapno)
    • [BUGFIX] Fixes tempodb_backend_hedged_roundtrips_total to correctly count hedged roundtrips. #1079 (@joe-elliott)

    Other Changes

    • [CHANGE] update jsonnet alerts and recording rules to use job_selectors and cluster_selectors for configurable unique identifier labels #935 (@kevinschoonover)
    • [CHANGE] Add troubleshooting language to config for server.grpc_server_max_recv_msg_size and server.grpc_server_max_send_msg_size when handling large traces #1023 (@thejosephstevens)
    Source code(tar.gz)
    Source code(zip)
    tempo_1.2.0-rc.0_checksums.txt(102 bytes)
    tempo_1.2.0-rc.0_linux_amd64.tar.gz(34.65 MB)
  • v1.1.0(Aug 27, 2021)

    Breaking Changes

    This release deprecates some internal data formats from prerelease versions of Tempo. If upgrading from Tempo v0.6.0 or earlier, then see the special upgrade instructions below. Tempo v0.7.0 and later have no compatibility issues or special instructions.

    Tempo v0.6.0 and earlier used block formats v0 and v1, which are being deprecated, and support for these blocks will be removed in the next release. To resolve this you must first upgrade to Tempo 0.7.0+ (latest 1.1 is recommended) which introduces the supported v2 block format. Tempo will write all new blocks as v2, and it must continue running until all v0 and v1 blocks are gone (either deleted due to retention, or compacted). Block versions can be checked using the tempo-cli list blocks command.

    New Features and Enhancements

    [FEATURE] Added the ability to hedge requests with all backends #750 (@joe-elliott) [FEATURE] Added a tenant index to reduce bucket polling. #834 (@joe-elliott) [ENHANCEMENT] Added hedged request metric tempodb_backend_hedged_roundtrips_total and a new storage agnostic tempodb_backend_request_duration_seconds metric that supersedes the soon-to-be deprecated storage specific metrics (tempodb_azure_request_duration_seconds, tempodb_s3_request_duration_seconds and tempodb_gcs_request_duration_seconds). #790 (@JosephWoodward) [ENHANCEMENT] Performance: improve compaction speed with concurrent reads and writes #754 (@mdisibio) [ENHANCEMENT] Improve readability of cpu and memory metrics on operational dashboard #764 (@bboreham) [ENHANCEMENT] Add azure_request_duration_seconds metric. #767 (@JosephWoodward) [ENHANCEMENT] Add s3_request_duration_seconds metric. #776 (@JosephWoodward) [ENHANCEMENT] Add tempo_ingester_flush_size_bytes metric. #777 (@bboreham) [ENHANCEMENT] Microservices jsonnet: resource requests and limits can be set in $._config. #793 (@kvrhdn) [ENHANCEMENT] Add -config.expand-env cli flag to support environment variables expansion in config file. #796 (@Ashmita152) [ENHANCEMENT] Add ability to control bloom filter caching based on age and/or compaction level. Add new cli command list cache-summary. #805 (@annanay25) [ENHANCEMENT] Emit traces for ingester flush operations. #812 (@bboreham) [ENHANCEMENT] Add retry middleware in query-frontend. #814 (@kvrhdn) [ENHANCEMENT] Add -use-otel-tracer to use the OpenTelemetry tracer, this will also capture traces emitted by the gcs sdk. Experimental: not all features are supported (i.e. remote sampling). #842 (@kvrhdn) [ENHANCEMENT] Add /services endpoint. #863 (@kvrhdn) [ENHANCEMENT] Added "query blocks" cli option. #876 (@joe-elliott) [ENHANCEMENT] Added traceid to trace too large message. #888 (@mritunjaysharma394) [ENHANCEMENT] Reduce compactor memory usage by forcing garbage collection. #915 (@joe-elliott)

    Bug Fixes

    [BUGFIX] Allow only valid trace ID characters when decoding #854 (@zalegrala) [BUGFIX] Queriers complete one polling cycle before finishing startup. #834 (@joe-elliott) [BUGFIX] Update port spec for GCS docker-compose example #869 (@zalegrala) [BUGFIX] Cortex upgrade to fix an issue where unhealthy compactors can't be forgotten #878 (@joe-elliott)

    Other Changes

    [CHANGE] Upgrade Cortex from v1.9.0 to v1.9.0-131-ga4bf10354 #841 (@aknuds1) [CHANGE] Change example default tempo port from 3100 to 3200 #770 (@MurzNN) [CHANGE] Jsonnet: use dedicated configmaps for distributors and ingesters #775 (@kvrhdn) [CHANGE] Docker images are now prefixed by their branch name #828 (@jvrplmlmn)

    Source code(tar.gz)
    Source code(zip)
    tempo_1.1.0_checksums.txt(97 bytes)
    tempo_1.1.0_linux_amd64.tar.gz(22.33 MB)
  • v1.1.0-rc.1(Aug 18, 2021)

    Breaking Changes

    This release deprecates some internal data formats from prerelease versions of Tempo. If upgrading from Tempo v0.6.0 or earlier, then see the special upgrade instructions below. Tempo v0.7.0 and later have no compatibility issues or special instructions.

    Tempo v0.6.0 and earlier used block formats v0 and v1, which are being deprecated, and support for these blocks will be removed in the next release. To resolve this you must first upgrade to Tempo 0.7.0+ (latest 1.1 is recommended) which introduces the supported v2 block format. Tempo will write all new blocks as v2, and it must continue running until all v0 and v1 blocks are gone (either deleted due to retention, or compacted). Block versions can be checked using the tempo-cli list blocks command.

    New Features and Enhancements

    [FEATURE] Added the ability to hedge requests with all backends #750 (@joe-elliott) [FEATURE] Added a tenant index to reduce bucket polling. #834 (@joe-elliott) [ENHANCEMENT] Added hedged request metric tempodb_backend_hedged_roundtrips_total and a new storage agnostic tempodb_backend_request_duration_seconds metric that supersedes the soon-to-be deprecated storage specific metrics (tempodb_azure_request_duration_seconds, tempodb_s3_request_duration_seconds and tempodb_gcs_request_duration_seconds). #790 (@JosephWoodward) [ENHANCEMENT] Performance: improve compaction speed with concurrent reads and writes #754 (@mdisibio) [ENHANCEMENT] Improve readability of cpu and memory metrics on operational dashboard #764 (@bboreham) [ENHANCEMENT] Add azure_request_duration_seconds metric. #767 (@JosephWoodward) [ENHANCEMENT] Add s3_request_duration_seconds metric. #776 (@JosephWoodward) [ENHANCEMENT] Add tempo_ingester_flush_size_bytes metric. #777 (@bboreham) [ENHANCEMENT] Microservices jsonnet: resource requests and limits can be set in $._config. #793 (@kvrhdn) [ENHANCEMENT] Add -config.expand-env cli flag to support environment variables expansion in config file. #796 (@Ashmita152) [ENHANCEMENT] Add ability to control bloom filter caching based on age and/or compaction level. Add new cli command list cache-summary. #805 (@annanay25) [ENHANCEMENT] Emit traces for ingester flush operations. #812 (@bboreham) [ENHANCEMENT] Add retry middleware in query-frontend. #814 (@kvrhdn) [ENHANCEMENT] Add -use-otel-tracer to use the OpenTelemetry tracer, this will also capture traces emitted by the gcs sdk. Experimental: not all features are supported (i.e. remote sampling). #842 (@kvrhdn) [ENHANCEMENT] Add /services endpoint. #863 (@kvrhdn) [ENHANCEMENT] Added "query blocks" cli option. #876 (@joe-elliott) [ENHANCEMENT] Added traceid to trace too large message. #888 (@mritunjaysharma394)

    Bug Fixes

    [BUGFIX] Allow only valid trace ID characters when decoding #854 (@zalegrala) [BUGFIX] Queriers complete one polling cycle before finishing startup. #834 (@joe-elliott) [BUGFIX] Update port spec for GCS docker-compose example #869 (@zalegrala) [BUGFIX] Cortex upgrade to fix an issue where unhealthy compactors can't be forgotten #878 (@joe-elliott)

    Other Changes

    [CHANGE] Upgrade Cortex from v1.9.0 to v1.9.0-131-ga4bf10354 #841 (@aknuds1) [CHANGE] Change example default tempo port from 3100 to 3200 #770 (@MurzNN) [CHANGE] Jsonnet: use dedicated configmaps for distributors and ingesters #775 (@kvrhdn) [CHANGE] Docker images are now prefixed by their branch name #828 (@jvrplmlmn)

    Source code(tar.gz)
    Source code(zip)
    tempo_1.1.0-rc.1_checksums.txt(102 bytes)
    tempo_1.1.0-rc.1_linux_amd64.tar.gz(22.33 MB)
  • v1.1.0-rc.0(Aug 11, 2021)

    Breaking Changes

    • This release deprecates some internal data formats from prerelease versions of Tempo. If upgrading from Tempo v0.6.0 or earlier, then see the special upgrade instructions below. Tempo v0.7.0 and later have no compatibility issues or special instructions.

    Tempo v0.6.0 and earlier used block formats v0 and v1, which are being deprecated, and support for these blocks will be removed in the next release. To resolve this you must first upgrade to Tempo 0.7.0+ (latest 1.1 is recommended) which introduces the supported v2 block format. Tempo will write all new blocks as v2, and it must continue running until all v0 and v1 blocks are gone (either deleted due to retention, or compacted). Block versions can be checked using the tempo-cli list blocks command.

    New Features and Enhancements

    [FEATURE] Added the ability to hedge requests with all backends #750 (@joe-elliott) [FEATURE] Added a tenant index to reduce bucket polling. #834 (@joe-elliott) [ENHANCEMENT] Added hedged request metric tempodb_backend_hedged_roundtrips_total and a new storage agnostic tempodb_backend_request_duration_seconds metric that supersedes the soon-to-be deprecated storage specific metrics (tempodb_azure_request_duration_seconds, tempodb_s3_request_duration_seconds and tempodb_gcs_request_duration_seconds). #790 (@JosephWoodward) [ENHANCEMENT] Performance: improve compaction speed with concurrent reads and writes #754 (@mdisibio) [ENHANCEMENT] Improve readability of cpu and memory metrics on operational dashboard #764 (@bboreham) [ENHANCEMENT] Add azure_request_duration_seconds metric. #767 (@JosephWoodward) [ENHANCEMENT] Add s3_request_duration_seconds metric. #776 (@JosephWoodward) [ENHANCEMENT] Add tempo_ingester_flush_size_bytes metric. #777 (@bboreham) [ENHANCEMENT] Microservices jsonnet: resource requests and limits can be set in $._config. #793 (@kvrhdn) [ENHANCEMENT] Add -config.expand-env cli flag to support environment variables expansion in config file. #796 (@Ashmita152) [ENHANCEMENT] Add ability to control bloom filter caching based on age and/or compaction level. Add new cli command list cache-summary. #805 (@annanay25) [ENHANCEMENT] Emit traces for ingester flush operations. #812 (@bboreham) [ENHANCEMENT] Add retry middleware in query-frontend. #814 (@kvrhdn) [ENHANCEMENT] Add -use-otel-tracer to use the OpenTelemetry tracer, this will also capture traces emitted by the gcs sdk. Experimental: not all features are supported (i.e. remote sampling). #842 (@kvrhdn) [ENHANCEMENT] Add /services endpoint. #863 (@kvrhdn)

    Bug Fixes

    [BUGFIX] Allow only valid trace ID characters when decoding #854 (@zalegrala) [BUGFIX] Queriers complete one polling cycle before finishing startup. #834 (@joe-elliott)

    Other Changes

    [CHANGE] Upgrade Cortex from v1.9.0 to v1.9.0-131-ga4bf10354 #841 (@aknuds1) [CHANGE] Change example default tempo port from 3100 to 3200 #770 (@MurzNN) [CHANGE] Jsonnet: use dedicated configmaps for distributors and ingesters #775 (@kvrhdn) [CHANGE] Docker images are now prefixed by their branch name #828 (@jvrplmlmn)

    Source code(tar.gz)
    Source code(zip)
    tempo_1.1.0-rc.0_checksums.txt(102 bytes)
    tempo_1.1.0-rc.0_linux_amd64.tar.gz(22.30 MB)
  • v1.0.1(Jun 14, 2021)

  • v1.0.0(Jun 8, 2021)

    Breaking changes

    • This release contains a change to communication between distributors and ingesters which requires a specific rollout process to prevent dropped spans. First, rollout everything except distributors. After all ingesters have updated you can then rollout distributors to the latest version.
    • -auth.enabled is marked deprecated. New flag is -multitenancy.enabled and is set to false by default. This is a breaking change if you were relying on auth/multitenancy being enabled by default. #646 @dgzlopes

    Enhancements

    This release contains significant improvements for performance and stability:

    • [ENHANCEMENT] Performance: Improve Ingester Record Insertion. #681 @joe-elliott
    • [ENHANCEMENT] Improve WAL Replay by not rebuilding the WAL. #668 @joe-elliott
    • [ENHANCEMENT] Preallocate byte slices on ingester request unmarshal. #679 @annanay25
    • [ENHANCEMENT] Reduce marshalling in the ingesters to improve performance. #694 @joe-elliott
    • [ENHANCEMENT] Add config option to disable write extension to the ingesters. #677 @joe-elliott
    • [ENHANCEMENT] Allow setting the bloom filter shard size with support dynamic shard count.#644 @annanay25
    • [ENHANCEMENT] GCS SDK update v1.12.0 => v.15.0, ReadAllWithEstimate used in GCS/S3 backends. #693 @annanay25
    • [ENHANCEMENT] Add a new endpoint /api/echo to test the query frontend is reachable. #714 @kvrhdn

    Bugfixes

    • [BUGFIX] Fix Query Frontend grpc settings to avoid noisy error log. #690 @annanay25
    • [BUGFIX] Zipkin Support - CombineTraces. #688 @joe-elliott
    • [BUGFIX] Zipkin support - Dedupe span IDs based on span.Kind (client/server) in Query Frontend. #687 @annanay25
    • [BUGFIX] Azure Backend - Fix an issue with the append method on the Azure backend. #736 @pedrosaraiva
    Source code(tar.gz)
    Source code(zip)
    tempo_1.0.0_checksums.txt(97 bytes)
    tempo_1.0.0_linux_amd64.tar.gz(22.99 MB)
  • v1.0.0-rc.0(Jun 2, 2021)

    Breaking changes

    • This release contains a change to communication between distributors and ingesters which requires a specific rollout process to prevent dropped spans. First, rollout everything except distributors. After all ingesters have updated you can then rollout distributors to the latest version.
    • -auth.enabled is marked deprecated. New flag is -multitenancy.enabled and is set to false by default. This is a breaking change if you were relying on auth/multitenancy being enabled by default. #646 @dgzlopes

    Enhancements

    This release contains significant improvements for performance and stability:

    • [ENHANCEMENT] Performance: Improve Ingester Record Insertion. #681 @joe-elliott
    • [ENHANCEMENT] Improve WAL Replay by not rebuilding the WAL. #668 @joe-elliott
    • [ENHANCEMENT] Preallocate byte slices on ingester request unmarshal. #679 @annanay25
    • [ENHANCEMENT] Reduce marshalling in the ingesters to improve performance. #694 @joe-elliott
    • [ENHANCEMENT] Add config option to disable write extension to the ingesters. #677 @joe-elliott
    • [ENHANCEMENT] Allow setting the bloom filter shard size with support dynamic shard count.#644 @annanay25
    • [ENHANCEMENT] GCS SDK update v1.12.0 => v.15.0, ReadAllWithEstimate used in GCS/S3 backends. #693 @annanay25
    • [ENHANCEMENT] Add a new endpoint /api/echo to test the query frontend is reachable. #714 @kvrhdn

    Bugfixes

    • [BUGFIX] Fix Query Frontend grpc settings to avoid noisy error log. #690 @annanay25
    • [BUGFIX] Zipkin Support - CombineTraces. #688 @joe-elliott
    • [BUGFIX] Zipkin support - Dedupe span IDs based on span.Kind (client/server) in Query Frontend. #687 @annanay25
    Source code(tar.gz)
    Source code(zip)
    tempo_1.0.0-rc.0_checksums.txt(102 bytes)
    tempo_1.0.0-rc.0_linux_amd64.tar.gz(22.99 MB)
  • v0.7.0(Apr 22, 2021)

    License Change

    • v0.7.0 and future versions are licensed under AGPLv3 #660

    Breaking changes

    • In an effort to move to byte based limits we removed some limits options and replaced them with byte based ones: max_spans_per_trace => max_bytes_per_trace ingestion_rate_limit => ingestion_rate_limit_bytes ingestion_burst_size => ingestion_burst_size_bytes
    • The Query/QueryFrontend call signature has changed so there will be a query interruption during rollout.

    All Changes

    • [CHANGE] Update to Go 1.16, latest OpenTelemetry proto definition and collector #546 @mdisibio
    • [CHANGE] max_spans_per_trace limit override has been removed in favour of max_bytes_per_trace. This is a breaking change to the overrides config section. #612 @annanay25
    • [CHANGE] Add new flag -ingester.lifecycler.ID to manually override the ingester ID with which to register in the ring. #625 @annanay25
    • [CHANGE] ingestion_rate_limit limit override has been removed in favour of ingestion_rate_limit_bytes. ingestion_burst_size limit override has been removed in favour of ingestion_burst_size_bytes. This is a breaking change to the overrides config section. #630 @annanay25

    Features

    • [FEATURE] Add page based access to the index file. #557 @joe-elliott
    • [FEATURE] (Experimental) WAL Compression/checksums. #638 @joe-elliott

    Enhancements

    • [ENHANCEMENT] Add a Shutdown handler to flush data to backend, at "/shutdown". #526 @annanay25
    • [ENHANCEMENT] Queriers now query all (healthy) ingesters for a trace to mitigate 404s on ingester rollouts/scaleups. This is a breaking change and will likely result in query errors on rollout as the query signature b/n QueryFrontend & Querier has changed. #557 @annanay25
    • [ENHANCEMENT] Add list compaction-summary command to tempo-cli #588 @mdisibio
    • [ENHANCEMENT] Add list and view index commands to tempo-cli #611 @mdisibio
    • [ENHANCEMENT] Add a configurable prefix for HTTP endpoints. #631 @joe-elliott
    • [ENHANCEMENT] Add kafka receiver. #613 @mapno
    • [ENHANCEMENT] Upgrade OTel collector to v0.21.0. #613 @mapno
    • [ENHANCEMENT] Add support for Cortex Background Cache. #640 @dgzlopes

    Bugfixes

    • [BUGFIX] Fixes permissions errors on startup in GCS. #554 @joe-elliott
    • [BUGFIX] Fixes error where Dell ECS cannot list objects. #561 @kradalby
    • [BUGFIX] Fixes listing blocks in S3 when the list is truncated. #567 @jojand
    • [BUGFIX] Fixes where ingester may leave file open #570 @mdisibio
    • [BUGFIX] Fixes a bug where some blocks were not searched due to query sharding and randomness in blocklist poll. #583 @annanay25
    • [BUGFIX] Fixes issue where wal was deleted before successful flush and adds exponential backoff for flush errors #593 @mdisibio
    • [BUGFIX] Fixes issue where Tempo would not parse odd length trace ids #605 @joe-elliott
    • [BUGFIX] Sort traces on flush to reduce unexpected recombination work by compactors #606 @mdisibio
    • [BUGFIX] Ingester fully persists blocks locally to reduce amount of work done after restart #628 @mdisibio
    Source code(tar.gz)
    Source code(zip)
    tempo_0.7.0_checksums.txt(97 bytes)
    tempo_0.7.0_linux_amd64.tar.gz(23.21 MB)
  • v0.6.0(Feb 18, 2021)

    0.6.0 release of Tempo comes with lots of features, enhancements and bugfixes. 🎁 🚀

    The highlights of this release include support for Compression and Exhaustive search, along with CPU performance gains from changing compression algorithm between distributors and ingesters.

    Breaking changes

    [CHANGE] Ingester cut blocks based on size instead of trace count. Replace ingester traces_per_block setting with max_block_bytes. This is a breaking change. #474 (By default, Tempo will cut blocks of size 1GB) @mdisibio [FEATURE] Added block compression. This is a breaking change b/c some configuration fields moved. #504 (By default, Tempo will use zstd for compression.) @joe-elliott [CHANGE] Refactor cache section in tempodb. This is a breaking change b/c the cache config section has changed. #485 @dgzlopes

    Enhancements

    [CHANGE/BUGFIX] Rename tempodb_compaction_objects_written and tempodb_compaction_bytes_written metrics to tempodb_compaction_objects_written_total and tempodb_compaction_bytes_written_total. #524 @gouthamve [ENHANCEMENT] Change default ingester_client compression from gzip to snappy. #522 @mdisibio [ENHANCEMENT] Add exhaustive search to combine traces from all blocks in the backend. #489 @annanay25 [ENHANCEMENT] Add per-tenant block retention #77 @mdisibio [ENHANCEMENT] Change index-downsample to index-downsample-bytes. This is a breaking change #519 @joe-elliott

    Bugfixes

    [BUGFIX] Upgrade cortex dependency to v1.7.0-rc.0+ to address issue with forgetting ring membership #442 #512 @mdisibio [BUGFIX] No longer raise the tempodb_blocklist_poll_errors_total metric if a block doesn't have meta or compacted meta. #481 @joe-elliott [BUGFIX] Replay wal completely before ingesting new spans. #525 @joe-elliott

    For the full list of changes see CHANGELOG

    Source code(tar.gz)
    Source code(zip)
    tempo_0.6.0_checksums.txt(97 bytes)
    tempo_0.6.0_linux_amd64.tar.gz(22.00 MB)
  • v0.5.0(Jan 15, 2021)

    0.5.0 Release of Tempo with important new features and improvements. 🚀 Note: This release contains some breaking changes.

    New Features

    36991f0 Add support for Azure Blob Storage backend (#340) @pedrosaraiva bc11b55 Add Query Frontend module to allow scaling the query path (#400) @annanay25

    Breaking Changes

    9e0e05a The gRPC signature from distributors to ingesters has changed. This is a breaking change when running in microservices mode with separate distributors and ingesters. To prevent errors first upgrade all ingesters, which adds the new gRPC endpoint, then upgrade the distributors. (#430) @mdisibio dd7a18e Removed disk-based caching. Please migrate to redis or memcached. (#441) @joe-elliott a796195 The ingestion_max_batch_size setting has been renamed to ingestion_burst_size. (#445) @mdisibio

    Other changes and fixes

    65a0643 Prevent race conditions between querier polling and ingesters clearing complete blocks (#421) @joe-elliott 4c0ea69 Exclude blocks in last active window from compaction (#411) @mdisibio e804efd Mixin: Ignore metrics and query-frontend route when checking for TempoRequestLatency alert. (#440) @pstibrany 40abd5d Compactor without GCS permissions fail silently (#379) @mdisibio 40abd5d Add docker-compose example for GCS along with new backend options (#397) @mdisibio @achatterjee-grafana 00d9d3a Added tempo_distributor_bytes_received_total metric (#453) @joe-elliott 9e690dd Redo tempo-cli with basic command structure and improvements (#385) @mdisibio @achatterjee-grafana

    For the full list of changes see CHANGELOG

    Source code(tar.gz)
    Source code(zip)
    tempo_0.5.0_checksums.txt(97 bytes)
    tempo_0.5.0_linux_amd64.tar.gz(20.80 MB)
  • v0.4.0(Dec 3, 2020)

    0.4.0 Release of Tempo with lots of bugfixes and performance improvements! 🎉

    This release saw three new contributors, welcome to the project @achatterjee-grafana, @AlexisSellier and @simonswine!

    Changelog

    cd8e6cbd Add level label to tempodb_compaction_objects_combined_total metric. Update operational dashboard to match (#376) @mdisibio 82bf5fc9 Add metrics for bytes and objects written during compaction (#360) @mdisibio 077b806d Add new compactor_objects_combined metric and test (#339) @mdisibio c20839c1 Add support for Redis caching (#354) @dgzlopes 2e7a5004 Change to TOC structure and other reorg to individual topics (#365) @achatterjee-grafana 143bd10e Compact more than 2 blocks at a time (#348) @mdisibio 62af44e3 Fix Ingesters Occassionally Double Flushing (#364) @joe-elliott c0cb2fab Fix value type (#383) @AlexisSellier 9b6edac3 Query Path Observability Improvements (#361) @joe-elliott 2b34ea30 [ops] Fix resources dashboard, disable tempo-query tracing by default (#353) @annanay25 aa26d613 feat: add option to S3 backend for V2 signatures (#352) @simonswine fca94847 Remove panic in ReportFatalError (#343) @joe-elliott 78f3554c Address frequent errors logged by compactor regarding meta not found (#327) @mdisibio

    Source code(tar.gz)
    Source code(zip)
    tempo_0.4.0_checksums.txt(97 bytes)
    tempo_0.4.0_linux_amd64.tar.gz(20.38 MB)
  • v0.3.0(Nov 10, 2020)

    Changelog

    eb987a85 Increase Prometheus not found metric on tempo-vulture (#301) @dgzlopes 09af806a #306 - Compactor flush to backend based on buffer size (#325) @mdisibio 493406d9 Add per tenant bytes counter (#331) @dgzlopes 077fd13e Add warnings for suspect configurations (#294) @dgzlopes d91c415e Bloom sharding (#192) @annanay25 814282d5 Build: add support for multi arch build (#311) @morlay 5d8d2f67 Fix Tempo build-info (#295) @dgzlopes 2f0917d7 Prune in-memory blocks from missing tenants (#314) @dgzlopes 7156aa34 Rename maintenance cycle to blocklist poll (#315) @dgzlopes fd114a27 Return 404 on unknown tenant (#321) @joe-elliott 4edd1fe2 Support multiple authentication methods for S3 (#320) @chancez

    Breaking Changes

    • Bloom filter sharding (#192) prevents previously written blocks from being read. Just wipe existing data and restart.
    • #315 renames maintenance_cycle to blocklist_poll in configuration for clarity.
    Source code(tar.gz)
    Source code(zip)
    tempo_0.3.0_checksums.txt(97 bytes)
    tempo_0.3.0_linux_amd64.tar.gz(19.75 MB)
  • v0.2.0(Oct 23, 2020)

Owner
Grafana Labs
Grafana Labs is behind leading open source projects Grafana and Loki, and the creator of the first open & composable observability platform.
Grafana Labs
grafana-sync Keep your grafana dashboards in sync.

grafana-sync Keep your grafana dashboards in sync. Table of Contents grafana-sync Table of Contents Installing Getting Started Pull Save all dashboard

Maksym Postument 167 Nov 29, 2022
Snowflake grafana datasource plugin allows Snowflake data to be visually represented in Grafana dashboards.

Snowflake Grafana Data Source With the Snowflake plugin, you can visualize your Snowflake data in Grafana and build awesome chart. Get started with th

Michelin 36 Nov 8, 2022
Terraform-grafana-dashboard - Grafana dashboard Terraform module

terraform-grafana-dashboard terraform-grafana-dashboard for project Requirements

hadenlabs 1 May 2, 2022
Grafana-threema-forwarder - Alert forwarder from Grafana webhooks to Threema wire messages

Grafana to Threema alert forwarder Although Grafana has built in support for pus

Péter Szilágyi 4 Nov 11, 2022
Topology-tester - Application to easily test microservice topologies and distributed tracing including K8s and Istio

Topology Tester The Topology Tester app allows you to quickly build a dynamic mi

Bas van Beek 1 Jan 14, 2022
Grafana DB2 Data Source Backend Plugin

Grafana DB2 Data Source Backend Plugin This template is a starting point for building Grafana Data Source Backend Plugins What is Grafana Data Source

null 2 Dec 13, 2021
Grafana Data Source Backend Plugin Template

Grafana Data Source Backend Plugin Template This template is a starting point for building Grafana Data Source Backend Plugins What is Grafana Data So

null 0 Jan 16, 2022
Grafana Data Source Backend Plugin

Grafana Data Source Backend Plugin This plugin allows you to receive telemetry i

Khmelev Anton 3 Sep 18, 2022
K8s-cinder-csi-plugin - K8s Pod Use Openstack Cinder Volume

k8s-cinder-csi-plugin K8s Pod Use Openstack Cinder Volume openstack volume list

douyali 0 Jul 18, 2022
Small monitor of pulseaudio volume etc. for use in xmobar, as CommandReader input

Simple PulseAudio volume monitor for xmobar This little monitor is my attempt to read the current volume and mute setting of the default sink from Pul

Özgür Kesim 1 Feb 16, 2022
Local Storage is one of HwameiStor components. It will provision the local LVM volume.

Local Storage Module English | Simplified_Chinese Introduction Local Storage is one of modules of HwameiStor which is a cloud native local storage sys

HwameiStor 165 Aug 6, 2022
Automatic manual tracing :)

autotel Automatic manual tracing :) The aim of this project is to show how golang can be used to automatically inject open telemetry tracing (https://

Przemyslaw Delewski 9 Nov 7, 2022
Taina backend Backend service With Golang

taina-backend Backend service Getting Started Essential steps to get your backend service deployed A helloworld example has been shipped with the temp

Commit App Playground 0 Nov 17, 2021
Acropolis Backend is the Go backend for Acropolis - the central management system for Full Stack at Brown

Acropolis Backend Acropolis Backend is the Go backend for Acropolis — the centra

Full Stack at Brown 1 Dec 25, 2021
Go-backend-test - Creating backend stuff & openid connect authentication stuff in golang

Go Backend Coding Practice This is my practice repo to learn about creating back

Pratik Raut 2 Feb 5, 2022
Austin See 1 Oct 7, 2022
Grafana Dashboard Manager

Grafana dash-n-grab Grafana Dash-n-Grab (GDG) -- Dashboard/DataSource Manager. The purpose of this project is to provide an easy to use CLI to interac

NetSage 145 Nov 21, 2022
Graph and alert on '.rrd' data using grafana, RRDTool and RRDSrv.

Grafana RRD Datasource A grafana datasource for reading '.rrd' files via RRDTool and RRDsrv. With this datasource you will be able to create grafana d

null 10 Oct 12, 2022
Download your Fitbit weight history and connect to InfluxDB and Grafana

WemonFit Weight monitoring for Fitbit, using InfluxDB and Grafana Generating a new certificate openssl req -new -newkey rsa:2048 -nodes -keyout lo

Eduardo Argollo 1 Oct 22, 2022