An observability database aims to ingest, analyze and store Metrics, Tracing and Logging data.

Overview

BanyanDB

BanyanDB, as an observability database, aims to ingest, analyze and store Metrics, Tracing and Logging data. It's designed to handle observability data generated by Observability platform and APM system, like Apache SkyWalking etc.

Resource

Contributing

For developers who want to contribute to this project, see Contribution Guide

License

Apache 2.0 License.

Issues
  • Add elementUI, sass and sass-loader@7.3.1

    Add elementUI, sass and [email protected]

    • Add elementUI, sass and [email protected]
    • Initialize page structure
      • Add Database.vue and Structure.vue
      • delete Laws.vue
    • Add Header Component
      • Add NavMenu from ElementUI
    ui 
    opened by WuChuSheng1 19
  • Add groupBy to the measure query request

    Add groupBy to the measure query request

    Add groupBy and aggregation function to the query request:

    • the query request doesn't support sub or nested aggregation
    • the response's timestamp field is null on returning the aggregated result
    • the result is as same as the order by if the request doesn't specify the agg function on grouping
    api 
    opened by hanahmily 12
  • Add docs

    Add docs

    Fixes https://github.com/apache/skywalking/issues/8989

    I leave empty CRUD examples for the future CLI tools.

    Signed-off-by: Gao Hongtao [email protected]

    documentation 
    opened by hanahmily 6
  • Benchmark flatbuffers and protobuf

    Benchmark flatbuffers and protobuf

    Benchmark env

    • CPU: Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz
    • Memory: 8 GB 2133 MHz LPDDR3
    • Java: JDK8u292b10
    • protoc: 3.17.3
    • protobuf-java: 3.17.2
    • flatc: 2.0.0
    • flatbuffers-java: 2.0.2

    Performance: Serialization+Java

    /**
     * # JMH version: 1.32
     * # VM version: JDK 1.8.0_292, OpenJDK 64-Bit Server VM, 25.292-b10
     * # VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
     * # VM options: -javaagent:/Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=55698:/Applications/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
     * # Blackhole mode: full + dont-inline hint
     * # Warmup: 5 iterations, 10 s each
     * # Measurement: 5 iterations, 10 s each
     * # Timeout: 10 min per iteration
     * # Threads: 1 thread, will synchronize iterations
     * # Benchmark mode: Average time, time/op
     * <p>
     * Benchmark                                                                  Mode  Cnt     Score    Error   Units
     * WriteEntitySerializationTest.flatbuffers                                   avgt   25  3044.054 ± 34.837   ns/op
     * WriteEntitySerializationTest.flatbuffers:·gc.alloc.rate                    avgt   25   911.872 ± 10.301  MB/sec
     * WriteEntitySerializationTest.flatbuffers:·gc.alloc.rate.norm               avgt   25  3056.000 ±  0.001    B/op
     * WriteEntitySerializationTest.flatbuffers:·gc.churn.PS_Eden_Space           avgt   25   912.394 ± 10.261  MB/sec
     * WriteEntitySerializationTest.flatbuffers:·gc.churn.PS_Eden_Space.norm      avgt   25  3057.783 ± 10.009    B/op
     * WriteEntitySerializationTest.flatbuffers:·gc.churn.PS_Survivor_Space       avgt   25     0.190 ±  0.018  MB/sec
     * WriteEntitySerializationTest.flatbuffers:·gc.churn.PS_Survivor_Space.norm  avgt   25     0.637 ±  0.059    B/op
     * WriteEntitySerializationTest.flatbuffers:·gc.count                         avgt   25  3878.000           counts
     * WriteEntitySerializationTest.flatbuffers:·gc.time                          avgt   25  2168.000               ms
     * WriteEntitySerializationTest.protobuf                                      avgt   25   514.010 ± 12.638   ns/op
     * WriteEntitySerializationTest.protobuf:·gc.alloc.rate                       avgt   25  3833.648 ± 90.162  MB/sec
     * WriteEntitySerializationTest.protobuf:·gc.alloc.rate.norm                  avgt   25  2168.000 ±  0.001    B/op
     * WriteEntitySerializationTest.protobuf:·gc.churn.PS_Eden_Space              avgt   25  3835.530 ± 94.020  MB/sec
     * WriteEntitySerializationTest.protobuf:·gc.churn.PS_Eden_Space.norm         avgt   25  2168.989 ±  6.134    B/op
     * WriteEntitySerializationTest.protobuf:·gc.churn.PS_Survivor_Space          avgt   25     0.195 ±  0.020  MB/sec
     * WriteEntitySerializationTest.protobuf:·gc.churn.PS_Survivor_Space.norm     avgt   25     0.110 ±  0.011    B/op
     * WriteEntitySerializationTest.protobuf:·gc.count                            avgt   25  3629.000           counts
     * WriteEntitySerializationTest.protobuf:·gc.time                             avgt   25  2227.000               ms
     */
    

    Performance: Deserialization+Go

    goos: darwin
    goarch: amd64
    pkg: github.com/apache/skywalking-banyandb/benchmark/go-bench
    cpu: Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz
    Benchmark_Deser_Flatbuffers-8   	100000000	       730.5 ns/op	      64 B/op	       2 allocs/op
    Benchmark_Deser_Protobuf-8      	14826262	      5044 ns/op	    1944 B/op	      49 allocs/op
    PASS
    ok  	github.com/apache/skywalking-banyandb/benchmark/go-bench	153.927s
    

    Size

    For the same entity as illustrated in WriteEntitySerializationTest.EntityModel, (Unit in bytes)

    • Flatbuffers: 512
    • Protobuf: 169

    which means Protobuf is much more compact.

    Conclusion

    From the perspective of bandwidth and write performance, protobuf is definitely a better choice.

    However, flatbuffer has better deserialization performance, in particular for partially read. The process of deserialization (i.e. GetRootAs***) actually does nothing. The real deserialization happens when users try to read from the byte buffer.

    References

    https://www.ida.liu.se/~nikca89/papers/networking20c.pdf

    Our conclusion is similar to what has been described in the above paper. See Fig.3,4,5.

    opened by lujiajing1126 6
  • Feat: query module

    Feat: query module

    Design

    This is a very perlimenary PR for the query module so far. Much things have to be considered further.

    Since it would be a principle module, I just want to discuss the current design/implementation ASAP to avoid improper design and find some better ideas to proceed.

    Logical Plan

    A Logical Plan is a DAG (Directed Acyclic Graph) of Params. The Param defines necessary parameters for query execution. The parameters are well prepared during the logical plan composing in order to reduce extra cost while executing physical plan.

    Plot

    the logical plan can be plotted as Dot graph. For example,

    digraph  {
    
    	n2[label="ChunkIDsFetch{metadata={group=skywalking,name=trace},projection=[TraceID startTime]}"];
    	n1[label="ChunkIDsMerge{}"];
    	n7[label="IndexScan{begin=1623203253604099000,end=1623214053604099000,KeyName=duration,conditions=[<=1000],metadata={group=skywalking,name=trace}}"];
    	n4[label="Pagination{Offset=0,Limit=0}"];
    	n5[label="Root{}"];
    	n3[label="SortMerge{fieldName=startTime,sort=DESC}"];
    	n6[label="TraceIDFetch{TraceID=aaaaaaaa,metadata={group=skywalking,name=trace},projection=[TraceID startTime]}"];
    	n2->n3;
    	n1->n2;
    	n7->n1;
    	n5->n6;
    	n5->n7;
    	n3->n4;
    	n6->n3;
    }
    

    We can leverage the online toolkit to visualize the logical plan,

    graphviz

    Physical Plan

    A Physical Plan contains the logical plan and the Transform(s) corresponding to each logical.Op.

    While the plan is triggered to run, a reversed topology-sorted slices (Future as items) will be generated from the logical plan.

    Tasks

    • [x] client utils for building EntityCriteria
    • [x] logical plan: Ops such as Sort, OffsetAndLimit, ChunkIDMerge, TableScan and IndexScan
    • [x] physical plan: topology sort, Transform
    • [ ] complete API to connect with Liaison (Add handlers) (Maybe next PR)

    To be discussed

    Index selection and optimization stage

    For now, I only use single-value indexes. But in the current implementation, we may be able to improve index selection during the process of generating Logical Plan.

    Any better idea? Since for the traditional databases, normally they have optimization stages (usually after generating hierarchical logical plan?) for indexes selection. How can we fit this optimization stage in our implementations?

    Sort and field orderliness

    I believe we have to impose stronger preconditions to sort-field since it is not possible to sort on a sparse field.

    And Sort requires the Field to be arranged in a strict order, i.e. we have to use fieldIndex number to access the field that is needed to be sorted quickly. Otherwise, it may cost much resources to find the specific field every time.

    opened by lujiajing1126 6
  • Add measure query

    Add measure query

    This PR introduces basic measure query feature with local index scan.

    The implementation is based on,

    1. No global index for measure
    2. No limit and offset for measure

    As we've discussed, GroupBy and Aggregation will come after this PR.

    enhancement 
    opened by lujiajing1126 5
  • Reload stream when metadata changes

    Reload stream when metadata changes

    This PR supersedes the previous PR #65 to allow metadata reload while it put logic mostly in the stream module instead of pursuing a strong-consistent metadata in the previous PR.

    As a result, the stream model starts a serially-running background job to continuously reconcile the opened stream (with underlying storage).

    Please have a review with the new design @hanahmily

    More test cases will be added later. I suppose Eventually method is necessary for these kinds of tests.

    opened by lujiajing1126 5
  • Introduce bytebuffer pool

    Introduce bytebuffer pool

    In this PR, I've introduced a very simple bytebuffer to optimize byte manipulation.

    The benchmark of query path has been added, the result shows approx. ~10% less allocation.

    opened by lujiajing1126 5
  • Update Go to 1.18

    Update Go to 1.18

    Update Go to 1.18. Close https://github.com/apache/skywalking/issues/9169

    Notable dependencies upgrade,

    • buf: from 1.0.0 to 1.5.0
    • protobuf and gen tools: from 1.17.1 to 1.18.0
    • klauspost/compress: from 1.13.1 to 1.15.6
    • grpc and gen tools: from 1.39.0 to 1.47.0
    enhancement 
    opened by lujiajing1126 4
  • Fix docker CI

    Fix docker CI

    Signed-off-by: Megrez Lu [email protected]

    1. Run docker build for both PR and main branch
    2. Skip docker push for PR
    3. Use go install instead of go get to install goimports due to the recent change in go 1.1.6
    opened by lujiajing1126 4
  • Add the stress test framework

    Add the stress test framework

    This introduced a stress test framework to generate fixed traffic to verify several behaviors of the banyand server.

    Verify the flushing mechanism of tsdb

    Most caches of tsdb are based on the badger which limits the memory usage by a fixed number. But the inverted index's memtable gets flushed by an event from the main storage.

    The below screenshot shows the flushing of the main storage.

    main storage flushing

    The below diagram indicates the inverted indices flushing under the controller of the main storage. inverted index flushing

    Check encoding algorithm

    This diagram shows the compression ratio of a stream under the case that each instance keeps sending the element to the database server. That's why the ratio is such high(about 98%). We will leverage the e2e test to get a more practical result in the next round. This result only indicates that the algorithm is working as expected.

    image

    This is the gorilla encoding algorithm's result which is similar to the theory result, around 30%

    image

    Signed-off-by: Gao Hongtao [email protected]

    testing 
    opened by hanahmily 3
  • Add streaming API and topN aggregator

    Add streaming API and topN aggregator

    This PR tends to introduce a simple stream processing API and implementation for TopN aggregation.

    Design

    Flow is an abstraction for the streaming processes, with following operator(s),

    • Source: provide data stream for the Flow. As we've discussed before, it should be a listener consuming Measure write request continuously. Later, we could use a global binlog/WAL.
    • Mapper: func(T) R which transforms an element from T to R
    • Filter: func(T) bool which predicates whether an element should be passed to the downstream
    • Windows: Currently only SlidingEventTimeWindows is implemented
    • Sink: the place to write final result. We have to write the final result, e.g. TopN into a separate Measure storage.

    Filter

    s := flow.New(tt.input).
        Filter(func(i int) bool {
            return i%2 == 0
        }).
        To(snk)
    

    The Filter operator allows us to filter by criteria set in TopNAggregation.

    Mapper

    s := flow.New(tt.input).
        Map(func(i int) int {
            return i * 2
        }).
        To(snk)
    

    The Mapper operator allows us to extract field from the record and transform it by groupBy operation.

    We currently do not have a separate keyBy (for example Apache/Flink) operation to do groupBy for simplicity

    Windows

    Generally, the split of the window can be related to the "time", which could be either of the following concepts,

    • Event Time
    • Processing Time
    image

    The above graph from Flink community discriminates these concepts. But in our case, the only "time" we care about is EventTime which represents the exact moment the record is produced, since we need to use the EventTime to drive the timely flush of the aggregation results, e.g. TopN ranks.

    image Sliding windows can fulfill our requirement in the sense that we need to flush the data more frequently.

    It means the flush interval should be much smaller than the interval of the real data points. For example in OAP, the downsampling rate can be MINUTE while the flush timer is set to 25 seconds by default.

    Technically, the SlidingEventWindow is built on,

    • A PriorityQueue maintains records which have not yet been emitted,
    • A PriorityQueueSet maintains all registered (depulicated) timers which will be triggered later

    TopN

    With the above semantics, we can impl TopN as a window aggregation function,

    flow.New(tt.input).
        Filter(...). // where
        Mapper(...) // select and groupBy 
        Window(NewSlidingTimeWindows(time.Minute*1, time.Second*15)).
        TopN(10, WithCacheSize(1000), ...) // TopN with parameters
        To(snk)
    

    TopN is implemented with the help of a TreeMap which maps sortedKey to the collection of records.

    api 
    opened by lujiajing1126 4
Releases(v0.1.0)
  • v0.1.0(Jun 5, 2022)

    Downloads

    http://skywalking.apache.org/downloads/

    Features

    • BanyanD is the server of BanyanDB
      • TSDB module. It provides the primary time series database with a key-value data module.
      • Stream module. It implements the stream data model's writing.
      • Measure module. It implements the measure data model's writing.
      • Metadata module. It implements resource registering and property CRUD.
      • Query module. It handles the querying requests of stream and measure.
      • Liaison module. It's the gateway to other modules and provides access endpoints to clients.
    • gRPC based APIs
    • Document
      • API reference
      • Installation instrument
      • Basic concepts
    • Testing
      • UT
      • E2E with Java Client and OAP

    Full Changelog: https://github.com/apache/skywalking-banyandb/commits/v0.1.0

    Source code(tar.gz)
    Source code(zip)
Owner
The Apache Software Foundation
The Apache Software Foundation
Distributed tracing using OpenTelemetry and ClickHouse

Distributed tracing backend using OpenTelemetry and ClickHouse Uptrace is a dist

Uptrace 740 Jul 1, 2022
A tool I made to quickly store bug bounty program scopes in a local sqlite3 database

GoScope A tool I made to quickly store bug bounty program scopes in a local sqlite3 database. Download or copy a Burpsuite configuration file from the

null 3 Nov 18, 2021
[mirror] the database client and tools for the Go vulnerability database

The Go Vulnerability Database golang.org/x/vulndb This repository is a prototype of the Go Vulnerability Database. Read the Draft Design. Neither the

Go 44 Jun 24, 2022
Database - Example project of database realization using drivers and models

database Golang based database realization Description Example project of databa

Denis 1 Feb 10, 2022
A tool to run queries in defined frequency and expose the count as prometheus metrics. Supports MongoDB and SQL

query2metric A tool to run db queries in defined frequency and expose the count as prometheus metrics. Why ? Product metrics play an important role in

S Santhosh Nagaraj 19 Jul 1, 2022
Library for scanning data from a database into Go structs and more

scany Overview Go favors simplicity, and it's pretty common to work with a database via driver directly without any ORM. It provides great control and

Georgy Savva 594 Jul 4, 2022
Lightweight SQL database written in Go for prototyping and playing with text (CSV, JSON) data

gopicosql Lightweight SQL database written in Go for prototyping and playing wit

null 2 May 17, 2022
Convert data exports from various services to a single SQLite database

Bionic Bionic is a tool to convert data exports from web apps to a single SQLite database. Bionic currently supports data exports from Google, Apple H

Bionic 138 Jun 29, 2022
Dumpling is a fast, easy-to-use tool written by Go for dumping data from the database(MySQL, TiDB...) to local/cloud(S3, GCP...) in multifarious formats(SQL, CSV...).

?? Dumpling Dumpling is a tool and a Go library for creating SQL dump from a MySQL-compatible database. It is intended to replace mysqldump and mydump

PingCAP 261 Jun 24, 2022
Create key value sqlite3 database from tabular data, fast.

Turn tabular data into a lookup table using sqlite3. This is a working PROTOTYPE with limitations, e.g. no customizations, the table definition is fixed, etc.

Martin Czygan 5 Apr 2, 2022
Make a sqlite3 database from tabular data, fast.

MAKTA make a database from tabular data Turn tabular data into a lookup table using sqlite3. This is a working PROTOTYPE with limitations, e.g. no cus

Martin Czygan 5 Apr 2, 2022
A database connection tool for sensitive data

go-sql 用于快速统计数据库行数、敏感字段匹配、数据库连接情况。 usage ./go-sql_darwin_amd64 -h ./go-sql_darwin_amd64 -f db.yaml -k name,user ./go-sql_darwin_amd64 -f db.yaml --min

null 5 Apr 4, 2022
A go package to add support for data at rest encryption if you are using the database/sql.

go-lockset A go package to add support for data at rest encryption if you are using the database/sql to access your database. Installation In your Gol

Bartlomiej Mika 0 Jan 30, 2022
InfluxDB metrics exporter for OpenCensus.io

opencensus-exporter-influxdb InfluxDB metrics exporter for OpenCensus.io Installation $ go get -u github.com/starvn/opencensus-exporter-influxdb Regi

Huy Duc Dao 1 Nov 6, 2021
Simple key-value store on top of SQLite or MySQL

KV Work in progress, not ready for prime time. A simple key/value store on top of SQLite or MySQL (Go port of GitHub's KV). Aims to be 100% compatible

Sergio Rubio 1 Mar 15, 2022
A Go rest API project that is following solid and common principles and is connected to local MySQL database.

This is an intermediate-level go project that running with a project structure optimized RESTful API service in Go. API's of that project is designed based on solid and common principles and connected to the local MySQL database.

Kıvanç Aydoğmuş 21 Jun 6, 2022
Database Access Layer for Golang - Testable, Extendable and Crafted Into a Clean and Elegant API

REL Modern Database Access Layer for Golang. REL is golang orm-ish database layer for layered architecture. It's testable and comes with its own test

REL 542 Jun 30, 2022
🏋️ dbbench is a simple database benchmarking tool which supports several databases and own scripts

dbbench Table of Contents Description Example Installation Supported Databases Usage Custom Scripts Troubeshooting Development Acknowledgements Descri

Simon Jürgensmeyer 70 Jul 4, 2022