SpiceDB is a Zanzibar-inspired database that stores, computes, and validates application permissions.



SpiceDB is a Zanzibar-inspired database that stores, computes, and validates application permissions.

Developers create a schema that models their permissions requirements and use a client library to apply the schema to the database, insert data into the database, and query the data to efficiently check permissions in their applications.

Features that distinguish SpiceDB from other systems include:

See CONTRIBUTING.md for instructions on how to contribute and perform common tasks like building the project and running tests.

Why SpiceDB?

Verifiable Correctness

The data used to calculate permissions have the most critical correctness requirements in the entirety a software system. Despite that, developers continue to build their own ad-hoc solutions coupled to the internal code of each new project. By developing a SpiceDB schema, you can iterate far more quickly and exhaustively test designs before altering any application code. This becomes especially important as you introduce backwards-compatible changes to the schema and want to ensure that the system remains secure.

Optimal Flexibility

The SpiceDB schema langauge is built on top of the concept of a graph of relationships between objects. This ReBAC design is capable of efficiently supporting all popular access control models (such as RBAC and ABAC) and custom models that contain hybrid behavior.

Modern solutions to developing permission systems all have a similar goal: to decouple policy from the application. Using a dedicated database like SpiceDB not only accomplishes this, but takes this idea a step further by also decoupling the data that policies operate on. SpiceDB is designed to share a single unified view of permissions across as many applications as your organization has. This has strategy has become an industry best-practice and is being used to great success at companies large (Google, GitHub, Airbnb) and small (Carta, Authzed).

Getting Started

Installing SpiceDB

SpiceDB is currently packaged by Homebrew for both macOS and Linux. Individual releases and other formats are also available on the releases page.

brew install authzed/tap/spicedb

SpiceDB is also available as a container image:

docker pull quay.io/authzed/spicedb:latest

For production usage, we highly recommend using a tag that corresponds to the latest release, rather than latest.

Running SpiceDB locally

spicedb serve --grpc-preshared-key "somerandomkeyhere" --grpc-no-tls

Visit http://localhost:8080 to see next steps, including loading the schema

Developing your own schema

Integrating with your application

  • Fix revive lint warnings

    Fix revive lint warnings

    This is related to issue https://github.com/authzed/spicedb/issues/36

    All issues involve renaming function to drop a prefix corresponding to the package name. The fix has been done automatically with a refactoring tool.

    This creates a change in the public API as namespace.NamespaceWithComment is renamed to namespace.WithComment.

  • service-discovery: Added ZooKeeper based service discovery

    service-discovery: Added ZooKeeper based service discovery

    I have implemented an alternative service discovery that can be used without kubernetes. It uses Apache ZooKeeper. It also contains the code necessary to work inside AWS ECS containers (it can get the IP from the task and instance metadata endpoint), but it falls back to the IP of the first public network interface. The address defined in dispatch-cluster-addr takes precedence in any case.

    I will use this in our deployment on ECS. The SRV record method was not reliable so I made a custom resolver that uses ZooKeeper to discover the peers, since we were already using ZooKeeper for some of our existing services.

    This is the first time I'm coding in Go, so I hope I didn't mess up anything.

  • introduce validate command

    introduce validate command

    Closes https://github.com/authzed/spicedb/issues/290


    The purpose of this command is to take a playground file and run the assertions and validations defined.

    The rationale is that schema development happens in the playground, but once the YAML is downloaded, there is nothing developers can do other than loading it with testserve command, or uploading it back to the playground. This attempts to reuse and run the assertions and validations as test-suite outside of the the playground, and in a programmatic way rather than only interactively. Rather than duplicating the same tests in the client application, the playground tests become the canonical representation for the business rules defined in the schema.


    1. developers introduce changes in schema via the playground
    2. YAML file is downloaded and persisted in git repository
    3. changes are pushed, PR is opened, CI runs spicedb validate, demonstrating changes are sound.


    • Introducing a new CLI command is cool, exposing new API in the go code requires more consideration
    • Version 2 of the Playground file is not really API, so instead of updating the public structures, in parsed the file in two phases: one time with the public stuff, and one with the v2 fields
    • I'm not sure I got right the versioning strategy y'all have with the API. It sounds like v0 is like "it's public, but may be broken anytime". I assumed it's OK to expose methods reusing v0 types, but would definitely appreciate some guidance here


    • accepts multiple playground files as input
    • process returns 0 if valid, non-zero if invalid
    • errors by line and message are logged (e.g. can be surfaced in the GitHub PR)


    • Planning to add tests if this is the design seems sound
  • Dashboard example zed usage references HEAD formula & `login` command

    Dashboard example zed usage references HEAD formula & `login` command

    Brew installation of zed fails with the Errno:ENOENT error:

    [email protected]:~$ brew install --HEAD authzed/tap/zed
    ==> Tapping authzed/tap
    Cloning into '/home/linuxbrew/.linuxbrew/Homebrew/Library/Taps/authzed/homebrew-tap'...
    remote: Enumerating objects: 34, done.
    remote: Counting objects: 100% (34/34), done.
    remote: Compressing objects: 100% (25/25), done.
    remote: Total 34 (delta 15), reused 10 (delta 3), pack-reused 0
    Receiving objects: 100% (34/34), 8.73 KiB | 1.75 MiB/s, done.
    Resolving deltas: 100% (15/15), done.
    Tapped 2 formulae (16 files, 92.0KB).
    ==> Downloading https://ghcr.io/v2/linuxbrew/core/go/manifests/1.17.1
    ######################################################################## 100.0%
    ==> Downloading https://ghcr.io/v2/linuxbrew/core/go/blobs/sha256:65e57b46322ebb9957754293cc66012579d93a7795b286bd2f267758f8006d7b
    ==> Downloading from https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:65e57b46322ebb9957754293cc66012579d93a7795b286bd2f267758f8006d7b?se=2021-09-30T17%3A50%3A00Z&sig=hB1Y%2FHG%2FMPADkzMm6M92
    ######################################################################## 100.0%
    ==> Cloning https://github.com/authzed/zed.git
    Cloning into '/home/ibazulic/.cache/Homebrew/zed--git'...
    ==> Checking out branch main
    Already on 'main'
    Your branch is up to date with 'origin/main'.
    ==> Installing zed from authzed/tap
    ==> Installing dependencies for authzed/tap/zed: go
    ==> Installing authzed/tap/zed dependency: go
    ==> Pouring go--1.17.1.x86_64_linux.bottle.tar.gz
     /home/linuxbrew/.linuxbrew/Cellar/go/1.17.1: 10,810 files, 537.4MB
    ==> Installing authzed/tap/zed --HEAD
    Error: An exception occurred within a child process:
      Errno::ENOENT: No such file or directory - zed

    Pulling zed normally via brew install authzed/tap/zed works but this binary does not have the login option needed to log into spicedb according to instructions.

  • Support OpenTelemetry collectors

    Support OpenTelemetry collectors

    Everything is instrumented using OpenTelemetry, but Jaeger is the only format exposed by command-line flags. If it can be made generic enough, this could be upstreamed into cobrautil.

  • Add quickstart examples

    Add quickstart examples

    Closes https://github.com/authzed/spicedb/issues/469

    This creates a collection of quickstart Docker Compose files to get new-comers quickly running with the datastore of their choosing. ~I also moved k8s/example.yaml under the examples/ directory, since it seemed to fit well there. Though, I'm not sure if this breaks documentation links.~ I reverted this change, things broke when that file moved.

    Most datastores were straightforward, but Cockroach and Spanner (especially Spanner) required some extra plumbing to get them operational.

  • A confusing place in module lexer

    A confusing place in module lexer

    in the module lexer, the lastNonWhitespaceToken in struct Lexer means "The last token returned that is non-whitespace"

    the only space that used lastNonWhitespaceToken in code is below assign a TokenTypeWhitespace to lastNonWhitespaceToken

    if t == TokenTypeWhitespace {
    	l.lastNonWhitespaceToken = currentToken

    should be like this?

    if t != TokenTypeWhitespace {
    	l.lastNonWhitespaceToken = currentToken
  • fix: skip comments when loading test relationships

    fix: skip comments when loading test relationships

    Fixes https://github.com/authzed/spicedb/issues/329

    From a brief test in the playground, comments in the Test Relationships are only of the format // my comment and not /** my comment */. This may need to be fact check though 😁

  • Bump golang from 1.17.1-alpine3.13 to 1.17.2-alpine3.13

    Bump golang from 1.17.1-alpine3.13 to 1.17.2-alpine3.13

    Bumps golang from 1.17.1-alpine3.13 to 1.17.2-alpine3.13.

  • Proposal: SpiceDB telemetry

    Proposal: SpiceDB telemetry

    Telemetry Proposal

    As a small team developing very high performance software, we're constantly prioritizing between improving features, stability, performance, and user experience. While we obviously have metrics from our hosted SpiceDB instances on Authzed.com, our last product taught us that open-source and enterprise users often use the software in surprisingly different ways. In order to develop a tight feedback loop with our users, we would like to add some opt-out telemetry information to SpiceDB. As big fans of open source, and as heavy users ourselves, we understand that users can be sensitive to data collection and exfiltration efforts by the software they run. That's why it is our goal to be as open and transparent about this process as possible.

    Philosophical Goals

    • [ ] Put a file called TELEMETRY.md in the root of the repository that includes the final form of this proposal and easy instructions for disabling telemetry
    • [ ] Users will be able to see the exact data that is collected and shipped at all times
    • [ ] It will be simple to disable telemetry, although it will remain opt-out to reduce response bias in the results.
    • [ ] Users will be notified at the INFO log level every time an instance of SpiceDB starts that telemetry is enabled
    • [ ] A notification will be written at the INFO log level every time telemetry data is sent
    • [ ] Metrics collection should not impact the performance of latency sensitive operations
    • [ ] Metrics will be anonymous and aggregate, we will not be able to track them back to a specific user.

    Proposed Data Collection

    Each of the following metrics includes the justification and the specific way in which we will use the data to measure and improve the software.

    Running SpiceDB instances per installation

    Knowing average cluster size will help us to direct resources to service discovery, clustering, and remote re-dispatch.

    Distributed cache hit ratio

    The Zanzibar model lives and dies by how effectively it utilizes the distributed cache. While we have our own tests and metrics, one or more user clusters underperforming would be an indicator that something is awry with our consistent hashing or data access assumptions.

    Number of object definitions

    In the Zanzibar paper, Google gave metrics about average schema size. A histogram would have been better! Schema complexity directly correlates to resolution complexity, and knowing that open-source uses are using more or less complex schemas than anticipated would help us direct resources toward nested query complexity.

    Number of relationships

    Similarly to schema complexity, the amount of data also controls the re-dispatch fan-out and resolution complexity. If schemas rely heavily on the arrow -> operator on very large datasets, this would lead us to invest in improvements in resolution order and heuristics.

    Number of redispatches/subproblems per operation

    A metrics that is the unification of data and schema, this is a direct, hardware-independent measurement of resolution complexity, and would direct investments similarly to schema and data complexity.

    Number of calls (but not latencies) to specific APIs

    The Zanzibar paper gives the call frequencies for certain operations, but does not tell the complete story. In the Zanzibar paper, Read is used more than Check, and Zanzibar does not support Lookup at all. In order to make sure we're investing in improvements to each method appropriately, it is important to understand the call-frequency usage patterns.

    Considered and Rejected

    It is often as important to know what was considered and rejected as it is to know what was included in the final proposal.

    Rejected: Collecting API latency metrics

    This is extremely infrastructure dependent, and no useful information could be gleaned from it in aggregate. Hardware independent complexity measures are preferred as a result.

    Rejected: User driven redaction of specific metrics

    While this sounds interesting at the outset, having an incomplete picture of the metrics from each SpiceDB installation could be statistically misleading. For example, knowing the cache hit ratio but not the schema complexity would make it hard to know if there is a data issue or schema issue.

    Rejected: Opt-in metrics

    While this is obviously very user-friendly, we're all aware of the problems of response bias in statistics. We may end up with an entirely different class of user choosing to report metrics than the average. This may skew efforts in the wrong direction. For example, if only enterprises opt-in to the data collection, we may completely overlook problems with the software that arise during the small-scale development phase.

    Open Questions

    • What data pipeline should we use to collect metrics? We use Prometheus for everything else, but this almost necessarily needs to be push-centric, which Prometheus cautions against.
    • Are users comfortable with us enlisting the help of a sub-processor, such as Mixpanel, Amplitude, Google Analytics, etc. for tracking and reporting the data that we collect?
  • Add GC to the namespace_config table

    Add GC to the namespace_config table

    Currently, when a docker image is started with a bootstrap file and the flag --datastore-bootstrap-overwrite=true, the namespace_config table expands with every restart of the docker image. The deleted rows are not cleanup up by the GC, like it does for the relation_tuple and relation_tuple_transaction tables. The GC should also clean the namespace_config table.

    hint/good first issue priority/2 medium area/perf area/datastore 
    opened by rolevinks 0
  • Support IAM database authentication for Postgres datastore

    Support IAM database authentication for Postgres datastore

    As of now, there is only support for username:password in the connection string for Postgres. For those who use an AWS's hosted Postgres, it's preferable to use IAM database authentication.

    priority/3 low area/datastore state/needs discussion 
    opened by jhalleeupgrade 3
  • PoC: Export model as JSON

    PoC: Export model as JSON

    As discussed on Discord, here an quick proof of concept to generate schema information in a machine-readable fashion for code generation. This information would allow to at least generate the string constants that otherwise clutter client libraries.

    In a next step one could add type-information and permission predicates.

    Example Output:

        "name": "user",
        "namespace": "default",
        "relations": [],
        "permissions": []
        "name": "platform",
        "namespace": "default",
        "relations": [
            "name": "administrator"
        "permissions": [
            "name": "super_admin"
            "name": "create_tenant"
        "name": "tenant",
        "namespace": "default",
        "relations": [
            "name": "platform"
            "name": "parent"
            "name": "administrator"
            "name": "agent"
            "name": "tenant_administrator"
            "name": "admin_administrator"
        "permissions": [
            "name": "administer_user"
            "name": "create_admin"
        "name": "administrator",
        "namespace": "default",
        "relations": [
            "name": "self"
            "name": "tenant"
        "permissions": [
            "name": "write"
            "name": "read"
  • Add revision fuzzing for picking optimized revisions

    Add revision fuzzing for picking optimized revisions

    Right now when a new optimized revision becomes the de-facto choice, the entire existing cache is (practically) simultaneously invalidated. In order to decrease the effect of this cutover, we should probably phase between the outgoing optimized revision and the incoming optimized revision over some period of time. Other revision picking logic, such as AtLeastAsFresh consistency happens after the optimized revision picking and should therefore be unaffected.

  • Better Caching Cost & Density

    Better Caching Cost & Density

    Improve Cache Density and Cost Estimate

    Hi Authzed folks - apologies in advance for this wall of text. 🙂

    I noticed a few weeks ago that the cache cost functions are not accurate if the cost represents bytes (which I believe it does). For example, the cost of a checkResultEntry is set to just 8 bytes, the cost of that struct when empty. But that cost doesn't include the memory pointed to by checkResultEntry.response, which could be much more.

    As I worked to improve the cache cost functions, I found a way to fit 2x more cache items into the same amount of memory: instead of caching the Go structs, cache the protobuf-marshaled bytes.

    The improved cache cost functions help keep the physical memory used by the cache much closer to the configured max cost.

    I'd be happy to open some PRs for these changes, but wanted to post my findings here and see which of the changes you'd like (if any).

    Cache Density

    I experimented with storing the marshaled bytes of protobuf messages rather than the Go objects directly.

    There are two main advantages to this:

    • Calculating the cost of a []byte is quite simple. Most importantly, the cost function does not need to change as the protobuf message changes: protobuf takes care of those details.
    • Second, the cache can store more items per MB of space used. In one test (below), the cache fit 212% more items per MB! However, later tests with more accurate cost functions improved cache density by a more modest 50-70%. All tests were on a single local instance of spicedb, so a load test at scale is warranted.

    Below are the results for two tests run on a single spicedb instance serving check requests. Total profiled space is for the whole application, while cache profiled space includes just the stacks related to caching. In this test, the cost function was still poor, but it does show that using marshaled bytes significantly improves cache density. | test | total profiled space | cache profiled space | cache calculated cost | key count | keys/ cache profiled MB | | --- | --- | --- | --- | --- | --- | | protobuf structs | 69.16 MB | 54.85 MB | 32 MB | 142,857 | 2,605 | | marshaled []byte | 77.02 MB | 61.0 MB | 30.1 MB | 337,311 | 5,529 |

    Of course, marshaling isn't free. However, existing code already calls proto.Clone() on every cache write, and as that is replaced with the call to proto.Marshal(), the relative cost may not be significant. Still, a test to check impact on CPU during a load test is warranted.

    Cache Cost Function

    Now, the long story.


    As stated above, the cache was using more memory than the 'max cost' setting because the cost of each cached item was being set to the size of a pointer (8 bytes) rather than the size of the memory referenced by a pointer.

    The first attempt at improving the cost function made the situation better, but there was still a substantial difference between the configured cache size and the total memory used. Below are flamegraphs for in-use space for a local spicedb instance, taken after running a 15 minute load test of check requests. Between 0 and 32 MB cache, the memory increased 59MB, 184% the increase in cache size. Between 32 and 64 MB cache, the memory increased 70MB, 219% the increase in cache size.

    1 byte Cache (single instance, local) image

    32 MB Cache (single instance, local) image

    64 MB Cache (single instance, local) image

    Aside on Profiling

    In the flamegraphs above, the in-use bytes within ristretto.(*Cache).processItems are very close to the allocated cache size. Also, the bytes allocated within caching.(#Dispatcher).DispatchCheck grow proportionally with the cache size.

    Initially I thought this meant the DispatchCheck() function was responsible for leaking memory. However, I no longer think that is the case.

    Heap profiles work by sampling allocations. When a sample is taken, the stack responsible for the allocation is added to the profile. So, seeing DispatchCheck() in the flamegraph doesn't mean that DispatchCheck() is responsible for keeping bytes from GC, only that it was responsible for originally allocating those bytes.

    Reviewing the spiceDB code, this makes sense - DispatchCheck() creates the object that is stored in the cache (via proto.Clone()), but then it is the cache that keeps that object from GC. When ristretto stores an item, it allocates a wrapper struct, which explains why it is also in the profile.

    Given this, the best way to measure memory used by the cache is to sum ristretto.(*Cache).processItems and proto.Clone. Doing so for the examples above gives 113MB for the 64MB cache (176% larger) and 59MB for the 32MB cache (184% larger).

    Size Classes

    One of the main breakthroughs I had was learning about class sizes in Go. Class sizes are predefined object sizes (8, 16, 24, 32, 48, etc). When allocating a 'small' object, Go takes the number of required bytes and then allocates the next size class larger than what is required. This is done to make GC tracking more efficient for small objects. See 'One more thing' section.

    So, a cost function that returns only the bytes required for an object will systematically under-report the actual cost in memory!

    This article indicates that append() is aware of class sizes and can be used to find them at run time. This code demonstrates: https://go.dev/play/p/lRaSqzunZ73

    After accounting for class sizes, I was able to write a cost function that exactly matched the allocated bytes, as reported by memstats.TotalAlloc.

    Keys Count Too

    Still, even accounting for size classes, the cost function was not controlling memory like I wanted. How could my tests show a perfect match to the reported allocated memory, but still allow the cache to grow beyond max cost? The answer is fairly simple: cache keys are stored too, and take up memory. After including keys in the cost function, I got the following results (caching []byte):

    | test | total profiled space | cache profiled space | cache computed space | key count | keys/cache profiled MB | | --- | --- | --- | --- | --- | --- | | 8MB cache | 33.1 MB | 16.2 MB | 8 MB | 42,094 | 2,598 | | 16MB cache | 40.4 MB | 24.3 MB | 16 MB | 84,097 | 3,460 | | 32MB cache | 63.8 MB | 44.4 MB | 32 MB | 168,152 | 3,787 |

    The difference in cache size between 8MB and 16MB max cost was 8.1MB! Between 16MB and 32MB, 20.1 MB, which is off by about 26%.

    Final Cost Function (protobuf structs, not bytes)

    This test was run with a cost function that accounted for keys and size classes. No changes were made to the objects stored in the cache for this test.

    | test | total profiled space | cache profiled space | cache computed space | key count | keys/cache profiled MB | | --- | --- | --- | --- | --- | --- | | no cache (1 byte) | 15.6 MB | 0 MB | 0 MB | 0 | 0 | | 16MB cache | 34.8 MB | 21.5 MB | 16 MB | 46,916 | 2,182 | | 32MB cache | 55.2 MB | 37.8 MB | 32 MB | 93,825 | 2,482 |

    This shows there is still some overhead for the cache, since going from a cache with only 1 byte max cost (effectively, no cache) to 16 MB cost added 21.5 MB to memory used by the cache. But, going from 16MB to 32MB added 16.3MB, off by ~2%.

    Compared to the test which used a similar cost function, but stored bytes instead, this also shows that storing bytes is still more efficient, although less so than in the original test. This makes sense, because now that they key is included in the cost function, the space saved on the items themselves is a smaller proportion of the total cost per entry.

    Misc Learnings

    • Are there memory leaks?
      • I don't think so. Once the cache reaches capacity and begins to evict items, memory use is stable.
    • Is protocol buffers increasing memory footprint?
      • The items stored in the cache are protobuf generated types and have some fields specific to protobuf (protoimpl.MessageState, protoimpl.SizeCache, protoimpl.UnknownFields). It is possible these fields are getting populated after the cost function runs and increasing memory footprint beyond what the cost function calculates. Running spicedb locally, I did see that this was the case - sending a message from the cache caused its size to increase significantly. However, subsequent sends shared the memory added by the first send. To further test if protobuf fields were increasing cost, I ran tests where a the cached object was never returned to callers, only deep copies. Memory use was similar enough that I don't think the protobuf fields have a significant impact.
      • 32 MB Cache (main) image
      • 32 MB Cache (clone on return) image
    area/perf area/observability area/dispatch 
    opened by benCoomes 9
  • add more CLI options to mysql datastore.

    add more CLI options to mysql datastore.

    Sorry, I borked the rebase..

    This is a Follow-up PR from the MySQL Datastore implementation.

    It updates the SplitQueryCount because that is what is being fetched from cobra.

    The following CLI options are now supported by the MySQL Datastore:

    • SplitAtUsersetCount
    • GCMaxOperationTime

    Co-authored-by: Bryan Huhta [email protected] Co-authored-by: Craig Steinberger [email protected]

    opened by christroger 1
