Accelerate aggregated MD5 hashing performance up to 8x for AVX512 and 4x for AVX2.

Overview

md5-simd

This is a SIMD accelerated MD5 package, allowing up to either 8 (AVX2) or 16 (AVX512) independent MD5 sums to be calculated on a single CPU core.

It was originally based on the md5vec repository by Igneous Systems, but has been made more flexible by amongst others supporting different message sizes per lane and adding AVX512.

md5-simd integrates a similar mechanism as described in minio/sha256-simd for making it easy for clients to take advantages of the parallel nature of the MD5 calculation. This will result in reduced overall CPU load.

It is important to understand that md5-simd does not speed up a single threaded MD5 hash sum. Rather it allows multiple independent MD5 sums to be computed in parallel on the same CPU core, thereby making more efficient usage of the computing resources.

Usage

Documentation

In order to use md5-simd, you must first create an Server which can be used to instantiate one or more objects for MD5 hashing.

These objects conform to the regular hash.Hash interface and as such the normal Write/Reset/Sum functionality works as expected.

As an example:

    // Create server
    server := md5simd.NewServer()
    defer server.Close()

    // Create hashing object (conforming to hash.Hash)
    md5Hash := server.NewHash()
    defer md5Hash.Close()

    // Write one (or more) blocks
    md5Hash.Write(block)
    
    // Return digest
    digest := md5Hash.Sum([]byte{})

To keep performance both a Server and individual Hasher should be closed using the Close() function when no longer needed.

A Hasher can efficiently be re-used by using Reset() functionality.

In case your system does not support the instructions required it will fall back to using crypto/md5 for hashing.

Limitations

As explained above md5-simd does not speed up an individual MD5 hash sum computation, unless some hierarchical tree construct is used but this will result in different outcomes. Running a single hash on a server results in approximately half the throughput.

Instead, it allows running multiple MD5 calculations in parallel on a single CPU core. This can be beneficial in e.g. multi-threaded server applications where many go-routines are dealing with many requests and multiple MD5 calculations can be packed/scheduled for parallel execution on a single core.

This will result in a lower overall CPU usage as compared to using the standard crypto/md5 functionality where each MD5 hash computation will consume a single thread (core).

It is best to test and measure the overall CPU usage in a representative usage scenario in your application to get an overall understanding of the benefits of md5-simd as compared to crypto/md5, ideally under heavy CPU load.

Also note that md5-simd is best meant to work with large objects, so if your application only hashes small objects of a few kilobytes you may be better of by using crypto/md5.

Performance

For the best performance writes should be a multiple of 64 bytes, ideally a multiple of 32KB. To help with that a buffered := bufio.NewWriterSize(hasher, 32<<10) can be inserted if you are unsure of the sizes of the writes. Remember to flush buffered before reading the hash.

A single 'server' can process 16 streams concurrently with 1 core (AVX-512) or 2 cores (AVX2). In situations where it is likely that more than 16 streams are fully loaded it may be beneficial to use multiple servers.

The following chart compares the multi-core performance between crypto/md5 vs the AVX2 vs the AVX512 code:

md5-performance-overview

Compared to crypto/md5, the AVX2 version is up to 4x faster:

$ benchcmp crypto-md5.txt avx2.txt 
benchmark                     old MB/s     new MB/s     speedup
BenchmarkParallel/32KB-4      2229.22      7370.50      3.31x
BenchmarkParallel/64KB-4      2233.61      8248.46      3.69x
BenchmarkParallel/128KB-4     2235.43      8660.74      3.87x
BenchmarkParallel/256KB-4     2236.39      8863.87      3.96x
BenchmarkParallel/512KB-4     2238.05      8985.39      4.01x
BenchmarkParallel/1MB-4       2233.56      9042.62      4.05x
BenchmarkParallel/2MB-4       2224.11      9014.46      4.05x
BenchmarkParallel/4MB-4       2199.78      8993.61      4.09x
BenchmarkParallel/8MB-4       2182.48      8748.22      4.01x

Compared to crypto/md5, the AVX512 is up to 8x faster (for larger block sizes):

$ benchcmp crypto-md5.txt avx512.txt
benchmark                     old MB/s     new MB/s     speedup
BenchmarkParallel/32KB-4      2229.22      11605.78     5.21x
BenchmarkParallel/64KB-4      2233.61      14329.65     6.42x
BenchmarkParallel/128KB-4     2235.43      16166.39     7.23x
BenchmarkParallel/256KB-4     2236.39      15570.09     6.96x
BenchmarkParallel/512KB-4     2238.05      16705.83     7.46x
BenchmarkParallel/1MB-4       2233.56      16941.95     7.59x
BenchmarkParallel/2MB-4       2224.11      17136.01     7.70x
BenchmarkParallel/4MB-4       2199.78      17218.61     7.83x
BenchmarkParallel/8MB-4       2182.48      17252.88     7.91x

These measurements were performed on AWS EC2 instance of type c5.xlarge equipped with a Xeon Platinum 8124M CPU at 3.0 GHz.

If only one or two inputs are available the scalar calculation method will be used for the optimal speed in these cases.

Operation

To make operation as easy as possible there is a “Server” coordinating everything. The server keeps track of individual hash states and updates them as new data comes in. This can be visualized as follows:

server-architecture

The data is sent to the server from each hash input in blocks of up to 32KB per round. In our testing we found this to be the block size that yielded the best results.

Whenever there is data available the server will collect data for up to 16 hashes and process all 16 lanes in parallel. This means that if 16 hashes have data available all the lanes will be filled. However since that may not be the case, the server will fill less lanes and do a round anyway. Lanes can also be partially filled if less than 32KB of data is written.

server-lanes-example

In this example 4 lanes are fully filled and 2 lanes are partially filled. In this case the black areas will simply be masked out from the results and ignored. This is also why calculating a single hash on a server will not result in any speedup and hash writes should be a multiple of 32KB for the best performance.

For AVX512 all 16 calculations will be done on a single core, on AVX2 on 2 cores if there is data for more than 8 lanes. So for optimal usage there should be data available for all 16 hashes. It may be perfectly reasonable to use more than 16 concurrent hashes.

Design & Tech

md5-simd has both an AVX2 (8-lane parallel), and an AVX512 (16-lane parallel version) algorithm to accelerate the computation with the following function definitions:

//go:noescape
func block8(state *uint32, base uintptr, bufs *int32, cache *byte, n int)

//go:noescape
func block16(state *uint32, ptrs *int64, mask uint64, n int)

The AVX2 version is based on the md5vec repository and is essentially unchanged except for minor (cosmetic) changes.

The AVX512 version is derived from the AVX2 version but adds some further optimizations and simplifications.

Caching in upper ZMM registers

The AVX2 version passes in a cache8 block of memory (about 0.5 KB) for temporary storage of intermediate results during ROUND1 which are subsequently used during ROUND2 through to ROUND4.

Since AVX512 has double the amount of registers (32 ZMM registers as compared to 16 YMM registers), it is possible to use the upper 16 ZMM registers for keeping the intermediate states on the CPU. As such, there is no need to pass in a corresponding cache16 into the AVX512 block function.

Direct loading using 64-bit pointers

The AVX2 uses the VPGATHERDD instruction (for YMM) to do a parallel load of 8 lanes using (8 independent) 32-bit offets. Since there is no control over how the 8 slices that are passed into the (Golang) blockMd5 function are laid out into memory, it is not possible to derive a "base" address and corresponding offsets (all within 32-bits) for all 8 slices.

As such the AVX2 version uses an interim buffer to collect the byte slices to be hashed from all 8 inut slices and passed this buffer along with (fixed) 32-bit offsets into the assembly code.

For the AVX512 version this interim buffer is not needed since the AVX512 code uses a pair of VPGATHERQD instructions to directly dereference 64-bit pointers (from a base register address that is initialized to zero).

Note that two load (gather) instructions are needed because the AVX512 version processes 16-lanes in parallel, requiring 16 times 64-bit = 1024 bits in total to be loaded. A simple VALIGND and VPORD are subsequently used to merge the lower and upper halves together into a single ZMM register (that contains 16 lanes of 32-bit DWORDS).

Masking support

Due to the fact that pointers are passed directly from the Golang slices, we need to protect against NULL pointers. For this a 16-bit mask is passed in the AVX512 assembly code which is used during the VPGATHERQD instructions to mask out lanes that could otherwise result in segment violations.

Minor optimizations

The roll macro (three instructions on AVX2) is no longer needed for AVX512 and is replaced by a single VPROLD instruction.

Also several logical operations from the various ROUNDS of the AVX2 version could be combined into a single instruction using ternary logic (with the VPTERMLOGD instruction), resulting in a further simplification and speed-up.

Low level block function performance

The benchmark below shows the (single thread) maximum performance of the block() function for AVX2 (having 8 lanes) and AVX512 (having 16 lanes). Also the baseline single-core performance from the standard crypto/md5 package is shown for comparison.

BenchmarkCryptoMd5-4                     687.66 MB/s           0 B/op          0 allocs/op
BenchmarkBlock8-4                       4144.80 MB/s           0 B/op          0 allocs/op
BenchmarkBlock16-4                      8228.88 MB/s           0 B/op          0 allocs/op

License

md5-simd is released under the Apache License v2.0. You can find the complete text in the file LICENSE.

Contributing

Contributions are welcome, please send PRs for any enhancements.

Issues
  • AVX512 Performance suggestions

    AVX512 Performance suggestions

    Whilst doing research for my article on MD5 optimisation, I came across your blog post.

    I had a quick skim of the assembly code here, and thought I'd offer up some suggestions if you're interested.

    • avoid unnecessarily mixing floating-point and integer instructions to avoid potential bypass delays (e.g. use VPXORD instead of VXORPS)
    • all bitwise logic should be handled by VPTERNLOG (i.e. don't do this, where you've got a XOR operation that the ternary-logic instruction can handle, use a move instruction to preserve the original value (note that modern processors support move-elimination, so moves will be more efficient than logic))
    • avoid using gathers - do regular loads and unpack/permute everything into place (Intel CPUs only have one shuffle port, so I can see some appeal with avoiding shuffle-ops, but 32-bit gathers place so much load on the LSU that I doubt it's ever worth it)
    • consider interleaving two instruction streams to make better use of ILP (i.e. compute 32 hashes at a time, instead of 16, for AVX512)
    • consider using EVEX embedded broadcast for loading constants, rather than duplicating it in memory. If you're interleaving two streams, it may be better to use VPBROADCASTD for loading.

    Note: for a good reference, check out Intel's multi-buffer MD5 implementation, which incorporates the suggestions above.

    Hope you found that useful!

    opened by animetosho 10
  • license in block-generic.go

    license in block-generic.go

    I'm packaging md5-simd in Debian. I noticed that the complete LICENSE file of block-generic.go is not included, and that there is also the gen.go source file missing. Could you please add them to the repo?

    // Copyright 2013 The Go Authors. All rights reserved.
    // Use of this source code is governed by a BSD-style
    // license that can be found in the LICENSE file.
    
    // Code generated by go run gen.go -output md5block.go; DO NOT EDIT.
    opened by legrostdg 5
  • md5Server does not properly clean up its clients

    md5Server does not properly clean up its clients

    In the 'client disconnected' case on this line:

    https://github.com/minio/md5-simd/blob/30ad8af83f6868c2a30c615f3edf1a9366bf3f89/md5-server_amd64.go#L136

    the delete is carried out using block.uid. However in this case this will always be zero, causing improper cleanup of the clients, and performance degradation over time. The correct variable to use here is uid.

    opened by koen-struyve-q 1
  • Use scalar functions when less traffic

    Use scalar functions when less traffic

    Switch to scalar assembly when less than 3 lanes are filled.

    This brings us very close to crypto/md5 in cases where only a single lane is populated.

    When there are 2 lanes filled we use 2 goroutines with the scalar code and above that we switch to SIMD.

    Before, with a single writer:

    BenchmarkAvx2SingleWriter/32KB-32              14686       80893 ns/op   405.08 MB/s       976 B/op        8 allocs/op
    BenchmarkAvx2SingleWriter/64KB-32               7498      162843 ns/op   402.45 MB/s      1840 B/op       15 allocs/op
    BenchmarkAvx2SingleWriter/128KB-32              3636      327558 ns/op   400.15 MB/s      3568 B/op       29 allocs/op
    BenchmarkAvx2SingleWriter/256KB-32              1845      650406 ns/op   403.05 MB/s      7024 B/op       57 allocs/op
    BenchmarkAvx2SingleWriter/512KB-32               922     1295010 ns/op   404.85 MB/s     13937 B/op      113 allocs/op
    BenchmarkAvx2SingleWriter/1MB-32                 463     2598272 ns/op   403.57 MB/s     27765 B/op      225 allocs/op
    BenchmarkAvx2SingleWriter/2MB-32                 231     5164500 ns/op   406.07 MB/s     55411 B/op      449 allocs/op
    BenchmarkAvx2SingleWriter/4MB-32                 100    10170000 ns/op   412.42 MB/s    110709 B/op      897 allocs/op
    BenchmarkAvx2SingleWriter/8MB-32                  56    20357161 ns/op   412.07 MB/s    221305 B/op     1793 allocs/op
    

    After:

    BenchmarkAvx2SingleWriter/32KB-32              26785       44353 ns/op   738.80 MB/s       112 B/op        1 allocs/op
    BenchmarkAvx2SingleWriter/64KB-32              13682       87853 ns/op   745.98 MB/s       112 B/op        1 allocs/op
    BenchmarkAvx2SingleWriter/128KB-32              7058      175829 ns/op   745.45 MB/s       112 B/op        1 allocs/op
    BenchmarkAvx2SingleWriter/256KB-32              3428      346558 ns/op   756.42 MB/s       112 B/op        1 allocs/op
    BenchmarkAvx2SingleWriter/512KB-32              1713      686515 ns/op   763.69 MB/s       112 B/op        1 allocs/op
    BenchmarkAvx2SingleWriter/1MB-32                 874     1366132 ns/op   767.55 MB/s       112 B/op        1 allocs/op
    BenchmarkAvx2SingleWriter/2MB-32                 439     2740318 ns/op   765.30 MB/s       112 B/op        1 allocs/op
    BenchmarkAvx2SingleWriter/4MB-32                 220     5431817 ns/op   772.17 MB/s       113 B/op        1 allocs/op
    BenchmarkAvx2SingleWriter/8MB-32                 100    10840002 ns/op   773.86 MB/s       116 B/op        1 allocs/op
    

    After optimizing the assembly:

    BenchmarkAvx2SingleWriter/32KB-32         	   28707	     41906 ns/op	 781.94 MB/s	       0 B/op	       0 allocs/op
    BenchmarkAvx2SingleWriter/64KB-32         	   14722	     81307 ns/op	 806.03 MB/s	       0 B/op	       0 allocs/op
    BenchmarkAvx2SingleWriter/128KB-32        	    7058	    163502 ns/op	 801.65 MB/s	       0 B/op	       0 allocs/op
    BenchmarkAvx2SingleWriter/256KB-32        	    3636	    324257 ns/op	 808.44 MB/s	       1 B/op	       0 allocs/op
    BenchmarkAvx2SingleWriter/512KB-32        	    1845	    653116 ns/op	 802.75 MB/s	       2 B/op	       0 allocs/op
    BenchmarkAvx2SingleWriter/1MB-32          	     930	   1304300 ns/op	 803.94 MB/s	       5 B/op	       0 allocs/op
    BenchmarkAvx2SingleWriter/2MB-32          	     456	   2620615 ns/op	 800.25 MB/s	      11 B/op	       0 allocs/op
    BenchmarkAvx2SingleWriter/4MB-32          	     231	   5190471 ns/op	 808.08 MB/s	      22 B/op	       0 allocs/op
    BenchmarkAvx2SingleWriter/8MB-32          	     100	  10359995 ns/op	 809.71 MB/s	      51 B/op	       0 allocs/op
    

    Compare to pure crypto/md5:

    BenchmarkCryptoMd5/32KB-32             30612       39004 ns/op   840.11 MB/s         0 B/op        0 allocs/op
    BenchmarkCryptoMd5/64KB-32             15285       77985 ns/op   840.37 MB/s         0 B/op        0 allocs/op
    BenchmarkCryptoMd5/128KB-32             7498      156175 ns/op   839.26 MB/s         0 B/op        0 allocs/op
    BenchmarkCryptoMd5/256KB-32             3870      310336 ns/op   844.71 MB/s         0 B/op        0 allocs/op
    BenchmarkCryptoMd5/512KB-32             1874      623266 ns/op   841.19 MB/s         0 B/op        0 allocs/op
    BenchmarkCryptoMd5/1MB-32                960     1243750 ns/op   843.08 MB/s         0 B/op        0 allocs/op
    BenchmarkCryptoMd5/2MB-32                480     2489588 ns/op   842.37 MB/s         0 B/op        0 allocs/op
    
    opened by klauspost 1
  • cpuid.CPU.AVX512F undefined (type cpuid.CPUInfo has no field or method AVX512F)

    cpuid.CPU.AVX512F undefined (type cpuid.CPUInfo has no field or method AVX512F)

    I'm trying to run minio-go on virtual machine (and in Docker environment).

    When I try to build or go get, i'll get this error message:

    ../../../go/src/github.com/minio/md5-simd/block_amd64.go:86:23: cpuid.CPU.AVX512F undefined (type cpuid.CPUInfo has no field or method AVX512F)
    ../../../go/src/github.com/minio/md5-simd/md5-server_amd64.go:63:15: cpuid.CPU.AVX2 undefined (type cpuid.CPUInfo has no field or method AVX2)
    

    I attached quick example project. minio-example.zip

    Run make run or manually docker build -t minio-example . and you should see the same error

    opened by Racle 1
  • Compile error

    Compile error "shift count type int, must be unsigned integer" on Go 1.12

    Hi, this package fails to build under go1.12 owing to these errors:

    block_amd64.go:157:14: invalid operation: 1 << j (shift count type int, must be unsigned integer)
    block_amd64.go:194:14: invalid operation: 1 << j (shift count type int, must be unsigned integer)
    
    opened by mappu 1
  • Add CodeQL security scanning

    Add CodeQL security scanning

    Hi, I'm a PM on the GitHub security team. This repository is eligible to try the new GitHub Advanced Security code scanning beta.

    Code scanning runs a static analysis tool called CodeQL which scans your code at build time to find any potential security issues. We've tuned the set of queries to be only the most severe, most precise issues. We'll show alerts in the security tab, and we'll show alerts for any net new vulnerabilities on pull requests as well. We've tried to make this super developer friendly, but we'd love your feedback as we work through the beta.

    If you're interested in trying it out, you can merge this pull request to set up the Actions workflow.

    opened by jhutchings1 1
  • Fix shared write buffers 2

    Fix shared write buffers 2

    Fixes #12

    Fix shared buffers:

    benchmark                          old MB/s     new MB/s     speedup
    BenchmarkAvx2/32KB-32              2232.44      2039.22      0.91x
    BenchmarkAvx2/64KB-32              2935.47      2707.90      0.92x
    BenchmarkAvx2/128KB-32             3428.63      2839.23      0.83x
    BenchmarkAvx2/256KB-32             3628.36      3145.90      0.87x
    BenchmarkAvx2/512KB-32             3576.96      3370.22      0.94x
    BenchmarkAvx2/1MB-32               3534.08      3417.84      0.97x
    BenchmarkAvx2/2MB-32               3459.18      3363.09      0.97x
    BenchmarkAvx2/4MB-32               3484.55      3348.91      0.96x
    BenchmarkAvx2/8MB-32               3497.50      3400.22      0.97x
    BenchmarkAvx2Parallel/32KB-32      30512.99     20568.38     0.67x
    BenchmarkAvx2Parallel/64KB-32      37090.64     21099.39     0.57x
    BenchmarkAvx2Parallel/128KB-32     41318.22     20926.21     0.51x
    BenchmarkAvx2Parallel/256KB-32     43143.56     24411.63     0.57x
    BenchmarkAvx2Parallel/512KB-32     43985.58     29105.24     0.66x
    BenchmarkAvx2Parallel/1MB-32       44011.91     29499.57     0.67x
    BenchmarkAvx2Parallel/2MB-32       44756.98     29765.74     0.67x
    BenchmarkAvx2Parallel/4MB-32       44581.99     27552.38     0.62x
    BenchmarkAvx2Parallel/8MB-32       44145.26     25791.88     0.58x
    

    And adds 3x16x32KB alloc when creating a server.

    opened by klauspost 1
  • Fix shared write buffers

    Fix shared write buffers

    Well, we need something else otherwise the project is dead:

    BenchmarkAvx2/32KB-32                   2199.15      1574.04      0.72x
    BenchmarkAvx2/64KB-32                   2936.65      2089.57      0.71x
    BenchmarkAvx2/128KB-32                  3338.90      2558.27      0.77x
    BenchmarkAvx2/256KB-32                  3558.00      2757.57      0.78x
    BenchmarkAvx2/512KB-32                  3513.83      2723.95      0.78x
    BenchmarkAvx2/1MB-32                    3433.49      2754.05      0.80x
    BenchmarkAvx2/2MB-32                    3416.81      2786.19      0.82x
    BenchmarkAvx2/4MB-32                    3425.56      2797.95      0.82x
    BenchmarkAvx2/8MB-32                    3415.68      2802.58      0.82x
    BenchmarkAvx2Parallel/32KB-32           31816.47     4743.60      0.15x
    BenchmarkAvx2Parallel/64KB-32           38000.07     5505.23      0.14x
    BenchmarkAvx2Parallel/128KB-32          41164.65     6209.73      0.15x
    BenchmarkAvx2Parallel/256KB-32          43592.91     6551.00      0.15x
    BenchmarkAvx2Parallel/512KB-32          44030.27     6525.96      0.15x
    BenchmarkAvx2Parallel/1MB-32            44192.77     6797.56      0.15x
    BenchmarkAvx2Parallel/2MB-32            44830.55     7216.95      0.16x
    BenchmarkAvx2Parallel/4MB-32            44405.23     7038.59      0.16x
    BenchmarkAvx2Parallel/8MB-32            43470.73     6836.64      0.16x
    

    Does not fix #12

    opened by klauspost 1
  • avx512: use VPTERNLOGQ for ternary operations

    avx512: use VPTERNLOGQ for ternary operations

    http://www.0x80.pl/articles/avx512-ternary-functions.html

    Made this calculator: https://play.golang.org/p/JkeaBPpu2b-

    ROUND1:

    	VXORPS  c, tmp, tmp            \
    [...]
    	VANDPS  b, tmp, tmp            \
    	VXORPS  d, tmp, tmp            \
    
    

    This looks to be tmp = (((c ^ tmp) & b) ^ d). This should be able to be reduced by 1 instruction.

    If to replace the last two tmp = ((tmp & b) ^ d). c=tmp, b=b, a=d), meaning c = ((c & b) ^ a)

    -> VPTERNLOG $120, d, b, tmp - https://play.golang.org/p/4A9Ex1-q9ft

    ROUND3:

    	VXORPS  d, tmp, tmp            \
    	VXORPS  b, tmp, tmp            \
    

    Easy, VPTERNLOG $150, b, d, tmp - https://play.golang.org/p/hNUMvRjSwQN

    ROUND4:

    	VORPS  b, tmp, tmp            \
    	VXORPS c, tmp, tmp            \
    

    This looks like tmp = (b | tmp) ^c. With params, c = (b | c) ^a, so

    -> VPTERNLOG $30, c, b, tmp - https://play.golang.org/p/2NqJElhLfSH

    opened by klauspost 1
  • Optimize scalar and avx2 implementations

    Optimize scalar and avx2 implementations

    Use https://github.com/animetosho/md5-optimisation#dependency-shortcut-in-g-function and https://github.com/animetosho/md5-optimisation#h-function-re-use for shorter dependency chain.

    Cleanup in AVX2 removing superfluous loads/moves.

    benchmark                              old ns/op     new ns/op     delta
    BenchmarkAvx2/32KB-32                  201378        194863        -3.24%
    BenchmarkAvx2/64KB-32                  321507        303803        -5.51%
    BenchmarkAvx2/128KB-32                 594137        577175        -2.85%
    BenchmarkAvx2/256KB-32                 1089630       1014160       -6.93%
    BenchmarkAvx2/512KB-32                 2077582       1959077       -5.70%
    BenchmarkAvx2/1MB-32                   4191188       3949610       -5.76%
    BenchmarkAvx2/2MB-32                   8439181       8042106       -4.71%
    BenchmarkAvx2/4MB-32                   16655067      15739187      -5.50%
    BenchmarkAvx2/8MB-32                   33017781      31620932      -4.23%
    BenchmarkAvx2SingleWriter/32KB-32      41765         39763         -4.79%
    BenchmarkAvx2SingleWriter/64KB-32      81884         76866         -6.13%
    BenchmarkAvx2SingleWriter/128KB-32     166802        155819        -6.58%
    BenchmarkAvx2SingleWriter/256KB-32     329145        306292        -6.94%
    BenchmarkAvx2SingleWriter/512KB-32     653422        616564        -5.64%
    BenchmarkAvx2SingleWriter/1MB-32       1303555       1237368       -5.08%
    BenchmarkAvx2SingleWriter/2MB-32       2596346       2441836       -5.95%
    BenchmarkAvx2SingleWriter/4MB-32       5151380       4885766       -5.16%
    BenchmarkAvx2SingleWriter/8MB-32       10324461      9765875       -5.41%
    
    opened by klauspost 0
Releases(v1.1.2)
Owner
High Performance, Kubernetes Native Object Storage
High Performance, Kubernetes Native Object Storage
libketama-style consistent hashing in Go

===================================== ketama.go libketama-style consistent hashing in Go Author: Nolan Caudill ([email protected]) Date: 2011-06

Nolan Caudill 73 Apr 3, 2022
whirlpool cryptographic hashing library

whirlpool.go A whirlpool hashing library for go Build status Setup $ go get github.com/jzelinskie/whirlpool Example package main import ( "fmt" "

Jimmy Zelinskie 21 Jan 18, 2022
An alternative to Consistent Hashing

Weighted Rendezvous Hashing An alternative to Consistent Hashing. Evenly distributes load on node removal. ring := rendezvous.New() for _, s := range

Minoru Osuka 3 Feb 12, 2022
Ekliptic - Primitives for cryptographic operations on the secp256k1 curve, with zero dependencies and excellent performance

Ekliptic This package provides primitives for cryptographic operations on the se

Konnor Klashinsky 0 Feb 17, 2022
sops is an editor of encrypted files that supports YAML, JSON, ENV, INI and BINARY formats and encrypts with AWS KMS, GCP KMS, Azure Key Vault, age, and PGP

sops is an editor of encrypted files that supports YAML, JSON, ENV, INI and BINARY formats and encrypts with AWS KMS, GCP KMS, Azure Key Vault, age, and PGP. (demo)

Mozilla 10.4k Aug 7, 2022
A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability.

A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability.

Filippo Valsorda 11k Aug 5, 2022
ConsenSys Software 8 Jan 18, 2022
Webserver I built to serve Infura endpoints. Deployable via k8s and AWS EKS. Load testable via k6 tooling, and montiorable via prometheus and grafana

Infura Web Server Welcome to my verion of the take home project. I've created a webserver written in go to serve Infura api data over 3 possible data

Jacob Elias 2 Jun 2, 2022
Get any cryptocurrencies ticker and trade data in real time from multiple exchanges and then save it in multiple storage systems.

Cryptogalaxy is an app which will get any cryptocurrencies ticker and trade data in real time from multiple exchanges and then saves it in multiple storage systems.

Pavan Shetty 100 Jul 23, 2022
OmniFlix Hub is a blockchain built using Cosmos SDK and Tendermint and created with Starport.

OmniFlix Hub is the root chain of the OmniFlix Network. Sovereign chains and DAOs connect to the OmniFlix Hub to manage their web2 & web3 media operations (mint, manage, distribute & monetize) as well as community interactions.

OmniFlix Network 34 Jun 27, 2022
A simple, semantic and developer-friendly golang package for encoding&decoding and encryption&decryption

A simple, semantic and developer-friendly golang package for encoding&decoding and encryption&decryption

null 267 Aug 4, 2022
Eunomia is a distributed application framework that support Gossip protocol, QuorumNWR algorithm, PBFT algorithm, PoW algorithm, and ZAB protocol and so on.

Introduction Eunomia is a distributed application framework that facilitates developers to quickly develop distributed applications and supports distr

Cong 2 Sep 28, 2021
BLS signature and multi-signature schemas in Go and Solidity

BLS signature and multisignature schemas in Go and Solidity This code demonstrates the following schemas. Sign functions signature are in GoLang, veri

EYWA Cross-chain Protocol 2 Jul 26, 2022
Example of querying the balance of Crypton and UUSD with Utopia Ecosystem API and utopialib-go

account-balance-go Example of querying the balance of Crypton and UUSD with Utopia Ecosystem API and utopialib-go example of use flags: -host string

null 1 Oct 8, 2021
demochain is a blockchain built using Cosmos SDK and Tendermint and created with Starport.

demochain demochain is a blockchain built using Cosmos SDK and Tendermint and created with Starport. Get started starport chain serve serve command i

Tomasz Zdybał 4 Jun 21, 2022
goKryptor is a small and portable cryptographic tool for encrypting and decrypting files.

goKryptor goKryptor is a small and portable cryptographic tool for encrypting and decrypting files. This tool supports XOR and AES-CTR (Advanced Encry

null 0 Dec 6, 2021
Flashbots utilities in Go: Blocks & Transactions API, and tools to spot bundle and block irregularities

Utilities for Flashbots Go API client for the mev-blocks API for information about Flashbots blocks and transactions Detect bundle errors: (a) out of

Chris Hager 31 Jul 20, 2022
loan is a blockchain built using Cosmos SDK and Tendermint and created with Starport.

loan loan is a blockchain built using Cosmos SDK and Tendermint and created with Starport. As a borrower you post a request for a loan and specify the

Denis Fadeev 8 Feb 15, 2022
A Gomora template for building dApps and web3-powered API and smart contract listeners

Gomora dApp A Gomora template for building dApps and web3-powered API and smart contract listeners Local Development Setup the .env file first cp .env

Nuxify Inc. 3 Feb 15, 2022