I am in the process of introducing a cache layer to one of my libraries. The goal is to cache compiled regular expressions, on a high level:
c := regex.Regexp{regex.Compile(someregex)
if m.Matches(somehaystack) {
// ..
}
Because regexp are user-defined, I want to cache them so I don't have to compile them on every request. First, I started out with a very simple approach of storing them to a map (no locks, nothing, very naive approach):
var cache := map[string]*regexp.Regexp
// ...
cache[pattern] = regexp.Compile(pattern)
// ..
if r, ok := cache[p] {
r.Matches(something)
}
This reduced runtime considerably:
# without cache
BenchmarkLadon/policies=10-8 2000 617938 ns/op
BenchmarkLadon/policies=100-8 300 5677979 ns/op
BenchmarkLadon/policies=1000-8 20 59775053 ns/op
# with primitive cache
BenchmarkLadon/policies=10-8 200000 6022 ns/op
BenchmarkLadon/policies=100-8 30000 43606 ns/op
BenchmarkLadon/policies=1000-8 3000 511438 ns/op
Now, I thought it might be a good idea to use golang-LRU for caching, so memory won't fill up. Unfortunately, this didn't go as expected:
# without cache
BenchmarkLadon/policies=10-8 2000 617938 ns/op
BenchmarkLadon/policies=100-8 300 5677979 ns/op
BenchmarkLadon/policies=1000-8 20 59775053 ns/op
# with cache (map)
BenchmarkLadon/policies=10-8 200000 6022 ns/op
BenchmarkLadon/policies=100-8 30000 43606 ns/op
BenchmarkLadon/policies=1000-8 3000 511438 ns/op
# with cache (replaced map with LRU)
BenchmarkLadon/policies=10-8 2000 637065 ns/op
BenchmarkLadon/policies=100-8 200 6383768 ns/op
BenchmarkLadon/policies=1000-8 20 68409968 ns/op
This really baffled me, because we're using even more time now. First I investigated if I made a mistake in my code, without luck. Then I started writing some benchmarks to check LRU itself:
package lru
import (
"github.com/hashicorp/golang-lru"
"regexp"
"sync"
"testing"
)
func BenchmarkLadonLRU(b *testing.B) {
b.StopTimer()
p, _ := regexp.Compile("^(foo|bar)$")
c, _ := lru.New(4096)
b.StartTimer()
for z := 0; z < b.N; z++ {
c.Add(z, p)
o, _ := c.Get(z)
if r, ok := o.(*regexp.Regexp); ok {
r.MatchString("foo")
}
}
}
func BenchmarkLadonSimple(b *testing.B) {
var lock sync.RWMutex
p, _ := regexp.Compile("^(foo|bar)$")
cache := map[int]interface{}{}
b.ResetTimer()
for z := 0; z < b.N; z++ {
lock.Lock()
cache[z] = p
lock.Unlock()
lock.RLock()
o := cache[z]
lock.RUnlock()
if r, ok := o.(*regexp.Regexp); ok {
r.MatchString("foo")
}
}
}
func BenchmarkLadonNoTypeCast(b *testing.B) {
var lock sync.RWMutex
p, _ := regexp.Compile("^(foo|bar)$")
cache := map[int]*regexp.Regexp{}
b.ResetTimer()
for z := 0; z < b.N; z++ {
lock.Lock()
cache[z] = p
lock.Unlock()
lock.RLock()
r := cache[z]
lock.RUnlock()
r.MatchString("foo")
}
}
func BenchmarkLadonNoTypeCastNoLockJustYolo(b *testing.B) {
p, _ := regexp.Compile("^(foo|bar)$")
cache := map[int]*regexp.Regexp{}
b.ResetTimer()
for z := 0; z < b.N; z++ {
cache[z] = p
r := cache[z]
r.MatchString("foo")
}
}
Which resulted:
BenchmarkLadonLRU-8 2000000 890 ns/op
BenchmarkLadonSimple-8 3000000 531 ns/op
BenchmarkLadonNoTypeCast-8 3000000 497 ns/op
BenchmarkLadonNoTypeCastNoLockJustYolo-8 3000000 446 ns/op
While locking and type casting introduces some overhead, LRU needs about twice the time per op than the most primitive cache. I noticed that increasing the cache's size increases the time spent per operation, too.
I still believe that something else is going on with my code where I replaced the map with LRU, although I think it might be due to some LRU or golang internal when dealing with the interface value (maybe some private fields are eliminated and require re-processing when calling Match
?).
I don't know if there is an easy way to improve the runtime of LRU, but maybe you find this helpful. I'll update you if I find something new.