High-Performance server for NATS, the cloud native messaging system.

Overview

NATS is a simple, secure and performant communications system for digital systems, services and devices. NATS is part of the Cloud Native Computing Foundation (CNCF). NATS has over 40 client language implementations, and its server can run on-premise, in the cloud, at the edge, and even on a Raspberry Pi. NATS can secure and simplify design and operation of modern distributed systems.

License Build Release Slack Coverage Docker Downloads CII Best Practices

Documentation

Contact

  • Twitter: Follow us on Twitter!
  • Google Groups: Where you can ask questions
  • Slack: Click here to join. You can ask question to our maintainers and to the rich and active community.

Contributing

If you are interested in contributing to NATS, read about our...

Security

Security Audit

A third party security audit was performed by Cure53, you can see the full report here.

Reporting Security Vulnerabilities

If you've found a vulnerability or a potential vulnerability in the NATS server, please let us know at nats-security.

License

Unless otherwise noted, the NATS source files are distributed under the Apache Version 2.0 license found in the LICENSE file.

Issues
  • subscription count in subsz is wrong

    subscription count in subsz is wrong

    SInce updating one of my brokers to 2.0.0 I noticed a slow increate in subscription counts - I also did a bunch of other updates like move to the newly renamed libraries etc so in order to find the cause I eventually concluded the server is just counting things wrongly.

    graph

    Ignoring the annoying popup, you can see a steady increase in subscriptions.

    Data below is from the below dependency embedded in another go process:

    github.com/nats-io/nats-server/v2 v2.0.1-0.20190701212751-a171864ae7df
    
    $ curl -s http://localhost:6165/varz|jq .subscriptions
    29256
    

    I then tried to verify this number, and assuming I have no bugs in the script below I think the varz counter is off by a lot, comparing snapshots of connz over time I see no growth reflected there not in connection counts nor subscriptions:

    $ curl "http://localhost:6165/connz?limit=200000&subs=1"|./countsubs.rb
    Connections: 3659
    Subscriptions: 25477
    

    I also captured connz output over time 15:17, 15:56 and 10:07 the next day:

    $ cat connz-1562685506.json|./countsubs.rb
    Connections: 3657
    Subscriptions: 25463
    $ cat connz-1562687791.json|./countsubs.rb
    Connections: 3658
    Subscriptions: 25463
    $ cat connz-1562687791.json|./countsubs.rb
    Connections: 3658
    Subscriptions: 25463
    

    Using the script here:

    require "json"
    
    subs = JSON.parse(STDIN.read)
    puts "Connections: %d" % subs["connections"].length
    
    count = 0
    
    subs["connections"].each do |conn|
      count += subs.length if subs = conn["subscriptions_list"]
    end
    
    puts "Subscriptions: %d" % count
    
    opened by ripienaar 87
  • Performance issues with locks and sublist cache

    Performance issues with locks and sublist cache

    • [ ] Defect
    • [x] Feature Request or Change Proposal

    Feature Requests

    Use Case:

    We are using gnatsd 1.4.1 (compiled go 1.11.5). During benchmark, we observed non-trivial latency (500 ms+, usually seconds) at gnatsd cluster.

    As there is no slow consumers (with default 2 seconds threshold) and the OS rcv buffer got full and TCP window went to 0, it seems that the gnatsd server is somehow slow in read loop. We are trying to slow down the sender for one connection but we believe that gnatsd can also be improved. If you need more proofs of slowness of read loop, we might be able to provide some tcpdump snippets and tracing logs of gnatsd.

    We also observe some parser errors happened rarely when gnatsd is under high load of reading. The client is using cnats. However we are not sure who (cnats, OS, or gnatsd) was not doing right. After we found it out, we may open another issue to address the problem.

    [8354] 2019/04/01 12:17:11.695815 [ERR] 10.228.255.129:44588 - cid:1253 - Client parser ERROR, state=0, i=302: proto='"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"...'
    

    By the way, as gnatsd could detect slow consumer, is that possible for gnatsd to know itself becomes a slow consumer (slow read)? The only idea I come up is to adjust OS buffer and let the upstream to know the pressure. If you have any suggestions, please let me know.

    Proposed Change:

    1. Improve locks. https://github.com/nats-io/gnatsd/compare/branch_1_4_0...azrle:enhance/processMsg_lock Comparison of read loops between high load and low load: image Sync blocking graph: image

    2. Ability to adjust sublist cache size or disable it. https://github.com/nats-io/gnatsd/compare/branch_1_4_0...azrle:feature/opts-sublist_cache_size According to our application characteristic, it doing sub/unsub very frequently and most of subjects are single-used. The hit rate of cache is under 0.5%. However, it can cost gnatsd to maintenance the sublist cache. Besides locks for the cache, reduceCacheCount is noticeable. Compared to other function's goroutines which are less than 50, the number of goroutines for server.(*Sublist).reduceCacheCount can climb up to near 18,000.

    Who Benefits From The Change(s)?

    Clients send messages heavily to gnatsd. And subscription changes frequently. Under our test cases (with enough servers), the 99.9%tile of latency drops from 1500ms to 500ms (it's still slow though).

    I noticed that gnatsd v2 is coming. And the implementation changes a lot. But I am afraid that we may not have time to wait for it to get production-ready.

    I sincerely hope the performance can be improved for v1.4.

    Thank you in advance!

    opened by azrle 59
  • Consumer stopped working after errPartialCache (nats-server oom-killed)

    Consumer stopped working after errPartialCache (nats-server oom-killed)

    Defect

    Make sure that these boxes are checked before submitting your issue -- thank you!

    • [x] Included nats-server -DV output
    • [ ] Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)

    Versions of nats-server and affected client libraries used:

    # nats-server -DV
    [92] 2021/12/06 15:16:05.235349 [INF] Starting nats-server
    [92] 2021/12/06 15:16:05.235397 [INF]   Version:  2.6.6
    [92] 2021/12/06 15:16:05.235401 [INF]   Git:      [878afad]
    [92] 2021/12/06 15:16:05.235406 [DBG]   Go build: go1.16.10
    [92] 2021/12/06 15:16:05.235416 [INF]   Name:     NASX72BQAFBIH4QBLZ36RADTPKSO6LCKRDEAS37XRJ7SYZ53RYYOFHHS
    [92] 2021/12/06 15:16:05.235436 [INF]   ID:       NASX72BQAFBIH4QBLZ36RADTPKSO6LCKRDEAS37XRJ7SYZ53RYYOFHHS
    [92] 2021/12/06 15:16:05.235457 [DBG] Created system account: "$SYS"
    
    Image:         nats:2.6.6-alpine
        Limits:
          cpu:     200m
          memory:  256Mi
        Requests:
          cpu:      200m
          memory:   256Mi
    

    go library:

    github.com/nats-io/nats.go v1.13.1-0.20211018182449-f2416a8b1483
    

    OS/Container environment:

    Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:42:41Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
    
    CONTAINER-RUNTIME
    cri-o://1.21.4
    

    Steps or code to reproduce the issue:

    1. Start nats cluster (3 replicas) with Jetstream enabled. JS Config:
    jetstream {
      max_mem: 64Mi
      store_dir: /data
    
      max_file:10Gi
    }
    
    
    1. Start to push messages into stream. Stream config:
    Configuration:
    
                 Subjects: widget-request-collector
         Acknowledgements: true
                Retention: File - WorkQueue
                 Replicas: 3
           Discard Policy: Old
         Duplicate Window: 2m0s
        Allows Msg Delete: true
             Allows Purge: true
           Allows Rollups: false
         Maximum Messages: unlimited
            Maximum Bytes: 1.9 GiB
              Maximum Age: 1d0h0m0s
     Maximum Message Size: unlimited
        Maximum Consumers: unlimited
    
    
    1. Shutdown one of the nats nodes for a while and rate limit consumer (or shutdown consumer) for collecting messages in file storage.
    2. Wait until storage reached it's maximum capacity (1.9G).
    3. Bring up nats server. (Do not bring up consumer)

    Expected result:

    Outdated node should become current.

    Actual result:

    Outdated node tries to become current, gets messages from stream leader, but reached memory limit and killed by OOM. It restarts again, and again killed by OOM.

    Cluster Information:
    
                     Name: nats
                   Leader: promo-widget-collector-event-nats-2
                  Replica: promo-widget-collector-event-nats-1, outdated, OFFLINE, seen 2m8s ago, 13,634 operations behind
                  Replica: promo-widget-collector-event-nats-0, current, seen 0.00s ago
    
    State:
    
                 Messages: 2,695,412
                    Bytes: 1.9 GiB
                 FirstSeq: 3,957,219 @ 2021-12-06T14:04:00 UTC
                  LastSeq: 6,652,630 @ 2021-12-06T15:09:36 UTC
         Active Consumers: 1
    

    Crashed pod info:

        State:          Waiting                                                                                                                                                                                                                                                                                                                                                                                                              
          Reason:       CrashLoopBackOff                                                                                                                                                                                                                                                                                                                                                                                                     
        Last State:     Terminated                                                                                                                                                                                                                                                                                                                                                                                                           
          Reason:       OOMKilled                                                                                                                                                                                                                                                                                                                                                                                                            
          Exit Code:    137                                                                                                                                                                                                                                                                                                                                                                                                                  
          Started:      Mon, 06 Dec 2021 14:30:26 +0000                                                                                                                                                                                                                                                                                                                                                                                      
          Finished:     Mon, 06 Dec 2021 14:31:08 +0000                                                                                                                                                                                                                                                                                                                                                                                      
        Ready:          False                                                                                                                                                                                                                                                                                                                                                                                                                
        Restart Count:  3 
    

    Is it possible to configure memory limits for nats-server to prevent memory overeating?

    🐞 bug 
    opened by rino-pupkin 51
  • jetstream could not pull message after nats-server restart

    jetstream could not pull message after nats-server restart

    i was testing jetstream on nats-server v2.3.2. one sender and one receiver program are running for quite a long time.

    this is what my stream look like :

    	_, err = js.AddStream(&nats.StreamConfig{
    		Name:      streamName,
    		Subjects:  []string{streamSubjects},
    		Storage:   nats.FileStorage,
    		Replicas:  3,
    		Retention: nats.WorkQueuePolicy,
    		Discard:   nats.DiscardNew,
    		MaxMsgs:   -1,
    		MaxAge:    time.Hour * 24 * 365,
    	})
    

    this is how i create the consumer:

    	if _, err := js.AddConsumer(streamName, &nats.ConsumerConfig{
    		Durable:       durableName,
    		DeliverPolicy: nats.DeliverAllPolicy,
    		AckPolicy:     nats.AckExplicitPolicy,
    		ReplayPolicy:  nats.ReplayInstantPolicy,
    		FilterSubject: subjectName,
    		AckWait:       time.Second * 30,
    		MaxDeliver:    -1,
    		MaxAckPending: 1000,
    	}); err != nil && !strings.Contains(err.Error(), "already in use") {
    		log.Println("AddConsumer fail")
    		return
    	}
    

    this is what the subscriber look like:

    	sub, err := js.PullSubscribe("ORDERS.created", durableName, nats.Bind("ORDERS", durableName))
    	if err != nil {
    		fmt.Println(" PullSubscribe:", err)
    		return
    	}
           msgs, err := sub.Fetch(1000, nats.MaxWait(10*time.Second))
    

    when i restart my nats-server cluster nodes(upgrade to nats-server 2.3.3), the consumer can no longer pull messages even if i restart my consumer program. the Fetch call just return : "nats: timeout", but i'm sure there are lots of message in the working queue. only if i delete the consumer by calling js.DeleteConsumer(streamName, durableName), recreate it, my program can resume fetching messages. actually, every time i restart nats-server nodes, my consumer program encouter the same problem.

    there is another issue, after i restart nats-server nodes, restart my program, it sometimes reports : "PullSubscribe: nats: JetStream system temporarily unavailable"

    I expect nats-server nodes restarting action not impacting jetstream clients fetching messages.

    🐞 bug 
    opened by carr123 50
  • Client Auth API

    Client Auth API

    Nats seems perfect for our needs, however having auth hard coded on service start isn't very practical when we are adding and removing users while its running.

    Implementing some go code to handle this is 1 option, another is to use an external service for authorization. Whether it's HTTP basic auth, etc. Being able to set an authentication endpoint would be very handy. Especially since we only allow a user to be logged in with 1 session.

    If this is possible now please let me know, but I couldn't find it in the docs anywhere.

    Thanks!

    customer requested security 
    opened by qrpike 47
  • memory increase in clustered mode

    memory increase in clustered mode

    This is a follow on from https://github.com/nats-io/nats-server/issues/1065

    While looking into the above issue I noticed memory growth, we wanted to focus on one issue at a time so with 1065 done I looked at the memory situation. The usage patterns and so forth is identical to 1065.

    12 hours

    Above is 12 hours, now as you know I embed your broker into one of my apps and I run a bunch of things in there. However in order to isolate the problem I did a few things:

    1. Same version of everything with the same usage pattern on a single unclustered broker does not show memory growth
    2. Turning off all the related feature in my code where I embed nats-server when clustered I still see the growth
    3. I made my code respond to SIGQUIT to write memory profiles on demand so I can interrogate a running nats server

    The nats-server is github.com/nats-io/nats-server/v2 v2.0.3-0.20190723153225-9cf534bc5e97

    From above memory dumps when comparing 6 hours apart dumps I see:

    8am:

    (pprof) top10
    Showing nodes accounting for 161.44MB, 90.17% of 179.04MB total
    Dropped 66 nodes (cum <= 0.90MB)
    Showing top 10 nodes out of 51
          flat  flat%   sum%        cum   cum%
       73.82MB 41.23% 41.23%    73.82MB 41.23%  github.com/nats-io/nats-server/v2/server.(*client).queueOutbound
       29.18MB 16.30% 57.53%    29.68MB 16.58%  github.com/nats-io/nats-server/v2/server.(*Server).createClient
       19.60MB 10.95% 68.48%    19.60MB 10.95%  math/rand.NewSource
       15.08MB  8.42% 76.90%   140.30MB 78.37%  github.com/nats-io/nats-server/v2/server.(*client).readLoop
        6.50MB  3.63% 80.53%       12MB  6.70%  github.com/nats-io/nats-server/v2/server.(*client).processSub
        5.25MB  2.93% 83.46%    11.25MB  6.28%  github.com/nats-io/nats-server/v2/server.(*Sublist).Insert
        4.01MB  2.24% 85.70%    65.85MB 36.78%  github.com/nats-io/nats-server/v2/server.(*client).processInboundClientMsg
        3.50MB  1.95% 87.65%     3.50MB  1.95%  github.com/nats-io/nats-server/v2/server.newLevel
        2.50MB  1.40% 89.05%     2.50MB  1.40%  github.com/nats-io/nats-server/v2/server.newNode
           2MB  1.12% 90.17%        2MB  1.12%  github.com/nats-io/nats-server/v2/server.(*client).addSubToRouteTargets
    

    1pm

    (pprof) top10
    Showing nodes accounting for 185.64MB, 90.87% of 204.29MB total
    Dropped 69 nodes (cum <= 1.02MB)
    Showing top 10 nodes out of 46
          flat  flat%   sum%        cum   cum%
       86.33MB 42.26% 42.26%    86.33MB 42.26%  github.com/nats-io/nats-server/v2/server.(*client).queueOutbound
       30.19MB 14.78% 57.04%    30.69MB 15.02%  github.com/nats-io/nats-server/v2/server.(*Server).createClient
       25.75MB 12.60% 69.64%   165.05MB 80.79%  github.com/nats-io/nats-server/v2/server.(*client).readLoop
       19.60MB  9.59% 79.24%    19.60MB  9.59%  math/rand.NewSource
        6.50MB  3.18% 82.42%    12.55MB  6.14%  github.com/nats-io/nats-server/v2/server.(*client).processSub
        5.25MB  2.57% 84.99%    11.25MB  5.51%  github.com/nats-io/nats-server/v2/server.(*Sublist).Insert
        4.02MB  1.97% 86.95%    73.70MB 36.08%  github.com/nats-io/nats-server/v2/server.(*client).processInboundClientMsg
        3.50MB  1.71% 88.67%     3.50MB  1.71%  github.com/nats-io/nats-server/v2/server.newLevel
        2.50MB  1.22% 89.89%     2.50MB  1.22%  github.com/nats-io/nats-server/v2/server.newNode
           2MB  0.98% 90.87%        2MB  0.98%  github.com/nats-io/nats-server/v2/server.(*client).addSubToRouteTargets
    
    opened by ripienaar 44
  • Suggest repair actions for JetStream cluster consumer NO quorum issue

    Suggest repair actions for JetStream cluster consumer NO quorum issue

    Environment

    • NATS version: 2.2.6 with jetstream enabled
    • Number of nodes nodes in the cluster : 3
    • Deploy on OKD 3.11 by nats helm chart 0.8.0

    Event description

    • Getting jetstream stream info successfully, but failed on getting jetstream consumer info by natscli
    • [Pub] OK, [Sub] Failed. NATS sub client can't connect to NATS cluster after 7/7 00:18
    • The cluster has been running for more than a month, and there were no errors until 7/7. It was confirmed that there were no network or hardware problems.
    • Attached logs and tried actions, please suggest other repair actions. Thanks.

    NATS server logs

    nats instance 0

    [1] 2021/07/07 00:18:44.650787 [WRN] JetStream cluster stream '$G > MY-STREAM2' has NO quorum, stalled.
    [1] 2021/07/07 00:18:44.651098 [WRN] JetStream cluster consumer '$G > MY-STREAM2 > consumer5' has NO quorum, stalled.
    [1] 2021/07/07 00:18:47.433327 [INF] JetStream cluster new metadata leader
    [1] 2021/07/07 00:18:47.930284 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM2 > consumer5'
    [1] 2021/07/07 00:18:51.306199 [WRN] JetStream cluster stream '$G > MY-STREAM' has NO quorum, stalled.
    [1] 2021/07/07 00:18:51.652389 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer3' has NO quorum, stalled.
    [1] 2021/07/07 00:18:56.555042 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM > consumer2'
    [1] 2021/07/07 00:19:00.462077 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM > consumer3'
    [1] 2021/07/07 00:19:00.870001 [WRN] Got stream sequence mismatch for '$G > MY-STREAM'
    [1] 2021/07/07 00:19:01.024537 [WRN] Resetting stream '$G > MY-STREAM'
    [1] 2021/07/07 00:19:01.292724 [INF] JetStream cluster new stream leader for '$G > MY-STREAM'
    

    nats instance 1

    [1] 2021/07/07 00:18:48.190309 [INF] JetStream cluster new stream leader for '$G > MY-STREAM2'
    [1] 2021/07/07 00:18:53.343597 [INF] JetStream cluster new metadata leader
    [1] 2021/07/07 00:18:56.820943 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM2 > consumer5'
    [1] 2021/07/07 00:18:57.098682 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM > consumer1'
    [1] 2021/07/07 00:18:57.572857 [INF] JetStream cluster new stream leader for '$G > MY-STREAM2'
    [1] 2021/07/07 00:18:57.679975 [INF] JetStream cluster new stream leader for '$G > MY-STREAM'
    [1] 2021/07/07 00:19:00.710121 [WRN] Got stream sequence mismatch for '$G > MY-STREAM'
    [1] 2021/07/07 00:19:00.909870 [WRN] Resetting stream '$G > MY-STREAM'
    [1] 2021/07/08 03:30:19.175389 [WRN] Did not receive all stream info results for "$G"
    

    nats instance 2

    [1] 2021/07/07 00:18:57.508614 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM > consumer4'
    [1] 2021/07/07 00:19:00.710399 [WRN] Got stream sequence mismatch for '$G > MY-STREAM'
    [1] 2021/07/07 00:19:00.907675 [WRN] Resetting stream '$G > MY-STREAM'
    

    Tried Actions

    1. Try to execute "nats consumer cluster step-down" [Failed]
    nats consumer list MY-STREAM
    # Consumers for Stream MY-STREAM:
    
    #         consumer1
    #         consumer2
    #         consumer3
    #         consumer4
    
    nats consumer cluster step-down --trace 
    # 13:11:04 >>> $JS.API.STREAM.NAMES
    # {"offset":0}
    
    # 13:11:05 <<< $JS.API.STREAM.NAMES
    # {"type":"io.nats.jetstream.api.v1.stream_names_response","total":2,"offset":0,"limit":1024,"streams":["MY-STREAM","MY-STREAM2"]}
    
    # ? Select a Stream MY-STREAM
    # 13:11:13 >>> $JS.API.CONSUMER.NAMES.MY-STREAM
    # {"offset":0}
    
    # 13:11:13 <<< $JS.API.CONSUMER.NAMES.MY-STREAM
    # {"type":"io.nats.jetstream.api.v1.consumer_names_response","total":4,"offset":0,"limit":1024,"consumers":["consumer1","consumer2","consumer3","consumer4"]}
    
    # ? Select a Consumer consumer2
    # 13:11:16 >>> $JS.API.CONSUMER.INFO.MY-STREAM.consumer2
    
    
    # 13:11:21 <<< $JS.API.CONSUMER.INFO.MY-STREAM.consumer2: context deadline exceeded
    
    # nats.exe: error: context deadline exceeded, try --help
    
    1. Try to request CONSUMER STEPDOWN API directly [Failed]
    nats req '$JS.API.CONSUMER.LEADER.STEPDOWN.MY-STREAM.consumer3' "" --trace
    
    # 05:20:43 Sending request on "$JS.API.CONSUMER.LEADER.STEPDOWN.MY-STREAM.consumer3"
    # nats: error: nats: timeout, try --help
    
    
    1. Try to restart NATS server [Still failed to get consumer]
    kubectl rollout restart statefulset nats -n mynamespace
    
    nats con info --trace
    # 05:43:02 >>> $JS.API.STREAM.NAMES
    # {"offset":0}
    
    # 05:43:02 <<< $JS.API.STREAM.NAMES
    # {"type":"io.nats.jetstream.api.v1.stream_names_response","total":2,"offset":0,"limit":1024,"streams":["MY-STREAM","MY-STREAM2"]}
    
    # ? Select a Stream MY-STREAM
    # 05:43:03 >>> $JS.API.CONSUMER.NAMES.MY-STREAM
    # {"offset":0}
    
    # 05:43:03 <<< $JS.API.CONSUMER.NAMES.MY-STREAM
    # {"type":"io.nats.jetstream.api.v1.consumer_names_response","total":4,"offset":0,"limit":1024,"consumers":["consumer1","consumer2","consumer3","consumer4"]}
    
    # ? Select a Consumer consumer1
    # 05:43:05 >>> $JS.API.CONSUMER.INFO.MY-STREAM.consumer1
    
    
    # 05:43:05 <<< $JS.API.CONSUMER.INFO.MY-STREAM.consumer1
    # {"type":"io.nats.jetstream.api.v1.consumer_info_response","error":{"code":503,"description":"JetStream system temporarily unavailable"}}
    
    # nats: error: could not load Consumer MY-STREAM > consumer1: JetStream system temporarily unavailable
    

    nats-0 server have a lot of JetStream WRAN logs

    [1] 2021/07/08 05:40:33.345825 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer3' has NO quorum, stalled.
    [1] 2021/07/08 05:40:34.027116 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer2' has NO quorum, stalled.
    [1] 2021/07/08 05:40:34.542920 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer1' has NO quorum, stalled.
    [1] 2021/07/08 05:40:35.494354 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer4' has NO quorum, stalled.
    [1] 2021/07/08 05:40:55.586260 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer4' has NO quorum, stalled.
    [1] 2021/07/08 05:40:57.300211 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer1' has NO quorum, stalled.
    [1] 2021/07/08 05:40:58.005908 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer3' has NO quorum, stalled.
    [1] 2021/07/08 05:40:58.324828 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer2' has NO quorum, stalled.
    [1] 2021/07/08 05:41:16.664240 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer4' has NO quorum, stalled.
    [1] 2021/07/08 05:41:17.659280 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer1' has NO quorum, stalled.
    [1] 2021/07/08 05:41:20.245055 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer3' has NO quorum, stalled.
    

    NATS stream report have MY-STREAM nats-0 failed status

    nats stream report
    
    Obtaining Stream stats
    
    +--------------------------------------------------------------------------------------------------------------------+
    |                                                   Stream Report                                                    |
    +-----------------------------+---------+-----------+----------+---------+------+---------+--------------------------+
    | Stream                      | Storage | Consumers | Messages | Bytes   | Lost | Deleted | Replicas                 |
    +-----------------------------+---------+-----------+----------+---------+------+---------+--------------------------+
    | MY-STREAM2 | File    | 1         | 0        | 0 B     | 0    | 0       | nats-0, nats-1, nats-2*  |
    | MY-STREAM                  | File    | 0         | 500      | 3.9 MiB | 0    | 0       | nats-0!, nats-1, nats-2* |
    +-----------------------------+---------+-----------+----------+---------+------+---------+--------------------------+
    
    1. Try to remove nats-0 peer for MY-STREAM [Failed]
    nats stream cluster peer-remove
    # ? Select a Stream MY-STREAM
    # ? Select a Peer nats-0
    # 06:16:31 Removing peer "nats-0"
    # nats: error: peer remap failed, try --help
    
    opened by phho 42
  • Service crossing accounts and leaf nodes can't send message back to requester.

    Service crossing accounts and leaf nodes can't send message back to requester.

    • [X] Defect
    • [ ] Feature Request or Change Proposal

    Defects

    Make sure that these boxes are checked before submitting your issue -- thank you!

    • [X] Included nats-server -DV output
    c1          | [1372] 2020/01/10 15:17:46.476336 [INF] Starting nats-server version 2.1.2
    c1          | [1372] 2020/01/10 15:17:46.476336 [DBG] Go build version go1.12.13
    c1          | [1372] 2020/01/10 15:17:46.476336 [INF] Git commit [679beda]
    c1          | [1372] 2020/01/10 15:17:46.476336 [WRN] Plaintext passwords detected, use nkeys or bcrypt.
    c1          | [1372] 2020/01/10 15:17:46.478337 [INF] Starting http monitor on 0.0.0.0:8222
    c1          | [1372] 2020/01/10 15:17:46.478337 [INF] Listening for leafnode connections on 0.0.0.0:7422
    c1          | [1372] 2020/01/10 15:17:46.478337 [DBG] Get non local IPs for "0.0.0.0"
    c1          | [1372] 2020/01/10 15:17:46.485338 [DBG]  ip=172.18.206.186
    c1          | [1372] 2020/01/10 15:17:46.488338 [INF] Listening for client connections on 0.0.0.0:4244
    c1          | [1372] 2020/01/10 15:17:46.488338 [INF] Server id is ND2MSDWDWTMJEX2V7TDS2O53Q5ZEY3W3ORS6T53HOM3PR5BBP6ZSYCA6
    c1          | [1372] 2020/01/10 15:17:46.488338 [INF] Server is ready
    c1          | [1372] 2020/01/10 15:17:46.488338 [DBG] Get non local IPs for "0.0.0.0"
    c1          | [1372] 2020/01/10 15:17:46.492338 [DBG]  ip=172.18.206.186
    c2          | [1372] 2020/01/10 15:17:48.537218 [INF] Starting nats-server version 2.1.2
    c2          | [1372] 2020/01/10 15:17:48.537218 [DBG] Go build version go1.12.13
    c2          | [1372] 2020/01/10 15:17:48.537218 [INF] Git commit [679beda]
    c2          | [1372] 2020/01/10 15:17:48.537218 [WRN] Plaintext passwords detected, use nkeys or bcrypt.
    c2          | [1372] 2020/01/10 15:17:48.539218 [INF] Starting http monitor on 0.0.0.0:8222
    c2          | [1372] 2020/01/10 15:17:48.539218 [INF] Listening for client connections on 0.0.0.0:4244
    c2          | [1372] 2020/01/10 15:17:48.539218 [INF] Server id is NCIHCZWAIQUH3OK624BMEV62WEEX6IEBKUFXAPRFRCE3GVEWRRNC5WBX
    c2          | [1372] 2020/01/10 15:17:48.539218 [INF] Server is ready
    c2          | [1372] 2020/01/10 15:17:48.539218 [DBG] Get non local IPs for "0.0.0.0"
    c2          | [1372] 2020/01/10 15:17:48.545215 [DBG]  ip=172.18.194.70
    c2          | [1372] 2020/01/10 15:17:48.556228 [DBG] Trying to connect as leafnode to remote server on "c1:7422" (172.18.206.186:7422)
    c1          | [1372] 2020/01/10 15:17:48.560110 [DBG] 172.18.194.70:49157 - lid:1 - Leafnode connection created
    c2          | [1372] 2020/01/10 15:17:48.560661 [DBG] 172.18.206.186:7422 - lid:1 - Remote leafnode connect msg sent
    c2          | [1372] 2020/01/10 15:17:48.560661 [DBG] 172.18.206.186:7422 - lid:1 - Leafnode connection created
    c2          | [1372] 2020/01/10 15:17:48.560661 [INF] Connected leafnode to "c1"
    c1          | [1372] 2020/01/10 15:17:48.561188 [TRC] 172.18.194.70:49157 - lid:1 - <<- [CONNECT {"tls_required":false,"name":"NCIHCZWAIQUH3OK624BMEV62WEEX6IEBKUFXAPRFRCE3GVEWRRNC5WBX"}]
    c1          | [1372] 2020/01/10 15:17:48.562131 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LS+ test.service.1]
    c1          | [1372] 2020/01/10 15:17:48.562131 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LS+ lds.qtioyTeG9dZPgE8uYM7rsy]
    c2          | [1372] 2020/01/10 15:17:48.561759 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LS+ test.service.1]
    c2          | [1372] 2020/01/10 15:17:48.562839 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LS+ lds.qtioyTeG9dZPgE8uYM7rsy]
    c1          | [1372] 2020/01/10 15:17:49.489505 [DBG] 10.35.68.24:62849 - cid:2 - Client connection created
    c1          | [1372] 2020/01/10 15:17:49.491212 [TRC] 10.35.68.24:62849 - cid:2 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"a","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Responder","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
    c1          | [1372] 2020/01/10 15:17:49.491212 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PING]
    c1          | [1372] 2020/01/10 15:17:49.491212 [TRC] 10.35.68.24:62849 - cid:2 - ->> [PONG]
    c1          | [1372] 2020/01/10 15:17:49.491563 [TRC] 10.35.68.24:62849 - cid:2 - <<- [SUB test.service.1 NATS-RPLY-22 1]
    c1          | [1372] 2020/01/10 15:17:49.491563 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PING]
    c1          | [1372] 2020/01/10 15:17:49.491563 [TRC] 10.35.68.24:62849 - cid:2 - ->> [PONG]
    c2          | [1372] 2020/01/10 15:17:49.636028 [DBG] 172.18.206.186:7422 - lid:1 - LeafNode Ping Timer
    c2          | [1372] 2020/01/10 15:17:49.636282 [TRC] 172.18.206.186:7422 - lid:1 - ->> [PING]
    c1          | [1372] 2020/01/10 15:17:49.636909 [TRC] 172.18.194.70:49157 - lid:1 - <<- [PING]
    c1          | [1372] 2020/01/10 15:17:49.636909 [TRC] 172.18.194.70:49157 - lid:1 - ->> [PONG]
    c2          | [1372] 2020/01/10 15:17:49.637613 [TRC] 172.18.206.186:7422 - lid:1 - <<- [PONG]
    c1          | [1372] 2020/01/10 15:17:49.732680 [DBG] 172.18.194.70:49157 - lid:1 - LeafNode Ping Timer
    c1          | [1372] 2020/01/10 15:17:49.732680 [TRC] 172.18.194.70:49157 - lid:1 - ->> [PING]
    c2          | [1372] 2020/01/10 15:17:49.717524 [TRC] 172.18.206.186:7422 - lid:1 - <<- [PING]
    c2          | [1372] 2020/01/10 15:17:49.717524 [TRC] 172.18.206.186:7422 - lid:1 - ->> [PONG]
    c1          | [1372] 2020/01/10 15:17:49.732680 [TRC] 172.18.194.70:49157 - lid:1 - <<- [PONG]
    c1          | [1372] 2020/01/10 15:17:51.714580 [DBG] 10.35.68.24:62849 - cid:2 - Client Ping Timer
    c1          | [1372] 2020/01/10 15:17:51.714580 [TRC] 10.35.68.24:62849 - cid:2 - ->> [PING]
    c1          | [1372] 2020/01/10 15:17:51.714580 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PONG]
    c1          | [1372] 2020/01/10 15:18:00.301474 [DBG] 10.35.68.24:62850 - cid:3 - Client connection created
    c1          | [1372] 2020/01/10 15:18:00.302611 [TRC] 10.35.68.24:62850 - cid:3 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"a","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Requestor","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
    c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - <<- [PING]
    c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - ->> [PONG]
    c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - <<- [SUB _INBOX.W7P0kJjrbQVrbmzAqqk6V1.*  1]
    c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - <<- [PUB test.service.1 _INBOX.W7P0kJjrbQVrbmzAqqk6V1.9cn7513D 3]
    c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - <<- MSG_PAYLOAD: ["foo"]
    c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62849 - cid:2 - ->> [PING]
    c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62849 - cid:2 - ->> [MSG test.service.1 1 _INBOX.W7P0kJjrbQVrbmzAqqk6V1.9cn7513D 3]
    c1          | [1372] 2020/01/10 15:18:00.303903 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PONG]
    c1          | [1372] 2020/01/10 15:18:00.304384 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PUB _INBOX.W7P0kJjrbQVrbmzAqqk6V1.9cn7513D 13]
    c1          | [1372] 2020/01/10 15:18:00.304384 [TRC] 10.35.68.24:62849 - cid:2 - <<- MSG_PAYLOAD: ["response text"]
    c1          | [1372] 2020/01/10 15:18:00.304384 [TRC] 10.35.68.24:62850 - cid:3 - ->> [MSG _INBOX.W7P0kJjrbQVrbmzAqqk6V1.9cn7513D 1 13]
    c1          | [1372] 2020/01/10 15:18:00.305527 [DBG] 10.35.68.24:62850 - cid:3 - Client connection closed
    c1          | [1372] 2020/01/10 15:18:00.307546 [TRC] 10.35.68.24:62850 - cid:3 - <-> [DELSUB 1]
    c1          | [1372] 2020/01/10 15:18:03.175280 [DBG] 10.35.68.24:62865 - cid:4 - Client connection created
    c1          | [1372] 2020/01/10 15:18:03.176364 [TRC] 10.35.68.24:62865 - cid:4 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"b","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Requestor","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
    c1          | [1372] 2020/01/10 15:18:03.176364 [TRC] 10.35.68.24:62865 - cid:4 - <<- [PING]
    c1          | [1372] 2020/01/10 15:18:03.176364 [TRC] 10.35.68.24:62865 - cid:4 - ->> [PONG]
    c1          | [1372] 2020/01/10 15:18:03.176364 [TRC] 10.35.68.24:62865 - cid:4 - <<- [SUB _INBOX.4ynIPqChOQMSroNEZqndLx.*  1]
    c1          | [1372] 2020/01/10 15:18:03.177312 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LS+ _INBOX.4ynIPqChOQMSroNEZqndLx.*]
    c1          | [1372] 2020/01/10 15:18:03.177312 [TRC] 10.35.68.24:62865 - cid:4 - <<- [PUB test.service.1 _INBOX.4ynIPqChOQMSroNEZqndLx.HhaycK1D 3]
    c1          | [1372] 2020/01/10 15:18:03.177312 [TRC] 10.35.68.24:62865 - cid:4 - <<- MSG_PAYLOAD: ["foo"]
    c1          | [1372] 2020/01/10 15:18:03.177312 [TRC] 10.35.68.24:62849 - cid:2 - ->> [MSG test.service.1 1 _R_.ie4QZJ.5bq99K 3]
    c2          | [1372] 2020/01/10 15:18:03.177521 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LS+ _INBOX.4ynIPqChOQMSroNEZqndLx.*]
    c1          | [1372] 2020/01/10 15:18:03.178465 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PUB _R_.ie4QZJ.5bq99K 13]
    c1          | [1372] 2020/01/10 15:18:03.178465 [TRC] 10.35.68.24:62849 - cid:2 - <<- MSG_PAYLOAD: ["response text"]
    c1          | [1372] 2020/01/10 15:18:03.178530 [TRC] 10.35.68.24:62865 - cid:4 - ->> [MSG _INBOX.4ynIPqChOQMSroNEZqndLx.HhaycK1D 1 13]
    c1          | [1372] 2020/01/10 15:18:03.179615 [DBG] 10.35.68.24:62865 - cid:4 - Client connection closed
    c1          | [1372] 2020/01/10 15:18:03.180602 [TRC] 10.35.68.24:62865 - cid:4 - <-> [DELSUB 1]
    c1          | [1372] 2020/01/10 15:18:03.180602 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LS- _INBOX.4ynIPqChOQMSroNEZqndLx.*]
    c2          | [1372] 2020/01/10 15:18:03.179385 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LS- _INBOX.4ynIPqChOQMSroNEZqndLx.*]
    c2          | [1372] 2020/01/10 15:18:03.181119 [TRC] 172.18.206.186:7422 - lid:1 - <-> [DELSUB _INBOX.4ynIPqChOQMSroNEZqndLx.*]
    c2          | [1372] 2020/01/10 15:18:05.761225 [DBG] 10.35.68.24:62866 - cid:2 - Client connection created
    c2          | [1372] 2020/01/10 15:18:05.762291 [TRC] 10.35.68.24:62866 - cid:2 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"c","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Requestor","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
    c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - <<- [PING]
    c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - ->> [PONG]
    c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - <<- [SUB _INBOX.TfzSpQyvrMigTw0TP7cMHt.*  1]
    c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 172.18.206.186:7422 - lid:1 - ->> [LS+ _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
    c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - <<- [PUB test.service.1 _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 3]
    c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - <<- MSG_PAYLOAD: ["foo"]
    c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 172.18.206.186:7422 - lid:1 - ->> [LMSG test.service.1 _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 3]
    c1          | [1372] 2020/01/10 15:18:05.763695 [TRC] 172.18.194.70:49157 - lid:1 - <<- [LS+ _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
    c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 172.18.194.70:49157 - lid:1 - <<- [LMSG test.service.1 _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 3]
    c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 172.18.194.70:49157 - lid:1 - <<- MSG_PAYLOAD: ["foo"]
    c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 10.35.68.24:62849 - cid:2 - ->> [MSG test.service.1 1 _R_.ie4QZJ.x9JGBo 3]
    c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PUB _R_.ie4QZJ.x9JGBo 13]
    c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 10.35.68.24:62849 - cid:2 - <<- MSG_PAYLOAD: ["response text"]
    c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LMSG _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 13]
    c2          | [1372] 2020/01/10 15:18:05.764649 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LMSG _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 13]
    c2          | [1372] 2020/01/10 15:18:05.764649 [TRC] 172.18.206.186:7422 - lid:1 - <<- MSG_PAYLOAD: ["response text"]
    c2          | [1372] 2020/01/10 15:18:05.765070 [TRC] 10.35.68.24:62866 - cid:2 - ->> [MSG _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 1 13]
    c2          | [1372] 2020/01/10 15:18:05.766173 [DBG] 10.35.68.24:62866 - cid:2 - Client connection closed
    c2          | [1372] 2020/01/10 15:18:05.766411 [TRC] 10.35.68.24:62866 - cid:2 - <-> [DELSUB 1]
    c2          | [1372] 2020/01/10 15:18:05.766411 [TRC] 172.18.206.186:7422 - lid:1 - ->> [LS- _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
    c1          | [1372] 2020/01/10 15:18:05.766060 [TRC] 172.18.194.70:49157 - lid:1 - <<- [LS- _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
    c1          | [1372] 2020/01/10 15:18:05.766060 [TRC] 172.18.194.70:49157 - lid:1 - <-> [DELSUB _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
    c2          | [1372] 2020/01/10 15:18:07.378670 [DBG] 10.35.68.24:62867 - cid:3 - Client connection created
    c2          | [1372] 2020/01/10 15:18:07.378670 [TRC] 10.35.68.24:62867 - cid:3 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"d","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Requestor","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
    c2          | [1372] 2020/01/10 15:18:07.378670 [TRC] 10.35.68.24:62867 - cid:3 - <<- [PING]
    c2          | [1372] 2020/01/10 15:18:07.379670 [TRC] 10.35.68.24:62867 - cid:3 - ->> [PONG]
    c2          | [1372] 2020/01/10 15:18:07.379746 [TRC] 10.35.68.24:62867 - cid:3 - <<- [SUB _INBOX.89dvNgB1mAb4aZo4PLaWJz.*  1]
    c2          | [1372] 2020/01/10 15:18:07.380243 [TRC] 10.35.68.24:62867 - cid:3 - <<- [PUB test.service.1 _INBOX.89dvNgB1mAb4aZo4PLaWJz.WyXS3UnR 3]
    c2          | [1372] 2020/01/10 15:18:07.380243 [TRC] 10.35.68.24:62867 - cid:3 - <<- MSG_PAYLOAD: ["foo"]
    c2          | [1372] 2020/01/10 15:18:07.380243 [TRC] 172.18.206.186:7422 - lid:1 - ->> [LMSG test.service.1 _R_.BBa91r.hQOCWj 3]
    c1          | [1372] 2020/01/10 15:18:07.380535 [TRC] 172.18.194.70:49157 - lid:1 - <<- [LMSG test.service.1 _R_.BBa91r.hQOCWj 3]
    c1          | [1372] 2020/01/10 15:18:07.380535 [TRC] 172.18.194.70:49157 - lid:1 - <<- MSG_PAYLOAD: ["foo"]
    c1          | [1372] 2020/01/10 15:18:07.380747 [TRC] 10.35.68.24:62849 - cid:2 - ->> [MSG test.service.1 1 _R_.ie4QZJ.tVfoKl 3]
    c1          | [1372] 2020/01/10 15:18:07.380747 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PUB _R_.ie4QZJ.tVfoKl 13]
    c1          | [1372] 2020/01/10 15:18:07.380747 [TRC] 10.35.68.24:62849 - cid:2 - <<- MSG_PAYLOAD: ["response text"]
    c2          | [1372] 2020/01/10 15:18:09.386622 [DBG] 10.35.68.24:62867 - cid:3 - Client connection closed
    c2          | [1372] 2020/01/10 15:18:09.386891 [TRC] 10.35.68.24:62867 - cid:3 - <-> [DELSUB 1]
    Gracefully stopping... (press Ctrl+C again to force)
    Stopping c2   ... done
    Stopping c1   ... done
    
    • [x] Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)

    Versions of nats-server and affected client libraries used:

    See logs. The go examples are as of commit f66f9c02346dc33296576bf0ef4bd48520bf88c9.

    OS/Container environment:

    Windows nanoserver

    Steps or code to reproduce the issue:

    docker-compose.yml

    version: "3.2"
    
    services:
     cluster1: 
       image: nats:2.1.2-nanoserver
       container_name: c1
       command: -c C:\\mount\\c1 -DV
       ports: 
         - 80:8222
         - 4244:4244
       expose:
         - "7422"
       volumes:
         - .\:C:\mount\
       networks:
         - cluster1
       restart: always
     cluster2: 
       depends_on: 
         - cluster1
       image: nats:2.1.2-nanoserver
       container_name: c2
       command: -c C:\\mount\\c2 -DV
       ports: 
         - 81:8222
         - 4245:4244
       expose:
         - "7422"
       volumes:
         - .\:C:\mount\
       networks:
         - cluster1
       restart: always
     
    networks:
     cluster1:
    

    cluster 1 config:

    port: 4244
    monitor_port: 8222
    accounts: {
      A: {
        users:[{
          user: a
          password: a
        }]
        exports: [
          {service: test.service.>}
        ]
      },
      B: {
        users:[{
           user: b
            password: b
        }]
        imports: [
          {service: {account: A, subject: test.service.1}}
        ]
      }
    }
    
    leafnodes {
      port: 7422
      authorization {
        account: B
      }
    }
    

    cluster 2 config:

    port: 4244
    monitor_port: 8222
    accounts: {
      C: {
        users:[{
          user: c
          password: c
        }]
        exports: [
          {service: test.service.>}
        ]
      },
      D: {
        users:[{
           user: d
           password: d
        }]
        imports: [
          {service: {account: C, subject: test.service.1}}
        ]
      }
    }
    leafnodes {
      remotes: [
        {
          urls: [
            nats-leaf://c1:7422
          ]
          account: C
        }
      ]
    }
    

    Starting a nats-rply: start "cluster1 Account A service" nats-rply -s nats://a:[email protected]:4244 test.service.1 "response text"

    Sending request to account D: nats-req -s nats://d:[email protected]:4245 test.service.1 foo

    Expected result:

    Request is sent from account D on cluster 2 to service listening at test.service.1 on Account A on cluster 1, and the requester gets "response text" back.

    Actual result:

    The service listening at test.service.1 gets a request of 'foo', but no message is returned to requester. Instead: "nats: timeout for request"

    opened by PlatanoBailando 41
  • Support WebSocket Connectivity

    Support WebSocket Connectivity

    Hi,

    At @gretaio, we need our signaling server to talk with web browsers, and in order to perform this, we setup a small proxy to gateway websocket to tcp so it can talk to nats.

    I saw on the todolist that you plan on adding a websocket strategy, and that's something we would greatly appreciate as that'd basically half the number of connections we need to have open :+1:.

    So, would you be open to a PR regarding this?

    idea customer requested 
    opened by pldubouilh 40
  • logging system, syslog and abstraction improvements

    logging system, syslog and abstraction improvements

    This is a WIP, this is a little roadmad and some questions I have.

    • [x] Create server.Logger interface and add server.SetLogger method
    • [x] Modify all the actual call to the new format
    • [x] Network syslog
    • [x] gnatsd options and link Network syslog
    • [x] fix this race condition on server.SetLogger()
    • [x] Tests
    • [x] Transform error message in real Errors

    Questions:

    Related to: https://github.com/apcera/gnatsd/issues/7

    opened by mcuadros 39
  • Consumers stops receiving messages

    Consumers stops receiving messages

    Defect

    Versions of nats-server and affected client libraries used:

    Nats server version

    [83] 2021/09/04 18:51:12.239432 [INF] Starting nats-server
    [83] 2021/09/04 18:51:12.239488 [INF]   Version:  2.4.0
    [83] 2021/09/04 18:51:12.239494 [INF]   Git:      [219a7c98]
    [83] 2021/09/04 18:51:12.239496 [DBG]   Go build: go1.16.7
    [83] 2021/09/04 18:51:12.239517 [INF]   Name:     NBVE7O7DMRAZ63STC7Z644KHF5HJ6QQUGLZVGDIKEG32CFL2J6O2456M
    [83] 2021/09/04 18:51:12.239533 [INF]   ID:       NBVE7O7DMRAZ63STC7Z644KHF5HJ6QQUGLZVGDIKEG32CFL2J6O2456M
    [83] 2021/09/04 18:51:12.239605 [DBG] Created system account: "$SYS"
    

    Go client version: v1.12.0

    OS/Container environment:

    GKE Kubernetes. Running nats js HA cluster. Deployed via nats helm chart.

    Steps or code to reproduce the issue:

    Stream configuration:

    apiVersion: jetstream.nats.io/v1beta1
    kind: Stream
    metadata:
      name: agent
    spec:
      name: agent
      subjects: ["data.*"]
      storage: file
      maxAge: 1h
      replicas: 3
      retention: interest
    

    There are two consumers to this stream. Each runs as queue subscriber in two services with 2 pod replicas. Note that I don't care if message is not processed, this is why ack none is set.

    
    // 2 pods for service A.
    js.QueueSubscribe(
    	"data.received",
    	"service1_queue",
    	func(msg *nats.Msg) {},
    	nats.DeliverNew(),
    	nats.AckNone(),
    )
    
    // 2 pods for service B.
    s.js.QueueSubscribe(
    	"data.received",
    	"service2_queue",
    	func(msg *nats.Msg) {},
    	nats.DeliverNew(),
    	nats.AckNone(),
    )
    

    Expected result:

    Consumer receives messages.

    Actual result:

    Stream stats after few days:

    agent                  │ File    │ 3         │ 28,258   │ 18 MiB  │ 0    │ 84      │ nats-js-0, nats-js-1*, nats-js-2
    

    Consumers stats:

    service1_queue │ Push │ None       │ 0.00s    │ 0           │ 0           │ 0           │ 60,756    │ nats-js-0, nats-js-1*, nats-js-2
    service2_queue │ Push │ None       │ 0.00s    │ 0           │ 0           │ 8,193 / 28% │ 60,843    │ nats-js-0, nats-js-1*, nats-js-2
    
    1. Non of the nats server pods contains errors indicating any problem.
    2. Unprocessed messages count for second consumer stays the same and doesn't decrease.
    3. The only fix which helped is after I changed second consumer raft leader with nats consumer cluster step-down. But after some time problem still comes back.
    4. There are active connections to the server. Checked with nats server report connections.

    /cc @kozlovic @derekcollison

    🐞 bug 
    opened by anjmao 38
  • Dave break

    Dave break

    • [ ] Link to issue, e.g. Resolves #NNN
    • [ ] Documentation added (if applicable)
    • [ ] Tests added
    • [ ] Branch rebased on top of current main (git pull --rebase origin main)
    • [ ] Changes squashed to a single commit (described here)
    • [ ] Build is green in Travis CI
    • [ ] You have certified that the contribution is your original work and that you license the work to the project under the Apache 2 license

    Resolves #

    Changes proposed in this pull request:

    /cc @nats-io/core

    opened by matthiashanel 0
  • Evaluate filtering subject interest propagation to leaf nodes with restricted users

    Evaluate filtering subject interest propagation to leaf nodes with restricted users

    I had a conversation with a person on Slack who tested out a the behavior of an LN connection to NGS with a user having --deny-pubsub=">" and enabling trace logs to discover the [LS+] ... lines which shows the subjects in the interest graph. Although a client can't obviously pub or sub to those, he posed the question/concern that its an easy way to snoop on subjects even though the user shouldn't see any of them.

    The question is whether interest prop should be filtered down to only the subset of permissions the LN connecting user has. I know there can be multiple connections across accounts, so the subset would need to be the union of those.

    This may overlap with some of the interest prop optimization work that has been discussed recently?

    opened by bruth 1
  • Fix chaos tests build tags for Travis

    Fix chaos tests build tags for Travis

    • [ ] Link to issue, e.g. Resolves #NNN
    • [ ] Documentation added (if applicable)
    • [ ] Tests added
    • [x] Branch rebased on top of current main (git pull --rebase origin main)
    • [ ] Changes squashed to a single commit (described here)
    • [ ] Build is green in Travis CI
    • [x] You have certified that the contribution is your original work and that you license the work to the project under the Apache 2 license

    Resolves #

    Changes proposed in this pull request:

    [FIXED] Wrong flag in Travis to exclude chaos tests … The js_tests build target was using the wrong tag to exclude chaos tests. As a result, chaos tests would run as part of the default testing.

    Exclude chaos tests helpers from default build

    /cc @nats-io/core

    opened by mprimi 3
  • Document downgrade process

    Document downgrade process

    Hi.

    After reading the documentation it is not clear for me, how the downgrade process should be performed over the solution. Could you please describe somewhere in the documentation:

    • How to downgrade NATS to the older version (hopefully with no downtime)
    • Maybe some compatibility policies and notes between versions

    Thanks in advance!

    opened by zamazan4ik 8
  • Chaos tests for Consumers

    Chaos tests for Consumers

    • [ ] Link to issue, e.g. Resolves #NNN
    • [x] Documentation added (if applicable)
    • [x] Tests added
    • [x] Branch rebased on top of current main (git pull --rebase origin main)
    • [ ] Changes squashed to a single commit
    • [ ] Build is green in Travis CI
    • [x] You have certified that the contribution is your original work and that you license the work to the project under the Apache 2 license

    Changes proposed in this pull request:

    Add 4 "chaos" tests for

    • Durable consumer
    • Ordered consumer
    • Pull consumer
    • Default (async) consumer

    These tests are trivial to pass under normal circumstances. In this case, a chaos monkey keeps restarting the cluster.

    This is surfacing some issues that would otherwise go unnoticed (discussed separately)

    /cc @nats-io/core

    opened by mprimi 7
  • Added param options to /healthz endpoint

    Added param options to /healthz endpoint

    • [x] Link to issue, e.g. Resolves #NNN
    • [x] Documentation added (if applicable)
    • [x] Tests added
    • [x] Branch rebased on top of current main (git pull --rebase origin main)
    • [x] Changes squashed to a single commit (described here)
    • [x] Build is green in Travis CI
    • [x] You have certified that the contribution is your original work and that you license the work to the project under the Apache 2 license

    Related to # https://github.com/nats-io/k8s/issues/541 Resolves # The JetStream portion of https://github.com/nats-io/nats-server/issues/2687

    Changes proposed in this pull request:

    • Added /healthz?js-enabled=true to return error if JetStream is disabled.
    • Added /healthz?js-server-only=true to skip checking of JetStream accounts, streams and consumers.

    /cc @nats-io/core

    opened by mfaizanse 0
Releases(v2.8.4)
Owner
NATS - The Cloud Native Messaging System
NATS is a simple, secure and performant communications system for digital systems, services and devices.
NATS - The Cloud Native Messaging System
Micro is a platform for cloud native development

Micro Overview Micro addresses the key requirements for building services in the cloud. It leverages the microservices architecture pattern and provid

Micro 11.2k Aug 1, 2022
Kafka implemented in Golang with built-in coordination (No ZK dep, single binary install, Cloud Native)

Jocko Kafka/distributed commit log service in Go. Goals of this project: Implement Kafka in Go Protocol compatible with Kafka so Kafka clients and ser

Travis Jeffery 4.6k Jul 29, 2022
CockroachDB - the open source, cloud-native distributed SQL database.

CockroachDB is a cloud-native distributed SQL database designed to build, scale, and manage modern, data-intensive applications. What is CockroachDB?

CockroachDB 25.3k Aug 7, 2022
A feature complete and high performance multi-group Raft library in Go.

Dragonboat - A Multi-Group Raft library in Go / 中文版 News 2021-01-20 Dragonboat v3.3 has been released, please check CHANGELOG for all changes. 2020-03

lni 4.3k Jul 27, 2022
High performance, distributed and low latency publish-subscribe platform.

Emitter: Distributed Publish-Subscribe Platform Emitter is a distributed, scalable and fault-tolerant publish-subscribe platform built with MQTT proto

emitter 3.3k Jul 27, 2022
short-url distributed and high-performance

durl 是一个分布式的高性能短链服务,逻辑简单,并提供了相关api接口,开发人员可以快速接入,也可以作为go初学者练手项目.

宋昂 439 Aug 6, 2022
Collection of high performance, thread-safe, lock-free go data structures

Garr - Go libs in a Jar Collection of high performance, thread-safe, lock-free go data structures. adder - Data structure to perform highly-performant

LINE 338 Jul 29, 2022
A realtime distributed messaging platform

Source: https://github.com/nsqio/nsq Issues: https://github.com/nsqio/nsq/issues Mailing List: [email protected] IRC: #nsq on freenode Docs:

NSQ 21.9k Jul 31, 2022
Simple, fast and scalable golang rpc library for high load

gorpc Simple, fast and scalable golang RPC library for high load and microservices. Gorpc provides the following features useful for highly loaded pro

Aliaksandr Valialkin 652 Jul 24, 2022
Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.

Dapr is a portable, serverless, event-driven runtime that makes it easy for developers to build resilient, stateless and stateful microservices that run on the cloud and edge and embraces the diversity of languages and developer frameworks.

Dapr 18.8k Aug 8, 2022
A distributed systems library for Kubernetes deployments built on top of spindle and Cloud Spanner.

hedge A library built on top of spindle and Cloud Spanner that provides rudimentary distributed computing facilities to Kubernetes deployments. Featur

null 21 Jan 4, 2022
A distributed locking library built on top of Cloud Spanner and TrueTime.

A distributed locking library built on top of Cloud Spanner and TrueTime.

null 44 Jul 19, 2022
Flowgraph package for scalable asynchronous system development

flowgraph Getting Started go get -u github.com/vectaport/flowgraph go test Links Wiki Slides from Minneapolis Golang Meetup, May 22nd 2019 Overview F

Scott Johnston 51 Jul 4, 2022
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Chris Lu 3.1k Aug 2, 2022
A distributed system for embedding-based retrieval

Overview Vearch is a scalable distributed system for efficient similarity search of deep learning vectors. Architecture Data Model space, documents, v

vector search infrastructure for AI applications 1.4k Jul 29, 2022
a dynamic configuration framework used in distributed system

go-archaius This is a light weight configuration management framework which helps to manage configurations in distributed system The main objective of

null 196 Jul 13, 2022
Verifiable credential system on Cosmos with IBC for Distributed Identities

CertX This is a project designed to demonstrate the use of IBC between different zones in the Cosmos ecosystem for privacy preserving credential manag

bwty 6 Mar 29, 2022