High-Performance server for NATS, the cloud native messaging system.

Overview

NATS is a simple, secure and performant communications system for digital systems, services and devices. NATS is part of the Cloud Native Computing Foundation (CNCF). NATS has over 40 client language implementations, and its server can run on-premise, in the cloud, at the edge, and even on a Raspberry Pi. NATS can secure and simplify design and operation of modern distributed systems.

License Build Release Slack Coverage Docker Downloads CII Best Practices

Documentation

Contact

  • Twitter: Follow us on Twitter!
  • Google Groups: Where you can ask questions
  • Slack: Click here to join. You can ask question to our maintainers and to the rich and active community.

Contributing

If you are interested in contributing to NATS, read about our...

Security

Security Audit

A third party security audit was performed by Cure53, you can see the full report here.

Reporting Security Vulnerabilities

If you've found a vulnerability or a potential vulnerability in the NATS server, please let us know at nats-security.

License

Unless otherwise noted, the NATS source files are distributed under the Apache Version 2.0 license found in the LICENSE file.

Comments
  • subscription count in subsz is wrong

    subscription count in subsz is wrong

    SInce updating one of my brokers to 2.0.0 I noticed a slow increate in subscription counts - I also did a bunch of other updates like move to the newly renamed libraries etc so in order to find the cause I eventually concluded the server is just counting things wrongly.

    graph

    Ignoring the annoying popup, you can see a steady increase in subscriptions.

    Data below is from the below dependency embedded in another go process:

    github.com/nats-io/nats-server/v2 v2.0.1-0.20190701212751-a171864ae7df
    
    $ curl -s http://localhost:6165/varz|jq .subscriptions
    29256
    

    I then tried to verify this number, and assuming I have no bugs in the script below I think the varz counter is off by a lot, comparing snapshots of connz over time I see no growth reflected there not in connection counts nor subscriptions:

    $ curl "http://localhost:6165/connz?limit=200000&subs=1"|./countsubs.rb
    Connections: 3659
    Subscriptions: 25477
    

    I also captured connz output over time 15:17, 15:56 and 10:07 the next day:

    $ cat connz-1562685506.json|./countsubs.rb
    Connections: 3657
    Subscriptions: 25463
    $ cat connz-1562687791.json|./countsubs.rb
    Connections: 3658
    Subscriptions: 25463
    $ cat connz-1562687791.json|./countsubs.rb
    Connections: 3658
    Subscriptions: 25463
    

    Using the script here:

    require "json"
    
    subs = JSON.parse(STDIN.read)
    puts "Connections: %d" % subs["connections"].length
    
    count = 0
    
    subs["connections"].each do |conn|
      count += subs.length if subs = conn["subscriptions_list"]
    end
    
    puts "Subscriptions: %d" % count
    
    opened by ripienaar 87
  • Performance issues with locks and sublist cache

    Performance issues with locks and sublist cache

    • [ ] Defect
    • [x] Feature Request or Change Proposal

    Feature Requests

    Use Case:

    We are using gnatsd 1.4.1 (compiled go 1.11.5). During benchmark, we observed non-trivial latency (500 ms+, usually seconds) at gnatsd cluster.

    As there is no slow consumers (with default 2 seconds threshold) and the OS rcv buffer got full and TCP window went to 0, it seems that the gnatsd server is somehow slow in read loop. We are trying to slow down the sender for one connection but we believe that gnatsd can also be improved. If you need more proofs of slowness of read loop, we might be able to provide some tcpdump snippets and tracing logs of gnatsd.

    We also observe some parser errors happened rarely when gnatsd is under high load of reading. The client is using cnats. However we are not sure who (cnats, OS, or gnatsd) was not doing right. After we found it out, we may open another issue to address the problem.

    [8354] 2019/04/01 12:17:11.695815 [ERR] 10.228.255.129:44588 - cid:1253 - Client parser ERROR, state=0, i=302: proto='"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"...'
    

    By the way, as gnatsd could detect slow consumer, is that possible for gnatsd to know itself becomes a slow consumer (slow read)? The only idea I come up is to adjust OS buffer and let the upstream to know the pressure. If you have any suggestions, please let me know.

    Proposed Change:

    1. Improve locks. https://github.com/nats-io/gnatsd/compare/branch_1_4_0...azrle:enhance/processMsg_lock Comparison of read loops between high load and low load: image Sync blocking graph: image

    2. Ability to adjust sublist cache size or disable it. https://github.com/nats-io/gnatsd/compare/branch_1_4_0...azrle:feature/opts-sublist_cache_size According to our application characteristic, it doing sub/unsub very frequently and most of subjects are single-used. The hit rate of cache is under 0.5%. However, it can cost gnatsd to maintenance the sublist cache. Besides locks for the cache, reduceCacheCount is noticeable. Compared to other function's goroutines which are less than 50, the number of goroutines for server.(*Sublist).reduceCacheCount can climb up to near 18,000.

    Who Benefits From The Change(s)?

    Clients send messages heavily to gnatsd. And subscription changes frequently. Under our test cases (with enough servers), the 99.9%tile of latency drops from 1500ms to 500ms (it's still slow though).

    I noticed that gnatsd v2 is coming. And the implementation changes a lot. But I am afraid that we may not have time to wait for it to get production-ready.

    I sincerely hope the performance can be improved for v1.4.

    Thank you in advance!

    opened by azrle 59
  • Consumer stopped working after errPartialCache (nats-server oom-killed)

    Consumer stopped working after errPartialCache (nats-server oom-killed)

    Defect

    Make sure that these boxes are checked before submitting your issue -- thank you!

    • [x] Included nats-server -DV output
    • [ ] Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)

    Versions of nats-server and affected client libraries used:

    # nats-server -DV
    [92] 2021/12/06 15:16:05.235349 [INF] Starting nats-server
    [92] 2021/12/06 15:16:05.235397 [INF]   Version:  2.6.6
    [92] 2021/12/06 15:16:05.235401 [INF]   Git:      [878afad]
    [92] 2021/12/06 15:16:05.235406 [DBG]   Go build: go1.16.10
    [92] 2021/12/06 15:16:05.235416 [INF]   Name:     NASX72BQAFBIH4QBLZ36RADTPKSO6LCKRDEAS37XRJ7SYZ53RYYOFHHS
    [92] 2021/12/06 15:16:05.235436 [INF]   ID:       NASX72BQAFBIH4QBLZ36RADTPKSO6LCKRDEAS37XRJ7SYZ53RYYOFHHS
    [92] 2021/12/06 15:16:05.235457 [DBG] Created system account: "$SYS"
    
    Image:         nats:2.6.6-alpine
        Limits:
          cpu:     200m
          memory:  256Mi
        Requests:
          cpu:      200m
          memory:   256Mi
    

    go library:

    github.com/nats-io/nats.go v1.13.1-0.20211018182449-f2416a8b1483
    

    OS/Container environment:

    Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:42:41Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
    
    CONTAINER-RUNTIME
    cri-o://1.21.4
    

    Steps or code to reproduce the issue:

    1. Start nats cluster (3 replicas) with Jetstream enabled. JS Config:
    jetstream {
      max_mem: 64Mi
      store_dir: /data
    
      max_file:10Gi
    }
    
    
    1. Start to push messages into stream. Stream config:
    Configuration:
    
                 Subjects: widget-request-collector
         Acknowledgements: true
                Retention: File - WorkQueue
                 Replicas: 3
           Discard Policy: Old
         Duplicate Window: 2m0s
        Allows Msg Delete: true
             Allows Purge: true
           Allows Rollups: false
         Maximum Messages: unlimited
            Maximum Bytes: 1.9 GiB
              Maximum Age: 1d0h0m0s
     Maximum Message Size: unlimited
        Maximum Consumers: unlimited
    
    
    1. Shutdown one of the nats nodes for a while and rate limit consumer (or shutdown consumer) for collecting messages in file storage.
    2. Wait until storage reached it's maximum capacity (1.9G).
    3. Bring up nats server. (Do not bring up consumer)

    Expected result:

    Outdated node should become current.

    Actual result:

    Outdated node tries to become current, gets messages from stream leader, but reached memory limit and killed by OOM. It restarts again, and again killed by OOM.

    Cluster Information:
    
                     Name: nats
                   Leader: promo-widget-collector-event-nats-2
                  Replica: promo-widget-collector-event-nats-1, outdated, OFFLINE, seen 2m8s ago, 13,634 operations behind
                  Replica: promo-widget-collector-event-nats-0, current, seen 0.00s ago
    
    State:
    
                 Messages: 2,695,412
                    Bytes: 1.9 GiB
                 FirstSeq: 3,957,219 @ 2021-12-06T14:04:00 UTC
                  LastSeq: 6,652,630 @ 2021-12-06T15:09:36 UTC
         Active Consumers: 1
    

    Crashed pod info:

        State:          Waiting                                                                                                                                                                                                                                                                                                                                                                                                              
          Reason:       CrashLoopBackOff                                                                                                                                                                                                                                                                                                                                                                                                     
        Last State:     Terminated                                                                                                                                                                                                                                                                                                                                                                                                           
          Reason:       OOMKilled                                                                                                                                                                                                                                                                                                                                                                                                            
          Exit Code:    137                                                                                                                                                                                                                                                                                                                                                                                                                  
          Started:      Mon, 06 Dec 2021 14:30:26 +0000                                                                                                                                                                                                                                                                                                                                                                                      
          Finished:     Mon, 06 Dec 2021 14:31:08 +0000                                                                                                                                                                                                                                                                                                                                                                                      
        Ready:          False                                                                                                                                                                                                                                                                                                                                                                                                                
        Restart Count:  3 
    

    Is it possible to configure memory limits for nats-server to prevent memory overeating?

    ๐Ÿž bug 
    opened by rino-pupkin 51
  • jetstream could not pull message after nats-server restart

    jetstream could not pull message after nats-server restart

    i was testing jetstream on nats-server v2.3.2. one sender and one receiver program are running for quite a long time.

    this is what my stream look like :

    	_, err = js.AddStream(&nats.StreamConfig{
    		Name:      streamName,
    		Subjects:  []string{streamSubjects},
    		Storage:   nats.FileStorage,
    		Replicas:  3,
    		Retention: nats.WorkQueuePolicy,
    		Discard:   nats.DiscardNew,
    		MaxMsgs:   -1,
    		MaxAge:    time.Hour * 24 * 365,
    	})
    

    this is how i create the consumer:

    	if _, err := js.AddConsumer(streamName, &nats.ConsumerConfig{
    		Durable:       durableName,
    		DeliverPolicy: nats.DeliverAllPolicy,
    		AckPolicy:     nats.AckExplicitPolicy,
    		ReplayPolicy:  nats.ReplayInstantPolicy,
    		FilterSubject: subjectName,
    		AckWait:       time.Second * 30,
    		MaxDeliver:    -1,
    		MaxAckPending: 1000,
    	}); err != nil && !strings.Contains(err.Error(), "already in use") {
    		log.Println("AddConsumer fail")
    		return
    	}
    

    this is what the subscriber look like:

    	sub, err := js.PullSubscribe("ORDERS.created", durableName, nats.Bind("ORDERS", durableName))
    	if err != nil {
    		fmt.Println(" PullSubscribe:", err)
    		return
    	}
           msgs, err := sub.Fetch(1000, nats.MaxWait(10*time.Second))
    

    when i restart my nats-server cluster nodes(upgrade to nats-server 2.3.3), the consumer can no longer pull messages even if i restart my consumer program. the Fetch call just return : "nats: timeout", but i'm sure there are lots of message in the working queue. only if i delete the consumer by calling js.DeleteConsumer(streamName, durableName), recreate it, my program can resume fetching messages. actually, every time i restart nats-server nodes, my consumer program encouter the same problem.

    there is another issue, after i restart nats-server nodes, restart my program, it sometimes reports : "PullSubscribe: nats: JetStream system temporarily unavailable"

    I expect nats-server nodes restarting action not impacting jetstream clients fetching messages.

    ๐Ÿž bug 
    opened by carr123 50
  • Client Auth API

    Client Auth API

    Nats seems perfect for our needs, however having auth hard coded on service start isn't very practical when we are adding and removing users while its running.

    Implementing some go code to handle this is 1 option, another is to use an external service for authorization. Whether it's HTTP basic auth, etc. Being able to set an authentication endpoint would be very handy. Especially since we only allow a user to be logged in with 1 session.

    If this is possible now please let me know, but I couldn't find it in the docs anywhere.

    Thanks!

    customer requested security 
    opened by qrpike 47
  • memory increase in clustered mode

    memory increase in clustered mode

    This is a follow on from https://github.com/nats-io/nats-server/issues/1065

    While looking into the above issue I noticed memory growth, we wanted to focus on one issue at a time so with 1065 done I looked at the memory situation. The usage patterns and so forth is identical to 1065.

    12 hours

    Above is 12 hours, now as you know I embed your broker into one of my apps and I run a bunch of things in there. However in order to isolate the problem I did a few things:

    1. Same version of everything with the same usage pattern on a single unclustered broker does not show memory growth
    2. Turning off all the related feature in my code where I embed nats-server when clustered I still see the growth
    3. I made my code respond to SIGQUIT to write memory profiles on demand so I can interrogate a running nats server

    The nats-server is github.com/nats-io/nats-server/v2 v2.0.3-0.20190723153225-9cf534bc5e97

    From above memory dumps when comparing 6 hours apart dumps I see:

    8am:

    (pprof) top10
    Showing nodes accounting for 161.44MB, 90.17% of 179.04MB total
    Dropped 66 nodes (cum <= 0.90MB)
    Showing top 10 nodes out of 51
          flat  flat%   sum%        cum   cum%
       73.82MB 41.23% 41.23%    73.82MB 41.23%  github.com/nats-io/nats-server/v2/server.(*client).queueOutbound
       29.18MB 16.30% 57.53%    29.68MB 16.58%  github.com/nats-io/nats-server/v2/server.(*Server).createClient
       19.60MB 10.95% 68.48%    19.60MB 10.95%  math/rand.NewSource
       15.08MB  8.42% 76.90%   140.30MB 78.37%  github.com/nats-io/nats-server/v2/server.(*client).readLoop
        6.50MB  3.63% 80.53%       12MB  6.70%  github.com/nats-io/nats-server/v2/server.(*client).processSub
        5.25MB  2.93% 83.46%    11.25MB  6.28%  github.com/nats-io/nats-server/v2/server.(*Sublist).Insert
        4.01MB  2.24% 85.70%    65.85MB 36.78%  github.com/nats-io/nats-server/v2/server.(*client).processInboundClientMsg
        3.50MB  1.95% 87.65%     3.50MB  1.95%  github.com/nats-io/nats-server/v2/server.newLevel
        2.50MB  1.40% 89.05%     2.50MB  1.40%  github.com/nats-io/nats-server/v2/server.newNode
           2MB  1.12% 90.17%        2MB  1.12%  github.com/nats-io/nats-server/v2/server.(*client).addSubToRouteTargets
    

    1pm

    (pprof) top10
    Showing nodes accounting for 185.64MB, 90.87% of 204.29MB total
    Dropped 69 nodes (cum <= 1.02MB)
    Showing top 10 nodes out of 46
          flat  flat%   sum%        cum   cum%
       86.33MB 42.26% 42.26%    86.33MB 42.26%  github.com/nats-io/nats-server/v2/server.(*client).queueOutbound
       30.19MB 14.78% 57.04%    30.69MB 15.02%  github.com/nats-io/nats-server/v2/server.(*Server).createClient
       25.75MB 12.60% 69.64%   165.05MB 80.79%  github.com/nats-io/nats-server/v2/server.(*client).readLoop
       19.60MB  9.59% 79.24%    19.60MB  9.59%  math/rand.NewSource
        6.50MB  3.18% 82.42%    12.55MB  6.14%  github.com/nats-io/nats-server/v2/server.(*client).processSub
        5.25MB  2.57% 84.99%    11.25MB  5.51%  github.com/nats-io/nats-server/v2/server.(*Sublist).Insert
        4.02MB  1.97% 86.95%    73.70MB 36.08%  github.com/nats-io/nats-server/v2/server.(*client).processInboundClientMsg
        3.50MB  1.71% 88.67%     3.50MB  1.71%  github.com/nats-io/nats-server/v2/server.newLevel
        2.50MB  1.22% 89.89%     2.50MB  1.22%  github.com/nats-io/nats-server/v2/server.newNode
           2MB  0.98% 90.87%        2MB  0.98%  github.com/nats-io/nats-server/v2/server.(*client).addSubToRouteTargets
    
    opened by ripienaar 44
  • Suggest repair actions for JetStream cluster consumer NO quorum issue

    Suggest repair actions for JetStream cluster consumer NO quorum issue

    Environment

    • NATS version: 2.2.6 with jetstream enabled
    • Number of nodes nodes in the cluster : 3
    • Deploy on OKD 3.11 by nats helm chart 0.8.0

    Event description

    • Getting jetstream stream info successfully, but failed on getting jetstream consumer info by natscli
    • [Pub] OK, [Sub] Failed. NATS sub client can't connect to NATS cluster after 7/7 00:18
    • The cluster has been running for more than a month, and there were no errors until 7/7. It was confirmed that there were no network or hardware problems.
    • Attached logs and tried actions, please suggest other repair actions. Thanks.

    NATS server logs

    nats instance 0

    [1] 2021/07/07 00:18:44.650787 [WRN] JetStream cluster stream '$G > MY-STREAM2' has NO quorum, stalled.
    [1] 2021/07/07 00:18:44.651098 [WRN] JetStream cluster consumer '$G > MY-STREAM2 > consumer5' has NO quorum, stalled.
    [1] 2021/07/07 00:18:47.433327 [INF] JetStream cluster new metadata leader
    [1] 2021/07/07 00:18:47.930284 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM2 > consumer5'
    [1] 2021/07/07 00:18:51.306199 [WRN] JetStream cluster stream '$G > MY-STREAM' has NO quorum, stalled.
    [1] 2021/07/07 00:18:51.652389 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer3' has NO quorum, stalled.
    [1] 2021/07/07 00:18:56.555042 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM > consumer2'
    [1] 2021/07/07 00:19:00.462077 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM > consumer3'
    [1] 2021/07/07 00:19:00.870001 [WRN] Got stream sequence mismatch for '$G > MY-STREAM'
    [1] 2021/07/07 00:19:01.024537 [WRN] Resetting stream '$G > MY-STREAM'
    [1] 2021/07/07 00:19:01.292724 [INF] JetStream cluster new stream leader for '$G > MY-STREAM'
    

    nats instance 1

    [1] 2021/07/07 00:18:48.190309 [INF] JetStream cluster new stream leader for '$G > MY-STREAM2'
    [1] 2021/07/07 00:18:53.343597 [INF] JetStream cluster new metadata leader
    [1] 2021/07/07 00:18:56.820943 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM2 > consumer5'
    [1] 2021/07/07 00:18:57.098682 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM > consumer1'
    [1] 2021/07/07 00:18:57.572857 [INF] JetStream cluster new stream leader for '$G > MY-STREAM2'
    [1] 2021/07/07 00:18:57.679975 [INF] JetStream cluster new stream leader for '$G > MY-STREAM'
    [1] 2021/07/07 00:19:00.710121 [WRN] Got stream sequence mismatch for '$G > MY-STREAM'
    [1] 2021/07/07 00:19:00.909870 [WRN] Resetting stream '$G > MY-STREAM'
    [1] 2021/07/08 03:30:19.175389 [WRN] Did not receive all stream info results for "$G"
    

    nats instance 2

    [1] 2021/07/07 00:18:57.508614 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM > consumer4'
    [1] 2021/07/07 00:19:00.710399 [WRN] Got stream sequence mismatch for '$G > MY-STREAM'
    [1] 2021/07/07 00:19:00.907675 [WRN] Resetting stream '$G > MY-STREAM'
    

    Tried Actions

    1. Try to execute "nats consumer cluster step-down" [Failed]
    nats consumer list MY-STREAM
    # Consumers for Stream MY-STREAM:
    
    #         consumer1
    #         consumer2
    #         consumer3
    #         consumer4
    
    nats consumer cluster step-down --trace 
    # 13:11:04 >>> $JS.API.STREAM.NAMES
    # {"offset":0}
    
    # 13:11:05 <<< $JS.API.STREAM.NAMES
    # {"type":"io.nats.jetstream.api.v1.stream_names_response","total":2,"offset":0,"limit":1024,"streams":["MY-STREAM","MY-STREAM2"]}
    
    # ? Select a Stream MY-STREAM
    # 13:11:13 >>> $JS.API.CONSUMER.NAMES.MY-STREAM
    # {"offset":0}
    
    # 13:11:13 <<< $JS.API.CONSUMER.NAMES.MY-STREAM
    # {"type":"io.nats.jetstream.api.v1.consumer_names_response","total":4,"offset":0,"limit":1024,"consumers":["consumer1","consumer2","consumer3","consumer4"]}
    
    # ? Select a Consumer consumer2
    # 13:11:16 >>> $JS.API.CONSUMER.INFO.MY-STREAM.consumer2
    
    
    # 13:11:21 <<< $JS.API.CONSUMER.INFO.MY-STREAM.consumer2: context deadline exceeded
    
    # nats.exe: error: context deadline exceeded, try --help
    
    1. Try to request CONSUMER STEPDOWN API directly [Failed]
    nats req '$JS.API.CONSUMER.LEADER.STEPDOWN.MY-STREAM.consumer3' "" --trace
    
    # 05:20:43 Sending request on "$JS.API.CONSUMER.LEADER.STEPDOWN.MY-STREAM.consumer3"
    # nats: error: nats: timeout, try --help
    
    
    1. Try to restart NATS server [Still failed to get consumer]
    kubectl rollout restart statefulset nats -n mynamespace
    
    nats con info --trace
    # 05:43:02 >>> $JS.API.STREAM.NAMES
    # {"offset":0}
    
    # 05:43:02 <<< $JS.API.STREAM.NAMES
    # {"type":"io.nats.jetstream.api.v1.stream_names_response","total":2,"offset":0,"limit":1024,"streams":["MY-STREAM","MY-STREAM2"]}
    
    # ? Select a Stream MY-STREAM
    # 05:43:03 >>> $JS.API.CONSUMER.NAMES.MY-STREAM
    # {"offset":0}
    
    # 05:43:03 <<< $JS.API.CONSUMER.NAMES.MY-STREAM
    # {"type":"io.nats.jetstream.api.v1.consumer_names_response","total":4,"offset":0,"limit":1024,"consumers":["consumer1","consumer2","consumer3","consumer4"]}
    
    # ? Select a Consumer consumer1
    # 05:43:05 >>> $JS.API.CONSUMER.INFO.MY-STREAM.consumer1
    
    
    # 05:43:05 <<< $JS.API.CONSUMER.INFO.MY-STREAM.consumer1
    # {"type":"io.nats.jetstream.api.v1.consumer_info_response","error":{"code":503,"description":"JetStream system temporarily unavailable"}}
    
    # nats: error: could not load Consumer MY-STREAM > consumer1: JetStream system temporarily unavailable
    

    nats-0 server have a lot of JetStream WRAN logs

    [1] 2021/07/08 05:40:33.345825 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer3' has NO quorum, stalled.
    [1] 2021/07/08 05:40:34.027116 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer2' has NO quorum, stalled.
    [1] 2021/07/08 05:40:34.542920 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer1' has NO quorum, stalled.
    [1] 2021/07/08 05:40:35.494354 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer4' has NO quorum, stalled.
    [1] 2021/07/08 05:40:55.586260 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer4' has NO quorum, stalled.
    [1] 2021/07/08 05:40:57.300211 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer1' has NO quorum, stalled.
    [1] 2021/07/08 05:40:58.005908 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer3' has NO quorum, stalled.
    [1] 2021/07/08 05:40:58.324828 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer2' has NO quorum, stalled.
    [1] 2021/07/08 05:41:16.664240 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer4' has NO quorum, stalled.
    [1] 2021/07/08 05:41:17.659280 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer1' has NO quorum, stalled.
    [1] 2021/07/08 05:41:20.245055 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer3' has NO quorum, stalled.
    

    NATS stream report have MY-STREAM nats-0 failed status

    nats stream report
    
    Obtaining Stream stats
    
    +--------------------------------------------------------------------------------------------------------------------+
    |                                                   Stream Report                                                    |
    +-----------------------------+---------+-----------+----------+---------+------+---------+--------------------------+
    | Stream                      | Storage | Consumers | Messages | Bytes   | Lost | Deleted | Replicas                 |
    +-----------------------------+---------+-----------+----------+---------+------+---------+--------------------------+
    | MY-STREAM2 | File    | 1         | 0        | 0 B     | 0    | 0       | nats-0, nats-1, nats-2*  |
    | MY-STREAM                  | File    | 0         | 500      | 3.9 MiB | 0    | 0       | nats-0!, nats-1, nats-2* |
    +-----------------------------+---------+-----------+----------+---------+------+---------+--------------------------+
    
    1. Try to remove nats-0 peer for MY-STREAM [Failed]
    nats stream cluster peer-remove
    # ? Select a Stream MY-STREAM
    # ? Select a Peer nats-0
    # 06:16:31 Removing peer "nats-0"
    # nats: error: peer remap failed, try --help
    
    opened by phho 42
  • Service crossing accounts and leaf nodes can't send message back to requester.

    Service crossing accounts and leaf nodes can't send message back to requester.

    • [X] Defect
    • [ ] Feature Request or Change Proposal

    Defects

    Make sure that these boxes are checked before submitting your issue -- thank you!

    • [X] Included nats-server -DV output
    c1          | [1372] 2020/01/10 15:17:46.476336 [INF] Starting nats-server version 2.1.2
    c1          | [1372] 2020/01/10 15:17:46.476336 [DBG] Go build version go1.12.13
    c1          | [1372] 2020/01/10 15:17:46.476336 [INF] Git commit [679beda]
    c1          | [1372] 2020/01/10 15:17:46.476336 [WRN] Plaintext passwords detected, use nkeys or bcrypt.
    c1          | [1372] 2020/01/10 15:17:46.478337 [INF] Starting http monitor on 0.0.0.0:8222
    c1          | [1372] 2020/01/10 15:17:46.478337 [INF] Listening for leafnode connections on 0.0.0.0:7422
    c1          | [1372] 2020/01/10 15:17:46.478337 [DBG] Get non local IPs for "0.0.0.0"
    c1          | [1372] 2020/01/10 15:17:46.485338 [DBG]  ip=172.18.206.186
    c1          | [1372] 2020/01/10 15:17:46.488338 [INF] Listening for client connections on 0.0.0.0:4244
    c1          | [1372] 2020/01/10 15:17:46.488338 [INF] Server id is ND2MSDWDWTMJEX2V7TDS2O53Q5ZEY3W3ORS6T53HOM3PR5BBP6ZSYCA6
    c1          | [1372] 2020/01/10 15:17:46.488338 [INF] Server is ready
    c1          | [1372] 2020/01/10 15:17:46.488338 [DBG] Get non local IPs for "0.0.0.0"
    c1          | [1372] 2020/01/10 15:17:46.492338 [DBG]  ip=172.18.206.186
    c2          | [1372] 2020/01/10 15:17:48.537218 [INF] Starting nats-server version 2.1.2
    c2          | [1372] 2020/01/10 15:17:48.537218 [DBG] Go build version go1.12.13
    c2          | [1372] 2020/01/10 15:17:48.537218 [INF] Git commit [679beda]
    c2          | [1372] 2020/01/10 15:17:48.537218 [WRN] Plaintext passwords detected, use nkeys or bcrypt.
    c2          | [1372] 2020/01/10 15:17:48.539218 [INF] Starting http monitor on 0.0.0.0:8222
    c2          | [1372] 2020/01/10 15:17:48.539218 [INF] Listening for client connections on 0.0.0.0:4244
    c2          | [1372] 2020/01/10 15:17:48.539218 [INF] Server id is NCIHCZWAIQUH3OK624BMEV62WEEX6IEBKUFXAPRFRCE3GVEWRRNC5WBX
    c2          | [1372] 2020/01/10 15:17:48.539218 [INF] Server is ready
    c2          | [1372] 2020/01/10 15:17:48.539218 [DBG] Get non local IPs for "0.0.0.0"
    c2          | [1372] 2020/01/10 15:17:48.545215 [DBG]  ip=172.18.194.70
    c2          | [1372] 2020/01/10 15:17:48.556228 [DBG] Trying to connect as leafnode to remote server on "c1:7422" (172.18.206.186:7422)
    c1          | [1372] 2020/01/10 15:17:48.560110 [DBG] 172.18.194.70:49157 - lid:1 - Leafnode connection created
    c2          | [1372] 2020/01/10 15:17:48.560661 [DBG] 172.18.206.186:7422 - lid:1 - Remote leafnode connect msg sent
    c2          | [1372] 2020/01/10 15:17:48.560661 [DBG] 172.18.206.186:7422 - lid:1 - Leafnode connection created
    c2          | [1372] 2020/01/10 15:17:48.560661 [INF] Connected leafnode to "c1"
    c1          | [1372] 2020/01/10 15:17:48.561188 [TRC] 172.18.194.70:49157 - lid:1 - <<- [CONNECT {"tls_required":false,"name":"NCIHCZWAIQUH3OK624BMEV62WEEX6IEBKUFXAPRFRCE3GVEWRRNC5WBX"}]
    c1          | [1372] 2020/01/10 15:17:48.562131 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LS+ test.service.1]
    c1          | [1372] 2020/01/10 15:17:48.562131 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LS+ lds.qtioyTeG9dZPgE8uYM7rsy]
    c2          | [1372] 2020/01/10 15:17:48.561759 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LS+ test.service.1]
    c2          | [1372] 2020/01/10 15:17:48.562839 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LS+ lds.qtioyTeG9dZPgE8uYM7rsy]
    c1          | [1372] 2020/01/10 15:17:49.489505 [DBG] 10.35.68.24:62849 - cid:2 - Client connection created
    c1          | [1372] 2020/01/10 15:17:49.491212 [TRC] 10.35.68.24:62849 - cid:2 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"a","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Responder","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
    c1          | [1372] 2020/01/10 15:17:49.491212 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PING]
    c1          | [1372] 2020/01/10 15:17:49.491212 [TRC] 10.35.68.24:62849 - cid:2 - ->> [PONG]
    c1          | [1372] 2020/01/10 15:17:49.491563 [TRC] 10.35.68.24:62849 - cid:2 - <<- [SUB test.service.1 NATS-RPLY-22 1]
    c1          | [1372] 2020/01/10 15:17:49.491563 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PING]
    c1          | [1372] 2020/01/10 15:17:49.491563 [TRC] 10.35.68.24:62849 - cid:2 - ->> [PONG]
    c2          | [1372] 2020/01/10 15:17:49.636028 [DBG] 172.18.206.186:7422 - lid:1 - LeafNode Ping Timer
    c2          | [1372] 2020/01/10 15:17:49.636282 [TRC] 172.18.206.186:7422 - lid:1 - ->> [PING]
    c1          | [1372] 2020/01/10 15:17:49.636909 [TRC] 172.18.194.70:49157 - lid:1 - <<- [PING]
    c1          | [1372] 2020/01/10 15:17:49.636909 [TRC] 172.18.194.70:49157 - lid:1 - ->> [PONG]
    c2          | [1372] 2020/01/10 15:17:49.637613 [TRC] 172.18.206.186:7422 - lid:1 - <<- [PONG]
    c1          | [1372] 2020/01/10 15:17:49.732680 [DBG] 172.18.194.70:49157 - lid:1 - LeafNode Ping Timer
    c1          | [1372] 2020/01/10 15:17:49.732680 [TRC] 172.18.194.70:49157 - lid:1 - ->> [PING]
    c2          | [1372] 2020/01/10 15:17:49.717524 [TRC] 172.18.206.186:7422 - lid:1 - <<- [PING]
    c2          | [1372] 2020/01/10 15:17:49.717524 [TRC] 172.18.206.186:7422 - lid:1 - ->> [PONG]
    c1          | [1372] 2020/01/10 15:17:49.732680 [TRC] 172.18.194.70:49157 - lid:1 - <<- [PONG]
    c1          | [1372] 2020/01/10 15:17:51.714580 [DBG] 10.35.68.24:62849 - cid:2 - Client Ping Timer
    c1          | [1372] 2020/01/10 15:17:51.714580 [TRC] 10.35.68.24:62849 - cid:2 - ->> [PING]
    c1          | [1372] 2020/01/10 15:17:51.714580 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PONG]
    c1          | [1372] 2020/01/10 15:18:00.301474 [DBG] 10.35.68.24:62850 - cid:3 - Client connection created
    c1          | [1372] 2020/01/10 15:18:00.302611 [TRC] 10.35.68.24:62850 - cid:3 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"a","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Requestor","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
    c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - <<- [PING]
    c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - ->> [PONG]
    c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - <<- [SUB _INBOX.W7P0kJjrbQVrbmzAqqk6V1.*  1]
    c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - <<- [PUB test.service.1 _INBOX.W7P0kJjrbQVrbmzAqqk6V1.9cn7513D 3]
    c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - <<- MSG_PAYLOAD: ["foo"]
    c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62849 - cid:2 - ->> [PING]
    c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62849 - cid:2 - ->> [MSG test.service.1 1 _INBOX.W7P0kJjrbQVrbmzAqqk6V1.9cn7513D 3]
    c1          | [1372] 2020/01/10 15:18:00.303903 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PONG]
    c1          | [1372] 2020/01/10 15:18:00.304384 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PUB _INBOX.W7P0kJjrbQVrbmzAqqk6V1.9cn7513D 13]
    c1          | [1372] 2020/01/10 15:18:00.304384 [TRC] 10.35.68.24:62849 - cid:2 - <<- MSG_PAYLOAD: ["response text"]
    c1          | [1372] 2020/01/10 15:18:00.304384 [TRC] 10.35.68.24:62850 - cid:3 - ->> [MSG _INBOX.W7P0kJjrbQVrbmzAqqk6V1.9cn7513D 1 13]
    c1          | [1372] 2020/01/10 15:18:00.305527 [DBG] 10.35.68.24:62850 - cid:3 - Client connection closed
    c1          | [1372] 2020/01/10 15:18:00.307546 [TRC] 10.35.68.24:62850 - cid:3 - <-> [DELSUB 1]
    c1          | [1372] 2020/01/10 15:18:03.175280 [DBG] 10.35.68.24:62865 - cid:4 - Client connection created
    c1          | [1372] 2020/01/10 15:18:03.176364 [TRC] 10.35.68.24:62865 - cid:4 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"b","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Requestor","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
    c1          | [1372] 2020/01/10 15:18:03.176364 [TRC] 10.35.68.24:62865 - cid:4 - <<- [PING]
    c1          | [1372] 2020/01/10 15:18:03.176364 [TRC] 10.35.68.24:62865 - cid:4 - ->> [PONG]
    c1          | [1372] 2020/01/10 15:18:03.176364 [TRC] 10.35.68.24:62865 - cid:4 - <<- [SUB _INBOX.4ynIPqChOQMSroNEZqndLx.*  1]
    c1          | [1372] 2020/01/10 15:18:03.177312 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LS+ _INBOX.4ynIPqChOQMSroNEZqndLx.*]
    c1          | [1372] 2020/01/10 15:18:03.177312 [TRC] 10.35.68.24:62865 - cid:4 - <<- [PUB test.service.1 _INBOX.4ynIPqChOQMSroNEZqndLx.HhaycK1D 3]
    c1          | [1372] 2020/01/10 15:18:03.177312 [TRC] 10.35.68.24:62865 - cid:4 - <<- MSG_PAYLOAD: ["foo"]
    c1          | [1372] 2020/01/10 15:18:03.177312 [TRC] 10.35.68.24:62849 - cid:2 - ->> [MSG test.service.1 1 _R_.ie4QZJ.5bq99K 3]
    c2          | [1372] 2020/01/10 15:18:03.177521 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LS+ _INBOX.4ynIPqChOQMSroNEZqndLx.*]
    c1          | [1372] 2020/01/10 15:18:03.178465 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PUB _R_.ie4QZJ.5bq99K 13]
    c1          | [1372] 2020/01/10 15:18:03.178465 [TRC] 10.35.68.24:62849 - cid:2 - <<- MSG_PAYLOAD: ["response text"]
    c1          | [1372] 2020/01/10 15:18:03.178530 [TRC] 10.35.68.24:62865 - cid:4 - ->> [MSG _INBOX.4ynIPqChOQMSroNEZqndLx.HhaycK1D 1 13]
    c1          | [1372] 2020/01/10 15:18:03.179615 [DBG] 10.35.68.24:62865 - cid:4 - Client connection closed
    c1          | [1372] 2020/01/10 15:18:03.180602 [TRC] 10.35.68.24:62865 - cid:4 - <-> [DELSUB 1]
    c1          | [1372] 2020/01/10 15:18:03.180602 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LS- _INBOX.4ynIPqChOQMSroNEZqndLx.*]
    c2          | [1372] 2020/01/10 15:18:03.179385 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LS- _INBOX.4ynIPqChOQMSroNEZqndLx.*]
    c2          | [1372] 2020/01/10 15:18:03.181119 [TRC] 172.18.206.186:7422 - lid:1 - <-> [DELSUB _INBOX.4ynIPqChOQMSroNEZqndLx.*]
    c2          | [1372] 2020/01/10 15:18:05.761225 [DBG] 10.35.68.24:62866 - cid:2 - Client connection created
    c2          | [1372] 2020/01/10 15:18:05.762291 [TRC] 10.35.68.24:62866 - cid:2 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"c","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Requestor","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
    c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - <<- [PING]
    c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - ->> [PONG]
    c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - <<- [SUB _INBOX.TfzSpQyvrMigTw0TP7cMHt.*  1]
    c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 172.18.206.186:7422 - lid:1 - ->> [LS+ _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
    c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - <<- [PUB test.service.1 _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 3]
    c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - <<- MSG_PAYLOAD: ["foo"]
    c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 172.18.206.186:7422 - lid:1 - ->> [LMSG test.service.1 _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 3]
    c1          | [1372] 2020/01/10 15:18:05.763695 [TRC] 172.18.194.70:49157 - lid:1 - <<- [LS+ _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
    c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 172.18.194.70:49157 - lid:1 - <<- [LMSG test.service.1 _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 3]
    c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 172.18.194.70:49157 - lid:1 - <<- MSG_PAYLOAD: ["foo"]
    c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 10.35.68.24:62849 - cid:2 - ->> [MSG test.service.1 1 _R_.ie4QZJ.x9JGBo 3]
    c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PUB _R_.ie4QZJ.x9JGBo 13]
    c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 10.35.68.24:62849 - cid:2 - <<- MSG_PAYLOAD: ["response text"]
    c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LMSG _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 13]
    c2          | [1372] 2020/01/10 15:18:05.764649 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LMSG _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 13]
    c2          | [1372] 2020/01/10 15:18:05.764649 [TRC] 172.18.206.186:7422 - lid:1 - <<- MSG_PAYLOAD: ["response text"]
    c2          | [1372] 2020/01/10 15:18:05.765070 [TRC] 10.35.68.24:62866 - cid:2 - ->> [MSG _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 1 13]
    c2          | [1372] 2020/01/10 15:18:05.766173 [DBG] 10.35.68.24:62866 - cid:2 - Client connection closed
    c2          | [1372] 2020/01/10 15:18:05.766411 [TRC] 10.35.68.24:62866 - cid:2 - <-> [DELSUB 1]
    c2          | [1372] 2020/01/10 15:18:05.766411 [TRC] 172.18.206.186:7422 - lid:1 - ->> [LS- _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
    c1          | [1372] 2020/01/10 15:18:05.766060 [TRC] 172.18.194.70:49157 - lid:1 - <<- [LS- _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
    c1          | [1372] 2020/01/10 15:18:05.766060 [TRC] 172.18.194.70:49157 - lid:1 - <-> [DELSUB _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
    c2          | [1372] 2020/01/10 15:18:07.378670 [DBG] 10.35.68.24:62867 - cid:3 - Client connection created
    c2          | [1372] 2020/01/10 15:18:07.378670 [TRC] 10.35.68.24:62867 - cid:3 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"d","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Requestor","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
    c2          | [1372] 2020/01/10 15:18:07.378670 [TRC] 10.35.68.24:62867 - cid:3 - <<- [PING]
    c2          | [1372] 2020/01/10 15:18:07.379670 [TRC] 10.35.68.24:62867 - cid:3 - ->> [PONG]
    c2          | [1372] 2020/01/10 15:18:07.379746 [TRC] 10.35.68.24:62867 - cid:3 - <<- [SUB _INBOX.89dvNgB1mAb4aZo4PLaWJz.*  1]
    c2          | [1372] 2020/01/10 15:18:07.380243 [TRC] 10.35.68.24:62867 - cid:3 - <<- [PUB test.service.1 _INBOX.89dvNgB1mAb4aZo4PLaWJz.WyXS3UnR 3]
    c2          | [1372] 2020/01/10 15:18:07.380243 [TRC] 10.35.68.24:62867 - cid:3 - <<- MSG_PAYLOAD: ["foo"]
    c2          | [1372] 2020/01/10 15:18:07.380243 [TRC] 172.18.206.186:7422 - lid:1 - ->> [LMSG test.service.1 _R_.BBa91r.hQOCWj 3]
    c1          | [1372] 2020/01/10 15:18:07.380535 [TRC] 172.18.194.70:49157 - lid:1 - <<- [LMSG test.service.1 _R_.BBa91r.hQOCWj 3]
    c1          | [1372] 2020/01/10 15:18:07.380535 [TRC] 172.18.194.70:49157 - lid:1 - <<- MSG_PAYLOAD: ["foo"]
    c1          | [1372] 2020/01/10 15:18:07.380747 [TRC] 10.35.68.24:62849 - cid:2 - ->> [MSG test.service.1 1 _R_.ie4QZJ.tVfoKl 3]
    c1          | [1372] 2020/01/10 15:18:07.380747 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PUB _R_.ie4QZJ.tVfoKl 13]
    c1          | [1372] 2020/01/10 15:18:07.380747 [TRC] 10.35.68.24:62849 - cid:2 - <<- MSG_PAYLOAD: ["response text"]
    c2          | [1372] 2020/01/10 15:18:09.386622 [DBG] 10.35.68.24:62867 - cid:3 - Client connection closed
    c2          | [1372] 2020/01/10 15:18:09.386891 [TRC] 10.35.68.24:62867 - cid:3 - <-> [DELSUB 1]
    Gracefully stopping... (press Ctrl+C again to force)
    Stopping c2   ... done
    Stopping c1   ... done
    
    • [x] Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)

    Versions of nats-server and affected client libraries used:

    See logs. The go examples are as of commit f66f9c02346dc33296576bf0ef4bd48520bf88c9.

    OS/Container environment:

    Windows nanoserver

    Steps or code to reproduce the issue:

    docker-compose.yml

    version: "3.2"
    
    services:
     cluster1: 
       image: nats:2.1.2-nanoserver
       container_name: c1
       command: -c C:\\mount\\c1 -DV
       ports: 
         - 80:8222
         - 4244:4244
       expose:
         - "7422"
       volumes:
         - .\:C:\mount\
       networks:
         - cluster1
       restart: always
     cluster2: 
       depends_on: 
         - cluster1
       image: nats:2.1.2-nanoserver
       container_name: c2
       command: -c C:\\mount\\c2 -DV
       ports: 
         - 81:8222
         - 4245:4244
       expose:
         - "7422"
       volumes:
         - .\:C:\mount\
       networks:
         - cluster1
       restart: always
     
    networks:
     cluster1:
    

    cluster 1 config:

    port: 4244
    monitor_port: 8222
    accounts: {
      A: {
        users:[{
          user: a
          password: a
        }]
        exports: [
          {service: test.service.>}
        ]
      },
      B: {
        users:[{
           user: b
            password: b
        }]
        imports: [
          {service: {account: A, subject: test.service.1}}
        ]
      }
    }
    
    leafnodes {
      port: 7422
      authorization {
        account: B
      }
    }
    

    cluster 2 config:

    port: 4244
    monitor_port: 8222
    accounts: {
      C: {
        users:[{
          user: c
          password: c
        }]
        exports: [
          {service: test.service.>}
        ]
      },
      D: {
        users:[{
           user: d
           password: d
        }]
        imports: [
          {service: {account: C, subject: test.service.1}}
        ]
      }
    }
    leafnodes {
      remotes: [
        {
          urls: [
            nats-leaf://c1:7422
          ]
          account: C
        }
      ]
    }
    

    Starting a nats-rply: start "cluster1 Account A service" nats-rply -s nats://a:a@localhost:4244 test.service.1 "response text"

    Sending request to account D: nats-req -s nats://d:d@localhost:4245 test.service.1 foo

    Expected result:

    Request is sent from account D on cluster 2 to service listening at test.service.1 on Account A on cluster 1, and the requester gets "response text" back.

    Actual result:

    The service listening at test.service.1 gets a request of 'foo', but no message is returned to requester. Instead: "nats: timeout for request"

    opened by cjmang 41
  • Support WebSocket Connectivity

    Support WebSocket Connectivity

    Hi,

    At @gretaio, we need our signaling server to talk with web browsers, and in order to perform this, we setup a small proxy to gateway websocket to tcp so it can talk to nats.

    I saw on the todolist that you plan on adding a websocket strategy, and that's something we would greatly appreciate as that'd basically half the number of connections we need to have open :+1:.

    So, would you be open to a PR regarding this?

    idea customer requested 
    opened by pldubouilh 40
  • logging system, syslog and abstraction improvements

    logging system, syslog and abstraction improvements

    This is a WIP, this is a little roadmad and some questions I have.

    • [x] Create server.Logger interface and add server.SetLogger method
    • [x] Modify all the actual call to the new format
    • [x] Network syslog
    • [x] gnatsd options and link Network syslog
    • [x] fix this race condition on server.SetLogger()
    • [x] Tests
    • [x] Transform error message in real Errors

    Questions:

    Related to: https://github.com/apcera/gnatsd/issues/7

    opened by mcuadros 39
  • Consumers stops receiving messages

    Consumers stops receiving messages

    Defect

    Versions of nats-server and affected client libraries used:

    Nats server version

    [83] 2021/09/04 18:51:12.239432 [INF] Starting nats-server
    [83] 2021/09/04 18:51:12.239488 [INF]   Version:  2.4.0
    [83] 2021/09/04 18:51:12.239494 [INF]   Git:      [219a7c98]
    [83] 2021/09/04 18:51:12.239496 [DBG]   Go build: go1.16.7
    [83] 2021/09/04 18:51:12.239517 [INF]   Name:     NBVE7O7DMRAZ63STC7Z644KHF5HJ6QQUGLZVGDIKEG32CFL2J6O2456M
    [83] 2021/09/04 18:51:12.239533 [INF]   ID:       NBVE7O7DMRAZ63STC7Z644KHF5HJ6QQUGLZVGDIKEG32CFL2J6O2456M
    [83] 2021/09/04 18:51:12.239605 [DBG] Created system account: "$SYS"
    

    Go client version: v1.12.0

    OS/Container environment:

    GKE Kubernetes. Running nats js HA cluster. Deployed via nats helm chart.

    Steps or code to reproduce the issue:

    Stream configuration:

    apiVersion: jetstream.nats.io/v1beta1
    kind: Stream
    metadata:
      name: agent
    spec:
      name: agent
      subjects: ["data.*"]
      storage: file
      maxAge: 1h
      replicas: 3
      retention: interest
    

    There are two consumers to this stream. Each runs as queue subscriber in two services with 2 pod replicas. Note that I don't care if message is not processed, this is why ack none is set.

    
    // 2 pods for service A.
    js.QueueSubscribe(
    	"data.received",
    	"service1_queue",
    	func(msg *nats.Msg) {},
    	nats.DeliverNew(),
    	nats.AckNone(),
    )
    
    // 2 pods for service B.
    s.js.QueueSubscribe(
    	"data.received",
    	"service2_queue",
    	func(msg *nats.Msg) {},
    	nats.DeliverNew(),
    	nats.AckNone(),
    )
    

    Expected result:

    Consumer receives messages.

    Actual result:

    Stream stats after few days:

    agent                  โ”‚ File    โ”‚ 3         โ”‚ 28,258   โ”‚ 18 MiB  โ”‚ 0    โ”‚ 84      โ”‚ nats-js-0, nats-js-1*, nats-js-2
    

    Consumers stats:

    service1_queue โ”‚ Push โ”‚ None       โ”‚ 0.00s    โ”‚ 0           โ”‚ 0           โ”‚ 0           โ”‚ 60,756    โ”‚ nats-js-0, nats-js-1*, nats-js-2
    service2_queue โ”‚ Push โ”‚ None       โ”‚ 0.00s    โ”‚ 0           โ”‚ 0           โ”‚ 8,193 / 28% โ”‚ 60,843    โ”‚ nats-js-0, nats-js-1*, nats-js-2
    
    1. Non of the nats server pods contains errors indicating any problem.
    2. Unprocessed messages count for second consumer stays the same and doesn't decrease.
    3. The only fix which helped is after I changed second consumer raft leader with nats consumer cluster step-down. But after some time problem still comes back.
    4. There are active connections to the server. Checked with nats server report connections.

    /cc @kozlovic @derekcollison

    ๐Ÿž bug 
    opened by anjmao 38
  • NATS for RPC

    NATS for RPC

    Hello! I use nats as a backbone for RPC in my microservice backend and it works great (low latency, good stability, loosely-coupled services unlike grpc, for example). However, I was reinventing a wheel to pair nats and RPC concepts (you can find my solution in my busrpc-spec repo, sorry for this ads). The great work is done for JetStream, however, the nats is originally beloved for it's low latency and effectivenes and was chosen by many for faster message distribution. Do you have any plans to add some features to provide RPC features for your project?

    ๐ŸŽ‰ enhancement 
    opened by pananton 3
  • NATS storage problem

    NATS storage problem

    Hi, I am using the Interest policy of nats in kubernetes with 3 replicas, and I have the following problem: What is indicated in the stream is not what is actually stored on disk, the disk has much more capacity, I do not know if it is because the delated messages are stored, and the problem I have now is that I am filling the storage very quickly without knowing why. image

    image

    this is the storage of one of my discs, all 3 replicas are the same: image

    The nats is deployed with helm, using the following version: 0.18.1. nats-server version: 2.9.3 Each replica has its own PV with a storage of 10gb for each one. To allocate each volume we use longhorn, so the directory where it is stored is /var/lib/longhorn.

    opened by bautistamad 12
  • Allow stream meta to be synched on creation to disk

    Allow stream meta to be synched on creation to disk

    Feature Request

    As we gather more embedded use cases we are seeing device resets leave meta data filed and block file with 0 bytes.

    Use Case:

    NATS server embedded on devices.

    Proposed Change:

    Allow server to be configured to sync meta data to disk on stream creation.

    Who Benefits From The Change(s)?

    Embedded use cases.

    ๐ŸŽ‰ enhancement 
    opened by derekcollison 0
  • Allow to set max startup time on windows service

    Allow to set max startup time on windows service

    Feature Request

    Use Case:

    The windows service reports a failure to start if it's not ready to accept connections within 10 seconds.

    This value was fixed and hardcoded since before JetStream. On (some) Windows systems, this leads to service startup failures, as the store dir sorting may be hindered by an important load, or slowed down by increased accesses times, typically from security software influence.

    see the relevant line of code

    Proposed Change:

    Check for an env var to allow setting this delay to any delay depending on expectations for the current use case.

    Who Benefits From The Change(s)?

    windows users, esp. when jetstream and security software are competing for processing resources.

    Alternative Approaches

    Currently setting a NATS service as delayed start, limits occurrences of failures to start. Once the streams and consumers are big enough to cause startup failures, several manual starts usually restores a startup time under 10 seconds.

    Another way to work around the service start failure is to whitelist the server storage dir with security software. IT services are usually not happy with this.

    ๐ŸŽ‰ enhancement 
    opened by Alberic-Hardis 0
  • Setup TLS, manually specify ServerName

    Setup TLS, manually specify ServerName

    Feature Request

    Being able to setup a TLS connection where the nats-route can be an IP
    instead of the domain name that has been used to sign the certificate.

    Use Case:

    Being able to make a TLS connection on a network where we do not have control over the DNS, but do have the IP.

    Proposed Change:

    Add a configuration parameter to the tls configuration to specify the tlsConfig.ServerName instead of deriving it from the route address.

    Who Benefits From The Change(s)?

    Besides being able to manually set the IP to the server, this could also help people debugging their TLS connection.

    ๐ŸŽ‰ enhancement 
    opened by michieldwitte 3
Releases(v2.9.11)
Owner
NATS - The Cloud Native Messaging System
NATS is a simple, secure and performant communications system for digital systems, services and devices.
NATS - The Cloud Native Messaging System
Micro is a platform for cloud native development

Micro Overview Micro addresses the key requirements for building services in the cloud. It leverages the microservices architecture pattern and provid

Micro 11.5k Dec 31, 2022
Kafka implemented in Golang with built-in coordination (No ZK dep, single binary install, Cloud Native)

Jocko Kafka/distributed commit log service in Go. Goals of this project: Implement Kafka in Go Protocol compatible with Kafka so Kafka clients and ser

Travis Jeffery 4.7k Dec 28, 2022
CockroachDB - the open source, cloud-native distributed SQL database.

CockroachDB is a cloud-native distributed SQL database designed to build, scale, and manage modern, data-intensive applications. What is CockroachDB?

CockroachDB 26.3k Dec 29, 2022
A feature complete and high performance multi-group Raft library in Go.

Dragonboat - A Multi-Group Raft library in Go / ไธญๆ–‡็‰ˆ News 2021-01-20 Dragonboat v3.3 has been released, please check CHANGELOG for all changes. 2020-03

lni 4.5k Dec 30, 2022
High performance, distributed and low latency publish-subscribe platform.

Emitter: Distributed Publish-Subscribe Platform Emitter is a distributed, scalable and fault-tolerant publish-subscribe platform built with MQTT proto

emitter 3.4k Jan 2, 2023
short-url distributed and high-performance

durl ๆ˜ฏไธ€ไธชๅˆ†ๅธƒๅผ็š„้ซ˜ๆ€ง่ƒฝ็Ÿญ้“พๆœๅŠก,้€ป่พ‘็ฎ€ๅ•,ๅนถๆไพ›ไบ†็›ธๅ…ณapiๆŽฅๅฃ,ๅผ€ๅ‘ไบบๅ‘˜ๅฏไปฅๅฟซ้€ŸๆŽฅๅ…ฅ,ไนŸๅฏไปฅไฝœไธบgoๅˆๅญฆ่€…็ปƒๆ‰‹้กน็›ฎ.

ๅฎ‹ๆ˜‚ 549 Jan 2, 2023
Collection of high performance, thread-safe, lock-free go data structures

Garr - Go libs in a Jar Collection of high performance, thread-safe, lock-free go data structures. adder - Data structure to perform highly-performant

LINE 358 Dec 26, 2022
A realtime distributed messaging platform

Source: https://github.com/nsqio/nsq Issues: https://github.com/nsqio/nsq/issues Mailing List: [email protected] IRC: #nsq on freenode Docs:

NSQ 23k Dec 29, 2022
Simple, fast and scalable golang rpc library for high load

gorpc Simple, fast and scalable golang RPC library for high load and microservices. Gorpc provides the following features useful for highly loaded pro

Aliaksandr Valialkin 667 Dec 19, 2022
Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.

Dapr is a portable, serverless, event-driven runtime that makes it easy for developers to build resilient, stateless and stateful microservices that run on the cloud and edge and embraces the diversity of languages and developer frameworks.

Dapr 20.2k Jan 5, 2023
A distributed systems library for Kubernetes deployments built on top of spindle and Cloud Spanner.

hedge A library built on top of spindle and Cloud Spanner that provides rudimentary distributed computing facilities to Kubernetes deployments. Featur

null 21 Nov 9, 2022
A distributed locking library built on top of Cloud Spanner and TrueTime.

A distributed locking library built on top of Cloud Spanner and TrueTime.

null 47 Sep 13, 2022
Flowgraph package for scalable asynchronous system development

flowgraph Getting Started go get -u github.com/vectaport/flowgraph go test Links Wiki Slides from Minneapolis Golang Meetup, May 22nd 2019 Overview F

Scott Johnston 52 Dec 22, 2022
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Chris Lu 3.1k Jan 1, 2023
A distributed system for embedding-based retrieval

Overview Vearch is a scalable distributed system for efficient similarity search of deep learning vectors. Architecture Data Model space, documents, v

vector search infrastructure for AI applications 1.5k Dec 30, 2022
a dynamic configuration framework used in distributed system

go-archaius This is a light weight configuration management framework which helps to manage configurations in distributed system The main objective of

null 205 Dec 9, 2022
Verifiable credential system on Cosmos with IBC for Distributed Identities

CertX This is a project designed to demonstrate the use of IBC between different zones in the Cosmos ecosystem for privacy preserving credential manag

bwty 6 Mar 29, 2022