Graceful process restarts in Go

Overview

Graceful process restarts in Go

It is sometimes useful to update the running code and / or configuration of a network service, without disrupting existing connections. Usually, this is achieved by starting a new process, somehow transferring clients to it and then exiting the old process.

There are many ways to implement graceful upgrades. They vary wildly in the trade-offs they make, and how much control they afford the user. This library has the following goals:

  • No old code keeps running after a successful upgrade
  • The new process has a grace period for performing initialisation
  • Crashing during initialisation is OK
  • Only a single upgrade is ever run in parallel

tableflip works on Linux and macOS.

Using the library

upg, _ := tableflip.New(tableflip.Options{})
defer upg.Stop()

go func() {
	sig := make(chan os.Signal, 1)
	signal.Notify(sig, syscall.SIGHUP)
	for range sig {
		upg.Upgrade()
	}
}()

// Listen must be called before Ready
ln, _ := upg.Listen("tcp", "localhost:8080")
defer ln.Close()

go http.Serve(ln, nil)

if err := upg.Ready(); err != nil {
	panic(err)
}

<-upg.Exit()

Please see the more elaborate graceful shutdown with net/http example.

Integration with systemd

[Unit]
Description=Service using tableflip

[Service]
ExecStart=/path/to/binary -some-flag /path/to/pid-file
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/path/to/pid-file

See the documentation as well.

The logs of a process using tableflip may go missing due to a bug in journald. You can work around this by logging directly to journald, for example by using go-systemd/journal and looking for the $JOURNAL_STREAM environment variable.

Issues
  • Way to share all parent connections

    Way to share all parent connections

    Hello everyone! Is there a way to share all the active connections between processes? I see only one way, use Fds.Conn() method, but i must know addr in parent process for this. It may be worth adding methods that will return all parent connections and listeners? It would be amazing.

    opened by bm0 12
  • Upgrade shutting down early for http2 connections but not http1.1

    Upgrade shutting down early for http2 connections but not http1.1

    I'm running a web process on FreeBSD. Using curl to connect over http2 when I call SIGHUP on the process it shuts down and drops any connections without waiting for them to complete.

    If I do the same but use the curl flag '--http1.1' the running connections complete before shutdown is called.

    Any idea why http2 would not wait for the connections to complete? While http1.1 connections would?

    Thank you,

    question 
    opened by chrispassas 10
  • Don't panic on parent connectivity problems

    Don't panic on parent connectivity problems

    After receiving the file name mapping from the parent process, the connectivity is read until EOF in a separate routine. Errors there should not lead to a panic as they might even occur after a successful handover, at least in theory.

    I'd recommend a proper handshake with forward compatibility in mind.

    opened by pascaldekloe 10
  • Don't crash on windows

    Don't crash on windows

    As discussed in #44, #40, and #21, it is currently impossible to build code on Windows that references tableflip, even if that code never actually runs tableflip.

    This PR addresses that issue in two ways:

    1. Allow the tableflip code to compile without errors on Windows. Code in fds.go that uses syscall is separated into conditionally-compiled files dup_fd.go and dup_fd_windows.go. This follows the existing pattern in dup_file.go and dup_file_legacy.go. Similarly, code in env.go which uses syscall is separated into conditionally-compiled files env_syscalls.go and env_windows.go. It is important to note that the code will not work even though it compiled successfully. That is why the next part is needed.
    2. Allow Windows users to test/run code that was built using tableflip with minimal changes. Upgrader is modified to return a tableflip.ErrNotSupported error when running on Windows. Additionally, a new package github.com/cloudflare/tableflip/testing is provided that contains a simple stub implementation of the Upgrader and Fds classes. As shown in the example provided therein, the user can define an interface with the required methods, then check the result of tableflip.New to see if errors.Is(err, tableflip.ErrNotSupported. If that is the case, the user can create an instance of the stub and use that instead.
    opened by kohenkatz 8
  • Keep foreground control with signal propagation after successful upgrade

    Keep foreground control with signal propagation after successful upgrade

    First off: I'm loving tableflip, I use it to livereload a dev server for my homegrown static site generator server https://github.com/jschaf/b2/blob/master/cmd/server/server.go#L121.

    This is more of question than an issue. If this isn't a good spot for it, I'm good with closing it.

    After a successful tableflip.Upgrade, I'd like the new process to be the foreground process in a terminal or shell. The reason I want this is so I can forward SIGINT via ctrl-c in the terminal. Using systemd is a bit heavyweight for my simple local development use-case.

    What happens now

    1. Start server in terminal with go run ./cmd/server.
    2. Server runs in foreground and output goes to stdout and stderr.
    3. Upgrade server which starts a new server process with a different PID in the background.
    4. Terminal shows prompt since foreground process exited.
    5. New server still writes output to terminal.
    6. Old server exits, leaving the new process parentless, so new process reparents to the PID 1 (systemd --user in my case).

    What I'd like to happen

    In step 3, the new server should keep running in the foreground and continue writing stdout and stderr of the new process.

    I'm not quite sure how to go about this. Would something like the following work?

    • After successful tableflip.Upgrade
    • Keep old process alive and forward signals to all child processes in the process group (PGID).

    Alternately, maybe get the new PID from the PID file and do some exec magic?

    enhancement 
    opened by jschaf 5
  • Pidfile: Use work directory if no path given

    Pidfile: Use work directory if no path given

    Hello

    When specifying just a filename e.g. app.pid as PIDFile option (expecting it to appear in the current working directory) - the previous code created a tmp file in /tmp and then tried to os.Rename it back to the current work directory - This results in invalid cross-device link errors from os.Rename if /tmp is on another mount point

    This is a proposal to fix the issue, by using the current work directory for the tmp file if no path was given :)

    opened by fasmide 5
  • loop reload failed

    loop reload failed

    loop always miss first args

    1 start: procname -c xxx.conf

    2 reload: kill -HUP parent pid

    3 look proc: ps aux|grep ${procname} or lsof -p parent pid

    child args is -c xxx.conf it's wrong

    4 continue reload: kill -HUP child pid

    reload failed

    opened by flyaways 4
  • sometimes got error like

    sometimes got error like "listen tcp xxx: bind: address already in use" when Upgrade

    I think the key of using tableflip is to replace net.Listen with upg.Fds.Listen, so that when Upgrade the listening socket will be inherited by child.

    But I've got errors like below when Upgrade after the application has run for quite long in production environment.

    {"level":"error","msg":"ListenAndServe err can't create new listener: listen tcp 0.0.0.0:8902: bind: address already in use","time":"2019-05-31T23:05:14+08:00"}

    It seems that the parent has opened a listener on port 8902 but the child doesn't inherit that listener. Any possible reason?

    opened by zhiqiangxu 4
  • add function to return all inherited files

    add function to return all inherited files

    This is helpful in the use case of integrating with systemd socket activation.

    When there is no parent, the app need to scan or use actication.Files() to find all the fds and call upgrader.AddFile() to track them. But when there is a parent, with the help of this new function, we can find the fds from upgrader instance rather then by scanning fds or calling actication.Files() both of which creates a set of os.File representing the underlying fds.

    Without this change, there would be two set of os.File objects that represent the same fd set. One set of os.File is hold by the upgrader while the another set returning from fd scanning (or actication.Files()) is being used at other places. This raise a risk of closing the same fd twice at different time which is really bad when the fd number get reused by other files during the two close calls.

    opened by hunts 3
  • Add ListenConfig as Option to Upgrader

    Add ListenConfig as Option to Upgrader

    This change allows the option of passing in a custom listen config to control the listener. In some situations, fine grained control may be necessary for instance setting SO_REUSEPORT setting.

    opened by john-cai 3
  • Allow checking if Upgrader has a parent

    Allow checking if Upgrader has a parent

    There are situations in which is useful to detect the first invocation, i.e. you may want to cleanup dangling unix sockets, but not during an upgrade.

    Closes #23

    opened by nolith 3
  • upg.Exit() do not  effective in go routine ?

    upg.Exit() do not effective in go routine ?

    	go func(upg *tableflip.Upgrader) {
    		for {
    			select {
    			case <-upg.Exit():
    				fmt.Println("Exit111111111111111111111111111")
    				break
    			}
    		}
    	}(upg)
    

    in this case , upg.Exit() not triggered ?

    complete example

    package main
    
    import (
    	"fmt"
    	"log"
    	"net/http"
    	"os"
    	"os/signal"
    	"syscall"
    	"time"
    
    	"github.com/cloudflare/tableflip"
    )
    
    // 當前程序的版本
    const version = "v0.0.1"
    
    func main() {
    	upg, err := tableflip.New(tableflip.Options{})
    	if err != nil {
    		panic(err)
    	}
    	defer upg.Stop()
    
    	// 爲了演示方便,爲程序啓動強行加入 1s 的延時,並在日誌中附上進程 pid
    	time.Sleep(time.Second)
    	log.SetPrefix(fmt.Sprintf("[PID: %d] ", os.Getpid()))
    
    	// 監聽系統的 SIGHUP 信號,以此信號觸發進程重啓
    	go func() {
    		sig := make(chan os.Signal, 1)
    		signal.Notify(sig, syscall.SIGHUP)
    		for range sig {
    			// 核心的 Upgrade 調用
    			err := upg.Upgrade()
    			if err != nil {
    				log.Println("Upgrade failed:", err)
    			}
    		}
    	}()
    
    	// 注意必須使用 upg.Listen 對端口進行監聽
    	ln, err := upg.Listen("tcp", ":8080")
    	if err != nil {
    		log.Fatalln("Can't listen:", err)
    	}
    
    	// 創建一個簡單的 http server,/version 返回當前的程序版本
    	mux := http.NewServeMux()
    	mux.HandleFunc("/version", func(rw http.ResponseWriter, r *http.Request) {
    		log.Println(version)
    		rw.Write([]byte(version + "\n"))
    	})
    	server := http.Server{
    		Handler: mux,
    	}
    
    	// 照常啓動 http server
    	go func() {
    		err := server.Serve(ln)
    		if err != http.ErrServerClosed {
    			log.Println("HTTP server:", err)
    		}
    	}()
    
    	if err := upg.Ready(); err != nil {
    		panic(err)
    	}
    
    	go func(upg *tableflip.Upgrader) {
    		for {
    			select {
    			case <-upg.Exit():
    				fmt.Println("Exit111111111111111111111111111")
    				break
    			}
    		}
    	}(upg)
    
    	time.Sleep(10 * time.Hour)
    
    	//<-upg.Exit()
    
    }
    
    
    opened by xiaobinqt 0
  • Request for updating README.MD

    Request for updating README.MD

    Before anything, I'd like to thank you for this amazing repo.

    I just want to say it's good to mention this fact (in readme.md) that reloading a systemd service will not update service environment vars.

    Having such a systemd unit file:

    [Unit]
    Description=Service using tableflip
    
    [Service]
    EnvironmentFile=/path/to/config-file
    ExecStart=/path/to/binary -some-flag /path/to/pid-file
    ExecReload=/bin/kill -HUP $MAINPID
    PIDFile=/path/to/pid-file
    

    By updating the config-file content and executing systemctl reload service, the reloaded service will not get new/updated environment vars. The service should read its configs/envs itself.

    opened by Palvaneh 0
Releases(v1.2.3)
  • v1.2.3(Mar 30, 2022)

    What's Changed

    • Fix flaky TestFilesAreNonblocking on CI by @lmb in https://github.com/cloudflare/tableflip/pull/62
    • Add function to return all inherited files by @hunts in https://github.com/cloudflare/tableflip/pull/68
    • Fix gofmt issues by @jdesgats in https://github.com/cloudflare/tableflip/pull/69
    • Update notes about the journald bug by @hunts in https://github.com/cloudflare/tableflip/pull/70

    Full Changelog: https://github.com/cloudflare/tableflip/compare/v1.2.2...v1.2.3

    Source code(tar.gz)
    Source code(zip)
  • v1.2.2(Jan 25, 2021)

    The previous release introduces a bug when starting a child process: the argument vector (aka argv) is populated incorrectly. The result is that processes that use argv will likely fail and processes that don't use argv will fail to upgrade.

    Source code(tar.gz)
    Source code(zip)
  • v1.2.1(Jan 25, 2021)

    The Go runtime has annoying behaviour around setting and clearing O_NONBLOCK: exec.Cmd.Start() ends up calling os.File.Fd() for any file in exec.Cmd.ExtraFiles. os.File.Fd() disables both the use of the runtime poller for the file and clears O_NONBLOCK from the underlying open file descriptor.

    This can lead to goroutines hanging in a parent process, after at least one failed upgrade. The bug manifests by goroutines which rely on either a deadline or interruption via Close() to be unblocked being stuck in read or accept like syscalls,. As far as I can tell we've not experienced this problem in production, so it's most likely quite rare.

    Source code(tar.gz)
    Source code(zip)
  • v1.2.0(May 1, 2020)

    Add built-in support for passing net.PacketConn around. This was possible using Fds.AddConn, but the new PacketConn support will also do correct unlinking of Unix domain sockets if necessary.

    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Apr 28, 2020)

    So far, developers working on a code base that uses tableflip on a Windows machine had a poor experience. Editing and building the program didn't work. Thanks to @kohenkatz the library compiles on Windows without errors, and tableflip.New returns an error on Windows. Developers can use this to shim out the upgrader using the new testing subpackage.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jul 16, 2019)

Owner
Cloudflare
Cloudflare
Zero-downtime restarts in Go

goagain Zero-downtime restarts in Go The goagain package provides primitives for bringing zero-downtime restarts to Go applications that accept connec

Richard Crowley 2k Aug 5, 2022
A TCP Server Framework with graceful shutdown, custom protocol.

xtcp A TCP Server Framework with graceful shutdown,custom protocol. Usage Define your protocol format: Before create server and client, you need defin

xfx 134 Jul 1, 2022
Graceful exit for golang project.

graceful-exit Graceful exit by capturing program exit signals.Suitable for k8s pod logout、docker container stop、program exit and etc. Installation Run

Afeyer 1 Dec 1, 2021
The graceful package is a simple library to shutdown application gracefully.

بِسْمِ اللّٰهِ الرَّحْمٰنِ الرَّحِيْمِ السَّلاَمُ عَلَيْكُمْ وَرَحْمَةُ اللهِ وَبَرَكَاتُهُ ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ ٱلْحَمْدُ لِلَّهِ رَ

null 0 Dec 27, 2021
High-performance PHP application server, load-balancer and process manager written in Golang

RoadRunner is an open-source (MIT licensed) high-performance PHP application server, load balancer, and process manager. It supports running as a serv

Spiral Scout 6.6k Aug 2, 2022
YoMo 43 Jun 20, 2022
High-performance PHP application server, load-balancer and process manager written in Golang

RoadRunner is an open-source (MIT licensed) high-performance PHP application server, load balancer, and process manager. It supports running as a serv

Spiral Scout 6.1k Dec 9, 2021
Furui - A process-based communication control system for containers

furui Communication control of the container runtime environment(now only docker

masibw 17 Mar 26, 2022
Builds and restarts a Go project when it crashes or some watched file changes

gaper Used to build and restart a Go project when it crashes or some watched file changes Aimed to be used in development only. Changelog See Releases

Max Claus Nunes 55 Jun 21, 2022
Zero downtime restarts for go servers (Drop in replacement for http.ListenAndServe)

endless Zero downtime restarts for golang HTTP and HTTPS servers. (for golang 1.3+) Inspiration & Credits Well... it's what you want right - no need t

Florian von Bock 3.6k Aug 9, 2022
Zero-downtime restarts in Go

goagain Zero-downtime restarts in Go The goagain package provides primitives for bringing zero-downtime restarts to Go applications that accept connec

Richard Crowley 2k Aug 5, 2022
A demo project that automatically restarts with a trio of docker, redis and go and transmits page visits.

A demo project that automatically restarts with a trio of docker, redis and go and transmits page visits.

Sami Salih İbrahimbaş 0 Feb 6, 2022
A TCP Server Framework with graceful shutdown, custom protocol.

xtcp A TCP Server Framework with graceful shutdown,custom protocol. Usage Define your protocol format: Before create server and client, you need defin

xfx 134 Jul 1, 2022
Opinionated Go starter with gin for REST API, logrus for logging, viper for config with added graceful shutdown

go-gin-starter An opinionated starter for Go Backend projects using: gin-gonic/gin as the REST framework logrus for logging viper for configs Docker f

Udaya Prakash 65 Jun 17, 2022
Pod Graceful Drain

You don't need lifecycle: { preStop: { exec: { command: ["sleep", "30"] } } }

SeongChan Lee 161 Jul 26, 2022
graceful is a resource termination library to smoothly clean up resources on term signals

graceful graceful is a resource termination library to smoothly clean up resources on term signals. example package main

Sharon L 4 Aug 26, 2021
Graceful exit for golang project.

graceful-exit Graceful exit by capturing program exit signals.Suitable for k8s pod logout、docker container stop、program exit and etc. Installation Run

Afeyer 1 Dec 1, 2021
The graceful package is a simple library to shutdown application gracefully.

بِسْمِ اللّٰهِ الرَّحْمٰنِ الرَّحِيْمِ السَّلاَمُ عَلَيْكُمْ وَرَحْمَةُ اللهِ وَبَرَكَاتُهُ ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ ٱلْحَمْدُ لِلَّهِ رَ

null 0 Dec 27, 2021
Graceful - shutdown package when a service is turned off by software function

graceful Graceful shutdown package when a service is turned off by software func

Bo-Yi Wu 35 Jun 4, 2022
Graceful shutdown with repeating "cron" jobs (running at a regular interval) in Go

Graceful shutdown with repeating "cron" jobs (running at a regular interval) in Go Illustrates how to implement the following in Go: run functions ("j

Valentin Padurean (Ogg) 1 May 30, 2022
Gowl is a process management and process monitoring tool at once. An infinite worker pool gives you the ability to control the pool and processes and monitor their status.

Gowl is a process management and process monitoring tool at once. An infinite worker pool gives you the ability to control the pool and processes and monitor their status.

Hamed Yousefi 27 Jul 24, 2022
Search running process for a given dll/function. Exposes a bufio.Scanner-like interface for walking a process' PEB

Search running process for a given dll/function. Exposes a bufio.Scanner-like interface for walking a process' PEB

Alex Flores 2 Apr 21, 2022
Process audio files with pipelined DSP framework

phono is a command for audio processing. It's build on top of pipelined DSP framework. Installation Prerequisites: lame to enable mp3 encoding To link

pipelined 50 Jul 28, 2022
A flexible process data collection, metrics, monitoring, instrumentation, and tracing client library for Go

Package monkit is a flexible code instrumenting and data collection library. See documentation at https://godoc.org/gopkg.in/spacemonkeygo/monkit.v3 S

Space Monkey Go 465 Aug 1, 2022
High-performance PHP application server, load-balancer and process manager written in Golang

[RR2-BETA] RoadRunner is an open-source (MIT licensed) high-performance PHP application server, load balancer, and process manager. It supports runnin

Spiral Scout 6.6k Aug 3, 2022
Process manager for Procfile-based applications

Hivemind Hivemind is a process manager for Procfile-based applications. At the moment, it supports Linux, FreeBSD, and macOS. Procfile is a simple for

Sergey Alexandrovich 792 Jul 28, 2022
Process manager for Procfile-based applications and tmux

Overmind Overmind is a process manager for Procfile-based applications and tmux. With Overmind, you can easily run several processes from your Procfil

Sergey Alexandrovich 1.9k Aug 5, 2022
Demo of process injection, using Nt, direct syscall, etc.

?? Frog For Automatic Scan ?? Doge For Defense Evasion&Offensive Security ?? Doge-Process-Injection Demo of process injection, using Nt, direct syscal

TimWhite 24 Apr 11, 2022
Reload Go code in a running process at function/method level granularity

got reload? Function/method-level stateful hot reloading for Go! Status Very much work in progress.

null 35 Apr 12, 2022