A reimplementation of AlphaGo in Go (specifically AlphaZero)

Overview

agogo

A reimplementation of AlphaGo in Go (specifically AlphaZero)

About

The algorithm is composed of:

  • a Monte-Carlo Tree Search (MCTS) implemented in the mcts package;
  • a Dual Neural Network (DNN) implemented in the dualnet package.

The algorithm is wrapped into a top-level structure (AZ for AlphaZero). The algorithm applies to any game able to fulfill a specified contract.

The contract specifies the description of a game state.

In this package, the contract is a Go interface declared in the game package: State.

Description of some concepts/ubiquitous language

  • In the agogo package, each player of the game is an Agent, and in a game, two Agents are playing in an Arena

  • The game package is loosely coupled with the AlphaZero algorithm and describes a game's behavior (and not what a game is). The behavior is expressed as a set of functions to operate on a State of the game. A State is an interface that represents the current game state as well as the allowed interactions. The interaction is made by an object Player who is operating a PlayerMove. The implementer's responsibility is to code the game's rules by creating an object that fulfills the State contract and implements the allowed moves.

Training process

Applying the Algo on a game

This package is designed to be extensible. Therefore you can train AlphaZero on any board game respecting the contract of the game package. Then, the model can be saved and used as a player.

The steps to train the algorithm are:

  • Creating a structure that is fulfilling the State interface (aka a game).
  • Creating a configuration for your AZ internal MCTS and NN.
  • Creating an AZ structure based on the game and the configuration
  • Executing the learning process (by calling the Learn method)
  • Saving the trained model (by calling the Save method)

The steps to play against the algorithm are:

  • Creating an AZ object
  • Loading the trained model (by calling the Read method)
  • Switching the agent to inference mode via the SwitchToInference method
  • Get the AI move by calling the Search method and applying the move to the game manually

Examples

Four board games are implemented so far. Each of them is defined as a subpackage of game:

tic-tac-toe

Tic-tac-toe is a m,n,k game where m=n=k=3.

Training

Here is a sample code that trains AlphaGo to play the game. The result is saved in a file example.model

// encodeBoard is a GameEncoder (https://pkg.go.dev/github.com/gorgonia/agogo#GameEncoder) for the tic-tac-toe
func encodeBoard(a game.State) []float32 {
     board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
     for i := range board {
     if board[i] == 0 {
          board[i] = 0.001
     }
     }
     playerLayer := make([]float32, len(a.Board()))
     next := a.ToMove()
     if next == game.Player(game.Black) {
     for i := range playerLayer {
          playerLayer[i] = 1
     }
     } else if next == game.Player(game.White) {
     // vecf32.Scale(board, -1)
     for i := range playerLayer {
          playerLayer[i] = -1
     }
     }
     retVal := append(board, playerLayer...)
     return retVal
}

func main() {
    // Create the configuration of the neural network
     conf := agogo.Config{
         Name:            "Tic Tac Toe",
         NNConf:          dual.DefaultConf(3, 3, 10),
         MCTSConf:        mcts.DefaultConfig(3),
         UpdateThreshold: 0.52,
     }
     conf.NNConf.BatchSize = 100
     conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well)
     conf.NNConf.K = 3
     conf.NNConf.SharedLayers = 3
     conf.MCTSConf = mcts.Config{
         PUCT:           1.0,
         M:              3,
         N:              3,
         Timeout:        100 * time.Millisecond,
         PassPreference: mcts.DontPreferPass,
         Budget:         1000,
         DumbPass:       true,
         RandomCount:    0,
     }

     conf.Encoder = encodeBoard

    // Create a new game
    g := mnk.TicTacToe()
    // Create the AlphaZero structure 
    a := agogo.New(g, conf)
    // Launch the learning process
    a.Learn(5, 30, 200, 30) // 5 epochs, 50 episode, 100 NN iters, 100 games.
    // Save the model
     a.Save("example.model")
}

Inference

func encodeBoard(a game.State) []float32 {
    board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
    for i := range board {
        if board[i] == 0 {
            board[i] = 0.001
        }
    }
    playerLayer := make([]float32, len(a.Board()))
    next := a.ToMove()
    if next == game.Player(game.Black) {
        for i := range playerLayer {
            playerLayer[i] = 1
        }
    } else if next == game.Player(game.White) {
        // vecf32.Scale(board, -1)
        for i := range playerLayer {
            playerLayer[i] = -1
        }
    }
    retVal := append(board, playerLayer...)
    return retVal
}

func main() {
    conf := agogo.Config{
        Name:     "Tic Tac Toe",
        NNConf:   dual.DefaultConf(3, 3, 10),
        MCTSConf: mcts.DefaultConfig(3),
    }
    conf.Encoder = encodeBoard

    g := mnk.TicTacToe()
    a := agogo.New(g, conf)
    a.Load("example.model")
    a.A.Player = mnk.Cross
    a.B.Player = mnk.Nought
    a.B.SwitchToInference(g)
    a.A.SwitchToInference(g)
    // Put x int the center
    stateAfterFirstPlay := g.Apply(game.PlayerMove{
        Player: mnk.Cross,
        Single: 4,
    })
    fmt.Println(stateAfterFirstPlay)
    // ⎢ · · · ⎥
    // ⎢ · X · ⎥
    // ⎢ · · · ⎥

    // What to do next
    move := a.B.Search(stateAfterFirstPlay)
    fmt.Println(move)
    // 1
    g.Apply(game.PlayerMove{
        Player: mnk.Nought,
        Single: move,
    })
    fmt.Println(stateAfterFirstPlay)
    // ⎢ · O · ⎥
    // ⎢ · X · ⎥
    // ⎢ · · · ⎥
}
Comments
  • Cannot train tic-tac-toe with more than 14 episodes

    Cannot train tic-tac-toe with more than 14 episodes

    This is a strange bug. I am using this code:

    func encodeBoard(a game.State) []float32 {
    	board := EncodeTwoPlayerBoard(a.Board(), nil)
    	for i := range board {
    		if board[i] == 0 {
    			board[i] = 0.001
    		}
    	}
    	playerLayer := make([]float32, len(a.Board()))
    	next := a.ToMove()
    	if next == game.Player(game.Black) {
    		for i := range playerLayer {
    			playerLayer[i] = 1
    		}
    	} else if next == game.Player(game.White) {
    		// vecf32.Scale(board, -1)
    		for i := range playerLayer {
    			playerLayer[i] = -1
    		}
    	}
    	retVal := append(board, playerLayer...)
    	return retVal
    }
    
    func TestAZ(t *testing.T) {
    	conf := Config{
    		Name:            "Tic Tac Toe",
    		NNConf:          dual.DefaultConf(3, 3, 10),
    		MCTSConf:        mcts.DefaultConfig(3),
    		UpdateThreshold: 0.52,
    	}
    	conf.NNConf.BatchSize = 100
    	conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well)
    	conf.NNConf.K = 3
    	conf.NNConf.SharedLayers = 3
    	conf.MCTSConf = mcts.Config{
    		PUCT:           1.0,
    		M:              3,
    		N:              3,
    		Timeout:        100 * time.Millisecond,
    		PassPreference: mcts.DontPreferPass,
    		Budget:         1000,
    		DumbPass:       true,
    		RandomCount:    0,
    	}
    
    	conf.Encoder = encodeBoard
    
    	g := mnk.TicTacToe()
    	a := New(g, conf)
    
    	//err := a.Learn(1, 20, 100, 100)
    	err := a.Learn(1, 14, 100, 100)
    	if err != nil {
    		t.Fatal(err)
    	}
    }
    

    with err := a.Learn(1, 14, 100, 100), the test pass, but with err := a.Learn(1, 15, 100, 100), the test fails with this error:

    2021/01/18 09:24:40 Self Play for epoch 0. Player A 0xc000368850, Player B 0xc0003688c0
    2021/01/18 09:24:40 Using Dummy
    2021/01/18 09:24:40 Set up selfplay: Switch To inference for A. A.NN 0xc0003409c0 (*dual.Dual)
    2021/01/18 09:24:40 Set up selfplay: Switch To inference for B. B.NN 0xc000340a90 (*dual.Dual)
    2021/01/18 09:24:40     Episode 0
    2021/01/18 09:24:40     Episode 1
    2021/01/18 09:24:41     Episode 2
    2021/01/18 09:24:41     Episode 3
    2021/01/18 09:24:42     Episode 4
    2021/01/18 09:24:43     Episode 5
    2021/01/18 09:24:44     Episode 6
    2021/01/18 09:24:45     Episode 7
    2021/01/18 09:24:45     Episode 8
    2021/01/18 09:24:46     Episode 9
    2021/01/18 09:24:47     Episode 10
    2021/01/18 09:24:48     Episode 11
    2021/01/18 09:24:48     Episode 12
    2021/01/18 09:24:49     Episode 13
    2021/01/18 09:24:50     Episode 14
        agogo_test.go:69: Train fail: PC: 246: PC 246. Failed to execute instruction Aᵀ{0, 2, 3, 1} [CPU144]        CPU144  false   true    false: Failed to carry op.Do(): Dimension mismatch. Expected 2, got 4
    
    bug 
    opened by owulveryck 4
  • Train fail: shuffle batch failed - matX: Not yet implemented: native matrix for colmajor or unpacked matrices

    Train fail: shuffle batch failed - matX: Not yet implemented: native matrix for colmajor or unpacked matrices

    I am trying to run the simple example of tic-tac-toe as is:

    package agogo
    
    import (
    	"log"
    	"time"
    
    	dual "github.com/gorgonia/agogo/dualnet"
    	"github.com/gorgonia/agogo/encoding/mjpeg"
    	"github.com/gorgonia/agogo/game"
    	"github.com/gorgonia/agogo/game/mnk"
    	"github.com/gorgonia/agogo/mcts"
    
    	_ "net/http/pprof"
    )
    
    func encodeBoard(a game.State) []float32 {
    	board := EncodeTwoPlayerBoard(a.Board(), nil)
    	for i := range board {
    		if board[i] == 0 {
    			board[i] = 0.001
    		}
    	}
    	playerLayer := make([]float32, len(a.Board()))
    	next := a.ToMove()
    	if next == game.Player(game.Black) {
    		for i := range playerLayer {
    			playerLayer[i] = 1
    		}
    	} else if next == game.Player(game.White) {
    		// vecf32.Scale(board, -1)
    		for i := range playerLayer {
    			playerLayer[i] = -1
    		}
    	}
    	retVal := append(board, playerLayer...)
    	return retVal
    }
    
    func ExampleAZ() {
    	conf := Config{
    		Name:            "Tic Tac Toe",
    		NNConf:          dual.DefaultConf(3, 3, 10),
    		MCTSConf:        mcts.DefaultConfig(3),
    		UpdateThreshold: 0.52,
    	}
    	conf.NNConf.BatchSize = 100
    	conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well)
    	conf.NNConf.K = 3
    	conf.NNConf.SharedLayers = 3
    	conf.MCTSConf = mcts.Config{
    		PUCT:           1.0,
    		M:              3,
    		N:              3,
    		Timeout:        100 * time.Millisecond,
    		PassPreference: mcts.DontPreferPass,
    		Budget:         1000,
    		DumbPass:       true,
    		RandomCount:    0,
    	}
    
    	conf.Encoder = encodeBoard
    	outEnc := mjpeg.NewEncoder(300, 300)
    	conf.OutputEncoder = outEnc
    
    	g := mnk.TicTacToe()
    	a := New(g, conf)
    
    	err := a.Learn(1, 1, 10, 1) // 5 epochs, 50 episode, 100 NN iters, 100 games.
    	if err != nil {
    		log.Fatal(err)
    	}
    	// output:
    }
    

    Running the test fails with this error.

    ❯ go test -run=^Example
    2021/01/16 17:22:18 Self Play for epoch 0. Player A 0xc00043e070, Player B 0xc00043e2a0
    2021/01/16 17:22:18 Using Dummy
    2021/01/16 17:22:18 Set up selfplay: Switch To inference for A. A.NN 0xc0000c9380 (*dual.Dual)
    2021/01/16 17:22:18 Set up selfplay: Switch To inference for B. B.NN 0xc0000c9450 (*dual.Dual)
    2021/01/16 17:22:18     Episode 0
    2021/01/16 17:22:19 Train fail: shuffle batch failed - matX: Not yet implemented: native matrix for colmajor or unpacked matrices
    exit status 1
    FAIL    github.com/gorgonia/agogo       1.229s
    

    This error is triggered from:

    https://github.com/gorgonia/agogo/blob/05cf5f11bbd24fd158d06b11c0b887aaa7e07f7c/dualnet/meta.go#L71-L73

    It looks like the tensor library is faulty here.

    I will investigate. Meanwhile, any hint welcome.

    Meanwhile, disabling the shuffleBatch method in the dualnet works.

    bug 
    opened by owulveryck 4
  • fix(workaround): Creating a new machine per iteration

    fix(workaround): Creating a new machine per iteration

    This PR fixes #4 by introducing a workaround.

    The problem is related to the Reset method that may suffer from a bug. Actually, two consecutive operations are not idempotent.

    This fixes the problem by introducing a workaround: creating a Machine for each interaction instead of relying on a unique one. It obviously has an impact on the performances because due to more pressure on the GC.

    This should be changed back once the bug is located and fixed.

    opened by owulveryck 1
  • Documentation/tictactoe

    Documentation/tictactoe

    If you have a couple of minutes @chewxy, I'd like a quick review of the README to check if all the writing makes sense.

    It is not crystal clear why both players need to SwitchToInference yet, but it seems to work.

    Besides that, this PR :

    • adds comments for the godoc
    • moves the gtp and online packages into internal
    opened by owulveryck 1
  • fix: return an error if batches is null

    fix: return an error if batches is null

    The prepareExamples method of AZ returns a number of batches. If this number is null, the Train function returns a self-explanatory error and does not trigger the training of the NN.

    This should fix #3

    opened by owulveryck 0
  • Dualnet/tapemachine

    Dualnet/tapemachine

    This PR fixes #1 and #2.

    The problem is related to the Reset method that may suffer from a bug. Actually, two consecutive operations are not idempotent.

    This fixes the problem by introducing a workaround: creating a Machine for each interaction instead of relying on a unique one. It obviously has an impact on the performances because due to more pressure on the GC.

    This should be changed back once the bug is located and fixed.

    opened by owulveryck 0
  • Can't run tic-tac-toc

    Can't run tic-tac-toc

    When I try to run cmd/tictactoe/main.go, I get a panic:

    go: downloading github.com/golang/freetype v0.0.0-20170609003504-e2365dfdc4a0 go: downloading golang.org/x/image v0.0.0-20201208152932-35266b937fa6 go: downloading gorgonia.org/gorgonia v0.9.17-0.20210124090702-531c6df2c434 go: downloading gorgonia.org/tensor v0.9.18 go: downloading github.com/chewxy/math32 v1.0.6 go: downloading gorgonia.org/vecf32 v0.9.0 go: downloading github.com/awalterschulze/gographviz v2.0.3+incompatible go: downloading github.com/apache/arrow/go/arrow v0.0.0-20210105145422-88aaea5262db go: downloading github.com/chewxy/hm v1.0.0 go: downloading go4.org/unsafe/assume-no-moving-gc v0.0.0-20201222180813-1025295fd063 go: downloading github.com/google/flatbuffers v1.12.0 go: downloading gonum.org/v1/gonum v0.8.2 go: downloading gorgonia.org/vecf64 v0.9.0 go: downloading github.com/leesper/go_rng v0.0.0-20190531154944-a612b043e353 go: downloading github.com/xtgo/set v1.0.0 go: downloading gorgonia.org/dawson v1.2.0 go: downloading github.com/gogo/protobuf v1.3.1 go: downloading github.com/golang/protobuf v1.4.3 go: downloading golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 go: downloading google.golang.org/protobuf v1.25.0 panic: Something in this program imports go4.org/unsafe/assume-no-moving-gc to declare that it assumes a non-moving garbage collector, but your version of go4.org/unsafe/assume-no-moving-gc hasn't been updated to assert that it's safe against the go1.18 runtime. If you want to risk it, run with environment variable ASSUME_NO_MOVING_GC_UNSAFE_RISK_IT_WITH=go1.18 set. Notably, if go1.18 adds a moving garbage collector, this program is unsafe to use.

    goroutine 1 [running]: go4.org/unsafe/assume-no-moving-gc.init.0() /home/haze/go/pkg/mod/go4.org/unsafe/[email protected]/untested.go:24 +0x1f4 exit status 2

    opened by StepHaze 0
  • Wrong model architecture in residual network?

    Wrong model architecture in residual network?

    I wonder is there any misconfiguration in model architecture. Specifically this function: https://github.com/gorgonia/agogo/blob/master/dualnet/ermahagerdmonards.go#L67 Because based from my understanding, from the paper (link) page 8/18 it said:

    Each residual block applies the following modules sequentially to its input:
    (1) A convolution of 256 filters of kernel size 3 × 3 with stride 1
    (2) Batch normalization
    (3) A rectifier nonlinearity
    (4) A convolution of 256 filters of kernel size 3 × 3 with stride 1
    (5) Batch normalization
    (6) A skip connection that adds the input to the block
    (7) A rectifier nonlinearity
    

    Point 6 means that the add operation should be from input to the block and each module should be in sequence. I wonder is this a correct implementation:

    func (m *maebe) share(input *G.Node, filterCount, layer int) (*G.Node, batchNormOp, batchNormOp) {
    	layer1, l1Op := m.res(input, filterCount, fmt.Sprintf("Layer1 of Shared Layer %d", layer))
    	layer2, l2Op := m.res(layer1, filterCount, fmt.Sprintf("Layer2 of Shared Layer %d", layer))
    	added := m.do(func() (*G.Node, error) { return G.Add(input, layer2) })
    	retVal := m.rectify(added)
    	return retVal, l1Op, l2Op
    }
    
    opened by Elvenson 5
  • How would the configuration for training agogo for go look like?

    How would the configuration for training agogo for go look like?

    I can find the configuration for titactoe within the repository, but not for the game of go. Is there some example on how to train it?

    Also is there some kind of documentation for the different configuration options?

    Thanks for the library!

    opened by sharpner 0
Releases(v0.1.1)
  • v0.1.1(Jan 18, 2021)

    This release adds some documentation (readme and godoc).

    On top of that, two packages have been moved to internal

    • gtp (go text protocol)
    • online
    Source code(tar.gz)
    Source code(zip)
Owner
Gorgonia
Gorgonia
Neko is a cross-platform open-source animated cursor-chasing cat. This is the reimplementation write in Go.

Neko Neko is a cat that chases the mouse cursor across the screen, an app written in the late 1980s and ported for many platforms. This code is a re-i

Cesar Gimenes 49 Nov 21, 2022
Reimplementation of some of the HashCat features in GO.

HashKitty Reimplementation of some of the HashCat (https://github.com/hashcat/hashcat) features in GO. What works Attack modes: 0 - wordlist attack 9

Nick Yakovliev 4 Jul 1, 2022
A reimplementation of the TinyGo drivers package for communicating with multiples of the same (supported) devices on one individual I2C bus.

tinygo-multi-i2c A reimplementation of the TinyGo drivers package for communicating with multiples of the same (supported) devices on one individual I

Quinn Millican 3 Mar 10, 2022
scrapligo -- is a Go library focused on connecting to devices, specifically network devices (routers/switches/firewalls/etc.) via SSH and NETCONF.

scrapligo -- scrap(e c)li (but in go!) -- is a Go library focused on connecting to devices, specifically network devices (routers/switches/firewalls/etc.) via SSH and NETCONF.

null 163 Jan 4, 2023
TFTP and HTTP server specifically designed to serve iPXE ROMs and scripts.

pixie TFTP and HTTP server specifically designed to serve iPXE ROMs and scripts. pixie comes embedded with the following ROMs provided by the iPXE pro

Adrian L Lange 18 Dec 31, 2022
A boiler-plate like base for people to get started in creating automation software specifically for purchasing items on websites.

Bot-Base Bot-Base is a small project with concepts for most elements of a bot. Feel free to contact me on Twitter with any questions. Contributing Pul

Edwin J 76 Dec 27, 2022
REST-API specifically build to support online store system of Zahir

Rest Test. • From Above ERD please create Rest full API. Create register API(Include Generate password). • Acceptance o Phone number and email is uniq

Sandi Permana Soebagio 0 Nov 15, 2021
C4udit - Static analyzer for solidity contracts based on regexs specifically crafted for Code4Rena contests

c4udit Introduction c4udit is a static analyzer for solidity contracts based on

byterocket 142 Jan 9, 2023
Capdns is a network capture utility designed specifically for DNS traffic. This utility is based on tcpdump.

Capdns is a network capture utility designed specifically for DNS traffic. This utility is based on tcpdump. Some of its features include: Unde

Infvie Envoy 10 Feb 26, 2022
sqlc implements a Dynamic Query Builder for SQLC and more specifically MySQL queries.

sqlc-go-builder sqlc implements a Dynamic Query Builder for SQLC and more specifically MySQL queries. It implements a parser using vitess-go-sqlparser

ProjectDiscovery 6 May 9, 2023