A Go idiomatic binding to the C++ core of PyTorch

Related tags

gotorch
Overview

GoTorch

TravisCI codecov CircleCI GoDoc

GoTorch reimplements PyTorch high-level APIs, including modules and functionals, in idiomatic Go. Thus enables deep learning programming in Go and Go+. This project is in its very early stage.

Easy Switch

Writing deep learning systems in Go is as efficiently as in Python. The DCGAN training programs in GoTorch and PyTorch call similar APIs, have similar program structure, and have a similar number of lines. Go+ has a syntax similar to Python. The Go+ compiler translates Go+ programs into Go source programs. It is a joy to write Go+ programs that calls Go packages like GoTorch.

We have a plan of a translator that migrates existing PyTorch models in Python into GoTorch.

Benefits

  1. Higher runtime efficiency. Go programs run as efficiently as C++.

  2. Training and prediction in the same language. No longer training in Python and online prediction in C++. All in Go/Go+. No TensorFlow graphs or PyTorch tracing.

  3. Same data processing code for training and prediction. No need to Wrap OpenCV functions into TensorFlow operators in C++ for prediction and Python for training.

  4. Supports many machine learning paradigms., including adversarial, reinforcement, and imitation learning -- those we cannot split into training and prediction.

  5. Same program for edge and cloud. GoTorch programs compile and run on phones and self-driving cars as they do on servers and desktops.

The Tech Stack

GoTorch works with the following open-source communities to form Go+Torch.

  • the Go+ community,
  • the PyTorch community, and
  • the TensorFlow XLA ecosystem.

The following figure reveals the stack of technologies.

Go+ applications   # users write DL applications in Go+,
     │             # whose syntax is as concise as Python
 [Go+ compiler]
     ↓
Go source code -→ GoTorch -→ libtorch -→ pytorch/xla -→ XLA ops
     │
 [Go compiler]
     ↓
executable binary  # x86_64, ARM, CUDA, TPU
                   # Linux, macOS, Android, iOS

Documentation

Issues
  • Memory leak while training Resnet50 model

    Memory leak while training Resnet50 model

    After 120 iterations, memory usage was increased to 78Gi.

    image

    bug 
    opened by Yancey1989 7
  • ExampleTrainMNIST takes forever on Raspbian 10 32bit

    ExampleTrainMNIST takes forever on Raspbian 10 32bit

    After https://github.com/wangkuiyi/gotorch/pull/98, GoTorch builds and runs on Raspbian 10, but the test ExampleTrainMNIST takes forever.

    opened by wangkuiyi 7
  • Why The Monad Pattern Looks Promising in Go+Torch Design

    Why The Monad Pattern Looks Promising in Go+Torch Design

    Monad is a programming pattern that records the output of each function call in a data structure, so we can free them at once afterward. It applies to many programming languages. Let us see why it is important to Go+Torch.

    Go uses the pattern extensively, see https://www.innoq.com/en/blog/golang-errors-monads/ for an example.

    Case Study 1: Free Tensors

    We now allocate Tensor objects using new to keep the reference count in the shared_ptr field of the C++ Tensor class: https://github.com/wangkuiyi/gotorch/blob/4ade9aa9ce84ae2532df260dd910c2b7bcf1e47a/cgotorch/cgotorch.cc#L16 The Tensor objects newed would cause memory leak if we don't recycle them.

    Assume that Go has a similar frontend API as C++, then according to the C++ MNIST example, let's think about the following problems.

    Problem 1: Destruct Tensors Created In The Train Loop To Avoid Memory Leak

    1. Tensors Created In the C++ train loop: The train loop in mnist.cpp is like:

      for (auto& batch : data_loader) {
          auto data = batch.data.to(device), targets = batch.target.to(device);  // `data` and `targets` are `Tensor`s
          optimizer.zero_grad();
          auto output = model.forward(data);  // `output` is a `Tensor`
          auto loss = torch::nll_loss(output, targets);  // `loss` is a `Tensor`
          AT_ASSERT(!std::isnan(loss.template item<float>()));
          loss.backward();
          optimizer.step();
          //...
      }
      

      We can see these Tensors has to be created:

      1. data and targets as the features and labels of the dataset
      2. output as the predictions of the data
      3. loss

      We can use defer to destruct Tensors in the train loop

      Because data, targets, output, and loss are all stack variables, they are created and destroyed in each iteration of the C++ train loop. This implied the libtorch framework would take ownership of the Tensor s if necessary. As a result, a naive API of gotorch can use defer to recycle the reference-counted Tensors. That is, the following imaginary code would work okay.

      // We need this nested function to make `defer` works as expected.
      func step(batch *Batch) {
          // `data`, `targets`, `output`, `loss` are `Tensor`s.
          data := batch.Data.To(device)
          defer data.Close()
          target := batch.Target.To(device)  
          defer target.Close()
          optimizer.zero_grad()
          output := model.Forward(data)
          defer output.Close()
          loss = torch.NllLoss(output, targets)
          defer loss.Close()
          loss.Backward()
          optimizer.Step()
          // ...
      }
      for batch := range data_loader {
          step(batch)
      }
      

      The defers are a bit tedious, maybe we can improve the syntax of Go+ to save typing.

    2. Tensors Created In the C++ forward method The forward method is called by the train loop above, in the C++ mnist example, the forward method looks like:

      torch::Tensor forward(torch::Tensor x) {
          x = torch::relu(torch::max_pool2d(conv1->forward(x), 2));
          x = torch::relu(
              torch::max_pool2d(conv2_drop->forward(conv2->forward(x)), 2));
          x = x.view({-1, 320});
          x = torch::relu(fc1->forward(x));
          x = torch::dropout(x, /*p=*/0.5, /*training=*/is_training());
          x = fc2->forward(x);
          return torch::log_softmax(x, /*dim=*/1);
        }
      

      We can use defer to destruct Tensors in the Forward function (in a tricky way)

      Similar to the train loop above, x is a Tensor on the stack and is destroyed at the end of the function scope. The difference is that x is reassigned multiple times. So we cannot simply use defer x.Close() here. A workaround is requiring users to use a different idiom, for a naive example:

      func (net *Net) Forward(x torch.Tensor) torch.Tensor {  // The argument x is recycled in the train loop
          var tensors []Tensor
          defer func () {
              for t := range tensors {
                  t.Close()
              }
          }()
          x = torch.Relu(torch.MaxPool2d(net.conv1.Forward(x), 2))
          append(tensors, x)
          x = torch.Relu(
              torch.MaxPool2d(net.conv2_drop.Forward(net.conv2.Forward(x)), 2))
          append(tensors, x)
          x = x.View([]int{-1, 320})
          append(tensors, x)
          x = torch.Relu(net.fc1.Forward(x))
          append(tensors, x)
          x = torch.Dropout(x, /*p=*/0.5, /*training=*/is_training())
          append(tensors, x)
          x = net.fc2.Forward(x)
          append(tensors, x)
          return torch.LogSoftmax(x, /*dim=*/1)  // The return value is recycled in the train loop
        }
      

      Obviously, this is not very elegant.

      Should we bookkeeping the Tensors in C++?

      A better way is keeping the tensors array in C++ rather than in Go, for example, we can use std::vector to record each C++ Tensor created by Go API, and provide a torch.CleanTensors for users to call at the end of the train loop. However, this solution is harder to design properly, for example, we have to take goroutines into consideration so as to avoid corrupting the std::vector.

    Case Study 2: Record Errors

    Few functions in libtorch have the noexcept tag. This implies that most of the functions in C++ may throw an exception. We have to expose an error return type for these functions' wrappers in Go. Recall the step function above:

    func step(batch *Batch) {
        // `data`, `targets`, `output`, `loss` are `Tensor`s.
        data := batch.Data.To(device)
        defer data.Close()
        // ...
    }
    

    It may become the following in production code:

    func step(batch *Batch) error {
        // `data`, `targets`, `output`, `loss` are `Tensor`s.
        data, err := batch.Data.To(device)
        if err != nil {
            return ...
        }
        defer data.Close()
        // ...
    }
    

    That is, the user should check whether there's an error on each line. This may be tedious too. Go+ has a neat syntax to unwrap errors, but I cannot think of an elegant way to solve the problem for the time being. See previous discussions also: https://github.com/goplus/gop/issues/307#issuecomment-663396846, https://github.com/goplus/gop/issues/307#issuecomment-663942929

    opened by shendiaomo 6
  • torch.nn.Module in Go

    torch.nn.Module in Go

    PyTorch APi has a key concept -- torch.nn.Module. Many builtin and user-defined models are classes derived from torch.nn.Module. The only method to override is forward(x).

    Usually, a torch.nn.Module-derived class has data members representing the model parameters. For example, nn.Linear, the PyTorch implementation of the fully-connected layer has W and B -- the weights and the bias respectively.

    In Go/Go+, the concept corresponds to a base class in Python is an interface. So, we provide type Module interface to mimic torch.nn.Module.

    Then, we need a solution to free up tensors when a model's life is over.

    opened by wangkuiyi 6
  • Revert

    Revert "Fix memory leak caused by Cgo thread spawning"

    Reverts wangkuiyi/gotorch#319

    opened by QiJune 5
  • Add test_ctr_test.go

    Add test_ctr_test.go

    opened by wangkuiyi 4
  • Compare different frontend language training on MNIST dataset

    Compare different frontend language training on MNIST dataset

    Just like wring a program to print "Hello World" is our first cause on coding, training a model to implement handwriting recognition on the MNIST database is usually the first course on Deep Learning.

    This issue tried to compare various frond-end language on how to train the model with C++, Go, Python, and Go+Torch.

    C++ Go
    #include <torch/torch.h>
    
    #include <cstddef>
    #include <cstdio>
    #include <iostream>
    #include <string>
    #include <vector>
    
    struct Net: torch::nn::Module {
      Net()
          : conv1(torch::nn::Conv2dOptions(1, 10, /*kernel_size=*/5)),
            conv2(torch::nn::Conv2dOptions(10, 20, /*kernel_size=*/5)),
            dropout1(0.25),
            dropout2(0.5),
            fc1(320, 50),
            fc2(50, 10) {
        register_module("conv1", conv1);
        register_module("conv2", conv2);
        register_module("dropout1", dropout1);
        register_module("dropout2", dropout2);
        register_module("fc1", fc1);
        register_module("fc2", fc2);
      }
    
      torch::Tensor forward(torch::Tensor x) {
        auto x = conv1->forward(x);
        x = torch::relu(x);
        x = conv2->forward(x);
        x = torch::relu(x);
        x = torch::max_pool2d(x, 2);
        x = dropout1(x);
        x = torch::flatten(x, 1);
        x = fc1(x);
        x = torch::relu(x);
        x = dropout2(x);
        auto output = fc2(x);
        return torch::log_softmax(x, 1);
      }
    
      torch::nn::Conv2d conv1;
      torch::nn::Conv2d conv2;
      torch::nn::Dropout dropout1;
      torch::nn::Dropout dropout2;
      torch::nn::Linear fc1;
      torch::nn::Linear fc2;
    };
    
    auto main() -> int {
      Net model;
      model.train();
      auto sgd = torch::optim::SGD(
          model.parameters(), torch::optim::SGDOptions(0.01).momentum(0.5));
      sgd.zero_grad();
      auto data = torch::rand({2, 3, 224, 224});
      auto target = torch::randint(1, 10, {2, });
      auto output = model.forward(data);
      auto loss = torch::nll_loss(output, target);
      loss.backward();
      sgd.step();
      std::printf("Loss: %.6f", loss.template item<float>());
    }
    
    package main
    import (
    	torch "github.com/wangkuiyi/gotorch"
    )
    
    type Net struct {
    	torch.Module
    	conv1 torch.Conv2d
    	conv2 torch.Conv2d
    	dropout1 torch.Dropout1
    	dropout2 torch.Dropout2
    	fc1 torch.Linear
    	fc2 torch.Linear
    }
    
    func NewNet() {
    	n := &Net{
    		torch.Model{},
    		conv1: &torch.Conv2d(1, 10, 5),
    		conv2: &torch.Conv2d(10, 20, 5),
    		dropout1: &torch.Dropout1(0.25)
    		dropout2: &torch.Dropout2(0.5)
    		fc1: &torch.Linear(9216, 128),
    		fc2: &torch.Linear(128, 10),
    	}
    	n.registerModule()
    	return m
    }
    
    func (n Net) registerModule() {
    	n.RegisterModule("conv1", n.conv1)
    	n.RegisterModule("conv2", n.conv2)
    	n.RegisterModule("dropout1", n.dropout1)
    	n.RegisterModule("dropout2", n.dropout2)
    	n.RegisterModule("fc1", n.fc1)
    	n.RegisterModule("fc2", n.fc2)
    }
    
    func (n Net) Forward(x torch.Tensor) torch.Tensor {
    	x := n.conv1.Forward(x)
    	x = torch.Relu(x)
    	x = n.conv2.Forward(x)
    	x = torch.Relu(x)
    	x = torch.MaxPool2d(x, 2)
    	x = n.dropout1(x)
    	x = torch.Flatten(x, 1)
    	x = n.fc1(x)
    	x = torch.Relu(x)
    	x = n.dropout2(x)
    	x = n.fc2(x)
    	output := torch.LogSoftMax(x, 1)
    	return output 
    }
    
    func main() {
    	model := NewNet()
    	model.Train()
    	sgd := torch.NewSGD(n.Parameters(), 0.01, 0.5)
    	sgd.ZeroGrad()
    	data := torch.Rand({2, 1, 28, 28})
    	target := torch.RandInt({1,10, {2, }})
    	output := n.Forward(data)
    	loss := torch.NllLoss(output, target)
    	loss.Backward()
    	sgd.Step()
    	fmt.Println("Loss:")
    }
    
    Python Go+Torch
    from __future__ import print_function
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    import torch.optim as optim
    
    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.conv1 = nn.Conv2d(1, 32, 3, 1)
            self.conv2 = nn.Conv2d(32, 64, 3, 1)
            self.dropout1 = nn.Dropout2d(0.25)
            self.dropout2 = nn.Dropout2d(0.5)
            self.fc1 = nn.Linear(9216, 128)
            self.fc2 = nn.Linear(128, 10)
    
        def forward(self, x):
            x = self.conv1(x)
            x = F.relu(x)
            x = self.conv2(x)
            x = F.relu(x)
            x = F.max_pool2d(x, 2)
            x = self.dropout1(x)
            x = torch.flatten(x, 1)
            x = self.fc1(x)
            x = F.relu(x)
            x = self.dropout2(x)
            x = self.fc2(x)
            output = F.log_softmax(x, dim=1)
            return output
    
    
    model = Net()
    model.train()
    optimizer = optim.Adadelta(model.parameters(), lr=0.1)
    data = torch.rand((2, 1, 28, 28))
    target = torch.randint(1, 10, (2,))
    output = model(data)
    loss = F.nll_loss(output, target)
    loss.backward()
    optimizer.step()
    print("Loss: {:.6f}".format(loss.item()))
    
    package main
    import (
    	torch "github.com/wangkuiyi/gotorch"
    )
    
    type Net struct {
    	torch.Module
    	conv1 torch.Conv2d
    	conv2 torch.Conv2d
    	dropout1 torch.Dropout1
    	dropout2 torch.Dropout2
    	fc1 torch.Linear
    	fc2 torch.Linear
    }
    
    func NewNet() {
    	n := &Net {
    		torch.Model{},
    		conv1: &torch.Conv2d(1, 10, 5),
    		conv2: &torch.Conv2d(10, 20, 5),
    		dropout1: &torch.Dropout1(0.25)
    		dropout2: &torch.Dropout2(0.5)
    		fc1: &torch.Linear(9216, 128),
    		fc2: &torch.Linear(128, 10),
    	}
    	return n
    }
    
    func (n Net) Forward(x torch.Tensor) torch.Tensor {
    	x := n.conv1.Forward(x)
    	x = torch.Relu(x)
    	x = n.conv2.Forward(x)
    	x = torch.Relu(x)
    	x = torch.MaxPool2d(x, 2)
    	x = n.dropout1(x)
    	x = torch.Flatten(x, 1)
    	x = n.fc1(x)
    	x = torch.Relu(x)
    	x = n.dropout2(x)
    	x = n.fc2(x)
    	output := torch.LogSoftMax(x, 1)
    	return output 
    }
    
    model := Net()
    model.Train()
    sgd := torch.NewSGD(m.Parameters(), 0.01, 0.5)
    sgd.ZeroGrad()
    data := torch.Rand({2, 1, 28, 28})
    target := torch.RandInt({1,10, {2, }})
    output := m.Forward(data)
    loss := torch.NllLoss(output, target)
    loss.Backward()
    sgd.Step()
    println("Loss: %0.6f", loss.Item())
    
    opened by Yancey1989 4
  • Test Go GC on Tensors

    Test Go GC on Tensors

    This example program calls runtime.SetFinalizer with a torch.Tenosr to set a finalized that calls Tensor.Close() and prints a message "Closed Tensor".

    opened by wangkuiyi 4
  • Decoding jpg diff between Go image library and Python PIL library

    Decoding jpg diff between Go image library and Python PIL library

    I use the ToTensor transform to read the same image in GoTorch and PyTorch:

    The last three Tensor value in GoTorch:

    0.0788  0.0936  0.0936
    

    In PyTorch:

    0.0784, 0.0941, 0.0941
    

    There is a little diff.

    opened by QiJune 3
  • CircleCI runs mandatory Linux test, Travis CI runs optional macOS

    CircleCI runs mandatory Linux test, Travis CI runs optional macOS

    • CircleCI runs Linux tests, pre-commit checks, and codecov reporting. Required to pass before merging.
    • Travis CI runs macOS tests. No pre-commit checks, no codecov reporting. It takes forever to install clang-format in Travis CI macOS VM image as it upgrades too many Homebrew packages. It is NOT required to pass Travis CI before merging.
    opened by wangkuiyi 3
  • gotorch can load pytorch models?

    gotorch can load pytorch models?

    Hi, this is awsome project ! but I have a question: gotorch can load the pytorch model, I want use gotorch only in predict

    opened by shwanliu 2
  • add image-recordio-gen cmd and RecordIO reader

    add image-recordio-gen cmd and RecordIO reader

    The image-recordio-gen command converts an image folder with label txt to recordio file format.

    Let's take mnist dataset as an example. We could download the dataset from https://github.com/myleott/mnist_png.git.

    The dataset contains two directories: training and testing. We need to make a label file. The label file maps a class string to an int index. Following is the label file for mnist dataset.

    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    

    Then, we could run the image-recordio-gen command:

    $GOPATH/bin/image-recordio-gen -label=$MNIST/label.txt -dataset=$MNIST/training -output=$MNIST/train_record -recordsPerShard=1500
    

    We could find the recordio shard files in train_record directory:

    data-00000
    data-00001
    ...
    ...
    
    opened by QiJune 1
  • add launch utility

    add launch utility

    This PR depends on #375

    First, install gotorch

    go install ./...
    

    Then, use launch tool to run 2 processes in a single node:

    $GOPATH/bin/launch -nprocPerNode=2 -masterAddr=127.0.0.1 -masterPort=11111 -trainingCmd="$GOPATH/bin/allreduce"
    

    It will run the allreduce example. Then, 0.log and 1.log will be created.

    Check the 0.log:

    cat 0.log
    2020/11/02 17:54:47  2  4
     6  8
    [ CPUFloatType{2,2} ]
    

    You will find the value is allreduced correctly.

    opened by QiJune 0
  • Add AllReduce distributed strategy design

    Add AllReduce distributed strategy design

    Here is for beter review

    opened by QiJune 1
  • Support data parallelism with a GPU cluster

    Support data parallelism with a GPU cluster

    Data Parallelism

    Data parallelism replicates the model on every device to generates gradients independently and then communicates those gradients at each iteration to keep model replicas consistent.

    Following is a survey for support data parallelism in GoTorch.

    Solutions

    NCCL and Gloo

    NCCL provides Broadcast and AllReduce C APIs, we could wrapper them in Go, and use them directly in GoTorch.

    Gloo is another collective communications library, which supports both CPU and GPU.

    The GPU performance of NCCL is better than Gloo.

    PyTorch Distributed Package

    It does more optimizations, including bucketing small gradients into a big tensor, overlapping communication and computation.

    The idea of gradient bucketing is motivated by the observation that collective communications are more efficient on large tensors.

    DDP registers one autograd hook for each gradient accumulator. The hook fires after its corresponding accumulator updating the gradients, and will inspect the bucket it pertains. If hooks of all gradients in the same buckets have fired, the last hook will trigger an asynchronous AllReduce on that bucket.

    Please refer to this paper for more details.

    Horovod

    Horovod is a distributed deep learning training framework for TensorFlow, Keras, and PyTorch. Horovod calls NCCL or Gloo underneath.

    Horovod also does many optimizations for communication. It uses the hook mechanism of PyTorch to overlapping communication and computation.

    Horovod also supports elastic training.

    The biggest difference when moving from normal distributed training to elastic training is the need to track and synchronize among the workers as workers are added or removed from the job.

    The elastic training depends on the Gloo library. So, the GPU performance may suffer a little.

    An interesting observation: People who want to run TensorFlow with AllReduce distributed strategy will choose Horovod, whereas people who want to run PyTorch with AllReduce distributed strategy will choose torch.DistributedDataParallel directly.

    Summary

    So, let's make a summary:

    | Solution | Performance | Effort | |--------------- | --------- |-------| | NCCL/Gloo | + | expose Broadcast/AllReduce C APIs to Go | | PyTorch | ++ | reimplement PyTorch distributed Python package in Go, and expose the C++ part to Go | | Horovod | ++ | reimplement Horovod Python package in Go, and expose the C++ part to Go |

    Note 1

    Key points to improve the performance:

    • bucketing small gradients
    • using the hook mechanism to launch Allreduce kernel asynchronously

    Note 2

    Both Horovod and PyTorch support Gloo backend, so we could support elastic training later if we choose either solution.

    opened by QiJune 3
  • WIP: Support basic data parallel

    WIP: Support basic data parallel

    opened by shendiaomo 1
  • ImageLoader shuffle samples at the begging of every each epoch

    ImageLoader shuffle samples at the begging of every each epoch

    We randomly shuffle data at the beginning of every epoch.

    c.f. https://github.com/KaimingHe/deep-residual-networks#disclaimer-and-known-issues

    enhancement 
    opened by Yancey1989 0
  • WIP Add build many_threads.cc into Makefile

    WIP Add build many_threads.cc into Makefile

    I made this change that contains the sample program in https://github.com/wangkuiyi/gotorch/issues/331. So it is easier to reproduce of the too-many-thread problem.

    Please

    1. check out this branch,
    2. run cgotorch/build.sh on a Linux box or a Docker container to generate a C++ binary cgotorch/many_threads, and
    3. run cd cgotorch; ./many_threads to reproduce the problem of creating many threads.
    opened by wangkuiyi 1
  • Use the Homebrew version of libtorch on macOS

    Use the Homebrew version of libtorch on macOS

    The current official version of libtorch only works on a single thread, no matter the settings: image As we can see, the above resnet process has 23 threads, with only one of them activated. The reason seems to be that the official version of libtorch doesn't link with libmkl.dylib:

    $ otool -L macos/libtorch/lib/libtorch_cpu.dylib
    macos/libtorch/lib/libtorch_cpu.dylib:
    	@rpath/libtorch_cpu.dylib (compatibility version 0.0.0, current version 0.0.0)
    	@rpath/libtensorpipe.dylib (compatibility version 0.0.0, current version 0.0.0)
    	@rpath/libiomp5.dylib (compatibility version 5.0.0, current version 5.0.0)
    	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.50.4)
    	@rpath/libc10.dylib (compatibility version 0.0.0, current version 0.0.0)
    	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 400.9.0)
    

    As a result, resnet with the official libtorch on macOS has only a throughput of about 1.55 samples/sec.

    In contrast, the Homebrew version of libtorch works well with multi-thread (the Homebrew version links the Apple Accelarate Framework rather than libmkl.dylib):

    otool -L /usr/local/Cellar/libtorch/1.6.0_1/lib/libtorch_cpu.dylib
    /usr/local/Cellar/libtorch/1.6.0_1/lib/libtorch_cpu.dylib:
    	/usr/local/opt/libtorch/lib/libtorch_cpu.dylib (compatibility version 0.0.0, current version 0.0.0)
    	/usr/local/opt/libomp/lib/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)
    	/usr/local/opt/protobuf/lib/libprotobuf.24.dylib (compatibility version 25.0.0, current version 25.0.0)
    	/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)
    	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.100.1)
    	@rpath/libc10.dylib (compatibility version 0.0.0, current version 0.0.0)
    	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 902.1.0)
    

    The Homebrew libtorch starts about 50 threads and has 6-10 of them running: image

    As a result, resnet on macOS with the homebrew libtorch has a throughput of about 2.21~3.01 samples/sec, about 40%~100% faster than the official version.

    opened by shendiaomo 0
Owner
Yi Wang
Yi Wang
Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

EGNN - Pytorch Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch. May be eventually used for Alphafold2 replication.

Phil Wang 126 Jul 11, 2021
Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Gorgonia 4.1k Jul 27, 2021
Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Gorgonia 4.1k Jul 19, 2021
A recommender system service based on collaborative filtering written in Go

Language: English | 中文 gorse: Go Recommender System Engine Build Coverage Report GoDoc RTD Demo gorse is an offline recommender system backend based o

Zhenghao Zhang 3.6k Jul 27, 2021
A Kubernetes Native Batch System (Project under CNCF)

Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workloa

Volcano 1.8k Jul 27, 2021
Deploy, manage, and scale machine learning models in production

Deploy, manage, and scale machine learning models in production. Cortex is a cloud native model serving platform for machine learning engineering teams.

Cortex Labs 7.6k Jul 21, 2021
Standard machine learning models

Cog: Standard machine learning models Define your models in a standard format, store them in a central place, run them anywhere. Standard interface fo

Replicate 70 Jul 27, 2021
Go binding for TensorFlow Lite

go-tflite Go binding for TensorFlow Lite Usage model := tflite.NewModelFromFile("sin_model.tflite") if model == nil { log.Fatal("cannot load model")

mattn 215 Jul 22, 2021
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

English ∙ 日本語 ∙ 简体中文 ∙ 繁體中文 | العَرَبِيَّة‎ ∙ বাংলা ∙ Português do Brasil ∙ Deutsch ∙ ελληνικά ∙ עברית ∙ Italiano ∙ 한국어 ∙ فارسی ∙ Polski ∙ русский язы

Donne Martin 139.2k Jul 23, 2021
An open source embedding vector similarity search engine powered by Faiss, NMSLIB and Annoy

Click to take a quick look at our demos! Image search Chatbots Chemical structure search Milvus is an open-source vector database built to power AI ap

The Milvus Project 7.3k Jul 26, 2021
Probability distributions and associated methods in Go

godist godist provides some Go implementations of useful continuous and discrete probability distributions, as well as some handy methods for working

Edd Robinson 28 May 3, 2021
Go Training Class Material :

Go Training Review our different courses and material To learn about Corporate training events, options and special pricing please contact: William Ke

Ardan Labs 9.4k Jul 18, 2021
Ensembles of decision trees in go/golang.

CloudForest Google Group Fast, flexible, multi-threaded ensembles of decision trees for machine learning in pure Go (golang). CloudForest allows for a

Ryan Bressler 687 Jul 17, 2021
Tensorflow + Go, the gopher way

tfgo: TensorFlow in Go tfgo: TensorFlow in Go Dependencies Installation Getting started Computer Vision using data flow graph Train in Python, Serve i

Paolo Galeone 1.8k Jul 26, 2021