A Go package for engineering organisms.



PkgGoDev GitHub license Tests Test Coverage

Poly is a Go package for engineering organisms.

  • Fast: Poly is fast and scalable.

  • Modern: Poly tackles issues that other libraries and utilities just don't. From general codon optimization and primer design to circular sequence hashing. All written in a language that was designed to be fast, scalable, and easy to develop in and maintain. Did we say it was fast?

  • Reproducible: Poly is well tested and designed to be used in industrial, academic, and hobbyist settings. No more copy and pasting strings into random websites to process the data you need.

  • Ambitious: Poly's goal is to be the most complete, open, and well used collection of computational synthetic biology tools ever assembled. If you like our dream and want to support us please star this repo, request a feature, open a pull request, or sponsor the project.



  • Discord: Chat about Poly and join us for game nights on our discord server!


  • Code of conduct: Please read the full text so you can understand what we're all about and remember to be excellent to each other!

  • Contributor's guide: Please read through it before you start hacking away and pushing contributions to this fine codebase.


  • Sponsor: 🤘 Thanks for your support 🤘


  • MIT

  • Copyright (c) 2021 Timothy Stiles

  • All iupac variants

    All iupac variants

    This is for #92, a port of all_iupac_variants from easy_dna.

    IUPAC codes are stored as a map of rune slices. The cartesian products are then generated using a retooled version of the function found in github.com/schwarmco/go-cartesian-product, which uses recursion and go routines, so it should scale quite nicely to large sequences.

    opened by tijeco 11
  • Brew installation does not work for m1 Macbook pro

    Brew installation does not work for m1 Macbook pro

    Describe the bug

    Installation on MacOS via Homebrew does not work. This might be due to m1 vs intel CPUs but I do not own both so can not confirm.

    To Reproduce

    ➜  Downloads brew install timothystiles/poly/poly
    ==> Tapping timothystiles/poly
    Cloning into '/opt/homebrew/Library/Taps/timothystiles/homebrew-poly'...
    remote: Enumerating objects: 80, done.
    remote: Counting objects: 100% (80/80), done.
    remote: Compressing objects: 100% (40/40), done.
    remote: Total 80 (delta 20), reused 0 (delta 0), pack-reused 0
    Receiving objects: 100% (80/80), 9.30 KiB | 2.33 MiB/s, done.
    Resolving deltas: 100% (20/20), done.
    Error: Invalid formula: /opt/homebrew/Library/Taps/timothystiles/homebrew-poly/Formula/poly.rb
    formulae require at least a URL
    Error: Cannot tap timothystiles/poly: invalid syntax in tap!

    Expected behavior

    Successful installation.



    Desktop (please complete the following information):

    • MacOS 11.4
    • M1 Macbook Pro

    Additional Context

    Homebrew list does not seem to contain this project: https://formulae.brew.sh/formula/

    opened by nii236 10
  • Enables Dev Container #117

    Enables Dev Container #117

    Added vscode dev container settings. Should install the go package a couple of other vscode plugins. Currently, the dev container image uses a Ubuntu Focal Container and installs golang using the latest apt package golang.

    This dev container also installs the following plugins:

    1. golang.go
    2. ms-vsliveshare.vsliveshare-pack
    3. redhat.vscode-yaml
    4. yzhang.markdown-all-in-one
    5. premparihar.gotestexplorer
    6. github.vscode-pull-request-github
    7. streetsidesoftware.code-spell-checker
    8. eamodio.gitlens
    opened by rkrishnasanka 10
  • Add cloning function

    Add cloning function

    The following branch "goldengatev1" has a functioning GoldenGate assembly script for BbsI. It works, but we should try to break it before integrating it, as well as debate the merits of my approach. It only operates on raw sequences as well - we should likely have an algorithm that works with Genbank files.

    If we decide to do something else, we should close this pull request. This function, however, does work.

    opened by Koeng101 10
  • Primer flow

    Primer flow

    Primers are a common need for labs, and will be required for more complex protocols, like Gibson assemblies.

    The most basic application of a primer design will be a simple amplification.


    primers.txt would be a tab separated value file which looks like (similar to what Snapgene provides):


    While pUC19_amplified.gb would simply be a genbank file with only the primers + amplified sequence.

    There are a few different flags that would be useful for poly amplify -

    1. --primer_for "pUC19_for" --primer_rev "pUC19_rev" for naming the output primers in primers.txt
    2. --amplicon "ATGC" amplifies the particular string
    3. --range 0,0 amplifies a particular range of sequence
    4. --no_amplify pGLO.gb prevents primers from amplifying a naughty sequence from a different file
    5. --validate 10:20 --size 100:150 --coverage 100 --overlap 10:40 amplifies a particular range of sequence with a size within the size limitations of the --size flag, the coverage limitations of the --coverage tag, and the overlap limitations of the --overlap tag.

    These 4 functions generally fulfill all needs of a biologist. The first few cover the use cases of your average cloner - they simply want to clone out an amplicon, and don't really care about more advanced features. The 5th, --validate covers the use case of people who build primers to validate things, like a colony PCR, or a validation PCR for clinical samples. I think these 2 use cases cover ~90% of the different kinds of uses of poly amplify

    Use cases:

    poly amplify


    poly amplify range

    poly amplify pUC19.gb --primer_for "pUC19_for" --primer_rev "pUC19_rev" --range 146:469 > primers.txt amplified.gb

    poly no_amplify

    poly amplify pUC19.gb --range 146:469 --no_amplify pGLO.gb > primers.txt amplified.gb

    poly validate

    poly amplify SARs-CoV-2.gb --validate 30:29886 --size 375:425 --coverage 100 --overlap 30:50 > primers.txt many_amplified.gb (Note: this is pretty much the exact use case of https://artic.network/ncov-2019, which is why this kind of thing is important. It also doesn't fit in well to other parts of the program) Another example: poly amplify pUC19.gb --validate 146:469 --size 0:500 --coverage 100 --overlap 0:0 > primers.txt amplified.gb In this example, we really just want to amplify a fragment that we know this sequence is inside of so we can do some sanger sequencing or the like on it.

    @jecalles thoughts on different use cases I might be forgetting?

    enhancement ucb-students 
    opened by Koeng101 9
  • fasta rework

    fasta rework

    So this is NOT intended as a performance boost change. This is above all an API rework for ease of use and clarity.

    I have implemented a fasta.Parser type. This type is intended to be the base to parse all fasta data and replace all other implementations.

    Benefits of Parser.ParseNext() when compared to XConcurrent functions:

    • Provides the bare minimum API signature to parse a single fasta genome
    • It is a simple and easy to understand API.
    • No ambiguities on API use.
    • Takes a io.Reader for user liberty to use with any API or stream.
    • User has more control over how many fasta genomes they want to read
    • (EDIT) Also errors! ParseNext can return a very useful error with the line number on which parsing stopped
    goos: linux
    goarch: amd64
    pkg: github.com/TimothyStiles/poly/io/fasta
    cpu: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz
    BenchmarkFastaLegacy-8   	      55	  19968425 ns/op	17326897 B/op	  158406 allocs/op
    BenchmarkParser-8        	     118	   9643441 ns/op	16795447 B/op	   58150 allocs/op
    opened by soypat 8
  • Genbank error checking

    Genbank error checking

    This is a genbank error checking file. As with the other branches, multi genbank parsing was removed. Parse, write, and read now defaults to outputting a list, though we don't have multi-genbank working quite yet.

    I deleted TestLocusParseRegression because it doesn't work. These two lines are different, which the old parser did not pick up, but the newer parser picks up. https://github.com/TimothyStiles/poly/blob/359da2c15a5b20c12b68d5eb25f8254dd67ee62a/data/puc19static.gbk#L43


    opened by Koeng101 8
  • Updated synthesisFixer to not use SQL.

    Updated synthesisFixer to not use SQL.

    This push updates the synthesis fixer to not use SQL, but rather just use raw Golang. Should make troubleshooting easier.

    Fixed 3 bugs in the example files:

    • GGGCCC should fix to GGGCCA, not GGACCC, given its codon table
    • repeat finder function should fix more of the repeat
    • example_basic wasn't computing with a real file, so optimization table was set to zero. Now uses a real table

    All other tests are passing.

    opened by Koeng101 8
  • easy_dna functionality port.

    easy_dna functionality port.

    I had a great call the other month with the developer of easy_dna and he suggested that even though it isn't by far his most popular library it is his most useful and that poly should have similar functionality. Below is a list of functions that I will accept PRs for if ported. See above link for documentation and links to original source code. Please make sure they are well tested!

    all_iupac_variants ~~anonymized_record~~ ~~copy_and_paste_segment~~ ~~cut_and_paste_segment~~ dna_pattern_to_regexpr list_common_enzymes random_dna_sequence random_protein_sequence ~~replace_segment~~ ~~reverse_segment~~ ~~swap_segments~~

    Thanks, Tim

    P.S some clarifying info that Davian asked for.

    Most functions will likely fit into the scope of either sequence.go or transformations.go and their associated test files. If you feel like a function doesn't fit within the scope of either you can make a utils.go and utils_test.go in the projects main directory to include with your pull request.

    Most functions can be written as standalone string functions and then be wrapped by a method to use with poly's main sequence struct. That's as complex as integration with the main library will have to be in most cases.

    opened by TimothyStiles 8
  • GbkFlatGz example fails to read feature properly.

    GbkFlatGz example fails to read feature properly.

    func ExampleReadGbkFlatGz() {
            sequences := ReadGbkFlatGz("data/flatGbk_test.seq.gz")
            //sequences := ReadGbkFlatGz("data/gbbct358.seq.gz")
            var locus []string
            for _, sequence := range sequences {
                    locus = append(locus, sequence.Meta.Locus.Name)
            fmt.Println(strings.Join(locus, ", "))
            // Output: AB000100, AB000106

    Parsing this genbank flat file in this example does not get translations of features out.

    opened by Koeng101 7
  • The Poly RBS calculator

    The Poly RBS calculator

    Salis lab has previously made a ribosomal binding site calculator, which can predict translation initiation rates from proteins.

    However, it is slow (requiring a queue on a website) and closed source. In order to incorporate RBS calculation data in more complex applications, we need better performance and velocity of development. The best advancements in technology should be incorporated in an open-source manner.

    Basic idea of RBS calculator

    The basic idea behind the RBS calculator (this is a simplification) is that you take the binding energy of the the ribosomal 16S RNA to the mRNA's RBS site and subtract that from the binding energy of the mRNA to itself. There are a few other variables, but these are the basic ones (please check https://pubs.acs.org/doi/suppl/10.1021/acssynbio.0c00394/suppl_file/sb0c00394_si_001.pdf table 2 for equations)

    mRNA is a large variable, so it must be calculated each time the simulation is run. The 16S RNA, on the other hand, is not very variable. There is approximately a power-law distribution of what organisms people use, so we can cache most of the 16S-RNA to RBS (which I will now call 16S-RBS) data in a lookup table.

    Software and numbers we need

    It is important to keep in mind we want this software to be fast. In order to get performance, there are 2 primary optimizations: 1 - using a faster algorithm for calculating RNA secondary structure (we use LinearFold, which folds RNA in linear time) and 2 - using a lookup table for slow RNA-to-RNA binding calculations.

    In order to calculate mRNA folding, @vivekr has ported LinearFold to Golang. This package needs to be incorporated into Poly before we build the RBS calculator.

    In order to calculate the 16S-RBS lookup table, we will likely need to operate outside of Golang (probably in python). LinearFold does not support (at this time) multiple separate RNAs binding to each other, so we'll have to do this work in a different algorithm. It will be a challenge to relate the two numbers from different software packages. Since the 16S RNA binding sequence is only 9 base pairs long, we theoretically only have to calculate its binding efficiency to 262,144 other RNAs.

    There are other parameters that assist in doing RBS calculations (such as ΔGstandb, from https://pubs.acs.org/doi/suppl/10.1021/acssynbio.0c00394/suppl_file/sb0c00394_si_001.pdf). We'll likely need to build those into the calculator at some point, but perhaps not in version 1.


    After we get a prototype-functioning RBS calculator, we can tune our model. One dataset from Salis Lab has 9862 sequences, and we can directly compare our calculator's outputs from the ones published by Salis Lab. We can also use empirical calculations from ~300,000 RBSs from Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping. Using a couple of these data sets, we should be able to massage our RBS calculator to get to "good enough"

    While it probably won't be as absolutely efficient in organisms with large machine learning model datasets, we can present machine learning model datasets with our calculator's calculation as a parameter, and hopefully improve their abilities by giving them data.

    The goal is to make something that is useful to scientists and engineers. Our calculator can still be mildly wrong, so long as it is fundamentally useful to practitioners.

    In-vivo testing

    After we build the Poly RBS calculator, Sporenet Labs (aka Keoni Gandall, aka me) plans to test its efficiency in a real laboratory environment. As the group who builds the thing, we'll all decide together what experiments we should run. Ideally, we'll be using Bxb1-GFP in E.coli with an oligo pool or a degenerate primer library + some Nanopore sequencing.

    opened by Koeng101 7
  • Fixed the issue `rebase can contain multiple references`

    Fixed the issue `rebase can contain multiple references`

    Fixed the issue #278 .


    • Rewrote the commercial availability section to use a RegEx based parser that makes it more resilient to malformed / mis-indented files
    • Updated the basic structure to capture the availability and the dates as pair entries in the main enzyme data structure
    • Cleaned up all depreciated ioutils references
    • Added a parser test with a minimal test cases

    Reviewer Questions

    1. Do I need to add more tests for this PR ?
    2. Can I get support to verify that the JSON dump I'm using is correct ?
    3. Do we need to add a stress test with the large rebase test case ? How do we validate it ?
    4. Is there a way to generate malformed rebase files to test for panics ?

    Recommendations for the future:

    • We need to move science-related data and commercial data to separate data structures that are not clubbed together
    • We need to standardize parser writing to use something like ANTLR. Logic writing and debugging will be hard to maintain in the future.
    opened by rkrishnasanka 0
  • added regression test and seperated out nested logic.

    added regression test and seperated out nested logic.

    I'm not sure where the issue from #279 is coming from yet but I think it may have something to do with minimal primer length.


    opened by TimothyStiles 1
  • PCR simulations are wrong.

    PCR simulations are wrong.

    package main
    import (
    func main() {
    	gene := "aataattacaccgagataacacatcatggataaaccgatactcaaagattctatgaagctatttgaggcacttggtacgatcaagtcgcgctcaatgtttggtggcttcggacttttcgctgatgaaacgatgtttgcactggttgtgaatgatcaacttcacatacgagcagaccagcaaacttcatctaacttcgagaagcaagggctaaaaccgtacgtttataaaaagcgtggttttccagtcgttactaagtactacgcgatttccgacgacttgtgggaatccagtgaacgcttgatagaagtagcgaagaagtcgttagaacaagccaatttggaaaaaaagcaacaggcaagtagtaagcccgacaggttgaaagacctgcctaacttacgactagcgactgaacgaatgcttaagaaagctggtataaaatcagttgaacaacttgaagagaaaggtgcattgaatgcttacaaagcgatacgtgactctcactccgcaaaagtaagtattgagctactctgggctttagaaggagcgataaacggcacgcactggagcgtcgttcctcaatctcgcagagaagagctggaaaatgcgctttcttaa"
    	// Now let's make sure it only amplified our target.



    Check here - https://pkg.go.dev/github.com/TimothyStiles/[email protected]/primers/pcr

    TTATAGGTCTCATACTAATAATTACACCGAGATAACACATCATGG anneals to the beginning of the sequence, NOT TATATGGTCTCTTCATTTAAGAAAGCGCATTTTCCAGC. There is no reason why the beginning of the output sequence should be TATA

    @TimothyStiles This is a fairly critical bug we should figure out.

    opened by Koeng101 1
  • Data race causes flaky test in

    Data race causes flaky test in "clone"

    Describe the bug

    When running clone tests in a repeated way, TestSignalKilledGoldenGate randomly fails. I believe this might be caused by data races in the clone package, as reported by go test -race ./clone:

    % go test -race ./clone
    Write at 0x00c0000ba0f0 by goroutine 45:
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:272 +0x3c0
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8
    Previous write at 0x00c0000ba0f0 by goroutine 40:
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:272 +0x3c0
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8
    Goroutine 45 (running) created at:
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0x5f4
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8
    Goroutine 40 (finished) created at:
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0x5f4
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8
    Read at 0x00c000268218 by goroutine 68:
          /usr/local/go/src/runtime/slice.go:178 +0x0
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:272 +0x39c
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8
    Previous write at 0x00c000268218 by goroutine 34:
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:272 +0x3c0
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8
    Goroutine 68 (running) created at:
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0x5f4
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8
    Goroutine 34 (finished) created at:
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0x5f4
          /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8
    --- FAIL: TestSignalKilledGoldenGate (0.02s)
        clone_test.go:169: Should be only 4 looping sequences. Got: 7
        testing.go:1319: race detected during execution of test
    FAIL	github.com/TimothyStiles/poly/clone	1.317s

    To Reproduce Steps to reproduce the behavior:

    1. To run tests in a repeated way: go test -count 10 -v ./clone/
    2. Then run the race detector: go test -race ./clone on the main directory.

    Expected behavior Data race should be fixed.

    opened by matiasinsaurralde 4
  • Default codon tables

    Default codon tables

    We should add default codon tables of the following organisms:

    1. Escherichia coli MG1655
    2. Bacillus subtilis 168
    3. Vibrio natriegens (whatever type strain)
    4. Saccharomyces cerevisiae BY4741/BY4742 (used for knockdown collections)
    5. Yarrowia lipolytica (apparently used quite a bit in production, use type strain)
    6. Pichia pastoris NRRL Y-7556
    7. Kluyveromyces marxianus NRRL Y-6860 (it zoom speed yeast)
    8. Homo sapiens (kinda yucky)
    opened by Koeng101 4
A strange creature that creates software that creates strange creatures.
A Go package that reports processor topology

Description ------------ cpu package reports (some) processor topology information Note that the term package refers to a physical processor

Joseph Poirier 21 Nov 5, 2022
A simple debugging Go package to perform Dump and Die

dump A simple Go package to perform Dump and Die.

null 5 May 16, 2021
Go package for dealing with Mantis Bug Tracking tool

BlueMantis is a Go package in development that aim to make the process of sending issues and bugs in Go applications to the Open Source Bug Tracking software MantisBT.

Gustavo H. M. Silva 6 Aug 3, 2021
The package manager for macOS you didn’t know you missed. Simple, functional, and fast.

Stew The package manager for macOS you didn’t know you missed. Built with simplicity, functionality, and most importantly, speed in mind. Installation

Stew 20 Mar 30, 2022
sentry integrated logrus package for our internal projects

sentry integrated logrus package for our internal projects

seo.do 7 Oct 15, 2021
Package fsm allows you to add finite-state machines to your Go code.

fsm Package fsm allows you to add finite-state machines to your Go code. States and Events are defined as int consts: const ( StateFoo fsm.State =

Cocoon Space 35 Dec 9, 2022
Go package providing tools for working with Library of Congress data.

go-libraryofcongress Go package providing tools for working with Library of Congress data. Documentation Tools $> make cli go build -mod vendor -o bin

San Francisco International Airport Museum 9 Jan 3, 2023
Go package for working with Library of Congress data in an SFO Museum context.

go-sfomuseum-libraryofcongress Go package for working with Library of Congress data in an SFO Museum context. Documentation Documentation is incomplet

San Francisco International Airport Museum 0 Oct 19, 2021
A tool to generate Pulumi Package schemas from Go type definitions

MkSchema A tool to generate Pulumi Package schemas from Go type definitions. This tool translates annotated Go files into Pulumi component schema meta

Joe Duffy 3 Sep 1, 2022
Provides the radix package that implements a radix tree.

go-radix Provides the radix package that implements a radix tree. The package only provides a single Tree implementation, optimized for sparse nodes.

null 0 Oct 26, 2021
The kprobe package allows construction of dynamic struct based on kprobe event format descriptions.

The kprobe package allows construction of dynamic struct based on kprobe event format descriptions.

Dan Kortschak 4 Oct 27, 2021
Terbilang adalah package untuk mengubah nominal angka rupiah ke dalam nominal angka rupiah dalam bentuk teks

?? Welcome TERBILANG Terbilang adalah package untuk mengubah nominal angka rupiah ke dalam bentuk teks How to install go get github.com/ekokurniadi/te

Eko Kurniadi 1 Aug 26, 2022
Go 1.18 Generics based slice package

The missing slice package A Go-generics (Go 1.18) based functional library with no side-effects that adds the following functions to a slice package:

Steven Soroka 35 Jan 8, 2023
A modification (and a bit of simplification) of the tracerr package.

Decrr A modification (and a bit of simplification) of the tracerr package. This essentially does pretty much the same, but instead of returning anothe

Reinaldy Rafli 5 Nov 24, 2021
Package transition implements smooth transition.

transition Package transition implements smooth transition. Get started Install go get github.com/hslam/transition Import import "github.com/hslam/tr

Meng Huang 0 Dec 16, 2021
Package tail implements file tailing with fsnotify.

tail Package tail implements file tailing with fsnotify. Fork of nxadm/tail, simplified, reworked and optimized. Currently, supports only Linux and Da

go faster 15 Nov 30, 2022
A simple package to daemonize Go applications.

A simple package to daemonize Go applications.

Henrique Dias 0 Nov 13, 2021
Package buildinfo provides basic building blocks and instructions to easily add build and release information to your app.

Package buildinfo provides basic building blocks and instructions to easily add build and release information to your app. This is done by replacing variables in main during build with ldflags.

null 1 Nov 14, 2021
Go package that adds marshal and unmarshal features to nullable sql types.

#Nullable Very simple Go module to handle nullable fields. Basically, it adds to sql package types the JSON marshal and unmarshal features. It has 100

Diego Hordi 1 Jan 20, 2022