A Go package for engineering organisms.

Overview

(Poly)merase

PkgGoDev GitHub license Tests Test Coverage

Poly is a Go package for engineering organisms.

  • Fast: Poly is fast and scalable.

  • Modern: Poly tackles issues that other libraries and utilities just don't. From general codon optimization and primer design to circular sequence hashing. All written in a language that was designed to be fast, scalable, and easy to develop in and maintain. Did we say it was fast?

  • Reproducible: Poly is well tested and designed to be used in industrial, academic, and hobbyist settings. No more copy and pasting strings into random websites to process the data you need.

  • Ambitious: Poly's goal is to be the most complete, open, and well used collection of computational synthetic biology tools ever assembled. If you like our dream and want to support us please star this repo, request a feature, open a pull request, or sponsor the project.

Documentation

Community

  • Discord: Chat about Poly and join us for game nights on our discord server!

Contributing

  • Code of conduct: Please read the full text so you can understand what we're all about and remember to be excellent to each other!

  • Contributor's guide: Please read through it before you start hacking away and pushing contributions to this fine codebase.

Sponsor

  • Sponsor: 🤘 Thanks for your support 🤘

License

  • MIT

  • Copyright (c) 2021 Timothy Stiles

Issues
  • All iupac variants

    All iupac variants

    This is for #92, a port of all_iupac_variants from easy_dna.

    IUPAC codes are stored as a map of rune slices. The cartesian products are then generated using a retooled version of the function found in github.com/schwarmco/go-cartesian-product, which uses recursion and go routines, so it should scale quite nicely to large sequences.

    opened by tijeco 11
  • Brew installation does not work for m1 Macbook pro

    Brew installation does not work for m1 Macbook pro

    Describe the bug

    Installation on MacOS via Homebrew does not work. This might be due to m1 vs intel CPUs but I do not own both so can not confirm.

    To Reproduce

    ➜  Downloads brew install timothystiles/poly/poly
    
    ==> Tapping timothystiles/poly
    Cloning into '/opt/homebrew/Library/Taps/timothystiles/homebrew-poly'...
    remote: Enumerating objects: 80, done.
    remote: Counting objects: 100% (80/80), done.
    remote: Compressing objects: 100% (40/40), done.
    remote: Total 80 (delta 20), reused 0 (delta 0), pack-reused 0
    Receiving objects: 100% (80/80), 9.30 KiB | 2.33 MiB/s, done.
    Resolving deltas: 100% (20/20), done.
    Error: Invalid formula: /opt/homebrew/Library/Taps/timothystiles/homebrew-poly/Formula/poly.rb
    formulae require at least a URL
    Error: Cannot tap timothystiles/poly: invalid syntax in tap!
    

    Expected behavior

    Successful installation.

    Screenshots

    N/A

    Desktop (please complete the following information):

    • MacOS 11.4
    • M1 Macbook Pro

    Additional Context

    Homebrew list does not seem to contain this project: https://formulae.brew.sh/formula/

    opened by nii236 10
  • Enables Dev Container #117

    Enables Dev Container #117

    Added vscode dev container settings. Should install the go package a couple of other vscode plugins. Currently, the dev container image uses a Ubuntu Focal Container and installs golang using the latest apt package golang.

    This dev container also installs the following plugins:

    1. golang.go
    2. ms-vsliveshare.vsliveshare-pack
    3. redhat.vscode-yaml
    4. yzhang.markdown-all-in-one
    5. premparihar.gotestexplorer
    6. github.vscode-pull-request-github
    7. streetsidesoftware.code-spell-checker
    8. eamodio.gitlens
    opened by rkrishnasanka 10
  • Add cloning function

    Add cloning function

    The following branch "goldengatev1" has a functioning GoldenGate assembly script for BbsI. It works, but we should try to break it before integrating it, as well as debate the merits of my approach. It only operates on raw sequences as well - we should likely have an algorithm that works with Genbank files.

    If we decide to do something else, we should close this pull request. This function, however, does work.

    opened by Koeng101 10
  • Primer flow

    Primer flow

    Primers are a common need for labs, and will be required for more complex protocols, like Gibson assemblies.

    The most basic application of a primer design will be a simple amplification.

    poly amplify pUC19.gb --amplicon "ATGACCATGATTACGCCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCCGGGTACCGAGCTCGAATTCACTGGCCGTCGTTTTACAACGTCGTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTCCCAACAGTTGCGCAGCCTGAATGGCGAATGGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAG" > primers.txt pUC19_amplified.gb

    primers.txt would be a tab separated value file which looks like (similar to what Snapgene provides):

    Primer 1	ATGACCATGATTACGCCAAG
    Primer 2	CTATGCGGCATCAGAGCA  
    

    While pUC19_amplified.gb would simply be a genbank file with only the primers + amplified sequence.

    There are a few different flags that would be useful for poly amplify -

    1. --primer_for "pUC19_for" --primer_rev "pUC19_rev" for naming the output primers in primers.txt
    2. --amplicon "ATGC" amplifies the particular string
    3. --range 0,0 amplifies a particular range of sequence
    4. --no_amplify pGLO.gb prevents primers from amplifying a naughty sequence from a different file
    5. --validate 10:20 --size 100:150 --coverage 100 --overlap 10:40 amplifies a particular range of sequence with a size within the size limitations of the --size flag, the coverage limitations of the --coverage tag, and the overlap limitations of the --overlap tag.

    These 4 functions generally fulfill all needs of a biologist. The first few cover the use cases of your average cloner - they simply want to clone out an amplicon, and don't really care about more advanced features. The 5th, --validate covers the use case of people who build primers to validate things, like a colony PCR, or a validation PCR for clinical samples. I think these 2 use cases cover ~90% of the different kinds of uses of poly amplify

    Use cases:

    poly amplify

    poly amplify pUC19.gb --primer_for "pUC19_for" --primer_rev "pUC19_rev" --amplicon "ATGACCATGATTACGCCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCCGGGTACCGAGCTCGAATTCACTGGCCGTCGTTTTACAACGTCGTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTCCCAACAGTTGCGCAGCCTGAATGGCGAATGGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAG" > primers.txt amplified.gb

    poly amplify range

    poly amplify pUC19.gb --primer_for "pUC19_for" --primer_rev "pUC19_rev" --range 146:469 > primers.txt amplified.gb

    poly no_amplify

    poly amplify pUC19.gb --range 146:469 --no_amplify pGLO.gb > primers.txt amplified.gb

    poly validate

    poly amplify SARs-CoV-2.gb --validate 30:29886 --size 375:425 --coverage 100 --overlap 30:50 > primers.txt many_amplified.gb (Note: this is pretty much the exact use case of https://artic.network/ncov-2019, which is why this kind of thing is important. It also doesn't fit in well to other parts of the program) Another example: poly amplify pUC19.gb --validate 146:469 --size 0:500 --coverage 100 --overlap 0:0 > primers.txt amplified.gb In this example, we really just want to amplify a fragment that we know this sequence is inside of so we can do some sanger sequencing or the like on it.

    @jecalles thoughts on different use cases I might be forgetting?

    enhancement ucb-students 
    opened by Koeng101 9
  • Genbank error checking

    Genbank error checking

    This is a genbank error checking file. As with the other branches, multi genbank parsing was removed. Parse, write, and read now defaults to outputting a list, though we don't have multi-genbank working quite yet.

    I deleted TestLocusParseRegression because it doesn't work. These two lines are different, which the old parser did not pick up, but the newer parser picks up. https://github.com/TimothyStiles/poly/blob/359da2c15a5b20c12b68d5eb25f8254dd67ee62a/data/puc19static.gbk#L43

    https://github.com/TimothyStiles/poly/blob/359da2c15a5b20c12b68d5eb25f8254dd67ee62a/data/puc19.gbk#L43

    opened by Koeng101 8
  • Updated synthesisFixer to not use SQL.

    Updated synthesisFixer to not use SQL.

    This push updates the synthesis fixer to not use SQL, but rather just use raw Golang. Should make troubleshooting easier.

    Fixed 3 bugs in the example files:

    • GGGCCC should fix to GGGCCA, not GGACCC, given its codon table
    • repeat finder function should fix more of the repeat
    • example_basic wasn't computing with a real file, so optimization table was set to zero. Now uses a real table

    All other tests are passing.

    opened by Koeng101 8
  • easy_dna functionality port.

    easy_dna functionality port.

    I had a great call the other month with the developer of easy_dna and he suggested that even though it isn't by far his most popular library it is his most useful and that poly should have similar functionality. Below is a list of functions that I will accept PRs for if ported. See above link for documentation and links to original source code. Please make sure they are well tested!

    all_iupac_variants ~~anonymized_record~~ ~~copy_and_paste_segment~~ ~~cut_and_paste_segment~~ dna_pattern_to_regexpr list_common_enzymes random_dna_sequence random_protein_sequence ~~replace_segment~~ ~~reverse_segment~~ ~~swap_segments~~

    Thanks, Tim

    P.S some clarifying info that Davian asked for.

    Most functions will likely fit into the scope of either sequence.go or transformations.go and their associated test files. If you feel like a function doesn't fit within the scope of either you can make a utils.go and utils_test.go in the projects main directory to include with your pull request.

    Most functions can be written as standalone string functions and then be wrapped by a method to use with poly's main sequence struct. That's as complex as integration with the main library will have to be in most cases.

    ucb-students 
    opened by TimothyStiles 8
  • GbkFlatGz example fails to read feature properly.

    GbkFlatGz example fails to read feature properly.

    func ExampleReadGbkFlatGz() {
            sequences := ReadGbkFlatGz("data/flatGbk_test.seq.gz")
            //sequences := ReadGbkFlatGz("data/gbbct358.seq.gz")
            var locus []string
            for _, sequence := range sequences {
                    locus = append(locus, sequence.Meta.Locus.Name)
            }
            fmt.Println(strings.Join(locus, ", "))
            // Output: AB000100, AB000106
    }
    

    Parsing this genbank flat file in this example does not get translations of features out.

    opened by Koeng101 7
  • The Poly RBS calculator

    The Poly RBS calculator

    Salis lab has previously made a ribosomal binding site calculator, which can predict translation initiation rates from proteins.

    However, it is slow (requiring a queue on a website) and closed source. In order to incorporate RBS calculation data in more complex applications, we need better performance and velocity of development. The best advancements in technology should be incorporated in an open-source manner.

    Basic idea of RBS calculator

    The basic idea behind the RBS calculator (this is a simplification) is that you take the binding energy of the the ribosomal 16S RNA to the mRNA's RBS site and subtract that from the binding energy of the mRNA to itself. There are a few other variables, but these are the basic ones (please check https://pubs.acs.org/doi/suppl/10.1021/acssynbio.0c00394/suppl_file/sb0c00394_si_001.pdf table 2 for equations)

    mRNA is a large variable, so it must be calculated each time the simulation is run. The 16S RNA, on the other hand, is not very variable. There is approximately a power-law distribution of what organisms people use, so we can cache most of the 16S-RNA to RBS (which I will now call 16S-RBS) data in a lookup table.

    Software and numbers we need

    It is important to keep in mind we want this software to be fast. In order to get performance, there are 2 primary optimizations: 1 - using a faster algorithm for calculating RNA secondary structure (we use LinearFold, which folds RNA in linear time) and 2 - using a lookup table for slow RNA-to-RNA binding calculations.

    In order to calculate mRNA folding, @vivekr has ported LinearFold to Golang. This package needs to be incorporated into Poly before we build the RBS calculator.

    In order to calculate the 16S-RBS lookup table, we will likely need to operate outside of Golang (probably in python). LinearFold does not support (at this time) multiple separate RNAs binding to each other, so we'll have to do this work in a different algorithm. It will be a challenge to relate the two numbers from different software packages. Since the 16S RNA binding sequence is only 9 base pairs long, we theoretically only have to calculate its binding efficiency to 262,144 other RNAs.

    There are other parameters that assist in doing RBS calculations (such as ΔGstandb, from https://pubs.acs.org/doi/suppl/10.1021/acssynbio.0c00394/suppl_file/sb0c00394_si_001.pdf). We'll likely need to build those into the calculator at some point, but perhaps not in version 1.

    Testing

    After we get a prototype-functioning RBS calculator, we can tune our model. One dataset from Salis Lab has 9862 sequences, and we can directly compare our calculator's outputs from the ones published by Salis Lab. We can also use empirical calculations from ~300,000 RBSs from Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping. Using a couple of these data sets, we should be able to massage our RBS calculator to get to "good enough"

    While it probably won't be as absolutely efficient in organisms with large machine learning model datasets, we can present machine learning model datasets with our calculator's calculation as a parameter, and hopefully improve their abilities by giving them data.

    The goal is to make something that is useful to scientists and engineers. Our calculator can still be mildly wrong, so long as it is fundamentally useful to practitioners.

    In-vivo testing

    After we build the Poly RBS calculator, Sporenet Labs (aka Keoni Gandall, aka me) plans to test its efficiency in a real laboratory environment. As the group who builds the thing, we'll all decide together what experiments we should run. Ideally, we'll be using Bxb1-GFP in E.coli with an oligo pool or a degenerate primer library + some Nanopore sequencing.

    opened by Koeng101 7
  • Synthesis fixer

    Synthesis fixer

    This is the synthesis fixer. What I need to do:

    • Add good tests
    • Make overlap checker into a recursive function
    • Add good description, delete old comments of how it works

    Check it out @TimothyStiles

    opened by Koeng101 7
  • overhaul dev docs and workflow

    overhaul dev docs and workflow

    Is your feature request related to a problem? Please describe. Poly has grown beyond merging PRs into prime(main branch-- renaming). It needs a new release and PR workflow.

    Describe the solution you'd like

    • [ ] Update the contributor's guide to describe the new workflow where PRs are first merged into dev before main
    • [ ] Update issue and PR templates to redirect support to discussions and tell devs not to branch and merge to main
    • [X] Create notification hooks for social media
    • [ ] Make go docs not give readers option to run IO/network related code
    • [ ] Make github action to release main and dev versions on push with auto-semver
    • [ ] auto-push new docs to pkg.go.dev on release
    opened by TimothyStiles 0
  • Default codon tables

    Default codon tables

    We should add default codon tables of the following organisms:

    1. Escherichia coli MG1655
    2. Bacillus subtilis 168
    3. Vibrio natriegens (whatever type strain)
    4. Saccharomyces cerevisiae BY4741/BY4742 (used for knockdown collections)
    5. Yarrowia lipolytica (apparently used quite a bit in production, use type strain)
    6. Pichia pastoris NRRL Y-7556
    7. Kluyveromyces marxianus NRRL Y-6860 (it zoom speed yeast)
    8. Homo sapiens (kinda yucky)
    opened by Koeng101 0
  • Fixing eisen bug in Golden Gate Cloning

    Fixing eisen bug in Golden Gate Cloning

    Run the tests enough and this happens:

    Screen Shot 2022-02-24 at 7 32 14 AM

    Looks like the cyclic and concurrent nature of our current GoldenGate implementation makes it hard to limit the number of cycles it loops through. @Koeng101 has some ideas to fix this.

    opened by TimothyStiles 0
  • Example Primer Development Workflow

    Example Primer Development Workflow

    image

    Getting the ball rolling on implementing an example primer workflow that satisfies the following in order to generate a large number of DNA primers:

    • [x] consistent GC content
    • [x] roughly the same melting temperature
    • [ ] don't dimerize with themselves or other primers in the set
    • [x] no shared subsequences >4bp with any other primer in the set
    • [ ] does not bind to any DNA sequences from a reference (e.g. FreeGenes library and the genomes of E.coli, B.subtilis, S.cerevisiae, and P.pastoris)
    opened by codercahol 4
  • CRISPR gRNA Design

    CRISPR gRNA Design

    I'd like to be able to design CRISPR guide RNAs with Poly. Below is a draft spec for a new CRISPR package:

    Spec

    Our guide RNA design package should be lightweight and easily extendable and at its core will likely need

    • [ ] A struct to hold information about the target and the CAS protein being used.
    • [ ] A substruct to hold information about the CAS protein (PAM, references, functionality (cut, tag))
    • [ ] Maybe a small embedded DB of common CAS variants and their properties.
    • [ ] A basic heuristic for determining the viability of a potential gRNA candidate. This should include:
      • [ ] Checking for off-target hits in the target organism
      • [ ] Checking that the gRNA itself can be synthesized
      • [ ] Checking that gRNA GC content is within the 40-80% range.
      • [ ] sgRNA should be between 17-24 nucleotides

    What this is not

    • This likely won't ship as a single-function, end-to-end solution, but instead a core set of utilities, examples, and tutorials needed to design guide RNAs.

    Help required

    • [ ] A thorough review of common checks and constraints that will be needed in +95% of use cases
    • [ ] A thorough description of biological context for documentation. What's crispr? What's a guide RNA? single-guide RNAs vs two part guide RNA?, etc.

    I'd really love to talk to some people who regularly make guide inserts for gRNA plasmids so if you or someone you know can hang out and talk please reach out!

    -Tim

    opened by TimothyStiles 9
Releases(v0.20.0)
Owner
Tim
A strange creature that creates software that creates strange creatures.
Tim
A Go package that reports processor topology

Description ------------ cpu package reports (some) processor topology information Note that the term package refers to a physical processor

Joseph Poirier 20 Aug 13, 2019
A simple debugging Go package to perform Dump and Die

dump A simple Go package to perform Dump and Die.

null 5 May 16, 2021
Go package for dealing with Mantis Bug Tracking tool

BlueMantis is a Go package in development that aim to make the process of sending issues and bugs in Go applications to the Open Source Bug Tracking software MantisBT.

Gustavo H. M. Silva 6 Aug 3, 2021
The package manager for macOS you didn’t know you missed. Simple, functional, and fast.

Stew The package manager for macOS you didn’t know you missed. Built with simplicity, functionality, and most importantly, speed in mind. Installation

Stew 20 Mar 30, 2022
sentry integrated logrus package for our internal projects

sentry integrated logrus package for our internal projects

seo.do 7 Oct 15, 2021
Package fsm allows you to add finite-state machines to your Go code.

fsm Package fsm allows you to add finite-state machines to your Go code. States and Events are defined as int consts: const ( StateFoo fsm.State =

Cocoon Space 22 Jun 19, 2022
Go package providing tools for working with Library of Congress data.

go-libraryofcongress Go package providing tools for working with Library of Congress data. Documentation Tools $> make cli go build -mod vendor -o bin

San Francisco International Airport Museum 6 Oct 29, 2021
Go package for working with Library of Congress data in an SFO Museum context.

go-sfomuseum-libraryofcongress Go package for working with Library of Congress data in an SFO Museum context. Documentation Documentation is incomplet

San Francisco International Airport Museum 0 Oct 19, 2021
A tool to generate Pulumi Package schemas from Go type definitions

MkSchema A tool to generate Pulumi Package schemas from Go type definitions. This tool translates annotated Go files into Pulumi component schema meta

Joe Duffy 2 Jun 5, 2022
Provides the radix package that implements a radix tree.

go-radix Provides the radix package that implements a radix tree. The package only provides a single Tree implementation, optimized for sparse nodes.

null 0 Oct 26, 2021
The kprobe package allows construction of dynamic struct based on kprobe event format descriptions.

The kprobe package allows construction of dynamic struct based on kprobe event format descriptions.

Dan Kortschak 4 Oct 27, 2021
Terbilang adalah package untuk mengubah nominal angka rupiah ke dalam nominal angka rupiah dalam bentuk teks

?? Welcome TERBILANG Terbilang adalah package untuk mengubah nominal angka rupiah ke dalam bentuk teks How to install go get github.com/ekokurniadi/te

Eko Kurniadi 2 Dec 6, 2021
Go 1.18 Generics based slice package

The missing slice package A Go-generics (Go 1.18) based functional library with no side-effects that adds the following functions to a slice package:

Steven Soroka 24 Jun 25, 2022
A modification (and a bit of simplification) of the tracerr package.

Decrr A modification (and a bit of simplification) of the tracerr package. This essentially does pretty much the same, but instead of returning anothe

Reinaldy Rafli 5 Nov 24, 2021
Package transition implements smooth transition.

transition Package transition implements smooth transition. Get started Install go get github.com/hslam/transition Import import "github.com/hslam/tr

Meng Huang 0 Dec 16, 2021
Package tail implements file tailing with fsnotify.

tail Package tail implements file tailing with fsnotify. Fork of nxadm/tail, simplified, reworked and optimized. Currently, supports only Linux and Da

go faster 12 Feb 8, 2022
A simple package to daemonize Go applications.

A simple package to daemonize Go applications.

Henrique Dias 0 Nov 13, 2021
Package buildinfo provides basic building blocks and instructions to easily add build and release information to your app.

Package buildinfo provides basic building blocks and instructions to easily add build and release information to your app. This is done by replacing variables in main during build with ldflags.

null 1 Nov 14, 2021
Go package that adds marshal and unmarshal features to nullable sql types.

#Nullable Very simple Go module to handle nullable fields. Basically, it adds to sql package types the JSON marshal and unmarshal features. It has 100

Diego Hordi 1 Jan 20, 2022