biogo is a bioinformatics library for Go

Related tags

biogo
Overview

bíogo

bíogo

GoDoc Build Status

Installation

    $ go get github.com/biogo/biogo/...

Overview

bíogo is a bioinformatics library for the Go language.

Getting help

Help or similar requests are preferred on the biogo-user Google Group.

https://groups.google.com/forum/#!forum/biogo-user

Contributing

If you find any bugs, feel free to file an issue on the github issue tracker. Pull requests are welcome, though if they involve changes to API or addition of features, please first open a discussion at the biogo-dev Google Group.

https://groups.google.com/forum/#!forum/biogo-dev

Citing

If you use bíogo, please cite Kortschak, Snyder, Maragkakis and Adelson "bíogo: a simple high-performance bioinformatics toolkit for the Go language", doi:10.21105/joss.00167, and Kortschak and Adelson "bíogo: a simple high-performance bioinformatics toolkit for the Go language", doi:10.1101/005033.

The Purpose of bíogo

bíogo stems from the need to address the size and structure of modern genomic and metagenomic data sets. These properties enforce requirements on the libraries and languages used for analysis:

  • speed - size of data sets
  • concurrency - problems often embarrassingly parallelisable

In addition to the computational burden of massive data set sizes in modern genomics there is an increasing need for complex pipelines to resolve questions in tightening problem space and also a developing need to be able to develop new algorithms to allow novel approaches to interesting questions. These issues suggest the need for a simplicity in syntax to facilitate:

  • ease of coding
  • checking for correctness in development and particularly in peer review

Related to the second issue is the reluctance of some researchers to release code because of quality concerns.

The issue of code release is the first of the principles formalised in the Science Code Manifesto.

Code  All source code written specifically to process data for a published
      paper must be available to the reviewers and readers of the paper.

A language with a simple, yet expressive, syntax should facilitate development of higher quality code and thus help reduce this barrier to research code release.

Articles

bíogo: a simple high-performance bioinformatics toolkit for the Go language

Analysis of Illumina sequencing data using bíogo

Using and extending types in bíogo

Yet Another Bioinformatics Library

It seems that nearly every language has it own bioinformatics library, some of which are very mature, for example BioPerl and BioPython. Why add another one?

The different libraries excel in different fields, acting as scripting glue for applications in a pipeline (much of [1, 2, 3]) and interacting with external hosts [1, 2, 4, 5], wrapping lower level high performance languages with more user friendly syntax [1, 2, 3, 4] or providing bioinformatics functions for high performance languages [5, 6].

The intended niche for bíogo lies somewhere between the scripting libraries and high performance language libraries in being easy to use for both small and large projects while having reasonable performance with computationally intensive tasks.

The intent is to reduce the level of investment required to develop new research software for computationally intensive tasks.

  1. BioPerl http://genome.cshlp.org/content/12/10/1611.full http://www.springerlink.com/content/pp72033m171568p2

  2. BioPython http://bioinformatics.oxfordjournals.org/content/25/11/1422

  3. BioRuby http://bioinformatics.oxfordjournals.org/content/26/20/2617

  4. PyCogent http://genomebiology.com/2007/8/8/R171

  5. BioJava http://bioinformatics.oxfordjournals.org/content/24/18/2096

  6. SeqAn http://www.biomedcentral.com/1471-2105/9/11

Library Structure and Coding Style

The bíogo library structure is influenced both by the Go core library.

The coding style should be aligned with normal Go idioms as represented in the Go core libraries.

Quality Scores

Quality scores are supported for all sequence types, including protein. Phred and Solexa scoring systems are able to be read from files, however internal representation of quality scores is with Phred, so there will be precision loss in conversion. A Solexa quality score type is provided for use where this will be a problem.

Copyright and License

Copyright ©2011-2013 The bíogo Authors except where otherwise noted. All rights reserved. Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.

The bíogo logo is derived from Bitstream Charter, Copyright ©1989-1992 Bitstream Inc., Cambridge, MA.

BITSTREAM CHARTER is a registered trademark of Bitstream Inc.

Issues
  • feat: add functions to extract base coordinates of Feature

    feat: add functions to extract base coordinates of Feature

    Functions for feat package to extract the base coordinates for a feat.Feature.

    opened by mnsmar 35
  • feat/gene: fix UTR extraction functions

    feat/gene: fix UTR extraction functions

    Related discussion in #40.

    opened by mnsmar 15
  • The orientation of a feature should not be defined relative to its location.

    The orientation of a feature should not be defined relative to its location.

    Background I tried to get the 5'UTR from a transcript. The UTR5start()and UTR5end() functions check the transcript orientation to decide where the 5'end is. This is wrong as seen in the example below. Notice that the transcript orientation is Forward because it's defined on a gene and this results in the wrong identification of the 5'end (returns the 3'end instead of the 5'end).

    Transcript:          3'<------------5'       Forward
    Gene:              <---------------------    Reverse
    Chrom:       -----------------------------------
    

    Suggestion I thought about this and realized that the problem is not the function implementation. The problem is that the transcript does not have the required information and the only option is to look down the feature chain.

    If we think of the oriented feature as a vector, the notation for its coordinates would be [x1, x2] where the first number is the start of the vector and the second number is the end of the vector. Examples: A: [10, 20] B: [20, 10] In this notation we can easily see that B is the reverse of A. In other words, the orientation is an invariant of the notation. If the coordinate system changes, then the actual numbers might change but the first number would always be the start and the second number the end of the vector.

    Instead, in biogo we use a different notation like so: A: [10, 20], Forward B: [10, 20], Reverse

    Therefore, I think it's not correct to define the orientation as // Orientation returns the orientation of the feature relative to its location.. Instead we should use // Orientation returns the orientation of the feature. and make it clear in the documentation what this actually means and that it does not depend on the feature location. This will probably result in some code changes but I think it is required.

    Any thoughts?

    opened by mnsmar 15
  • Issues about filter BAM record

    Issues about filter BAM record

    Hi, First of all, thanks for all this amazing work. Recentlly I trying to filter some records out of a valid BAM file, But I can't deal with the headers properly. The code I used are as follow. BTW I also trying to create a new sam.Header, but really can't figure out how to do it

    f, err = os.Open(input_bam)
    	if err != nil {
    		log.Fatalf("could not open file %q:\n", err)
    	}
    	defer f.Close()
    	ok, err := bgzf.HasEOF(f)
    	if err != nil {
    		log.Fatalf("could not open file %q:\n", err)
    	}
    	if !ok {
    		log.Printf("file %q has no bgzf magic block: may be truncated\n", input_bam)
    	}
    
    	b, err := bam.NewReader(f, threads)
    	if err != nil {
    		log.Fatalf("could not read bam: %q\n", err)
    	}
    	defer b.Close()
    
    
    	fo, err := os.OpenFile(output_bam, os.O_WRONLY|os.O_CREATE, os.ModeAppend)
    	defer fo.Close()
    	if err != nil {
    		log.Fatalf("Could open file %v\n", output_bam)
    	}
            
            // due to I need a header exactlly matches with BAM records, so I trying to filter out header Refs here
    	header := b.Header().Clone()
    
    	removedRef := make([]*sam.Reference, 0)
    	for _, i := range header.Refs() {
    		if _, ok := rna[i.Name()]; !ok {
    			removedRef = append(removedRef, i)
    		}
    	}
    
    	for _, i := range removedRef {
    		err = header.RemoveReference(i)
    		if err != nil {
    			log.Fatalf("remove header reference failed, %v\n", err)
    		}
    	}
    
    	w, err := bam.NewWriter(fo, header, threads)
    	if err != nil {
    		log.Fatalf("Could write file %v\n", f)
    	}
    	defer w.Close()
    
    	// And iter all BAM records
    	for {
    		rec, err := b.Read()
    		if err == io.EOF {
    			break
    		}
    		if err != nil {
    			log.Fatalf("error reading bam: %v", err)
    		}
    
    		if _, ok := rna[rec.Ref.String()]; ok {
    			err = w.Write(rec)
    
    			if err != nil {
    				log.Fatalf(err.Error())
    			}
    		}
    
    	}
    
    opened by ygidtu 11
  • code.google.com imports causing problems in go1.8

    code.google.com imports causing problems in go1.8

    not sure if this is specific to 1.8, but I see:

    [email protected]:~/go/src/github.com/biogo/biogo/io/seqio$ go test
    ../../seq/annotation.go:8:2: cannot find package "code.google.com/p/biogo/alphabet" in any of:
    	/home/brentp/go/go1.8beta1/go/src/code.google.com/p/biogo/alphabet (from $GOROOT)
    	/home/brentp/go/src/code.google.com/p/biogo/alphabet (from $GOPATH)
    ../../seq/annotation.go:9:2: cannot find package "code.google.com/p/biogo/feat" in any of:
    	/home/brentp/go/go1.8beta1/go/src/code.google.com/p/biogo/feat (from $GOROOT)
    	/home/brentp/go/src/code.google.com/p/biogo/feat (from $GOPATH)
    [email protected]:~/go/src/github.com/biogo/biogo/io/seqio$ go version
    go version go1.8beta1 linux/amd64
    

    you can also see this with:

    go get github.com/biogo/biogo/...
    
    opened by brentp 10
  • Add method to determine does file is bgzf or not

    Add method to determine does file is bgzf or not

    I'm use many compressors (zip, gzip, pgzip, bgzf) and need to understand what 
    file underline i have.
    For example if i download bzgf file i need to enter to some code path to able 
    to seek inside file, in case of gzip/pgzip i need to switch to other things 
    (like enable more cpus or not..).
    Does it possible to add such method?
    

    Original issue reported on code.google.com by [email protected] on 15 Feb 2015 at 12:13

    Priority-Medium auto-migrated Type-Enhancement 
    opened by GoogleCodeExporter 9
  • biogo.bam fails to iterate over valid BAM file

    biogo.bam fails to iterate over valid BAM file

    Please check that you are using the latest version of bíogo: execute `git
    describe --always' in your bíogo repository and check that it matches the
    latest master at http://code.google.com/p/biogo/source/browse/
    
    What steps will reproduce the problem? (If possible please include a
    program that is a minimal self-contained reproducing case).
    1. Tried to run play.go over play.bam, iterating through each Record and 
    printing its QNAME
    
    
    What is the expected output?
    
    A listing of all QNAME in the BAM file in stdout
    
    What do you see instead?
    
    After printing three QNAME, got an error saying "truncated sequence"
    
    
    What version of the product are you using (`git describe --always')? Which
    version of Go is being used (`go version')? On what operating system?
    
    Using biogo.bam @ 53b55fc, Go version 1.0.3
    
    Please provide any additional information below.
    
    BAM file seems to be valid; samtools view can display it without any problems.
    

    Original issue reported on code.google.com by [email protected] on 8 May 2013 at 8:38

    Attachments:

    Type-Defect Priority-Medium auto-migrated 
    opened by GoogleCodeExporter 7
  • Community

    Community

    Please take a look.

    opened by kortschak 6
  • align: improve documentation for accessing alignment scores

    align: improve documentation for accessing alignment scores

    In the structure holding the pairwise alignment results in biogo/align/align.go the alignment score is not exported and the featPair type is private to the package.

    type featPair struct { a, b feature score int } As the alignment methods return []feat.Pair, there is no way to access the alignment score as far as can I see. Is that right? If yes, it would be nice to have a way to do that (other than parsing the string representation of the featPair objects).

    opened by bsipos 6
  • example of getting the consensus of multiple sequences?

    example of getting the consensus of multiple sequences?

    I have many similar sequences of different lengths and need to get the consensus of them. From the docs, the seq module should work for this, but any example provided?

    opened by dongweigogo 6
  • Do you plan to support GFF3.

    Do you plan to support GFF3.

    Given that GFF2 is deprecated, I was wondering if their is a plan to support GFF3?

    opened by srynobio 1
  • Using custom-defined errors

    Using custom-defined errors

    Currently, errors are return as a error. For example, fasta.Reader can return an IO error, a badly formed line, or a badly formed header. The error string is sufficient for a human to recognize where the error occurs. However, it requires a matching library to deal with the error string programmatically since all errors are the same type.

    Error handling and Go gives some examples for custom-defined errors: a struct that satisfy the error interface is defined. Additional details about the error can be included in the struct. The those details can also be included in an error string through a custom defined T. Error() method. When handling the error, type assertion can be used to figure out where went wrong.

    Is it possible to introduce custom-defined error in a future release of biogo?

    opened by mys721tx 1
  • Add Genbank Parsing

    Add Genbank Parsing

    GenBank parsing is available in BioPython but it doesn't appear to be present in biogo: the only mention of GenBank I can find is in gff.go.

    opened by thomas-bio 3
  • align: a better way to do formatting exists

    align: a better way to do formatting exists

    The align.feature type contains loc field (as it should). It would have been sensible (was probably my intention) to point the location of each alignment segment at the input sequence value. This can still be done, though not in a nice way because of the types I went with when the API was 'designed'.

    unfortunate 
    opened by kortschak 0
  • Nice, but not friendly

    Nice, but not friendly

    Hi guys,

    I've looked at few files and it looks really nice. Problem is that your readme does not tell what it is about. Non-academic bioinformaticians won't click on your papers links and people bounce because they do not see any nice example or something that will ignite their interest. Please, add at least basic examples that will show people what it is about (and some feature list maybe?!).

    Thank you guys. Keep up the good work, we need better languages in bioinformatics than C++ and Perl.

    opened by mariokostelac 7
  • README

    README

    Add back links to biogo-user and biogo-dev.

    opened by kortschak 0
Owner
bíogo
bíogo is a bioinformatics library collection for Go
bíogo
ghw - Golang HardWare discovery/inspection library

ghw - Golang HardWare discovery/inspection library ghw is a small Golang library providing hardware inspection and discovery for Linux and Windows.

Jay Pipes 1.1k Oct 22, 2021
Go implementation of the XDG Base Directory Specification and XDG user directories

xdg Provides an implementation of the XDG Base Directory Specification. The specification defines a set of standard paths for storing application file

Adrian-George Bostan 136 Oct 12, 2021
Go bindings for unarr (decompression library for RAR, TAR, ZIP and 7z archives)

go-unarr Golang bindings for the unarr library from sumatrapdf. unarr is a decompression library and CLI for RAR, TAR, ZIP and 7z archives. GoDoc See

Milan Nikolic 150 Oct 14, 2021
Easily create & extract archives, and compress & decompress files of various formats

archiver Introducing Archiver 3.1 - a cross-platform, multi-format archive utility and Go library. A powerful and flexible library meets an elegant CL

Matt Holt 3.3k Oct 15, 2021
Prometheus instrumentation library for Go applications

Prometheus Go client library This is the Go client library for Prometheus. It has two separate parts, one for instrumenting application code, and one

Prometheus 3.4k Oct 18, 2021
Elegant generics for Go

genny - Generics for Go Install: go get github.com/cheekybits/genny ===== (pron. Jenny) by Mat Ryer (@matryer) and Tyler Bunnell (@TylerJBunnell). Un

null 1.6k Oct 17, 2021
Go package providing tools for working with Library of Congress data.

go-libraryofcongress Go package providing tools for working with Library of Congress data. Documentation Tools $> make cli go build -mod vendor -o bin

San Francisco International Airport Museum 1 Oct 14, 2021
Emojis for Go 😄🐢🚀

turtle Emojis for Go ?? ?? ?? Reference Follow this link to view the reference documentation: GoDoc Reference ?? Installation Library To install the t

Raphael Pierzina 124 Oct 4, 2021
A library for parallel programming in Go

pargo A library for parallel programming in Go Package pargo provides functions and data structures for expressing parallel algorithms. While Go is pr

null 169 Sep 25, 2021
Library to work with MimeHeaders and another mime types. Library support wildcards and parameters.

Mime header Motivation This library created to help people to parse media type data, like headers, and store and match it. The main features of the li

Anton Ohorodnyk 25 Aug 24, 2021
Super short, fully unique, non-sequential and URL friendly Ids

Generator of unique non-sequential short Ids The package shortidenables the generation of short, fully unique, non-sequential and by default URL frien

teris.io 701 Oct 10, 2021
safe and easy casting from one type to another in Go

cast Easy and safe casting from one type to another in Go Don’t Panic! ... Cast What is Cast? Cast is a library to convert between different go types

Steve Francia 1.8k Oct 22, 2021
Flow-based and dataflow programming library for Go (golang)

GoFlow - Dataflow and Flow-based programming library for Go (golang) Status of this branch (WIP) Warning: you are currently on v1 branch of GoFlow. v1

Vladimir Sibirov 1.3k Oct 16, 2021
wkhtmltopdf Go bindings and high level interface for HTML to PDF conversion

wkhtmltopdf Go bindings and high level interface for HTML to PDF conversion. Implements wkhtmltopdf Go bindings. It can be used to convert HTML docume

Adrian-George Bostan 72 Oct 21, 2021
Yubigo is a Yubikey client API library that provides an easy way to integrate the Yubico Yubikey into your existing Go-based user authentication infrastructure.

yubigo Yubigo is a Yubikey client API library that provides an easy way to integrate the Yubikey into any Go application. Installation Installation is

Geert-Johan Riemer 117 Oct 4, 2021
Idiomatic Event Sourcing in Go

Event Sourcing for Go Idiomatic library to help you build Event Sourced application in Go. Please note The library is currently under development and

eventually 70 Oct 15, 2021
💥 Fusion is a tiny stream processing library written in Go.

?? Fusion Fusion is a tiny stream processing library written in Go. See reactor for a stream processing tool built using fusion. Features Simple & lig

Shivaprasad Bhat 17 Jun 30, 2021
Go library for structured parallelism

Go library for structured concurrency Structured concurrency helps reasoning about the behaviour of parallel programs. parallel implements structured

Ridge 8 Oct 14, 2021
Our library to use the idealo interfaces in go.

Here you can find our library for idealo. We develop the API endpoints according to our demand and need. You are welcome to help us to further develop this library.

J&J Ideenschmiede GmbH 1 Oct 23, 2021