biogo is a bioinformatics library for Go

Related tags

Miscellaneous biogo
Overview

bíogo

bíogo

GoDoc Build Status

Installation

    $ go get github.com/biogo/biogo/...

Overview

bíogo is a bioinformatics library for the Go language.

Getting help

Help or similar requests are preferred on the biogo-user Google Group.

https://groups.google.com/forum/#!forum/biogo-user

Contributing

If you find any bugs, feel free to file an issue on the github issue tracker. Pull requests are welcome, though if they involve changes to API or addition of features, please first open a discussion at the biogo-dev Google Group.

https://groups.google.com/forum/#!forum/biogo-dev

Citing

If you use bíogo, please cite Kortschak, Snyder, Maragkakis and Adelson "bíogo: a simple high-performance bioinformatics toolkit for the Go language", doi:10.21105/joss.00167, and Kortschak and Adelson "bíogo: a simple high-performance bioinformatics toolkit for the Go language", doi:10.1101/005033.

The Purpose of bíogo

bíogo stems from the need to address the size and structure of modern genomic and metagenomic data sets. These properties enforce requirements on the libraries and languages used for analysis:

  • speed - size of data sets
  • concurrency - problems often embarrassingly parallelisable

In addition to the computational burden of massive data set sizes in modern genomics there is an increasing need for complex pipelines to resolve questions in tightening problem space and also a developing need to be able to develop new algorithms to allow novel approaches to interesting questions. These issues suggest the need for a simplicity in syntax to facilitate:

  • ease of coding
  • checking for correctness in development and particularly in peer review

Related to the second issue is the reluctance of some researchers to release code because of quality concerns.

The issue of code release is the first of the principles formalised in the Science Code Manifesto.

Code  All source code written specifically to process data for a published
      paper must be available to the reviewers and readers of the paper.

A language with a simple, yet expressive, syntax should facilitate development of higher quality code and thus help reduce this barrier to research code release.

Articles

bíogo: a simple high-performance bioinformatics toolkit for the Go language

Analysis of Illumina sequencing data using bíogo

Using and extending types in bíogo

Yet Another Bioinformatics Library

It seems that nearly every language has it own bioinformatics library, some of which are very mature, for example BioPerl and BioPython. Why add another one?

The different libraries excel in different fields, acting as scripting glue for applications in a pipeline (much of [1, 2, 3]) and interacting with external hosts [1, 2, 4, 5], wrapping lower level high performance languages with more user friendly syntax [1, 2, 3, 4] or providing bioinformatics functions for high performance languages [5, 6].

The intended niche for bíogo lies somewhere between the scripting libraries and high performance language libraries in being easy to use for both small and large projects while having reasonable performance with computationally intensive tasks.

The intent is to reduce the level of investment required to develop new research software for computationally intensive tasks.

  1. BioPerl http://genome.cshlp.org/content/12/10/1611.full http://www.springerlink.com/content/pp72033m171568p2

  2. BioPython http://bioinformatics.oxfordjournals.org/content/25/11/1422

  3. BioRuby http://bioinformatics.oxfordjournals.org/content/26/20/2617

  4. PyCogent http://genomebiology.com/2007/8/8/R171

  5. BioJava http://bioinformatics.oxfordjournals.org/content/24/18/2096

  6. SeqAn http://www.biomedcentral.com/1471-2105/9/11

Library Structure and Coding Style

The bíogo library structure is influenced both by the Go core library.

The coding style should be aligned with normal Go idioms as represented in the Go core libraries.

Quality Scores

Quality scores are supported for all sequence types, including protein. Phred and Solexa scoring systems are able to be read from files, however internal representation of quality scores is with Phred, so there will be precision loss in conversion. A Solexa quality score type is provided for use where this will be a problem.

Copyright and License

Copyright ©2011-2013 The bíogo Authors except where otherwise noted. All rights reserved. Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.

The bíogo logo is derived from Bitstream Charter, Copyright ©1989-1992 Bitstream Inc., Cambridge, MA.

BITSTREAM CHARTER is a registered trademark of Bitstream Inc.

Issues
  • The orientation of a feature should not be defined relative to its location.

    The orientation of a feature should not be defined relative to its location.

    Background I tried to get the 5'UTR from a transcript. The UTR5start()and UTR5end() functions check the transcript orientation to decide where the 5'end is. This is wrong as seen in the example below. Notice that the transcript orientation is Forward because it's defined on a gene and this results in the wrong identification of the 5'end (returns the 3'end instead of the 5'end).

    Transcript:          3'<------------5'       Forward
    Gene:              <---------------------    Reverse
    Chrom:       -----------------------------------
    

    Suggestion I thought about this and realized that the problem is not the function implementation. The problem is that the transcript does not have the required information and the only option is to look down the feature chain.

    If we think of the oriented feature as a vector, the notation for its coordinates would be [x1, x2] where the first number is the start of the vector and the second number is the end of the vector. Examples: A: [10, 20] B: [20, 10] In this notation we can easily see that B is the reverse of A. In other words, the orientation is an invariant of the notation. If the coordinate system changes, then the actual numbers might change but the first number would always be the start and the second number the end of the vector.

    Instead, in biogo we use a different notation like so: A: [10, 20], Forward B: [10, 20], Reverse

    Therefore, I think it's not correct to define the orientation as // Orientation returns the orientation of the feature relative to its location.. Instead we should use // Orientation returns the orientation of the feature. and make it clear in the documentation what this actually means and that it does not depend on the feature location. This will probably result in some code changes but I think it is required.

    Any thoughts?

    opened by mnsmar 15
  • Issues about filter BAM record

    Issues about filter BAM record

    Hi, First of all, thanks for all this amazing work. Recentlly I trying to filter some records out of a valid BAM file, But I can't deal with the headers properly. The code I used are as follow. BTW I also trying to create a new sam.Header, but really can't figure out how to do it

    f, err = os.Open(input_bam)
    	if err != nil {
    		log.Fatalf("could not open file %q:\n", err)
    	}
    	defer f.Close()
    	ok, err := bgzf.HasEOF(f)
    	if err != nil {
    		log.Fatalf("could not open file %q:\n", err)
    	}
    	if !ok {
    		log.Printf("file %q has no bgzf magic block: may be truncated\n", input_bam)
    	}
    
    	b, err := bam.NewReader(f, threads)
    	if err != nil {
    		log.Fatalf("could not read bam: %q\n", err)
    	}
    	defer b.Close()
    
    
    	fo, err := os.OpenFile(output_bam, os.O_WRONLY|os.O_CREATE, os.ModeAppend)
    	defer fo.Close()
    	if err != nil {
    		log.Fatalf("Could open file %v\n", output_bam)
    	}
            
            // due to I need a header exactlly matches with BAM records, so I trying to filter out header Refs here
    	header := b.Header().Clone()
    
    	removedRef := make([]*sam.Reference, 0)
    	for _, i := range header.Refs() {
    		if _, ok := rna[i.Name()]; !ok {
    			removedRef = append(removedRef, i)
    		}
    	}
    
    	for _, i := range removedRef {
    		err = header.RemoveReference(i)
    		if err != nil {
    			log.Fatalf("remove header reference failed, %v\n", err)
    		}
    	}
    
    	w, err := bam.NewWriter(fo, header, threads)
    	if err != nil {
    		log.Fatalf("Could write file %v\n", f)
    	}
    	defer w.Close()
    
    	// And iter all BAM records
    	for {
    		rec, err := b.Read()
    		if err == io.EOF {
    			break
    		}
    		if err != nil {
    			log.Fatalf("error reading bam: %v", err)
    		}
    
    		if _, ok := rna[rec.Ref.String()]; ok {
    			err = w.Write(rec)
    
    			if err != nil {
    				log.Fatalf(err.Error())
    			}
    		}
    
    	}
    
    opened by ygidtu 11
  • code.google.com imports causing problems in go1.8

    code.google.com imports causing problems in go1.8

    not sure if this is specific to 1.8, but I see:

    [email protected]:~/go/src/github.com/biogo/biogo/io/seqio$ go test
    ../../seq/annotation.go:8:2: cannot find package "code.google.com/p/biogo/alphabet" in any of:
    	/home/brentp/go/go1.8beta1/go/src/code.google.com/p/biogo/alphabet (from $GOROOT)
    	/home/brentp/go/src/code.google.com/p/biogo/alphabet (from $GOPATH)
    ../../seq/annotation.go:9:2: cannot find package "code.google.com/p/biogo/feat" in any of:
    	/home/brentp/go/go1.8beta1/go/src/code.google.com/p/biogo/feat (from $GOROOT)
    	/home/brentp/go/src/code.google.com/p/biogo/feat (from $GOPATH)
    [email protected]:~/go/src/github.com/biogo/biogo/io/seqio$ go version
    go version go1.8beta1 linux/amd64
    

    you can also see this with:

    go get github.com/biogo/biogo/...
    
    opened by brentp 10
  • Add method to determine does file is bgzf or not

    Add method to determine does file is bgzf or not

    I'm use many compressors (zip, gzip, pgzip, bgzf) and need to understand what 
    file underline i have.
    For example if i download bzgf file i need to enter to some code path to able 
    to seek inside file, in case of gzip/pgzip i need to switch to other things 
    (like enable more cpus or not..).
    Does it possible to add such method?
    

    Original issue reported on code.google.com by [email protected] on 15 Feb 2015 at 12:13

    Priority-Medium auto-migrated Type-Enhancement 
    opened by GoogleCodeExporter 9
  • biogo.bam fails to iterate over valid BAM file

    biogo.bam fails to iterate over valid BAM file

    Please check that you are using the latest version of bíogo: execute `git
    describe --always' in your bíogo repository and check that it matches the
    latest master at http://code.google.com/p/biogo/source/browse/
    
    What steps will reproduce the problem? (If possible please include a
    program that is a minimal self-contained reproducing case).
    1. Tried to run play.go over play.bam, iterating through each Record and 
    printing its QNAME
    
    
    What is the expected output?
    
    A listing of all QNAME in the BAM file in stdout
    
    What do you see instead?
    
    After printing three QNAME, got an error saying "truncated sequence"
    
    
    What version of the product are you using (`git describe --always')? Which
    version of Go is being used (`go version')? On what operating system?
    
    Using biogo.bam @ 53b55fc, Go version 1.0.3
    
    Please provide any additional information below.
    
    BAM file seems to be valid; samtools view can display it without any problems.
    

    Original issue reported on code.google.com by [email protected] on 8 May 2013 at 8:38

    Attachments:

    Type-Defect Priority-Medium auto-migrated 
    opened by GoogleCodeExporter 7
  • align: improve documentation for accessing alignment scores

    align: improve documentation for accessing alignment scores

    In the structure holding the pairwise alignment results in biogo/align/align.go the alignment score is not exported and the featPair type is private to the package.

    type featPair struct { a, b feature score int } As the alignment methods return []feat.Pair, there is no way to access the alignment score as far as can I see. Is that right? If yes, it would be nice to have a way to do that (other than parsing the string representation of the featPair objects).

    opened by bsipos 6
  • Testsuite is failing

    Testsuite is failing

    Hi @kortschak

    Thanks a lot for your work on biogo. I maintain this as a package in Debian, and during a recent re-build, tests have started to fail for biogo, in particular these:

    === RUN   Test
    
    ----------------------------------------------------------------------
    FAIL: errors_test.go:44: S.TestCaller
    
    errors_test.go:49:
        c.Check(ln, check.Equals, 45)
    ... obtained int = 46
    ... expected int = 45
    
    === RUN   Test
    
    ----------------------------------------------------------------------
    FAIL: fai_test.go:27: S.TestReadFrom
    
    fai_test.go:178:
        c.Assert(err, check.DeepEquals, t.err)
    ... obtained *csv.ParseError = &csv.ParseError{StartLine:8, Line:8, Column:1, Err:(*errors.errorString)(0xc0000563f0)} ("record on line 8: wrong number of fields")
    ... expected *csv.ParseError = &csv.ParseError{StartLine:8, Line:8, Column:0, Err:(*errors.errorString)(0xc0000563f0)} ("record on line 8: wrong number of fields")
    ... Difference:
    ...     Column: 1 != 0
    
    
    OOPS: 0 passed, 1 FAILED
    --- FAIL: Test (0.00s)
    FAIL
    FAIL	github.com/biogo/biogo/io/seqio/fai	0.023s
    

    Full log can be found here please consider fixing. There is some delta in col values that seems to trigger this, but I cannot do much beyond this point. Please consider fixing this, and thanks again!

    opened by nileshpatra 5
  • End coordinate in biogo.bam is off by one

    End coordinate in biogo.bam is off by one

    In record.go (biogo.bam), the End() function is supposed to return end 
    coordinate by it's off by one.
    
    The first nucleotide is counted twice: by r.Pos and the match.
    
    Solution:
    end := r.Pos
    should be replaced by
    end := r.Pos - 1
    
    

    Original issue reported on code.google.com by [email protected] on 13 Aug 2014 at 11:31

    Type-Defect Priority-Medium auto-migrated 
    opened by GoogleCodeExporter 5
  • New Overlap method for Record

    New Overlap method for Record

    Record is missing a method to compute the overlap between the record and user 
    range.
    
    Could you please add it to biogo.bam?
    
    If you could write tests that would be nice. Thanks
    
    Also min and max might already be somewhere in biogo.
    
    func min(a, b int) int {
        if a > b {
            return b
        }
        return a
    }
    
    func max(a, b int) int {
        if a < b {
            return b
        }
        return a
    }
    
    // Overlap returns the length of the overlap between the alignment of the read 
    and the interval
    // specified by the start and end on the reference sequence.
    func (r *Record) Overlap(start int, end int) int {
        var overlap, o int
        pos := r.Pos
        for _, co := range r.Cigar {
            t := co.Type()
            l := co.Len()
            if consume[t].query && consume[t].ref {
                o = min(pos + l, end) - max(pos, start)
                if o > 0 {
                    overlap += o
                }
            }
            if consume[t].query || consume[t].ref {
                pos += l
            }
        }
        return overlap
    }
    

    Original issue reported on code.google.com by [email protected] on 14 Aug 2014 at 1:45

    Priority-Medium auto-migrated Type-Enhancement 
    opened by GoogleCodeExporter 4
  • How to create a new QSeq using an old one?

    How to create a new QSeq using an old one?

    Hello, I am new to Go so apologies for the newbie question. I'm benchmarking Go vs Python to do a simple string manipulation in the ID section of a FASTQ, to turn 2:N:0:0|SEQORIENT=F|PRIMER=RT_IgM_long_12N|BARCODE=TCGGAAAT,ACGGCAGA into 2:N:0:0_SEQORIENT=F_PRIMER=RT_IgM_long_12N_BARCODE:TCGGAAAT (for read 1, the other side of the comma for read 2). Here is my full program, including a test dataset. It can be run with:

    ./format_barcodes BX-R1_primers-pass_pair-pass_first3.fastq 1 test_output.fastq
    

    If seq, err = reader.Read(), then I thought this would work:

    seq_replaced := linear.NewQSeq(new_id, seq.Seq, seq.Alphabet(), seq.Encode)
    

    I'm getting the error that there is no field or function Seq or Encode:

    format_barcodes/main.go:77:44: seq.Seq undefined (type seq.Sequence has no field or method Seq)
    format_barcodes/main.go:77:69: seq.Encode undefined (type seq.Sequence has no field or method Encode)
    

    But I see those fields in the debugger of GoLand:

    screen shot 2018-04-16 at 8 42 20 pm

    How can I access these fields? I'm getting very confused between what is a seq.Sequence vs a linear.QSeq

    Thank you! Warmest, Olga

    opened by olgabot 3
  • align: add semi-global alignment

    align: add semi-global alignment

    I recently wrote a NW implementation with penalty-free end gaps for semi-global alignment, before I discovered biogo. Any interest in accepting such a thing, if contributed? (I just want to check in before putting in much effort.)

    opened by josharian 3
  • example of getting the consensus of multiple sequences?

    example of getting the consensus of multiple sequences?

    I have many similar sequences of different lengths and need to get the consensus of them. From the docs, the seq module should work for this, but any example provided?

    opened by dongweigogo 6
  • Using custom-defined errors

    Using custom-defined errors

    Currently, errors are return as a error. For example, fasta.Reader can return an IO error, a badly formed line, or a badly formed header. The error string is sufficient for a human to recognize where the error occurs. However, it requires a matching library to deal with the error string programmatically since all errors are the same type.

    Error handling and Go gives some examples for custom-defined errors: a struct that satisfy the error interface is defined. Additional details about the error can be included in the struct. The those details can also be included in an error string through a custom defined T. Error() method. When handling the error, type assertion can be used to figure out where went wrong.

    Is it possible to introduce custom-defined error in a future release of biogo?

    opened by mys721tx 1
  • align: a better way to do formatting exists

    align: a better way to do formatting exists

    The align.feature type contains loc field (as it should). It would have been sensible (was probably my intention) to point the location of each alignment segment at the input sequence value. This can still be done, though not in a nice way because of the types I went with when the API was 'designed'.

    unfortunate 
    opened by kortschak 0
  • Nice, but not friendly

    Nice, but not friendly

    Hi guys,

    I've looked at few files and it looks really nice. Problem is that your readme does not tell what it is about. Non-academic bioinformaticians won't click on your papers links and people bounce because they do not see any nice example or something that will ignite their interest. Please, add at least basic examples that will show people what it is about (and some feature list maybe?!).

    Thank you guys. Keep up the good work, we need better languages in bioinformatics than C++ and Perl.

    opened by mariokostelac 7
Owner
bíogo
bíogo is a bioinformatics library collection for Go
bíogo
Evolutionary optimization library for Go (genetic algorithm, partical swarm optimization, differential evolution)

eaopt is an evolutionary optimization library Table of Contents Changelog Example Background Features Usage General advice Genetic algorithms Overview

Max Halford 808 Aug 8, 2022
cross-platform, normalized battery information library

battery Cross-platform, normalized battery information library. Gives access to a system independent, typed battery state, capacity, charge and voltag

null 208 Jul 27, 2022
GoLang Library for Browser Capabilities Project

Browser Capabilities GoLang Project PHP has get_browser() function which tells what the user's browser is capable of. You can check original documenta

Maksim N. 41 Jul 22, 2022
Go bindings for unarr (decompression library for RAR, TAR, ZIP and 7z archives)

go-unarr Golang bindings for the unarr library from sumatrapdf. unarr is a decompression library and CLI for RAR, TAR, ZIP and 7z archives. GoDoc See

Milan Nikolic 196 Aug 12, 2022
Type-safe Prometheus metrics builder library for golang

gotoprom A Prometheus metrics builder gotoprom offers an easy to use declarative API with type-safe labels for building and using Prometheus metrics.

Cabify 93 Jun 9, 2022
An easy to use, extensible health check library for Go applications.

Try browsing the code on Sourcegraph! Go Health Check An easy to use, extensible health check library for Go applications. Table of Contents Example M

Claudemiro 432 Aug 18, 2022
An simple, easily extensible and concurrent health-check library for Go services

Healthcheck A simple and extensible RESTful Healthcheck API implementation for Go services. Health provides an http.Handlefunc for use as a healthchec

Ether Labs 225 Aug 5, 2022
Simple licensing library for golang.

license-key A simple licensing library in Golang, that generates license files containing arbitrary data. Note that this implementation is quite basic

Hyperboloide 252 Aug 3, 2022
Library for interacting with LLVM IR in pure Go.

llvm Library for interacting with LLVM IR in pure Go. Introduction Introductory blog post "LLVM IR and Go" Our Document Installation go get -u github.

null 931 Aug 9, 2022
atomic measures + Prometheus exposition library

About Atomic measures with Prometheus exposition for the Go programming language. This is free and unencumbered software released into the public doma

Pascal S. de Kloe 22 Jun 10, 2022
Morse Code Library in Go

morse Morse Code Library in Go Download and Use go get -u -v github.com/alwindoss/morse or dep ensure -add github.com/alwindoss/morse Sample Usage pac

Alwin Doss 75 Jul 21, 2022
A Golang library to manipulate strings according to the word parsing rules of the UNIX Bourne shell.

shellwords A Golang library to manipulate strings according to the word parsing rules of the UNIX Bourne shell. Installation go get github.com/Wing924

Wei He 17 Mar 15, 2022
Notification library for gophers and their furry friends.

Shoutrrr Notification library for gophers and their furry friends. Heavily inspired by caronc/apprise. Quick Start As a package Using shoutrrr is easy

containrrr 408 Aug 16, 2022
Go library for creating state machines

Stateless Create state machines and lightweight state machine-based workflows directly in Go code: phoneCall := stateless.NewStateMachine(stateOffHook

Quim Muntal 469 Aug 8, 2022
a cron library for go

cron Cron V3 has been released! To download the specific tagged release, run: go get github.com/robfig/cron/[email protected] Import it in your program as: im

Rob Figueiredo 10.1k Aug 9, 2022
Functional programming library for Go including a lazy list implementation and some of the most usual functions.

functional A functional programming library including a lazy list implementation and some of the most usual functions. import FP "github.com/tcard/fun

Toni Cárdenas 31 May 21, 2022
FreeSWITCH Event Socket library for the Go programming language.

eventsocket FreeSWITCH Event Socket library for the Go programming language. It supports both inbound and outbound event socket connections, acting ei

Alexandre Fiori 107 Aug 2, 2022
Flow-based and dataflow programming library for Go (golang)

GoFlow - Dataflow and Flow-based programming library for Go (golang) Status of this branch (WIP) Warning: you are currently on v1 branch of GoFlow. v1

Vladimir Sibirov 1.4k Aug 10, 2022
Go port of Coda Hale's Metrics library

go-metrics Go port of Coda Hale's Metrics library: https://github.com/dropwizard/metrics. Documentation: http://godoc.org/github.com/rcrowley/go-metri

Richard Crowley 3.3k Aug 17, 2022