Robust, flexible and resource-efficient pipelines using Go and the commandline

Overview

SciPipe

Robust, flexible and resource-efficient pipelines using Go and the commandline

Build Status Test Coverage Codebeat Grade Go Report Card GoDoc Gitter DOI

Project links: Documentation & Main Website | Issue Tracker | Chat

Why SciPipe?

  • Intuitive: SciPipe works by flowing data through a network of channels and processes
  • Flexible: Wrapped command-line programs can be combined with processes in Go
  • Convenient: Full control over how your files are named
  • Efficient: Workflows are compiled to binary code that run fast
  • Parallel: Pipeline paralellism between processes as well as task parallelism for multiple inputs, making efficient use of multiple CPU cores
  • Supports streaming: Stream data between programs to avoid wasting disk space
  • Easy to debug: Use available Go debugging tools or just println()
  • Portable: Distribute workflows as Go code or as self-contained executable files

Project updates

Introduction

SciPipe is a library for writing Scientific Workflows, sometimes also called "pipelines", in the Go programming language.

When you need to run many commandline programs that depend on each other in complex ways, SciPipe helps by making the process of running these programs flexible, robust and reproducible. SciPipe also lets you restart an interrupted run without over-writing already produced output and produces an audit report of what was run, among many other things.

SciPipe is built on the proven principles of Flow-Based Programming (FBP) to achieve maximum flexibility, productivity and agility when designing workflows. Compared to plain dataflow, FBP provides the benefits that processes are fully self-contained, so that a library of re-usable components can be created, and plugged into new workflows ad-hoc.

Similar to other FBP systems, SciPipe workflows can be likened to a network of assembly lines in a factory, where items (files) are flowing through a network of conveyor belts, stopping at different independently running stations (processes) for processing, as depicted in the picture above.

SciPipe was initially created for problems in bioinformatics and cheminformatics, but works equally well for any problem involving pipelines of commandline applications.

Project status: SciPipe pretty stable now, and only very minor API changes might still occur. We have successfully used SciPipe in a handful of both real and experimental projects, and it has had occasional use outside the research group as well.

Known limitations

Hello World example

Let's look at an example workflow to get a feel for what writing workflows in SciPipe looks like:

package main

import (
    // Import SciPipe, aliased to sp
    sp "github.com/scipipe/scipipe"
)

func main() {
    // Init workflow and max concurrent tasks
    wf := sp.NewWorkflow("hello_world", 4)

    // Initialize processes, and file extensions
    hello := wf.NewProc("hello", "echo 'Hello ' > {o:out|.txt}")
    world := wf.NewProc("world", "echo $(cat {i:in}) World > {o:out|.txt}")

    // Define data flow
    world.In("in").From(hello.Out("out"))

    // Run workflow
    wf.Run()
}

Running the example

Let's put the code in a file named hello_world.go and run it:

$ go run hello_world.go
AUDIT   2018/07/17 21:42:26 | workflow:hello_world             | Starting workflow (Writing log to log/scipipe-20180717-214226-hello_world.log)
AUDIT   2018/07/17 21:42:26 | hello                            | Executing: echo 'Hello ' > hello.out.txt
AUDIT   2018/07/17 21:42:26 | hello                            | Finished: echo 'Hello ' > hello.out.txt
AUDIT   2018/07/17 21:42:26 | world                            | Executing: echo $(cat ../hello.out.txt) World > hello.out.txt.world.out.txt
AUDIT   2018/07/17 21:42:26 | world                            | Finished: echo $(cat ../hello.out.txt) World > hello.out.txt.world.out.txt
AUDIT   2018/07/17 21:42:26 | workflow:hello_world             | Finished workflow (Log written to log/scipipe-20180717-214226-hello_world.log)

Let's check what file SciPipe has generated:

$ ls -1 hello*
hello.out.txt
hello.out.txt.audit.json
hello.out.txt.world.out.txt
hello.out.txt.world.out.txt.audit.json

As you can see, it has created a file hello.out.txt, and hello.out.world.out.txt, and an accompanying .audit.json for each of these files.

Now, let's check the output of the final resulting file:

$ cat hello.out.txt.world.out.txt
Hello World

Now we can rejoice that it contains the text "Hello World", exactly as a proper Hello World example should :)

Now, these were a little long and cumbersome filenames, weren't they? SciPipe gives you very good control over how to name your files, if you don't want to rely on the automatic file naming. For example, we could set the first filename to a static one, and then use the first name as a basis for the file name for the second process, like so:

package main

import (
    // Import the SciPipe package, aliased to 'sp'
    sp "github.com/scipipe/scipipe"
)

func main() {
    // Init workflow with a name, and max concurrent tasks
    wf := sp.NewWorkflow("hello_world", 4)

    // Initialize processes and set output file paths
    hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}")
    hello.SetOut("out", "hello.txt")

    world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}")
    // The modifier 's/.txt//' will replace '.txt' in the input path with ''
    world.SetOut("out", "{i:in|s/.txt//}_world.txt")

    // Connect network
    world.In("in").From(hello.Out("out"))

    // Run workflow
    wf.Run()
}

Now, if we run this, the file names get a little cleaner:

$ ls -1 hello*
hello.txt
hello.txt.audit.json
hello.txt.world.go
hello.txt.world.txt
hello.txt.world.txt.audit.json

The audit logs

Finally, we could have a look at one of those audit file created:

$ cat hello.txt.world.txt.audit.json
{
    "ID": "99i5vxhtd41pmaewc8pr",
    "ProcessName": "world",
    "Command": "echo $(cat hello.txt) World \u003e\u003e hello.txt.world.txt.tmp/hello.txt.world.txt",
    "Params": {},
    "Tags": {},
    "StartTime": "2018-06-15T19:10:37.955602979+02:00",
    "FinishTime": "2018-06-15T19:10:37.959410102+02:00",
    "ExecTimeNS": 3000000,
    "Upstream": {
        "hello.txt": {
            "ID": "w4oeiii9h5j7sckq7aqq",
            "ProcessName": "hello",
            "Command": "echo 'Hello ' \u003e hello.txt.tmp/hello.txt",
            "Params": {},
            "Tags": {},
            "StartTime": "2018-06-15T19:10:37.950032676+02:00",
            "FinishTime": "2018-06-15T19:10:37.95468214+02:00",
            "ExecTimeNS": 4000000,
            "Upstream": {}
        }
    }

Each such audit-file contains a hierarchic JSON-representation of the full workflow path that was executed in order to produce this file. On the first level is the command that directly produced the corresponding file, and then, indexed by their filenames, under "Upstream", there is a similar chunk describing how all of its input files were generated. This process will be repeated in a recursive way for large workflows, so that, for each file generated by the workflow, there is always a full, hierarchic, history of all the commands run - with their associated metadata - to produce that file.

You can find many more examples in the examples folder in the GitHub repo.

For more information about how to write workflows using SciPipe, and much more, see SciPipe website (scipipe.org)!

More material on SciPipe

Citing SciPipe

If you use SciPipe in academic or scholarly work, please cite the following paper as source:

Lampa S, Dahlö M, Alvarsson J, Spjuth O. SciPipe: A workflow library for agile development of complex and dynamic bioinformatics pipelines Gigascience. 8, 5 (2019). DOI: 10.1093/gigascience/giz044

Acknowledgements

Related tools

Find below a few tools that are more or less similar to SciPipe that are worth worth checking out before deciding on what tool fits you best (in approximate order of similarity to SciPipe):

Comments
  • Implement audit logging

    Implement audit logging

    Implement some kind of structured data keeper for task info (for provenance etc):

    • Parameters
    • The command run
    • Previous tasks / parameters used to generate input files?
    • Execution time
    • SLURM execution time
    • ...
    opened by samuell 8
  • Can not use a variable in an absolute path

    Can not use a variable in an absolute path

    Hello, I am rather new with both Go and Scipipe but I came up with an issue I am not sure how to solve. I am trying to create a pipeline where I want to import all the files that are in a folder and use those files in other procedures.

    My though was to first create a list with the filenames of the files in the specific folder and then assign each of those to a variable and use that variable for both "targeting" the correct file (using absolute path for example) and giving each output file a different name depending the initial filename. My code goes like this:

    package main
    
    import (
    	"io/ioutil"
    	"log"
    
    	sp "github.com/scipipe/scipipe"
    )
    
    func main() {
    	files, err := ioutil.ReadDir(".")
    	if err != nil {
    		log.Fatal(err)
    	}
    
    	var td []string
    
    	for _, f := range files {
    		td = append(td, f.Name()) // Creating a list with the filenames
    	}
    	td = append(td[:0], td[1], td[2], td[3]) // I removed the .DS_Store file I was getting in the list
    
    	wf := sp.NewWorkflow("DB", 1)
    
    	for _, target := range td {
    		train_proc := wf.NewProc(target+"_train", `echo "$(cat ~/Desktop/Project/'$target')" > {o:out}`)
    		train_proc.SetOut("out", target+"_file.txt")
    	}
    
    	wf.Run()
    }
    

    I actually want to take the content of a file and copy it to an output file, but I can't find a way to point to the file with the specific name while using a variable in the absolute path of that file. In the SetOut stage, the target variable is replaced by the different values

    Thank you in advance

    help wanted 
    opened by PaschalisAthan 5
  • Contributing to scipipe - working with github+golang

    Contributing to scipipe - working with github+golang

    I have tried out scipipe and hope to contribute to it.

    I am relatively new to golang and have a hard time testing out local changes I made in my test main program as it keeps pulling in the zip archive versioned copy of scipipe. Hence it is not picking my changes I made to the locally clone copy of scipipe.

    I am familiar with the traditional way working with C++ and Python but Golang is quite challenging from a contribution perspective.

    Any advice/workflow/pointer/best-practices ?

    Cheers

    opened by nyue 4
  • Filename

    Filename "" does not match expression [A-Za-z\/\.-_]+

    Hello I am trying to use the streamToSubstream functionality but I am getting the error below. ERROR 2019/06/12 18:09:20 Filename "" does not match expression [A-Za-z\/\.-_]+

    I also get this error when I try to run your example workflows https://github.com/pharmbio/scipipe-demo/tree/fdb98884edb98a693c2892930c088cd723070691/dnacanceranalysis

    I believe the issue is in

    func (p *StreamToSubStream) Run() {
    	defer p.CloseAllOutPorts()
    
    	scipipe.Debug.Println("Creating new information packet for the substream...")
    	subStreamIP := scipipe.NewFileIP("")
    	scipipe.Debug.Printf("Setting in-port of process %s to IP substream field\n", p.Name())
    	subStreamIP.SubStream = p.In()
    
    	scipipe.Debug.Printf("Sending sub-stream IP in process %s...\n", p.Name())
    	p.OutSubStream().Send(subStreamIP)
    	scipipe.Debug.Printf("Done sending sub-stream IP in process %s.\n", p.Name())
    }```
    
    Where `subStreamIP := scipipe.NewFileIP("")` is called.
    This triggers the error due the `checkFilename` in `NewFileIP` in ip.go
    
    Anything I am doing wrong? thanks for any help you can provide.
    bug 
    opened by JakeHagen 4
  • Use temp folders instead of temp filename extension, for running jobs?

    Use temp folders instead of temp filename extension, for running jobs?

    This is needed sometimes when you can not control the file name that is created, but still need to check if it exists, such as when unpacking a tarball with a known folder name in it

    EDIT: Old title: Add option to turn off .tmp path usage

    enhancement 
    opened by samuell 4
  • Make number of simultaneous tasks per process configurable

    Make number of simultaneous tasks per process configurable

    Currently, a process will spawn as many tasks as there are incoming sets of data packets on in-ports. If running stuff locally, this might overbook the CPU.

    Probably the best option is to have a global pool of "run leases", that are handed out to processes as they ask for them, and then handed back when they are finished.

    opened by samuell 4
  • Cloud execution

    Cloud execution

    Hi. I've looked through the docs and there doesn't seem to be any particular reference on if this is possible. What I'm envisaging is having each process run via AWS Batch or the Google Cloud Life Sciences API, which are common targets for bioinformatics pipelines.

    enhancement 
    opened by multimeric 3
  • Better way of connecting components, to allow sanity checks and more

    Better way of connecting components, to allow sanity checks and more

    If we create special InPort and OutPort structs, with some convenience functionality, we can move from:

    task2.InPorts["bar"] = task1.OutPorts["foo"]
    

    ... to something like:

    task2.InPorts["bar"].connectFrom(task1.OutPorts["foo"])
    

    ... and one could allow going the other direction too:

    task1.OutPorts["foo"].connectTo(task2.InPorts["bar"])
    

    This should also work well with static port fields, such as:

    task2.InBar.connectFrom(task1.OutFoo)
    task1.OutFoo.connectTo(task2.InBar)
    

    This would allow us to make sure that there are no unconnected ports and other sanity checks, as well as to enable traversing the workflow dependency graph to produce a textual or graphical representation of the workflow.

    An alternative approach would be to create a Channel component, that the "port" maps are initialized with, so that the assignment syntax still works (just that it is a Channel struct (with a real channel inside) that is assigned rather than a plain channel), but that wouldn't allow us the benefits stated above.

    opened by samuell 3
  • Add streaming support

    Add streaming support

    Should probably be choosable on each output, whether it should stream its output or not!

    ... either in the commandline pattern, or as a struct map field.

    opened by samuell 3
  • Serialized workflow description in JSON ?

    Serialized workflow description in JSON ?

    Hi,

    I am looking into the possibility of using SciPipe for use outside of bioinformatics, namely film/visual-effects and possibly AEC.

    Most of my work experience lately is in the film/vfx industry so I am looking to hook up SciPipe to studio facility running what we call a farm with software like Tractor (from Pixar) and maybe Deadline (Thinkbox).

    I have also recently spent time at CSL (Pharmaceutical) setting up an HPC cluster running SLURM and integrating with CWL for the bioinformatics R&D.

    I would like to know if SciPipe has a serialise description of the FBP in a form like JSON with which I can write translator to generate industry specific job management files.

    In my short time running some of the demo code, I see *.audit.json files which are generated after a workflow has completed. I would like to generate the workflow description without running the workflow.

    I am looking for something like the Dot output but in a JSON format with enough detail for me to recreate all the necessarily detail to submit jobs to SLURM, Tractor, Deadline or generate CWL, WDL or other workflow files for submission via their respective runtime like Cromwell.

    Cheers

    question 
    opened by nyue 2
  • SciPipe doesn't fail on missing output files

    SciPipe doesn't fail on missing output files

    It appeared through some of @jonalv's workflows, that scipipe does not properly fail when some declared outputs of a process are not properly created. These will AFAIK fail when trying to read from non-existing files in downstream processes, but it would be much more helpful when debugging to get the error where it happens.

    bug 
    opened by samuell 2
  • Idea for a flexible component to generate parameters dynamically from shell code

    Idea for a flexible component to generate parameters dynamically from shell code

    A current shady spot in SciPipe is when needing to generate many sets of parameter values or file names to feed a downstream pipeline. This can be done to some extent using e.g. a globber component, but that is very limited to a very specific use case.

    Below is a sketch on an idea on how to implement a type of component that can generate this based on shell scripts.

    The idea is that you are supposed to write a shell script that produces a set of JSON-objects, one per line, with the parameter and output filename fields populated.

    The API could look like this:

    looper := wf.NewLooper("looper", "for f in data/*.csv; do echo \"{ 'outfile': '{o:outfile:$f}' }\"; done;")
    
    otherProc := wf.NewProc("other-proc", "some-command -in {i:infile} ... Etc etc")
    
    outerProc.In("infile").From(looper.Out("outfile"))
    
    // ... etc etc ...
    

    The example above is basically just a globber, but the same method could be used for populating parameters as well. I will update the example shortly to illustrate the combined generation of filenames and parameters.

    enhancement 
    opened by samuell 0
  • Requests for a tool that monitors the execution of scipipe workflows.

    Requests for a tool that monitors the execution of scipipe workflows.

    panoptes is such a tool for snakemake workflows. nf-tower is such a tool for nextflow workflows. Has scipipe provided such a tool or sort of APIs that I can use to monitor the execution of scipipe workflow? If not, could you please give me some hints about how to do that?

    Regards, Zhen

    opened by zhangzhen 2
  • Merge tags and params?

    Merge tags and params?

    Right now, params and tags associated with IPs with AuditInfo serve quite similar roles, with the main difference being that parameters are aimed to be parameters sent to whetever program is being executed, while tags might be other metadata that is not sent to a program, but might be extracted from the filename or the file itself and used for further filtering, grouping etc.

    Thus, it seems worth considering whether these could be stored in the same map of "tags".

    opened by samuell 0
  • Order globs on file size

    Order globs on file size

    When batch processing large number of files of different size and having multiple gather points it becomes important to make sure to start the long running things early. Often long running correlates well to file size. Hence it would be nice to be able to sort a file glob for file size in order to get the long running things fired of early.

    opened by jonalv 0
  • Enable renaming paths to final paths on different partition

    Enable renaming paths to final paths on different partition

    Now, if writing to paths that are on a different partition, for example /tmp/foo, if / is on a different hard drive partition than your /home/ folder where you might execute the workflow, the os.Rename() call in FinalizePaths() will fail with "invalid cross-device link".

    To come around this, we could check for that specific error and if so, do a proper copy and remove instead.

    Some links with pointers and/or ideas:

    enhancement 
    opened by samuell 0
  • Ability to depend on task completion

    Ability to depend on task completion

    For some components which return multiple outputs, such as the globber component, it would be useful to be able to depend on the process' full completion in downstream tasks.

    Reporter: @jonalv

    enhancement 
    opened by samuell 0
Releases(v0.12.0)
  • v0.12.0(Oct 14, 2021)

    This release contains an important bugfix, making sure that we properly handle all errors (#146).

    It also contains a minor API change, that could affect custom components making use of the AtomizeIPs() method, which is now renamed to FinalizePaths().

    Source code(tar.gz)
    Source code(zip)
  • v0.11.2-rc4(Oct 14, 2021)

  • v0.11.2-rc3(Oct 11, 2021)

  • v0.11.2-rc2(Oct 7, 2021)

  • v0.11.2-rc1(Sep 29, 2021)

  • v0.11.1(Sep 1, 2021)

  • v0.11.0(Aug 29, 2021)

    This is a rather small release, which fixes issue #134 and improves logging, by making sure that more context information, such as process, task, workflow or file-IP name is included wherever applicable.

    It increases more than the most minor part in the version since it contains some smaller breaking changes, in particular that the NewFileIP() constructor returns errors instead of failing the program locally (this is, as you might have guessed, part of the move towards logging errors in a place with more context available).

    Source code(tar.gz)
    Source code(zip)
  • v0.10.2(Jun 28, 2021)

  • v0.10.1(May 25, 2021)

  • v0.10.0(May 24, 2021)

    This release fixes bug #130. Read more in the bug description for info about this. We are bumping to 0.10.0 as this is potentially a breaking change for workflows already using the basename modifier.

    Source code(tar.gz)
    Source code(zip)
  • v0.9.14(Apr 28, 2021)

    This release contains a bugfix for #125 which was recently introduced as part of the fix for #66. This release fixes #125 by allowing multiple occurances of the same port-type/port-name combo, but not the same port-name for different types.

    Source code(tar.gz)
    Source code(zip)
  • v0.9.13(Apr 10, 2021)

  • v0.9.12(Apr 10, 2021)

    This release contains a number of somewhat important bugfixes, for stability and robustness of workflows:

    • #54: Properly fail when some outputs are missing, instead of silently passing on references to those.
    • #66: Showing an intelligible warning about when the same port name is used for multiple ports (in/out/params) in the same process.
    • #117: Automatically create any parent directories when plotting the workflow graph.
    • #119: Add a component for selecting / filtering IPs, based on a custom Go function, which can access all the data of the IP.
    • #120: Properly handle references to parent directories in out-paths (e.g. ../../somedir/somefile.txt).
    Source code(tar.gz)
    Source code(zip)
  • v0.9.11(Mar 8, 2021)

  • v0.9.10(Oct 12, 2020)

  • v0.9.9(Sep 23, 2020)

    This is a small release containing one new enhancement, and a bug fix, both in the Concatenator component:

    • It implements #109, adding a new "GroupByTag" feature, to enable separating output based on a tag in IPs.
    • It fixes #108 so that concatenator does not swallow newlines.
    Source code(tar.gz)
    Source code(zip)
  • v0.9.8(Sep 4, 2020)

    This release mainly fixes issue #71 , where a workflow could not run if it contained just a single process.

    Many thanks to @kerkomen for contributing the fix!

    Source code(tar.gz)
    Source code(zip)
  • v0.9.7(Aug 20, 2020)

    This is a bug fix release, fixing a number of bugs: #93, #104, #105, #106

    The biggest change is that path modifiers such as basename and %.ext for trimming file name extensions, are now available for all placeholders in the main command as well as in the SetOut() function.

    The available modifiers are now documented here in the docs: Available path modifiers

    It also now has support for go modules (a simple change, as scipipe has zero dependencies :)).

    Thanks to @dwmunster for some contributions, and @jonalv for providing input leading to discovering most of, the fixes in this release.

    Source code(tar.gz)
    Source code(zip)
  • v0.9.6(Sep 7, 2019)

    This release:

    • Fixes #78 as reported by @JakeHagen via PR #91 submitted by @rbisewski
    • Fixed a typo in the logging, via PR #90 submitted by @JakeHagen
    • Hashes substream IP paths for the temp directory, via PR #89 submitted by @dwmunster

    With these fixes, the DNA Cancer Analysis demo from the SciPipe paper, which was broken, does now work again!

    Many thanks for the contribution guys - it's so much appreacated! :raised_hands:

    Source code(tar.gz)
    Source code(zip)
  • v0.9.5(Jun 19, 2019)

  • v0.9.4(Jun 14, 2019)

    This release contains important improvements and bugfixes kindly contributed by @dwmunster:

    On a smaller note, the CircleCI configuration is now updated to their new 2.0 syntax.

    Upgrade is highly recommended, as usual with:

    go get github.com/scipipe/scipipe/...
    
    Source code(tar.gz)
    Source code(zip)
  • v0.9.3(Jun 13, 2019)

    This release fixed issue #77 which has caused intermittent deadlocks (and seemingly sometimes timeouts of tests).

    As usual, update scipipe with:

    go get -u github.com/scipipe/scipipe/...
    

    (and don't forget the ...)

    Source code(tar.gz)
    Source code(zip)
  • v0.9.2(May 22, 2019)

    This is a small but important fix for a bug in the FileCombinator components that was introduced in 0.9.1, that caused a deadlock when trying to send more than 16 files through the FileCombinator.

    Source code(tar.gz)
    Source code(zip)
  • v0.9.1(May 21, 2019)

  • v0.9.0(Apr 24, 2019)

    Often in workflows, we need to generate a list of parameters commandline to drive workflows. For example if we have a list of target proteins for which we want to create predictive models for, for drug molecules, out of a large combined dataset. Optimally this should be doable with any shell command, so that it can be generated based on some existing data. We recently realized that there is not really an easy way to do this in SciPipe currently. Until now.

    Now there is the new CommandToParams component.

    We will update with docs shortly, but in the meanwhile, the test contains a small (quite dumb) mini example of how to use it.

    Note: We have plans to create a more integrated solution for reading data into parameter streams, so that this can be done in the normal shell commands created with workflow.NewProc(), but this dedicated component is created to serve the basic need for this functionality while we get that in place.

    Small breaking change: components.{FileReader -> FileToParamsReader}

    This release also renames the FileReader component to FileToParamsReader, as we realized that it was not really functioning properly in its previous role, where the inclusion of line breaks was causing troubles. It is now properly unit tested to function for reading individual rows in files into a stream of parameter values to send on the parameter ports of other processes.

    This small breaking change means we're ticking the (still pre 1.0) version from 0.8.x to 0.9.x.

    Source code(tar.gz)
    Source code(zip)
  • v0.8.3(Apr 17, 2019)

    Now one can use basename in path formatters, to remove everything from a path up to the actual filename.

    So, say that you have an input path that is: /some/folder/file.txt

    ... and that you want to use it, with the port name infile for setting the path of outfile using SetOut() like this:

    aProcess.SetOut("outfile", "{i:infile}.some.extension")
    

    which would result in the filename:

    /some/folder/file.txt.some.extension
    

    Then, to remove /some/folder/ from the input string, you can do:

    aProcess.SetOut("outfile", "{i:infile|basename}.some.extension")
    

    With this, the output file will instead be named just:

    file.txt.some.extension
    
    Source code(tar.gz)
    Source code(zip)
  • v0.8.2(Apr 7, 2019)

  • v0.8.1(Aug 14, 2018)

    This is an important bug fix, and anybody using the previous version, 0.8 is strongly recommended to upgrade.

    The release contains primarily a fix for a bug that was apparently introduced in 0.8, with the move to folder-based temporary paths, where existing temporary paths were not properly detected, and unfinished files could potentially be mixed with finished ones.

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Jul 26, 2018)

    This release contains a very large number of improvements, too large to list individually here, but a few selected one are covered further below. This release brings in another contributor, @jonalv, who did fantastic work on the TeX template for the audit report conversion feature.

    Notable new features

    A simplified API

    Each task are now executed in its own isolated temporary folder, so that extra files generated by commands are properly captured and handled in an atomic way (to avoid mixing up non-finished and finished files).

    Among the improved areas is that setting paths is now not even required. If you still want to set the output file extension for outputs, you can do that with the following syntax in an out-port placeholder in commands: {o:portname|.csv}, for the .csv extension.

    Furthermore, the many different Process.SetPath... methods are now unified to only two: Process.SetOut(portName string, pattern string) and Process.SetOutFunc(portName string, pathFunc func(Task) string).

    SetOut() takes placeholder similar to those used to define the command pattern, such as {i:portname} for input files and {p:param1} for parameters. It also allows certain modifiers after the port name, separated by | characters, such as for trimming the end of a string, which is done like so: {i:bamfile|%.bam}, given that we have an in-port named "bamfile", for which we want to re-use its filename, but without the .bam file extension.

    As always, for more information about this, see the documentation.

    Graph plotting

    SciPipe can now plot the graph of a workflow to a .dot file, which can be converted to PDF with the GraphViz dot command (See the documentation for this feature).

    This can be done by adding this line in the workflow Go file:

    myWorkflow.PlotGraph("myworkflow.dot")
    

    One can also let SciPipe execute the dot command as well to convert to PDF in one go (requires having GraphViz installed):

    myWorkflow.PlotGraphPDF("myworkflow.dot")
    

    An example plot can be seen here: selection_851

    Convert Audit report to TeX / PDF

    This is an experimental feature. (See the documentation for this feature).

    Usage:

    scipipe audit2tex somefile.audit.json
    pdflatex somefile.audit.tex
    open somefile.audit.pdf
    

    How it looks currently: selection_850

    Convert Audit report to HTML

    This is an experimental feature. (See the documentation for this feature).

    Usage:

    scipipe audit2html somefile.audit.json
    

    How it looks currently: selection_771

    Convert Audit report to Bash

    This is an experimental feature. (See the documentation for this feature).

    Usage:

    scipipe audit2bash somefile.audit.json
    

    How it looks currently: selection_852

    Source code(tar.gz)
    Source code(zip)
  • v0.8(Jul 28, 2018)

Owner
SciPipe
A flow-based scientific workflow library and general pattern in Go
SciPipe
Waiton - Commandline for executing command and waiting on output

waiton Commandline for executing command and waiting on output Output of waiton

Andreas Bergmeier 0 Feb 4, 2022
Commandline Utility To Create Secure Password Hashes (scrypt / bcrypt / pbkdf2)

passhash Create Secure Password Hashes with different algorithms. I/O format is base64 conforming to RFC 4648 (also known as url safe base64 encoding)

Michael Gebetsroither 16 Oct 10, 2022
An alternative syntax to generate YAML (or JSON) from commandline

yo An alternative syntax to generate YAML (or JSON) from commandline. The ultimate commanline YAML (or JSON) generator! ... I'm kidding of course! but

Luca Sepe 10 Jul 30, 2022
Commandline tool to generate Cistercian numerals

cistercian Commandline tool to generate Cistercian numerals. Installation go get github.com/rhardih/cistercian Example usage Text $ cistercian 7323

René Hansen 42 Sep 30, 2022
A commandline tool to resolve URI Templates expressions as specified in RFC 6570.

URI Are you tired to build, concat, replace URL(s) (via shell scripts sed/awk/tr) from your awesome commandline pipeline? Well! here is the missing pi

Luca Sepe 17 Jun 9, 2021
Teardown API for Commandline Based Applications

Building go build -ldflags "-s -w" -o ./build/api.exe ./ Get the latest XML from https://www.teardowngame.com/modding/api.xml Commands help list searc

Dan 4 Mar 1, 2022
tigrfont is a commandline tool for creating bitmap font sheets for TIGR from TTF or BDF font files.

tigrfont - bitmap font sheet generator for TIGR tigrfont is a commandline tool for creating bitmap font sheets for TIGR from TTF or BDF font files. TI

Erik Agsjö 10 Dec 5, 2022
NYAGOS - The hybrid Commandline Shell between UNIX & DOS

The Nihongo Yet Another GOing Shell English / Japanese NYAGOS is the commandline-shell written with the Programming Language GO and Lua. There are som

nyaos.org 305 Dec 30, 2022
Curried commandline

curry Install $ go install github.com/lambdasawa/[email protected] $ brew tap lambdasawa/tap $ brew install lambdasawa/tap/curry Usage Basic usage. $ curry

Tsubasa Irisawa 3 Dec 10, 2021
A simple golang marshaller from commandline to a struct

flagmarshal SYNOPSIS A simple golang marshaller from commandline to a struct ParseFlags(structptr interface{}) error DESCRIPTION Very simple implement

null 0 Jan 22, 2022
GodSpeed is a robust and intuitive manager for reverse shells.

GodSpeed is a robust and intuitive manager for reverse shells. It supports tab-completion, verbose listing of connected hosts and easy interaction with selected shells by passing their corresponding ID.

RedCode Labs 76 Dec 12, 2022
Clip - A simple and robust clipboard manager

clip A simple and robust clipboard manager Clip is currently ONLY for macos and

Daniel 4 Jun 20, 2022
Buildkite-cli - Command line tool for interacting with Buildkite pipelines, builds, and more

Buildkite CLI Command line tool for interacting with Buildkite pipelines, builds

Mark Skelton 1 Jan 7, 2022
Bk - Command line tool for interacting with Buildkite pipelines, builds, and more

Buildkite CLI Command line tool for interacting with Buildkite pipelines, builds

Mark Skelton 1 Jan 7, 2022
A simple CLI tool to use the _simulate API of elasticsearch to quickly test pipelines

elasticsearch-pipeline-tester A simple CLI tool to use the _simulate API of elasticsearch to quickly test pipelines usage: pipelinetester [<flags>] <p

Mario Castro 1 Oct 19, 2021
A CLI tool implemented by Golang to manage `CloudComb` resource

CloudComb CLI tool: comb Get Started comb is a CLI tool for manage resources in CloudComb base on cloudcomb-go-sdk. Support Mac, Linux and Windows. We

Bingo Huang 22 Jan 4, 2021
Sipexer - Modern and flexible SIP (RFC3261) command line tool

sipexer Modern and flexible SIP (RFC3261) command line tool. Overview sipexer is

Daniel-Constantin Mierla 149 Jan 1, 2023
Procmon is a Linux reimagining of the classic Procmon tool from the Sysinternals suite of tools for Windows. Procmon provides a convenient and efficient way for Linux developers to trace the syscall activity on the system.

Process Monitor for Linux (Preview) Process Monitor (Procmon) is a Linux reimagining of the classic Procmon tool from the Sysinternals suite of tools

Windows Sysinternals 3.5k Dec 29, 2022
Fast, secure, efficient backup program

Introduction restic is a backup program that is fast, efficient and secure. It supports the three major operating systems (Linux, macOS, Windows) and

The restic backup program 18.9k Dec 31, 2022