Command pigeon generates parsers in Go from a PEG grammar.

Overview

pigeon - a PEG parser generator for Go

GoDoc build status GoReportCard Software License

The pigeon command generates parsers based on a parsing expression grammar (PEG). Its grammar and syntax is inspired by the PEG.js project, while the implementation is loosely based on the parsing expression grammar for C# 3.0 article. It parses Unicode text encoded in UTF-8.

See the godoc page for detailed usage. Also have a look at the Pigeon Wiki for additional information about Pigeon and PEG in general.

Releases

  • v1.0.0 is the tagged release of the original implementation.
  • Work has started on v2.0.0 with some planned breaking changes.

Github user @mna created the package in April 2015, and @breml is the package's maintainer as of May 2017.

Breaking Changes since v1.0.0

  • Removed support for Go < v1.11 to support go modules for dependency tracking.

  • Removed support for Go < v1.9 due to the requirement golang.org/x/tools/imports, which was updated to reflect changes in recent versions of Go. This is in compliance with the Go Release Policy respectively the Go Release Maintenance, which states support for each major release until there are two newer major releases.

Installation

Provided you have Go correctly installed with the $GOPATH and $GOBIN environment variables set, run:

$ go get -u github.com/mna/pigeon

This will install or update the package, and the pigeon command will be installed in your $GOBIN directory. Neither this package nor the parsers generated by this command require any third-party dependency, unless such a dependency is used in the code blocks of the grammar.

Basic usage

$ pigeon [options] [PEG_GRAMMAR_FILE]

By default, the input grammar is read from stdin and the generated code is printed to stdout. You may save it in a file using the -o flag.

Example

Given the following grammar:

{
// part of the initializer code block omitted for brevity

var ops = map[string]func(int, int) int {
    "+": func(l, r int) int {
        return l + r
    },
    "-": func(l, r int) int {
        return l - r
    },
    "*": func(l, r int) int {
        return l * r
    },
    "/": func(l, r int) int {
        return l / r
    },
}

func toIfaceSlice(v interface{}) []interface{} {
    if v == nil {
        return nil
    }
    return v.([]interface{})
}

func eval(first, rest interface{}) int {
    l := first.(int)
    restSl := toIfaceSlice(rest)
    for _, v := range restSl {
        restExpr := toIfaceSlice(v)
        r := restExpr[3].(int)
        op := restExpr[1].(string)
        l = ops[op](l, r)
    }
    return l
}
}


Input <- expr:Expr EOF {
    return expr, nil
}

Expr <- _ first:Term rest:( _ AddOp _ Term )* _ {
    return eval(first, rest), nil
}

Term <- first:Factor rest:( _ MulOp _ Factor )* {
    return eval(first, rest), nil
}

Factor <- '(' expr:Expr ')' {
    return expr, nil
} / integer:Integer {
    return integer, nil
}

AddOp <- ( '+' / '-' ) {
    return string(c.text), nil
}

MulOp <- ( '*' / '/' ) {
    return string(c.text), nil
}

Integer <- '-'? [0-9]+ {
    return strconv.Atoi(string(c.text))
}

_ "whitespace" <- [ \n\t\r]*

EOF <- !.

The generated parser can parse simple arithmetic operations, e.g.:

18 + 3 - 27 * (-18 / -3)

=> -141

More examples can be found in the examples/ subdirectory.

See the godoc page for detailed usage.

Contributing

See the CONTRIBUTING.md file.

License

The BSD 3-Clause license. See the LICENSE file.

Issues
  • errors during compilation of grammar.go are reported as grammar.go rather than grammar.peg

    errors during compilation of grammar.go are reported as grammar.go rather than grammar.peg

    This is a nuisance when used, eg, with emacs compilation mode.

    lex/yacc/byacc/bison/et all use line number preprocessor lines in the generated output to refer back to the original source file even when it's the C compiler that is reporting the error. I don't know how to do that in go but I presume that it must be possible. If not, it should be.

    opened by kpixley 5
  • Unable to generate optimized grammar

    Unable to generate optimized grammar

    Hello,

    First, thanks a lot for maintaining this project, it's a great library! I'm currently using it in https://github.com/bytesparadise/libasciidoc and it's working really well 🙌

    However, since https://github.com/mna/pigeon/commit/9fec3898cef80afe60fbe5df398fceca513566b8 was merged, I've been getting the following error when running the command below in the project's root:

    $ pigeon -optimize-grammar -alternate-entrypoints PreparsedDocument,InlineElementsWithoutSubtitution,VerbatimBlock -o ./pkg/parser/asciidoc_parser.go  ./pkg/parser/asciidoc-grammar.peg
    panic: runtime error: invalid memory address or nil pointer dereference [recovered]
            panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x1284d49]
    
    goroutine 1 [running]:
    main.main.func1(0x13d3500, 0xc000072020, 0xc0005fde50)
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/main.go:87 +0x13d
    panic(0x12f2440, 0x15f2fd0)
            /usr/local/Cellar/go/1.12.1/libexec/src/runtime/panic.go:522 +0x1b5
    github.com/mna/pigeon/ast.(*grammarOptimizer).optimizeRule(0xc0004bfec0, 0x13d0d80, 0xc0003e2300, 0xc00045d080, 0xc0003e4d20)
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/ast/ast_optimize.go:243 +0x309
    github.com/mna/pigeon/ast.(*grammarOptimizer).optimize(0xc0004bfec0, 0x13d0d60, 0xc0003e4f00, 0x13d0e40, 0xc0004bfec0)
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/ast/ast_optimize.go:182 +0x27ec
    github.com/mna/pigeon/ast.(*grammarOptimizer).Visit(0xc0004bfec0, 0x13d0d60, 0xc0003e4f00, 0xc0003e4d20, 0xc0004bfec0)
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/ast/ast_optimize.go:39 +0x3e
    github.com/mna/pigeon/ast.Walk(0x13d0e40, 0xc0004bfec0, 0x13d0d60, 0xc0003e4f00)
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/ast/ast_walk.go:20 +0x55
    github.com/mna/pigeon/ast.Walk(0x13d0e40, 0xc0004bfec0, 0x13d0c80, 0xc0004dba90)
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/ast/ast_walk.go:41 +0x43b
    github.com/mna/pigeon/ast.Optimize(0xc0004dba90, 0xc0001044e0, 0x3, 0x3)
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/ast/ast_optimize.go:464 +0x14e
    main.main()
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/main.go:120 +0x106f
    

    The grammar in my project is already quite big: https://github.com/bytesparadise/libasciidoc/blob/master/pkg/parser/asciidoc-grammar.peg and unfortunalely the stack trace does not give much information about the rule(s) that cause the error, so I can't really narrow down the grammar to a simpler form :/

    Note: building and running pigeon with the previous commit works like a charm.

    opened by xcoulon 4
  • Should not allow left recursion grammar to pass conversion

    Should not allow left recursion grammar to pass conversion

    eg:

    {
    //------ start
    package main
    
    func main() {
        if len(os.Args) != 2 {
            log.Fatal("Usage: calculator 'EXPR'")
        }
        got, err := ParseReader("", strings.NewReader(os.Args[1]))
        if err != nil {
            log.Fatal(err)
        }
        fmt.Printf("%#v\n", got)
    }
    
    // ------ end
    }
    
    Input <- expr:Expr EOF {
        return expr, nil
    }
    
    Expr <- _ Expr _ LogicOp _ Expr _/ _ Value _
    
    LogicOp <- ("and" / "or") {
        return string(c.text), nil
    }
    
    Value <- [0-9]+ {
        return string(c.text),nil
    }
    
    _ "whitespace" <- [ \n\t\r]*
    
    EOF <- !.
    
    

    go run main.go "1 and 1"

    Will cause dead loop

    opened by cch123 2
  • [Feature Request] Operator Precedence Climbing

    [Feature Request] Operator Precedence Climbing

    {
    //------ start
    package main
    
    type CompExpr struct {
        left string
        op string
        right string
    }
    
    type LogicExpr struct {
        left interface{}
        op string
        right interface{}
    }
    
    func main() {
        if len(os.Args) != 2 {
            log.Fatal("Usage: calculator 'EXPR'")
        }
        got, err := ParseReader("", strings.NewReader(os.Args[1]))
        if err != nil {
            log.Fatal(err)
        }
        fmt.Printf("%#v\n", got)
    }
    
    // ------ end
    }
    
    Input <- expr:Expr EOF {
        return expr, nil
    }
    
    Expr <- LogicExpr / Atom
    
    LogicExpr <- _ atom:Atom _ op: LogicOp _ expr: Expr _ {
        return  LogicExpr {left : atom, op : op.(string), right: expr}, nil
    }
    
    Atom <- '(' expr:Expr ')' {
        return expr, nil
    } / _ field: Ident _ op:BinOp _ value:Value _{
        return CompExpr{left : field.(string), op: op.(string), right : value.(string)}, nil
    }
    
    LogicOp <- ("and" / "or"){
        return string(c.text), nil
    }
    
    BinOp <- ("!=" / ">=" / "<=" / "=" / "<>" / ">" / "<") {
        return string(c.text),nil
    }
    
    Ident <- [a-zA-Z][a-zA-Z0-9]* {
        return string(c.text),nil
    }
    
    Value <- [0-9]+ {
        return string(c.text),nil
    }
    
    _ "whitespace" <- [ \n\t\r]*
    
    EOF <- !.
    
    

    The right recursion grammar will auto-generate right association

    If pigeon can implement precedence climbing or something else, it will be very convenient ~

    opened by cch123 1
  • Certain inputs take an extremely long time to parse

    Certain inputs take an extremely long time to parse

    Hello!

    First of all, thank you very much for maintaining this project!

    I'm hoping that someone can provide a bit of guidance. I apologize in advance for not having a minimal test case to reproduce this issue.

    The issue

    I've been doing some fuzz testing on OPA and I ran into one case where certain inputs would cause the program to hang and then crash. Here's a snippet of the crash:

    program hanged (timeout 10 seconds)
    
    SIGABRT: abort
    PC=0x45766f m=0 sigcode=0
    
    goroutine 1 [running]:
    runtime.aeshashbody()
            /tmp/go-fuzz-build473666132/goroot/src/runtime/asm_amd64.s:917 +0x5f fp=0xc0041a78f8 sp=0xc0041a78f0 pc=0x45766f
    runtime.mapassign_faststr(0x764fc0, 0xc0041a7a20, 0xc0005d1fb0, 0x3, 0xc0118f8d68)
            /tmp/go-fuzz-build473666132/goroot/src/runtime/map_faststr.go:202 +0x62 fp=0xc0041a7960 sp=0xc0041a78f8 pc=0x4135d2
    github.com/open-policy-agent/opa/ast.(*parser).parse(0xc00049a180, 0xa53e00, 0x0, 0x0, 0x0, 0x0)
            /tmp/go-fuzz-build473666132/gopath/src/github.com/open-policy-agent/opa/ast/parser.go:4362 +0x272 fp=0xc0041a7b50 sp=0xc0041a7960 pc=0x6dc782
    github.com/open-policy-agent/opa/ast.Parse(0x0, 0x0, 0xc0001c95e8, 0x8, 0x8, 0xc000527cb8, 0x2, 0x2, 0xc0002f4460, 0x0, ...)
            /tmp/go-fuzz-build473666132/gopath/src/github.com/open-policy-agent/opa/ast/parser.go:3784 +0x98 fp=0xc0041a7ba8 sp=0xc0041a7b50 pc=0x6da558
    github.com/open-policy-agent/opa/ast.ParseStatements(0x0, 0x0, 0xc0001c95e0, 0x8, 0xc0001c95e0, 0x8, 0x200000003, 0xc000000300, 0xc000022000, 0xc000527df8, ...)
            /tmp/go-fuzz-build473666132/gopath/src/github.com/open-policy-agent/opa/ast/parser_ext.go:468 +0x173 fp=0xc0041a7d50 sp=0xc0041a7ba8 pc=0x6e62b3
    github.com/open-policy-agent/fuzz-opa.Fuzz(0x7f734c798000, 0x8, 0x200000, 0x3)
    

    The crash above occurs here: https://github.com/open-policy-agent/opa/blob/master/ast/parser.go#L4362

    I modified the code to print the size of the maxFailExpected slice and found that it grew to very large sizes in pathological cases. For example the input {{{{{{{{ takes 3.5s to parse (error) and the slice holds around 3,000,000 elements.

    Expected behaviour

    It's not clear whether much can be done about this. In the case of OPA, we don't display the expected values (because we found them too noisy to be helpful) so disabling the code that generates them is an option, however, I'm not sure that would resolve the problem because valid inputs with a similar structure also take a very long time to parse (e.g., {{{{{{{{}}}}}}}} takes ~1.5s before succeeding.)

    The PEG file is here: https://github.com/open-policy-agent/opa/blob/master/ast/rego.peg

    The vendored version is bb0192cfc2ae6ff30b9726618594b42ef2562da5.

    Any suggestions would be appreciated.

    opened by tsandall 1
Releases(v1.1.0)
Owner
Martin Angers
Martin Angers
This command line converts thuderbird's exported RSS .eml file to .html file

thunderbird-rss-html This command line tool converts .html to .epub with images fetching. Install > go get github.com/gonejack/thunderbird-rss-html Us

会有猫的 0 Dec 15, 2021
Peg, Parsing Expression Grammar, is an implementation of a Packrat parser generator.

PEG, an Implementation of a Packrat Parsing Expression Grammar in Go A Parsing Expression Grammar ( hence peg) is a way to create grammars similar in

Andrew Snodgrass 860 Aug 3, 2022
Tiny binary serializer and deserializer to create on demand parsers and compilers

Parco Hobbyist binary compiler and parser built with as less reflection as possible, highly extensible and with zero dependencies. There are plenty pa

Marquitos 27 Aug 10, 2022
Repository for the Bott the Pigeon Discord bot.

Bott The Pigeon Monorepo for the Discord Bot "Bott The Pigeon" (Or Scott the Pigeon). It is written entirely in Golang, using the Discord API, and is

null 2 Feb 16, 2022
Query and Provision Cloud Infrastructure using an extensible SQL based grammar

Deploy, Manage and Query Cloud Infrastructure using SQL [Documentation] [Developer Guide] Cloud infrastructure coding using SQL InfraQL allows you to

InfraQL 22 Apr 5, 2022
A data parser lib for Go with pythonic grammar sugar and as concern as possible for high performance

mapinterface - A data parser lib for Go with pythonic grammar sugar and as concern as possible for high performance mapinterface 旨在消灭对map/list解析而产生的层层

Knownothing 1 Nov 10, 2021
linenoise-classic is a command-line tool that generates strings of random characters that can be used as reasonably secure passwords.

linenoise-classic is a command-line tool that generates strings of random characters that can be used as reasonably secure passwords.

Mark Cornick 0 Dec 31, 2021
:zap: boilerplate template manager that generates files or directories from template repositories

Boilr Are you doing the same steps over and over again every time you start a new programming project? Boilr is here to help you create projects from

Tamer Tas 1.5k Aug 13, 2022
Reads from existing Cloud Providers (reverse Terraform) and generates your infrastructure as code on Terraform configuration

TerraCognita Imports your current Cloud infrastructure to an Infrastructure As Code Terraform configuration (HCL) or/and to a Terraform State. At Cycl

Cycloid 1.2k Aug 12, 2022
Golang package that generates clean, responsive HTML e-mails for sending transactional mail

Hermes Hermes is the Go port of the great mailgen engine for Node.js. Check their work, it's awesome! It's a package that generates clean, responsive

Mathieu Cornic 2.5k Aug 11, 2022
Takes an input http.FileSystem (likely at go generate time) and generates Go code that statically implements it.

vfsgen Package vfsgen takes an http.FileSystem (likely at go generate time) and generates Go code that statically implements the provided http.FileSys

null 947 Aug 6, 2022
Generates go code to embed resource files into your library or executable

Deprecating Notice go is now going to officially support embedding files. The go command will support //go:embed tags. Go Embed Generates go code to e

Peter 6.3k Jun 2, 2021
Takes an input http.FileSystem (likely at go generate time) and generates Go code that statically implements it.

vfsgen Package vfsgen takes an http.FileSystem (likely at go generate time) and generates Go code that statically implements the provided http.FileSys

null 947 Aug 6, 2022
PiHex Library, written in Go, generates a hexadecimal number sequence in the number Pi in the range from 0 to 10,000,000.

PiHex PiHex Library generates a hexadecimal number sequence in the number Pi in the range from 0 to 1.0e10000000. To calculate using "Bailey-Borwein-P

Eduard 17 Aug 9, 2022
RTS: request to struct. Generates Go structs from JSON server responses.

RTS: Request to Struct Generate Go structs definitions from JSON server responses. RTS defines type names using the specified lines in the route file

Paolo Galeone 232 Aug 10, 2022
dfg - Generates dockerfiles based on various input channels.

dfg - Dockerfile Generator dfg is both a go library and an executable that produces valid Dockerfiles using various input channels. Table of Contents

Ozan Kaşıkçı 132 Jul 25, 2022
Generates data structure definitions from JSON files for any kind of programming language

Overview Archivist generates data structure definitions from JSON files for any kind of programming language. It also provides a library for golang to

Kingsgroup 45 Jun 28, 2022
A CLI tool that generates OpenTelemetry Collector binaries based on a manifest.

OpenTelemetry Collector builder This program generates a custom OpenTelemetry Collector binary based on a given configuration. TL;DR $ go get github.c

OpenTelemetry - CNCF 51 Aug 3, 2022
Generates Golang client and server based on OpenAPI2 (swagger) definitions

ExperienceOne Golang APIKit ExperienceOne Golang APIKit Overview Requirements Installation Usage Generate standard project structure Define the API wi

Experience One 143 Aug 9, 2022
Faker is a Go library that generates fake data for you.

Faker is a Go library that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your p

Jonathan Schweder 302 Aug 11, 2022
protoc-gen-grpc-gateway-ts is a Typescript client generator for the grpc-gateway project. It generates idiomatic Typescript clients that connect the web frontend and golang backend fronted by grpc-gateway.

protoc-gen-grpc-gateway-ts protoc-gen-grpc-gateway-ts is a Typescript client generator for the grpc-gateway project. It generates idiomatic Typescript

gRPC Ecosystem 73 Aug 5, 2022
gensvg generates SVG to an io.Writer

gensvg: A Go library for SVG generation The library generates SVG as defined by the Scalable Vector Graphics 1.1 Specification

Anthony Starks 17 Jan 10, 2022
🦉 Docuowl generates a static single-page documentation from Markdown files

?? Docuowl generates a static single-page documentation from Markdown files

Docuowl 1.2k Aug 10, 2022
A program that generates a folder structure with challenges and projects for mastering a programming language.

Challenge Generator A program that generates a folder structure with challenges and projects for mastering a programming language. Explore the docs »

João Freitas 69 Jul 20, 2022
A basic file server automatically generates self certificates and serves the given folder.

A basic file server automatically generates self certificates and serves the given folder.

Ahmet ÖZER 4 Jul 20, 2022
A Protocol Buffers compiler that generates optimized marshaling & unmarshaling Go code for ProtoBuf APIv2

vtprotobuf, the Vitess Protocol Buffers compiler This repository provides the protoc-gen-go-vtproto plug-in for protoc, which is used by Vitess to gen

PlanetScale 482 Aug 14, 2022
A simple http service that generates *.PDF reports from Grafana dashboards.

Grafana reporter A simple http service that generates *.PDF reports from Grafana dashboards. Requirements Runtime requirements pdflatex installed and

Izak Marais 762 Aug 16, 2022
Package csrf is a middleware that generates and validates CSRF tokens for Flamego

csrf Package csrf is a middleware that generates and validates CSRF tokens for Flamego.

Flamego 7 Jul 29, 2022
Dynamically Generates Ysoserial's Payload by Golang

Gososerial 介绍 ysoserial是java反序列化安全方面著名的工具 无需java环境,无需下载ysoserial.jar文件 输入命令直接获得payload,方便编写安全工具 目前已支持CC1-CC7,K1-K4和CB1链 Introduce Ysoserial is a well-

4ra1n 43 Jul 10, 2022