A dead simple parser package for Go

Overview

A dead simple parser package for Go

PkgGoDev CircleCI Go Report Card Slack chat

V2

This is version 2 of Participle. See the Change Log for details.

Note: semantic versioning API guarantees do not apply to the experimental packages - the API may break between minor point releases.

It can be installed with:

$ go get github.com/alecthomas/participle/[email protected]

The latest version from v0 can be installed via:

$ go get github.com/alecthomas/[email protected]

Introduction

The goal of this package is to provide a simple, idiomatic and elegant way of defining parsers in Go.

Participle's method of defining grammars should be familiar to any Go programmer who has used the encoding/json package: struct field tags define what and how input is mapped to those same fields. This is not unusual for Go encoders, but is unusual for a parser.

Tutorial

A tutorial is available, walking through the creation of an .ini parser.

Tag syntax

Participle supports two forms of struct tag grammar syntax.

The easiest to read is when the grammar uses the entire struct tag content, eg.

Field string `@Ident @("," Ident)*`

However, this does not coexist well with other tags such as JSON, etc. and may cause issues with linters. If this is an issue then you can use the parser:"" tag format. In this case single quotes can be used to quote literals making the tags somewhat easier to write, eg.

Field string `parser:"@ident (',' Ident)*" json:"field"`

Overview

A grammar is an annotated Go structure used to both define the parser grammar, and be the AST output by the parser. As an example, following is the final INI parser from the tutorial.

type INI struct {
  Properties []*Property `@@*`
  Sections   []*Section  `@@*`
}

type Section struct {
  Identifier string      `"[" @Ident "]"`
  Properties []*Property `@@*`
}

type Property struct {
  Key   string `@Ident "="`
  Value *Value `@@`
}

type Value struct {
  String *string  `  @String`
  Number *float64 `| @Float`
}

Note: Participle also supports named struct tags (eg. Hello string `parser:"@Ident"`).

A parser is constructed from a grammar and a lexer:

parser, err := participle.Build(&INI{})

Once constructed, the parser is applied to input to produce an AST:

ast := &INI{}
err := parser.ParseString("", "size = 10", ast)
// ast == &INI{
//   Properties: []*Property{
//     {Key: "size", Value: &Value{Number: &10}},
//   },
// }

Grammar syntax

Participle grammars are defined as tagged Go structures. Participle will first look for tags in the form parser:"...". It will then fall back to using the entire tag body.

The grammar format is:

  • @ Capture expression into the field.
  • @@ Recursively capture using the fields own type.
  • Match named lexer token.
  • ( ... ) Group.
  • "..." or '...' Match the literal (note that the lexer must emit tokens matching this literal exactly).
  • "...": Match the literal, specifying the exact lexer token type to match.
  • ... Match expressions.
  • | | ... Match one of the alternatives. Each alternative is tried in order, with backtracking.
  • ! Match any token that is not the start of the expression (eg: @!";" matches anything but the ; character into the field).

The following modifiers can be used after any expression:

  • * Expression can match zero or more times.
  • + Expression must match one or more times.
  • ? Expression can match zero or once.
  • ! Require a non-empty match (this is useful with a sequence of optional matches eg. ("a"? "b"? "c"?)!).

Notes:

  • Each struct is a single production, with each field applied in sequence.
  • @ is the mechanism for capturing matches into the field.
  • if a struct field is not keyed with "parser", the entire struct tag will be used as the grammar fragment. This allows the grammar syntax to remain clear and simple to maintain.

Capturing

Prefixing any expression in the grammar with @ will capture matching values for that expression into the corresponding field.

For example:

// The grammar definition.
type Grammar struct {
  Hello string `@Ident`
}

// The source text to parse.
source := "world"

// After parsing, the resulting AST.
result == &Grammar{
  Hello: "world",
}

For slice and string fields, each instance of @ will accumulate into the field (including repeated patterns). Accumulation into other types is not supported.

For integer and floating point types, a successful capture will be parsed with strconv.ParseInt() and strconv.ParseFloat() respectively.

A successful capture match into a bool field will set the field to true.

Tokens can also be captured directly into fields of type lexer.Token and []lexer.Token.

Custom control of how values are captured into fields can be achieved by a field type implementing the Capture interface (Capture(values []string) error).

Additionally, any field implementing the encoding.TextUnmarshaler interface will be capturable too. One caveat is that UnmarshalText() will be called once for each captured token, so eg. @(Ident Ident Ident) will be called three times.

Capturing boolean value

By default a boolean field is used to indicate that a match occurred, which turns out to be much more useful and common in Participle than parsing true or false literals. For example, parsing a variable declaration with a trailing optional syntax:

type Var struct {
  Name string `"var" @Ident`
  Type string `":" @Ident`
  Optional bool `@"?"?`
}

In practice this gives more useful AST's. If bool were to be parsed literally then you'd need to have some alternate type for Optional such as string or a custom type.

To capture literal boolean values such as true or false, implement the Capture interface like so:

type Boolean bool

func (b *Boolean) Capture(values []string) error {
	*b = values[0] == "true"
	return nil
}

type Value struct {
	Float  *float64 `  @Float`
	Int    *int     `| @Int`
	String *string  `| @String`
	Bool   *Boolean `| @("true" | "false")`
}

Streaming

Participle supports streaming parsing. Simply pass a channel of your grammar into Parse*(). The grammar will be repeatedly parsed and sent to the channel. Note that the Parse*() call will not return until parsing completes, so it should generally be started in a goroutine.

type token struct {
  Str string `  @Ident`
  Num int    `| @Int`
}

parser, err := participle.Build(&token{})

tokens := make(chan *token, 128)
err := parser.ParseString("", `hello 10 11 12 world`, tokens)
for token := range tokens {
  fmt.Printf("%#v\n", token)
}

Lexing

Participle relies on distinct lexing and parsing phases. The lexer takes raw bytes and produces tokens which the parser consumes. The parser transforms these tokens into Go values.

The default lexer, if one is not explicitly configured, is based on the Go text/scanner package and thus produces tokens for C/Go-like source code. This is surprisingly useful, but if you do require more control over lexing the builtin participle/lexer/stateful lexer should cover most other cases. If that in turn is not flexible enough, you can implement your own lexer.

Configure your parser with a lexer using the participle.Lexer() option.

To use your own Lexer you will need to implement two interfaces: Definition (and optionally StringsDefinition and BytesDefinition) and Lexer.

Stateful lexer

Participle's included stateful/modal lexer provides powerful yet convenient construction of most lexers (notably, indentation based lexers cannot be expressed).

It is sometimes the case that a simple lexer cannot fully express the tokens required by a parser. The canonical example of this is interpolated strings within a larger language. eg.

let a = "hello ${name + ", ${last + "!"}"}"

This is impossible to tokenise with a normal lexer due to the arbitrarily deep nesting of expressions.

To support this case Participle's lexer is now stateful by default.

The lexer is a state machine defined by a map of rules keyed by the state name. Each rule within the state includes the name of the produced token, the regex to match, and an optional operation to apply when the rule matches.

As a convenience, any Rule starting with a lowercase letter will be elided from output.

Lexing starts in the Root group. Each rule is matched in order, with the first successful match producing a lexeme. If the matching rule has an associated Action it will be executed. The name of each non-root rule is prefixed with the name of its group to yield the token identifier used during matching.

A state change can be introduced with the Action Push(state). Pop() will return to the previous state.

To reuse rules from another state, use Include(state).

A special named rule Return() can also be used as the final rule in a state to always return to the previous state.

As a special case, regexes containing backrefs in the form \N (where N is a digit) will match the corresponding capture group from the immediate parent group. This can be used to parse, among other things, heredocs. See the tests for an example of this, among others.

Example stateful lexer

Here's a cut down example of the string interpolation described above. Refer to the stateful example for the corresponding parser.

var lexer = stateful.Must(Rules{
	"Root": {
		{`String`, `"`, Push("String")},
	},
	"String": {
		{"Escaped", `\\.`, nil},
		{"StringEnd", `"`, Pop()},
		{"Expr", `\${`, Push("Expr")},
		{"Char", `[^$"\\]+`, nil},
	},
	"Expr": {
		Include("Root"),
		{`whitespace`, `\s+`, nil},
		{`Oper`, `[-+/*%]`, nil},
		{"Ident", `\w+`, nil},
		{"ExprEnd", `}`, Pop()},
	},
})

Example simple/non-stateful lexer

The Stateful lexer is now the only custom lexer supported by Participle, but most parsers won't need this level of flexibility. To support this common case, which replaces the old Regex and EBNF lexers, you can use stateful.MustSimple() and stateful.NewSimple().

eg. The lexer for a form of BASIC:

var basicLexer = stateful.MustSimple([]stateful.Rule{
    {"Comment", `(?i)rem[^\n]*`, nil},
    {"String", `"(\\"|[^"])*"`, nil},
    {"Number", `[-+]?(\d*\.)?\d+`, nil},
    {"Ident", `[a-zA-Z_]\w*`, nil},
    {"Punct", `[-[[email protected]#$%^&*()+_={}\|:;"'<,>.?/]|]`, nil},
    {"EOL", `[\n\r]+`, nil},
    {"whitespace", `[ \t]+`, nil},
})

Experimental - code generation

Participle v2 now has experimental support for generating code to perform lexing. Use participle/experimental/codegen.GenerateLexer() to compile a stateful lexer to Go code.

This will generally provide around a 10x improvement in lexing performance while producing O(1) garbage.

Options

The Parser's behaviour can be configured via Options.

Examples

There are several examples included:

Example Description
BASIC A lexer, parser and interpreter for a rudimentary dialect of BASIC.
EBNF Parser for the form of EBNF used by Go.
Expr A basic mathematical expression parser and evaluator.
GraphQL Lexer+parser for GraphQL schemas
HCL A parser for the HashiCorp Configuration Language.
INI An INI file parser.
Protobuf A full Protobuf version 2 and 3 parser.
SQL A very rudimentary SQL SELECT parser.
Stateful A basic example of a stateful lexer and corresponding parser.
Thrift A full Thrift parser.
TOML A TOML parser.

Included below is a full GraphQL lexer and parser:

package main

import (
	"fmt"
	"os"

	"github.com/alecthomas/kong"
	"github.com/alecthomas/repr"

	"github.com/alecthomas/participle/v2"
	"github.com/alecthomas/participle/v2/lexer"
	"github.com/alecthomas/participle/v2/lexer/stateful"
)

type File struct {
	Entries []*Entry `@@*`
}

type Entry struct {
	Type   *Type   `  @@`
	Schema *Schema `| @@`
	Enum   *Enum   `| @@`
	Scalar string  `| "scalar" @Ident`
}

type Enum struct {
	Name  string   `"enum" @Ident`
	Cases []string `"{" @Ident* "}"`
}

type Schema struct {
	Fields []*Field `"schema" "{" @@* "}"`
}

type Type struct {
	Name       string   `"type" @Ident`
	Implements string   `( "implements" @Ident )?`
	Fields     []*Field `"{" @@* "}"`
}

type Field struct {
	Name       string      `@Ident`
	Arguments  []*Argument `( "(" ( @@ ( "," @@ )* )? ")" )?`
	Type       *TypeRef    `":" @@`
	Annotation string      `( "@" @Ident )?`
}

type Argument struct {
	Name    string   `@Ident`
	Type    *TypeRef `":" @@`
	Default *Value   `( "=" @@ )`
}

type TypeRef struct {
	Array       *TypeRef `(   "[" @@ "]"`
	Type        string   `  | @Ident )`
	NonNullable bool     `( @"!" )?`
}

type Value struct {
	Symbol string `@Ident`
}

var (
	graphQLLexer = stateful.MustSimple([]stateful.Rule{
		{"Comment", `(?:#|//)[^\n]*\n?`, nil},
		{"Ident", `[a-zA-Z]\w*`, nil},
		{"Number", `(?:\d*\.)?\d+`, nil},
		{"Punct", `[-[[email protected]#$%^&*()+_={}\|:;"'<,>.?/]|]`, nil},
		{"Whitespace", `[ \t\n\r]+`, nil},
	})
	parser = participle.MustBuild(&File{},
		participle.Lexer(graphQLLexer),
		participle.Elide("Comment", "Whitespace"),
		participle.UseLookahead(2),
	)
)

var cli struct {
	EBNF  bool     `help"Dump EBNF."`
	Files []string `arg:"" optional:"" type:"existingfile" help:"GraphQL schema files to parse."`
}

func main() {
	ctx := kong.Parse(&cli)
	if cli.EBNF {
		fmt.Println(parser.String())
		ctx.Exit(0)
	}
	for _, file := range cli.Files {
		ast := &File{}
		r, err := os.Open(file)
		ctx.FatalIfErrorf(err)
		err = parser.Parse(file, r, ast)
		r.Close()
		repr.Println(ast)
		ctx.FatalIfErrorf(err)
	}
}

Performance

One of the included examples is a complete Thrift parser (shell-style comments are not supported). This gives a convenient baseline for comparing to the PEG based pigeon, which is the parser used by go-thrift. Additionally, the pigeon parser is utilising a generated parser, while the participle parser is built at run time.

You can run the benchmarks yourself, but here's the output on my machine:

BenchmarkParticipleThrift-12    	   5941	   201242 ns/op	 178088 B/op	   2390 allocs/op
BenchmarkGoThriftParser-12      	   3196	   379226 ns/op	 157560 B/op	   2644 allocs/op

On a real life codebase of 47K lines of Thrift, Participle takes 200ms and go- thrift takes 630ms, which aligns quite closely with the benchmarks.

Concurrency

A compiled Parser instance can be used concurrently. A LexerDefinition can be used concurrently. A Lexer instance cannot be used concurrently.

Error reporting

There are a few areas where Participle can provide useful feedback to users of your parser.

  1. Errors returned by Parser.Parse*() will be of type Error. This will contain positional information where available.
  2. Participle will make a best effort to return as much of the AST up to the error location as possible.
  3. Any node in the AST containing a field Pos lexer.Position will be automatically populated from the nearest matching token.
  4. Any node in the AST containing a field EndPos lexer.Position will be automatically populated from the token at the end of the node.
  5. Any node in the AST containing a field Tokens []lexer.Token will be automatically populated with all tokens captured by the node, including elided tokens.

These related pieces of information can be combined to provide fairly comprehensive error reporting.

Limitations

Internally, Participle is a recursive descent parser with backtracking (see UseLookahead(K)).

Among other things, this means that they do not support left recursion. Left recursion must be eliminated by restructuring your grammar.

EBNF

Participle supports outputting an EBNF grammar from a Participle parser. Once the parser is constructed simply call String().

Participle also includes a parser for this form of EBNF (naturally).

eg. The GraphQL example gives in the following EBNF:

File = Entry* .
Entry = Type | Schema | Enum | "scalar" ident .
Type = "type" ident ("implements" ident)? "{" Field* "}" .
Field = ident ("(" (Argument ("," Argument)*)? ")")? ":" TypeRef ("@" ident)? .
Argument = ident ":" TypeRef ("=" Value)? .
TypeRef = "[" TypeRef "]" | ident "!"? .
Value = ident .
Schema = "schema" "{" Field* "}" .
Enum = "enum" ident "{" ident* "}" .

Syntax/Railroad Diagrams

Participle includes a command-line utility to take an EBNF representation of a Participle grammar (as returned by Parser.String()) and produce a Railroad Diagram using tabatkins/railroad-diagrams.

Here's what the GraphQL grammar looks like:

EBNF Railroad Diagram

Issues
  • Calculate follow-set automatically from multiple production alternatives

    Calculate follow-set automatically from multiple production alternatives

    First of all, thanks a lot for sharing participle with the world! I felt very happy to discover what feels like a very novel approach to parser generation, parser libraries, etc.

    I took a stab at trying to rewrite a grammar for LLVM IR to use participle, but reached the following stumbling block and thought I'd reach out and ask if it is intended by design (to keep the parser simple), or if I've done something wrong, or otherwise, if we could seek to resolve so that the follow set of a token is calculated from each present production alternatives. In this case, the follow set of "target" should be {"datalayout", "triple"}.

    I wish to write the grammar as example 2, but have only gotten example 1 to work so far. Any ideas?

    Cheers :) /u

    Input

    LLVM IR input source:

    source_filename = "foo.c"
    target datalayout = "bar"
    target triple = "baz"
    

    Example 1

    Grammar:

    type Module struct {
    	Decls []*Decl `{ @@ }`
    }
    
    type Decl struct {
    	SourceFilename string      `  "source_filename" "=" @String`
    	TargetSpec     *TargetSpec `| "target" @@`
    }
    
    type TargetSpec struct {
    	DataLayout   string `  "datalayout" "=" @String`
    	TargetTriple string `| "triple" "=" @String`
    }
    

    Example run:

    [email protected] ~/D/g/s/g/m/low> low a.ll 
    &main.Module{
        Decls: {
            &main.Decl{
                SourceFilename: "foo.c",
                TargetSpec:     (*main.TargetSpec)(nil),
            },
            &main.Decl{
                SourceFilename: "",
                TargetSpec:     &main.TargetSpec{DataLayout:"bar", TargetTriple:""},
            },
            &main.Decl{
                SourceFilename: "",
                TargetSpec:     &main.TargetSpec{DataLayout:"", TargetTriple:"baz"},
            },
        },
    }
    

    Example 2

    Grammar:

    type Module struct {
    	Decls []*Decl `{ @@ }`
    }
    
    type Decl struct {
    	SourceFilename string `  "source_filename" "=" @String`
    	DataLayout     string `| "target" "datalayout" "=" @String`
    	TargetTriple   string `| "target" "triple" "=" @String`
    }
    

    Example run:

    [email protected] ~/D/g/s/g/m/low> low a.ll
    2017/08/27 21:15:38 a.ll:3:7: expected ( "datalayout" ) not "triple"
    
    opened by mewmew 31
  • Proposal: interface type productions

    Proposal: interface type productions

    In the past, when I've written recursive descent parsers by hand, I have used interfaces to represent "classes" of productions, such as Stmt, Expr, etc - this simplifies the consuming code considerably.

    I would like to be able to do something similar using participle - for example, I would like to define an Expr interface, and then parse that manually while allowing the rest of the grammar to be managed by participle.

    Perhaps it could look something like this:

    // Expr is the interface implemented by expression productions
    type Expr interface { expr() }
    
    // These types all implement Expr, and are parsed manually
    type Atom struct { /* ... */ }
    type Unary struct { /* ... */ }
    type Binary struct { /* ... */ }
    
    // This is the parser definition, with a new `UseInterface` option to define a custom parsing func for the Expr interface
    var theParser = participle.MustBuild(&Grammar{}, participle.UseInterface(
        func (lex *lexer.PeekingLexer) (Expr, error) {
            /* Provide an implementation of the expression parser here */
        },
    ))
    
    // And then I can use `Expr` like this:
    type Stmt struct {
        Expr   Expr    `@@` // <- participle has registered `Expr` as an interface that can be parsed
        Assign *Assign `@@`
    }
    

    As an additional nice-to-have, it would be cool if there was a way to give a *lexer.PeekingLexer and a production to the parser, and have it fill in the data for me. Then I could compose participle-managed productions within the manual parsing code for Expr. This would allow me to break out of participle in a limited way, just for the things that I need (such as managing precedence climbing)

    opened by mccolljr 21
  • How to write all optional but at least one necessary

    How to write all optional but at least one necessary

    I'm implementing CSS Selector Grammer(https://www.w3.org/TR/selectors-4/#grammar) by participle. But I'm stucked with to implement compound-selector.

    <compound-selector> = [ <type-selector>? <subclass-selector>*
                            [ <pseudo-element-selector> <pseudo-class-selector>* ]* ]!
    

    compound-selector expects at least one optional value by !. If this is not implemented, parser will go in infinite loop. Is it possible to implement this by participle?

    opened by tamayika 20
  • Support Matching EOF

    Support Matching EOF

    The parser actually supports matching EOF but then panics: panic: m:2:41: branch <eof> was accepted but did not progress the lexer at m:2:41 ("") [recovered]

    I think it would be beneficial to special-case this check to allow matching the end of a file.

    opened by tooolbox 14
  • Starting work on negation, wip

    Starting work on negation, wip

    Here is a naive interpretation on how the negation could function.

    If you want to do it entirely yourself, go ahead, but I'm willing to see this through as I really would like to see it come to fruition and this helps me hone my go funk.

    I'm looking for some feedback on that preliminary implementation.

    Am I missing something glaring ? Is the clone stuff potentially a performance killer ? Should Parse() be a little more involved ?

    opened by ceymard 13
  • Support for sub-lexers

    Support for sub-lexers

    To support more complex languages, it should be possible to elegantly define stateful lexers.

    Ideally this would support:

    • "here docs", eg. cat << EOF\nEOF (where EOF is a user-defined marker).
    • Runtime selected sub-lexers, ala markdown's ```<language> blocks - this would defer to some external function in order to set the lexer based on <language>.
    • Recursive lexers for eg. string interpolation, "${"${var}"}" - in this situation a new lexer is pushed onto the state when ${ is encountered, and popped when } is encountered.

    My hunch is that some kind of "stateful EBNF" could work, but it would need to be extensible programatically, and it's not clear exactly how this would be expressed.

    proposal 
    opened by alecthomas 13
  • Optional comments capturing while normally eliding them

    Optional comments capturing while normally eliding them

    For https://github.com/tawasprache/kompilierer, I'd like to be able to obtain the text of a comment before an AST node, for documentation generation purposes, while otherwise ignoring them in the parser.

    I envision the API looking similar to the Pos/EndPos API:

    type Item struct {
        Name `"function" @Ident`
        
        PrecedingComment lexer.Comment
    }
    

    can parse

    // hello!
    function hi
    

    with the comment and

    function hi
    

    without one

    opened by pontaoski 12
  • antlr2participle

    antlr2participle

    This adds the following:

    • A parser for .g4 ANTLR files.
    • A generator that uses an ANTLR AST to create a Participle lexer & parser.

    This is an initial draft. Notes:

    • Documentation is on the way.
    • Lexer modes are not yet implemented.
    • Recursive lexing is not yet implemented.
    • The skip lexer command is supported. The channel lexer command acts like skip. No other lexer commands are supported yet.
    • Actions and predicates are not supported.
    • Rule element labels are partially supported.
    • Alternative labels are parsed but not supported in the generator.
    • Rule arguments are not supported.

    Feedback is appreciated.

    opened by tooolbox 11
  • Matching Tokens

    Matching Tokens

    In line with the « Anything But » that I suggested in #104, I am looking for a way to get the tokens around a match. The idea would be that we could match a token without necessarily consume it.

    type Example struct {
      StartToken *lexer.Token `@?`
      SomeRuleThatMatches *[]string ` (@!";")+ ";" `
      EndToken *lexer.Token `@?`
    }
    

    Note that the @? is not a proposition, as I wonder what would make sense here.

    Now as to why I would need it ; usually, the lexers ignore whitespace. This makes of course for a simpler and cleaner grammar, as there is no need to mention optional whitespace everywhere.

    However, I sometimes need to access the "exact string" that was matched by a rule, discarded text included. This is because I'm writing parsers for incomplete languages where I do not care about the meaning of some constructs - I just want them "as is".

    Is there already a way to do such things ? I tried to match @Token into a *lexer.Token, but that didn't work. Also, I think that it would be more useful for token extraction that they don't advance the parser.

    opened by ceymard 11
  • Use tag on structure

    Use tag on structure

    I'm trying to use encoding/json with it's tags to change attributes' name of my structure. But the issue is that "json token is not recognized". How could I escape json tag ?

    opened by kevinhassan 11
  • Bad grammar?

    Bad grammar?

    package main
    
    import (
    	"fmt"
    
    	"github.com/alecthomas/participle/v2"
    	"github.com/alecthomas/participle/v2/lexer/stateful"
    )
    
    var configLexer = stateful.MustSimple([]stateful.Rule{
    	{Name: `whitespace`, Pattern: `\s+`, Action: nil},
    	{Name: `Number`, Pattern: `[\d]+`, Action: nil},
    	{Name: `Ident`, Pattern: `[\w.-]+`, Action: nil},
    	{Name: `String`, Pattern: `"(?:\\.|[^"])*"`, Action: nil},
    	{Name: `Path`, Pattern: `(^[\w\\.:\s]+?\.\w{2,4}$|^\/[a-zA-Z0-9_\/-]*[^\/]$)`, Action: nil},
    })
    
    type Config struct {
    	Servers []*Server `@@*`
    }
    
    type Server struct {
    	VirtualHost string      `"server" @Ident "{"`
    	Properties  *Properties `@@* "}"`
    }
    
    type Properties struct {
    	Listen int `"listen" ":" @Ident`
    }
    
    var parser = participle.MustBuild(&Config{},
    	participle.Lexer(configLexer),
    	participle.Unquote("String"),
    )
    
    var code = `
    server dzonerzy.net {
    	listen: 1234
    }
    `
    
    func main() {
    	cfg := &Config{}
    	err := parser.ParseString("", code, cfg)
    	if err != nil {
    		panic(err)
    	}
    	fmt.Println(cfg.Servers)
    }
    

    Can you kindly explain me what I'm doing wrong?

    opened by dzonerzy 9
  • possible bug in tag syntax

    possible bug in tag syntax

    Hello,

    it seems there is a difference between the "raw" and "parse" tag syntax? The following repro shows what I stumbled upon.

    package tagsyntax
    
    import (
    	"testing"
    
    	"github.com/alecthomas/assert/v2"
    	"github.com/alecthomas/participle/v2"
    )
    
    type GoodAST struct {
    	Key        string   `parser:"@Ident '='"`
    	BlankLines []string `"\n"`
    }
    
    type BadAST struct {
    	Key string `parser:"@Ident '='"`
    	// Same as field in GoodAST, as explained in https://github.com/alecthomas/participle#tag-syntax
    	BlankLines []string `parser:"'\n'"`
    }
    
    func TestLiteralNotTerminatedGood(t *testing.T) {
    	_, err := participle.Build(&GoodAST{})
    
    	assert.NoError(t, err)
    }
    
    func TestLiteralNotTerminatedBad(t *testing.T) {
    	_, err := participle.Build(&BadAST{})
    
    	// The error is:
    	//
    	//     Key: <input>:1:2: literal not terminated
    	//
    	// which is confusing because it refers to the previous field in the struct (Key)
    	// and unclear?
    	assert.NoError(t, err)
    }
    
    opened by marco-m 1
  • Struct capture wrongly applying previous captures in a failed branch

    Struct capture wrongly applying previous captures in a failed branch

    Hey, while working on parser code generation and implementing lookahead error recovery, I noticed a bug. Consider this example:

    type BugStructCapturesInBadBranch struct {
    	Bad bool                `parser:"(@'!'"`
    	A   BugFirstAlternative `parser:" @@) |"`
    	B   int                 `parser:"('!' '#' @Int)"`
    }
    
    type BugFirstAlternative struct {
    	Value string `parser:"'#' @Ident"`
    }
    
    func TestBug_GroupCapturesInBadBranch(t *testing.T) {
    	var out BugStructCapturesInBadBranch
    	require.NoError(t, MustBuild(&BugStructCapturesInBadBranch{}, UseLookahead(2)).ParseString("", "!#4", &out))
    	assert.Equal(t, BugStructCapturesInBadBranch{B: 4}, out)
    }
    

    I tried to make it as minimalistic as reasonable, it's quite an obscure bug that's unlikely to bother someone but I thought I'd report it anyway. strct.Parse will call ctx.Apply even if s.expr.Parse returned an error. The purpose of that is apparently providing a partial AST in case the entire parsing fails, but is has an unwanted side-effect. Any captures added to parseContext.apply added by the branch so far will be applied, even though the error may later be caught by a disjunction or a ?/*/+ group and recovered. I think this can only happen if lookahead is at least 2, as it requires one token for that unwanted capture and a second token for the strct to return an error instead of nil out.

    In the example above, the input is constructed to match the second disjunction alternative, but the first tokens will initially lead it into the first alternative and into the BugFirstAlternative struct. When attempting to match Ident for the Value field, the sequence will fail and return an error, but ctx.apply will already contain a capture for BugStructCapturesInBadBranch.Bad, which will be applied in strct.Parse, even though the disjunction recovers it and matches the second alternative.

    I don't think it's super important this is fixed, but my code generation parser's behavior will differ from this, because I'm trying to take a different approach to recovering failed branches - restoring to a backup of the parsed struct when branch fails instead of delaying applying captures.

    opened by petee-d 0
  • Generating parser code

    Generating parser code

    Hi there again @alecthomas,

    I'm already using this great parser in a large project (a GoLang implementation of Jinja, will be open sourced eventually) and one of the things that somewhat bothers me is the speed and GC pressure (allocations) of the parser. I've considered using the generated lexer to improve it, but it's just not enough. So what if this library could also generate code for the parser? Is this something you've already considered or even started playing with?

    I actually already did and have a very ugly prototype that can generate the parser code for this subset of features:

    • string and struct fields only
    • supported nodes (at least mostly) are strct, sequence, token reference, token literal, capture (@) and group (?, *, +, not ! yet)
    • totally trash error reporting
    • max lookahead, case insensitive tokens and probably other options are not respected yet

    Example grammar it can parse consistently with the native parser (except for error messages):

    type Stuff struct {
    	Pos    lexer.Position
    	A      string   `'test' @Ident @Ident`
    	Sub    SubStuff `@@+`
    	B      string   `@Ident`
    	EndPos lexer.Position
    }
    
    type SubStuff struct {
    	Object    string `'(' @Ident 'is'`
    	Adjective string `    @Ident ')'`
    }
    
    // Uses lexer.TextScannerLexer
    

    For this grammar and with a pre-built PeekingLexer (so that I compare only parsing speed), here are the benchmarks at this moment:

    Input string, pre-lexed: "test str ing (this is fast) (this is fast) (this is fast) (this is fast) (this is LAST) end"
    BenchmarkNative
    BenchmarkNative-8      	  282736	     18882 ns/op	   11536 B/op	     207 allocs/op
    BenchmarkGenerated
    BenchmarkGenerated-8   	10671034	       576.2 ns/op	       8 B/op	       1 allocs/op
    

    The reason for being >30x faster for this particular grammar (likely a very cherry-picked example) is that the generated code:

    • avoids allocations as much as humanly possible, in fact the only allocation it does in that example is when concatenating strings for Stuff.A
    • doesn't use Reflect, obviously - the huge benefit of generated code
    • avoids unnecessarily small functions - it generates 1 function per struct + 1 wrapper, uses goto (responsibly) to solve things that would otherwise need calling a function or duplicating code
    • also uses some optimizations that could be applied to the native parser (for example: allocation-free PeekingLexer.Clone alternative, avoiding string concatenation when capturing values, etc.)

    It's very possible I'll run into an issue I won't be able to overcome, but for now it seems like this should be very doable. The generated code isn't too long (160 LOC for the above grammar), is quite well isolated (adds one generic method to all nodes, plus a file for code generation utilities) and doesn't introduce any new dependencies. For now I would just like to let you know I'm working on this, so we can coordinate any similar efforts. :) I would also appreciate your support with a few questions (will post them in this issue later).

    What do you think? Cheers!

    opened by petee-d 9
  • Feature request: fuzzing support

    Feature request: fuzzing support

    Since Participle already has access to the grammar, it can do grammar-driven fuzzing, which could integrate nicely with Go 1.18's new fuzz tests.

    Usagewise, it would prolly be something like this:

    var ast AST
    parser.Fuzz(&ast) // ast is filled with a random valid tree
    
    opened by pontaoski 0
  • bug: lexer codegen should validate token identifiers are valid

    bug: lexer codegen should validate token identifiers are valid

    Hi,

    I've been playing with Participle and I wanted to try to codegen feature, however the generated code seems to not escape some characters, leading to uncompilable source.

    See the following example :

    // ...
    } else if match := match:(l.s, l.p); match[1] != 0 {
    	sym = -47
    	groups = match[:]
    } else if match := match;(l.s, l.p); match[1] != 0 {
    	sym = -48
    	groups = match[:]
    } else if match := match|(l.s, l.p); match[1] != 0 {
    	sym = -49
    	groups = match[:]
    } else if match := match,(l.s, l.p); match[1] != 0 {
    	sym = -50
    	groups = match[:]
    } else if match := [email protected](l.s, l.p); match[1] != 0 {
    	sym = -51
    	groups = match[:]
    }
    // ...
    

    My rules used :

    {
      // ...
      {":", `:`, nil},
      {";", `;`, nil},
      {"|", `\|`, nil},
      {",", `,`, nil},
      {"@", `@`, nil},
      // ...
    }
    

    As far as I can tell it comes from https://github.com/alecthomas/participle/blob/master/lexer/codegen.go#L251

    While I do agree that I could rename my Rule name to make the whole thing works, I think it might be a good idea to come up with some strategy in order to avoid this when generating code.

    I'm fine with writting a PR if you have a idea for this.

    Cheers !

    opened by maxcleme 3
Owner
Alec Thomas
Alec Thomas
Simple middleware to rate-limit HTTP requests.

Tollbooth This is a generic middleware to rate-limit HTTP requests. NOTE 1: This library is considered finished. NOTE 2: Major version changes are bac

Didip Kerabat 2.2k Jun 16, 2022
Redcon is a custom Redis server framework for Go that is fast and simple to use.

Redcon is a custom Redis server framework for Go that is fast and simple to use. The reason for this library it to give an efficient server front-end for the BuntDB and Tile38 projects.

Josh Baker 1.8k Jun 17, 2022
A simple and lightweight encrypted password manager written in Go.

A simple and lightweight encrypted password manager written in Go.

null 32 Jun 16, 2022
Remark42 is a self-hosted, lightweight, and simple comment engine

Remark42 is a self-hosted, lightweight, and simple (yet functional) comment engine, which doesn't spy on users. It can be embedded into blogs, articles or any other place where readers add comments.

Umputun 3.8k Jun 27, 2022
A dead simple parser package for Go

A dead simple parser package for Go V2 Introduction Tutorial Tag syntax Overview Grammar syntax Capturing Capturing boolean value Streaming Lexing Sta

Alec Thomas 2.5k Jun 26, 2022
bf.go - A dead simple brainfuck interpreter Slow and simple

bf.go - A dead simple brainfuck interpreter Slow and simple. Can execute pretty much all tested Brainfuck scripts. Installation If you have Go install

Chris 0 Oct 15, 2021
A dead simple, highly performant, highly customizable sessions middleware for go http servers.

If you're interested in jwt's, see my jwt library! Sessions A dead simple, highly performant, highly customizable sessions service for go http servers

Adam Hanna 65 May 4, 2022
A dead simple configuration manager for Go applications

Store Store is a dead simple configuration manager for Go applications. I didn't like existing configuration management solutions like globalconf, tac

Ian P Badtrousers 260 May 4, 2022
Dead simple Go database migration library.

migrator Dead simple Go database migration library. Features Simple code Usage as a library, embeddable and extensible on your behalf Support of any d

David Lobe 127 May 11, 2022
A dead simple 2D game library for Go

Ebiten (v2) A dead simple 2D game library for Go Ebiten is an open source game library for the Go programming language. Ebiten's simple API allows you

Hajime Hoshi 6.6k Jun 21, 2022
Dead simple, super fast, zero allocation and modular logger for Golang

Onelog Onelog is a dead simple but very efficient JSON logger. It is one of the fastest JSON logger out there. Also, it is one of the logger with the

Francois Parquet 399 Jun 16, 2022
🧶 Dead simple, lightweight tracing.

?? tracer Dead simple, lightweight tracing. ?? Idea The tracer provides API to trace execution flow. func Do(ctx context.Context) { defer tracer.Fetc

Kamil Samigullin 64 Jun 17, 2022
Dead simple rate limit middleware for Go.

Limiter Dead simple rate limit middleware for Go. Simple API "Store" approach for backend Redis support (but not tied too) Middlewares: HTTP, FastHTTP

Ulule 1.6k Jun 26, 2022
A dead simple, no frills Go cross compile tool

Gox - Simple Go Cross Compilation Gox is a simple, no-frills tool for Go cross compilation that behaves a lot like standard go build. Gox will paralle

Mitchell Hashimoto 4.3k Jun 19, 2022
Dead simple rate limit middleware for Go.

Limiter Dead simple rate limit middleware for Go. Simple API "Store" approach for backend Redis support (but not tied too) Middlewares: HTTP, FastHTTP

Ulule 1.6k Jun 22, 2022
Task Timer (tt) is a dead simple TUI task timer

tasktimer Task Timer (tt) is a dead simple TUI task timer Usage To get started, just run tt: tt You'll be presented with something like this: You can

Carlos Alexandro Becker 230 Jun 28, 2022
A dead simple Go library for sending notifications to various messaging services.

A dead simple Go library for sending notifications to various messaging services. About Notify arose from my own need for one of my api server running

Niko Köser 1.2k Jun 28, 2022
A dead simple tool to sign files and verify digital signatures.

minisign minisign is a dead simple tool to sign files and verify signatures. $ minisign -G

Andreas Auernhammer 64 May 14, 2022
Dead simple reverse proxy for all your containerized needss

Whats this ? Pawxi is yet another reverse proxy designed with simplicity in mind. Born out of a certain users frustration at the complexity of setting

null 14 May 26, 2022
DSF - Dead Simple Fileserver

A dead simple HTTP fileserver to share your files across LAN.

LI Zhennan 80 Mar 25, 2022
A dead simple tool to rename your files for smooth web access!

ffw - Friendly Files for the Web Easily rename files from a folder to be compatible with the web Run ffw and that's it! Installation on macOs brew tap

Vincent Gschwend 2 Jan 31, 2022
🪵 A dead simple, pretty, and feature-rich logger for golang

?? lumber ?? A dead simple, pretty, and feature-rich logger for golang ?? Install ?? Logging Functions lumber.Success() lumber.Info() lumber.Debug() l

Matt Gleich 49 May 26, 2022
A dead simple, stupid, http service.

A dead simple, stupid, http service implemented in a complicated way just for the sake of following Go design patterns and scalability. Useful for learning and testing basic kubernetes networking. Made on an insomniac night.

Rudraksh Pareek 2 Sep 13, 2021
⚙️ Dead Simple Config Management, load and persist config without having to think about where and how.

Configo Dead Simple Config Management, load and persist config without having to think about where and how. Install go get github.com/UltiRequiem/conf

Eliaz Bobadilla 8 Apr 6, 2022
A dead simple cli utility to help you manage your git stash

A dead simple cli utility to help you manage your git stash.

Fadi Khadra 3 Jan 10, 2022
A dead simple CLI tool that prints the next semantic version based on the last tag of your git repository

nextver A dead simple CLI tool that prints the next semantic version based on the last tag of your git repository. Install go install github.com/junk1

Tom 3 Jun 4, 2022
A dead simple Go wrapper around the hidden moonarch.app API.

moonarch A dead simple Go wrapper around the hidden moonarch.app API. How-To First, get the repository: go get github.com/lazdotdigital/moonarch. moon

Laz 0 Nov 27, 2021
Lightweight and dead-simple CI detection.

is-ci Lightweight and dead-simple CI detection for golang. This mod is based on the @npmcli/ci-detect package. Install go get -u github.com/wesleimp/i

Weslei Juan Novaes Pereira 4 Dec 17, 2021
Tcp-proxy - A dead simple reverse proxy server.

tcp-proxy A proxy that forwords from a host to another. Building go build -ldflags="-X 'main.Version=$(git describe --tags $(git rev-list --tags --max

Injamul Mohammad Mollah 0 Jan 2, 2022