A dead simple parser package for Go

Overview

A dead simple parser package for Go

PkgGoDev GHA Build Go Report Card Slack chat

V2

This is an alpha of version 2 of Participle. It is still subject to change but should be mostly stable at this point.

See the Change Log for details.

Note: semantic versioning API guarantees do not apply to the experimental packages - the API may break between minor point releases.

It can be installed with:

$ go get github.com/alecthomas/participle/[email protected]

The latest version from v0 can be installed via:

$ go get github.com/alecthomas/[email protected]

Introduction

The goal of this package is to provide a simple, idiomatic and elegant way of defining parsers in Go.

Participle's method of defining grammars should be familiar to any Go programmer who has used the encoding/json package: struct field tags define what and how input is mapped to those same fields. This is not unusual for Go encoders, but is unusual for a parser.

Tutorial

A tutorial is available, walking through the creation of an .ini parser.

Tag syntax

Participle supports two forms of struct tag grammar syntax.

The easiest to read is when the grammar uses the entire struct tag content, eg.

Field string `@Ident @("," Ident)*`

However, this does not coexist well with other tags such as JSON, etc. and may cause issues with linters. If this is an issue then you can use the parser:"" tag format. In this case single quotes can be used to quote literals making the tags somewhat easier to write, eg.

Field string `parser:"@ident (',' Ident)*" json:"field"`

Overview

A grammar is an annotated Go structure used to both define the parser grammar, and be the AST output by the parser. As an example, following is the final INI parser from the tutorial.

type INI struct {
  Properties []*Property `@@*`
  Sections   []*Section  `@@*`
}

type Section struct {
  Identifier string      `"[" @Ident "]"`
  Properties []*Property `@@*`
}

type Property struct {
  Key   string `@Ident "="`
  Value *Value `@@`
}

type Value struct {
  String *string  `  @String`
  Number *float64 `| @Float`
}

Note: Participle also supports named struct tags (eg. Hello string `parser:"@Ident"`).

A parser is constructed from a grammar and a lexer:

parser, err := participle.Build(&INI{})

Once constructed, the parser is applied to input to produce an AST:

ast := &INI{}
err := parser.ParseString("", "size = 10", ast)
// ast == &INI{
//   Properties: []*Property{
//     {Key: "size", Value: &Value{Number: &10}},
//   },
// }

Grammar syntax

Participle grammars are defined as tagged Go structures. Participle will first look for tags in the form parser:"...". It will then fall back to using the entire tag body.

The grammar format is:

  • @<expr> Capture expression into the field.
  • @@ Recursively capture using the fields own type.
  • <identifier> Match named lexer token.
  • ( ... ) Group.
  • "..." or '...' Match the literal (note that the lexer must emit tokens matching this literal exactly).
  • "...":<identifier> Match the literal, specifying the exact lexer token type to match.
  • <expr> <expr> ... Match expressions.
  • <expr> | <expr> | ... Match one of the alternatives. Each alternative is tried in order, with backtracking.
  • ~<expr> Match any token that is not the start of the expression (eg: @~";" matches anything but the ; character into the field).
  • (?= ... ) Positive lookahead group - requires the contents to match further input, without consuming it.
  • (?! ... ) Negative lookahead group - requires the contents not to match further input, without consuming it.

The following modifiers can be used after any expression:

  • * Expression can match zero or more times.
  • + Expression must match one or more times.
  • ? Expression can match zero or once.
  • ! Require a non-empty match (this is useful with a sequence of optional matches eg. ("a"? "b"? "c"?)!).

Notes:

  • Each struct is a single production, with each field applied in sequence.
  • @<expr> is the mechanism for capturing matches into the field.
  • if a struct field is not keyed with "parser", the entire struct tag will be used as the grammar fragment. This allows the grammar syntax to remain clear and simple to maintain.

Capturing

Prefixing any expression in the grammar with @ will capture matching values for that expression into the corresponding field.

For example:

// The grammar definition.
type Grammar struct {
  Hello string `@Ident`
}

// The source text to parse.
source := "world"

// After parsing, the resulting AST.
result == &Grammar{
  Hello: "world",
}

For slice and string fields, each instance of @ will accumulate into the field (including repeated patterns). Accumulation into other types is not supported.

For integer and floating point types, a successful capture will be parsed with strconv.ParseInt() and strconv.ParseFloat() respectively.

A successful capture match into a bool field will set the field to true.

Tokens can also be captured directly into fields of type lexer.Token and []lexer.Token.

Custom control of how values are captured into fields can be achieved by a field type implementing the Capture interface (Capture(values []string) error).

Additionally, any field implementing the encoding.TextUnmarshaler interface will be capturable too. One caveat is that UnmarshalText() will be called once for each captured token, so eg. @(Ident Ident Ident) will be called three times.

Capturing boolean value

By default a boolean field is used to indicate that a match occurred, which turns out to be much more useful and common in Participle than parsing true or false literals. For example, parsing a variable declaration with a trailing optional syntax:

type Var struct {
  Name string `"var" @Ident`
  Type string `":" @Ident`
  Optional bool `@"?"?`
}

In practice this gives more useful AST's. If bool were to be parsed literally then you'd need to have some alternate type for Optional such as string or a custom type.

To capture literal boolean values such as true or false, implement the Capture interface like so:

type Boolean bool

func (b *Boolean) Capture(values []string) error {
	*b = values[0] == "true"
	return nil
}

type Value struct {
	Float  *float64 `  @Float`
	Int    *int     `| @Int`
	String *string  `| @String`
	Bool   *Boolean `| @("true" | "false")`
}

Streaming

Participle supports streaming parsing. Simply pass a channel of your grammar into Parse*(). The grammar will be repeatedly parsed and sent to the channel. Note that the Parse*() call will not return until parsing completes, so it should generally be started in a goroutine.

type token struct {
  Str string `  @Ident`
  Num int    `| @Int`
}

parser, err := participle.Build(&token{})

tokens := make(chan *token, 128)
err := parser.ParseString("", `hello 10 11 12 world`, tokens)
for token := range tokens {
  fmt.Printf("%#v\n", token)
}

Lexing

Participle relies on distinct lexing and parsing phases. The lexer takes raw bytes and produces tokens which the parser consumes. The parser transforms these tokens into Go values.

The default lexer, if one is not explicitly configured, is based on the Go text/scanner package and thus produces tokens for C/Go-like source code. This is surprisingly useful, but if you do require more control over lexing the builtin participle/lexer/stateful lexer should cover most other cases. If that in turn is not flexible enough, you can implement your own lexer.

Configure your parser with a lexer using the participle.Lexer() option.

To use your own Lexer you will need to implement two interfaces: Definition (and optionally StringsDefinition and BytesDefinition) and Lexer.

Stateful lexer

In addition to the default lexer, Participle includes an optional stateful/modal lexer which provides powerful yet convenient construction of most lexers. (Notably, indentation based lexers cannot be expressed using the stateful lexer -- for discussion of how these lexers can be implemented, see #20).

It is sometimes the case that a simple lexer cannot fully express the tokens required by a parser. The canonical example of this is interpolated strings within a larger language. eg.

let a = "hello ${name + ", ${last + "!"}"}"

This is impossible to tokenise with a normal lexer due to the arbitrarily deep nesting of expressions.

To support this case Participle's lexer is now stateful by default.

The lexer is a state machine defined by a map of rules keyed by the state name. Each rule within the state includes the name of the produced token, the regex to match, and an optional operation to apply when the rule matches.

As a convenience, any Rule starting with a lowercase letter will be elided from output.

Lexing starts in the Root group. Each rule is matched in order, with the first successful match producing a lexeme. If the matching rule has an associated Action it will be executed.

A state change can be introduced with the Action Push(state). Pop() will return to the previous state.

To reuse rules from another state, use Include(state).

A special named rule Return() can also be used as the final rule in a state to always return to the previous state.

As a special case, regexes containing backrefs in the form \N (where N is a digit) will match the corresponding capture group from the immediate parent group. This can be used to parse, among other things, heredocs. See the tests for an example of this, among others.

Example stateful lexer

Here's a cut down example of the string interpolation described above. Refer to the stateful example for the corresponding parser.

var lexer = stateful.Must(Rules{
	"Root": {
		{`String`, `"`, Push("String")},
	},
	"String": {
		{"Escaped", `\\.`, nil},
		{"StringEnd", `"`, Pop()},
		{"Expr", `\${`, Push("Expr")},
		{"Char", `[^$"\\]+`, nil},
	},
	"Expr": {
		Include("Root"),
		{`whitespace`, `\s+`, nil},
		{`Oper`, `[-+/*%]`, nil},
		{"Ident", `\w+`, nil},
		{"ExprEnd", `}`, Pop()},
	},
})

Example simple/non-stateful lexer

Other than the default and stateful lexers, it's easy to define your own stateless lexer using the stateful.MustSimple() and stateful.NewSimple() methods. These methods accept a slice of stateful.Rule{} objects consisting of a key and a regex-style pattern. The stateful lexer replaced the old regex lexer.

For example, the lexer for a form of BASIC:

var basicLexer = stateful.MustSimple([]stateful.Rule{
    {"Comment", `(?i)rem[^\n]*`, nil},
    {"String", `"(\\"|[^"])*"`, nil},
    {"Number", `[-+]?(\d*\.)?\d+`, nil},
    {"Ident", `[a-zA-Z_]\w*`, nil},
    {"Punct", `[-[[email protected]#$%^&*()+_={}\|:;"'<,>.?/]|]`, nil},
    {"EOL", `[\n\r]+`, nil},
    {"whitespace", `[ \t]+`, nil},
})

Experimental - code generation

Participle v2 now has experimental support for generating code to perform lexing. Use participle/experimental/codegen.GenerateLexer() to compile a stateful lexer to Go code.

This will generally provide around a 10x improvement in lexing performance while producing O(1) garbage.

Options

The Parser's behaviour can be configured via Options.

Examples

There are several examples included:

Example Description
BASIC A lexer, parser and interpreter for a rudimentary dialect of BASIC.
EBNF Parser for the form of EBNF used by Go.
Expr A basic mathematical expression parser and evaluator.
GraphQL Lexer+parser for GraphQL schemas
HCL A parser for the HashiCorp Configuration Language.
INI An INI file parser.
Protobuf A full Protobuf version 2 and 3 parser.
SQL A very rudimentary SQL SELECT parser.
Stateful A basic example of a stateful lexer and corresponding parser.
Thrift A full Thrift parser.
TOML A TOML parser.

Included below is a full GraphQL lexer and parser:

package main

import (
	"fmt"
	"os"

	"github.com/alecthomas/kong"
	"github.com/alecthomas/repr"

	"github.com/alecthomas/participle/v2"
	"github.com/alecthomas/participle/v2/lexer"
	"github.com/alecthomas/participle/v2/lexer/stateful"
)

type File struct {
	Entries []*Entry `@@*`
}

type Entry struct {
	Type   *Type   `  @@`
	Schema *Schema `| @@`
	Enum   *Enum   `| @@`
	Scalar string  `| "scalar" @Ident`
}

type Enum struct {
	Name  string   `"enum" @Ident`
	Cases []string `"{" @Ident* "}"`
}

type Schema struct {
	Fields []*Field `"schema" "{" @@* "}"`
}

type Type struct {
	Name       string   `"type" @Ident`
	Implements string   `( "implements" @Ident )?`
	Fields     []*Field `"{" @@* "}"`
}

type Field struct {
	Name       string      `@Ident`
	Arguments  []*Argument `( "(" ( @@ ( "," @@ )* )? ")" )?`
	Type       *TypeRef    `":" @@`
	Annotation string      `( "@" @Ident )?`
}

type Argument struct {
	Name    string   `@Ident`
	Type    *TypeRef `":" @@`
	Default *Value   `( "=" @@ )`
}

type TypeRef struct {
	Array       *TypeRef `(   "[" @@ "]"`
	Type        string   `  | @Ident )`
	NonNullable bool     `( @"!" )?`
}

type Value struct {
	Symbol string `@Ident`
}

var (
	graphQLLexer = stateful.MustSimple([]stateful.Rule{
		{"Comment", `(?:#|//)[^\n]*\n?`, nil},
		{"Ident", `[a-zA-Z]\w*`, nil},
		{"Number", `(?:\d*\.)?\d+`, nil},
		{"Punct", `[-[[email protected]#$%^&*()+_={}\|:;"'<,>.?/]|]`, nil},
		{"Whitespace", `[ \t\n\r]+`, nil},
	})
	parser = participle.MustBuild(&File{},
		participle.Lexer(graphQLLexer),
		participle.Elide("Comment", "Whitespace"),
		participle.UseLookahead(2),
	)
)

var cli struct {
	EBNF  bool     `help"Dump EBNF."`
	Files []string `arg:"" optional:"" type:"existingfile" help:"GraphQL schema files to parse."`
}

func main() {
	ctx := kong.Parse(&cli)
	if cli.EBNF {
		fmt.Println(parser.String())
		ctx.Exit(0)
	}
	for _, file := range cli.Files {
		ast := &File{}
		r, err := os.Open(file)
		ctx.FatalIfErrorf(err)
		err = parser.Parse(file, r, ast)
		r.Close()
		repr.Println(ast)
		ctx.FatalIfErrorf(err)
	}
}

Performance

One of the included examples is a complete Thrift parser (shell-style comments are not supported). This gives a convenient baseline for comparing to the PEG based pigeon, which is the parser used by go-thrift. Additionally, the pigeon parser is utilising a generated parser, while the participle parser is built at run time.

You can run the benchmarks yourself, but here's the output on my machine:

BenchmarkParticipleThrift-12    	   5941	   201242 ns/op	 178088 B/op	   2390 allocs/op
BenchmarkGoThriftParser-12      	   3196	   379226 ns/op	 157560 B/op	   2644 allocs/op

On a real life codebase of 47K lines of Thrift, Participle takes 200ms and go- thrift takes 630ms, which aligns quite closely with the benchmarks.

Concurrency

A compiled Parser instance can be used concurrently. A LexerDefinition can be used concurrently. A Lexer instance cannot be used concurrently.

Error reporting

There are a few areas where Participle can provide useful feedback to users of your parser.

  1. Errors returned by Parser.Parse*() will be of type Error. This will contain positional information where available.
  2. Participle will make a best effort to return as much of the AST up to the error location as possible.
  3. Any node in the AST containing a field Pos lexer.Position will be automatically populated from the nearest matching token.
  4. Any node in the AST containing a field EndPos lexer.Position will be automatically populated from the token at the end of the node.
  5. Any node in the AST containing a field Tokens []lexer.Token will be automatically populated with all tokens captured by the node, including elided tokens.

These related pieces of information can be combined to provide fairly comprehensive error reporting.

Limitations

Internally, Participle is a recursive descent parser with backtracking (see UseLookahead(K)).

Among other things, this means that they do not support left recursion. Left recursion must be eliminated by restructuring your grammar.

EBNF

The old EBNF lexer was removed in a major refactoring at 362b26 -- if you have an EBNF grammar you need to implement, you can either translate it into regex-style stateful.Rule{} syntax or implement your own EBNF lexer -- you might be able to use the old EBNF lexer as a starting point.

Participle supports outputting an EBNF grammar from a Participle parser. Once the parser is constructed simply call String().

Participle also includes a parser for this form of EBNF (naturally).

eg. The GraphQL example gives in the following EBNF:

File = Entry* .
Entry = Type | Schema | Enum | "scalar" ident .
Type = "type" ident ("implements" ident)? "{" Field* "}" .
Field = ident ("(" (Argument ("," Argument)*)? ")")? ":" TypeRef ("@" ident)? .
Argument = ident ":" TypeRef ("=" Value)? .
TypeRef = "[" TypeRef "]" | ident "!"? .
Value = ident .
Schema = "schema" "{" Field* "}" .
Enum = "enum" ident "{" ident* "}" .

Syntax/Railroad Diagrams

Participle includes a command-line utility to take an EBNF representation of a Participle grammar (as returned by Parser.String()) and produce a Railroad Diagram using tabatkins/railroad-diagrams.

Here's what the GraphQL grammar looks like:

EBNF Railroad Diagram

Issues
  • Calculate follow-set automatically from multiple production alternatives

    Calculate follow-set automatically from multiple production alternatives

    First of all, thanks a lot for sharing participle with the world! I felt very happy to discover what feels like a very novel approach to parser generation, parser libraries, etc.

    I took a stab at trying to rewrite a grammar for LLVM IR to use participle, but reached the following stumbling block and thought I'd reach out and ask if it is intended by design (to keep the parser simple), or if I've done something wrong, or otherwise, if we could seek to resolve so that the follow set of a token is calculated from each present production alternatives. In this case, the follow set of "target" should be {"datalayout", "triple"}.

    I wish to write the grammar as example 2, but have only gotten example 1 to work so far. Any ideas?

    Cheers :) /u

    Input

    LLVM IR input source:

    source_filename = "foo.c"
    target datalayout = "bar"
    target triple = "baz"
    

    Example 1

    Grammar:

    type Module struct {
    	Decls []*Decl `{ @@ }`
    }
    
    type Decl struct {
    	SourceFilename string      `  "source_filename" "=" @String`
    	TargetSpec     *TargetSpec `| "target" @@`
    }
    
    type TargetSpec struct {
    	DataLayout   string `  "datalayout" "=" @String`
    	TargetTriple string `| "triple" "=" @String`
    }
    

    Example run:

    [email protected] ~/D/g/s/g/m/low> low a.ll 
    &main.Module{
        Decls: {
            &main.Decl{
                SourceFilename: "foo.c",
                TargetSpec:     (*main.TargetSpec)(nil),
            },
            &main.Decl{
                SourceFilename: "",
                TargetSpec:     &main.TargetSpec{DataLayout:"bar", TargetTriple:""},
            },
            &main.Decl{
                SourceFilename: "",
                TargetSpec:     &main.TargetSpec{DataLayout:"", TargetTriple:"baz"},
            },
        },
    }
    

    Example 2

    Grammar:

    type Module struct {
    	Decls []*Decl `{ @@ }`
    }
    
    type Decl struct {
    	SourceFilename string `  "source_filename" "=" @String`
    	DataLayout     string `| "target" "datalayout" "=" @String`
    	TargetTriple   string `| "target" "triple" "=" @String`
    }
    

    Example run:

    [email protected] ~/D/g/s/g/m/low> low a.ll
    2017/08/27 21:15:38 a.ll:3:7: expected ( "datalayout" ) not "triple"
    
    opened by mewmew 31
  • Proposal: interface type productions

    Proposal: interface type productions

    In the past, when I've written recursive descent parsers by hand, I have used interfaces to represent "classes" of productions, such as Stmt, Expr, etc - this simplifies the consuming code considerably.

    I would like to be able to do something similar using participle - for example, I would like to define an Expr interface, and then parse that manually while allowing the rest of the grammar to be managed by participle.

    Perhaps it could look something like this:

    // Expr is the interface implemented by expression productions
    type Expr interface { expr() }
    
    // These types all implement Expr, and are parsed manually
    type Atom struct { /* ... */ }
    type Unary struct { /* ... */ }
    type Binary struct { /* ... */ }
    
    // This is the parser definition, with a new `UseInterface` option to define a custom parsing func for the Expr interface
    var theParser = participle.MustBuild(&Grammar{}, participle.UseInterface(
        func (lex *lexer.PeekingLexer) (Expr, error) {
            /* Provide an implementation of the expression parser here */
        },
    ))
    
    // And then I can use `Expr` like this:
    type Stmt struct {
        Expr   Expr    `@@` // <- participle has registered `Expr` as an interface that can be parsed
        Assign *Assign `@@`
    }
    

    As an additional nice-to-have, it would be cool if there was a way to give a *lexer.PeekingLexer and a production to the parser, and have it fill in the data for me. Then I could compose participle-managed productions within the manual parsing code for Expr. This would allow me to break out of participle in a limited way, just for the things that I need (such as managing precedence climbing)

    opened by mccolljr 21
  • How to write all optional but at least one necessary

    How to write all optional but at least one necessary

    I'm implementing CSS Selector Grammer(https://www.w3.org/TR/selectors-4/#grammar) by participle. But I'm stucked with to implement compound-selector.

    <compound-selector> = [ <type-selector>? <subclass-selector>*
                            [ <pseudo-element-selector> <pseudo-class-selector>* ]* ]!
    

    compound-selector expects at least one optional value by !. If this is not implemented, parser will go in infinite loop. Is it possible to implement this by participle?

    opened by tamayika 20
  • Support Matching EOF

    Support Matching EOF

    The parser actually supports matching EOF but then panics: panic: m:2:41: branch <eof> was accepted but did not progress the lexer at m:2:41 ("") [recovered]

    I think it would be beneficial to special-case this check to allow matching the end of a file.

    opened by tooolbox 14
  • Starting work on negation, wip

    Starting work on negation, wip

    Here is a naive interpretation on how the negation could function.

    If you want to do it entirely yourself, go ahead, but I'm willing to see this through as I really would like to see it come to fruition and this helps me hone my go funk.

    I'm looking for some feedback on that preliminary implementation.

    Am I missing something glaring ? Is the clone stuff potentially a performance killer ? Should Parse() be a little more involved ?

    opened by ceymard 13
  • Support for sub-lexers

    Support for sub-lexers

    To support more complex languages, it should be possible to elegantly define stateful lexers.

    Ideally this would support:

    • "here docs", eg. cat << EOF\nEOF (where EOF is a user-defined marker).
    • Runtime selected sub-lexers, ala markdown's ```<language> blocks - this would defer to some external function in order to set the lexer based on <language>.
    • Recursive lexers for eg. string interpolation, "${"${var}"}" - in this situation a new lexer is pushed onto the state when ${ is encountered, and popped when } is encountered.

    My hunch is that some kind of "stateful EBNF" could work, but it would need to be extensible programatically, and it's not clear exactly how this would be expressed.

    proposal 
    opened by alecthomas 13
  • Optional comments capturing while normally eliding them

    Optional comments capturing while normally eliding them

    For https://github.com/tawasprache/kompilierer, I'd like to be able to obtain the text of a comment before an AST node, for documentation generation purposes, while otherwise ignoring them in the parser.

    I envision the API looking similar to the Pos/EndPos API:

    type Item struct {
        Name `"function" @Ident`
        
        PrecedingComment lexer.Comment
    }
    

    can parse

    // hello!
    function hi
    

    with the comment and

    function hi
    

    without one

    opened by pontaoski 12
  • antlr2participle

    antlr2participle

    This adds the following:

    • A parser for .g4 ANTLR files.
    • A generator that uses an ANTLR AST to create a Participle lexer & parser.

    This is an initial draft. Notes:

    • Documentation is on the way.
    • Lexer modes are not yet implemented.
    • Recursive lexing is not yet implemented.
    • The skip lexer command is supported. The channel lexer command acts like skip. No other lexer commands are supported yet.
    • Actions and predicates are not supported.
    • Rule element labels are partially supported.
    • Alternative labels are parsed but not supported in the generator.
    • Rule arguments are not supported.

    Feedback is appreciated.

    opened by tooolbox 11
  • Matching Tokens

    Matching Tokens

    In line with the « Anything But » that I suggested in #104, I am looking for a way to get the tokens around a match. The idea would be that we could match a token without necessarily consume it.

    type Example struct {
      StartToken *lexer.Token `@?`
      SomeRuleThatMatches *[]string ` (@!";")+ ";" `
      EndToken *lexer.Token `@?`
    }
    

    Note that the @? is not a proposition, as I wonder what would make sense here.

    Now as to why I would need it ; usually, the lexers ignore whitespace. This makes of course for a simpler and cleaner grammar, as there is no need to mention optional whitespace everywhere.

    However, I sometimes need to access the "exact string" that was matched by a rule, discarded text included. This is because I'm writing parsers for incomplete languages where I do not care about the meaning of some constructs - I just want them "as is".

    Is there already a way to do such things ? I tried to match @Token into a *lexer.Token, but that didn't work. Also, I think that it would be more useful for token extraction that they don't advance the parser.

    opened by ceymard 11
  • Use tag on structure

    Use tag on structure

    I'm trying to use encoding/json with it's tags to change attributes' name of my structure. But the issue is that "json token is not recognized". How could I escape json tag ?

    opened by kevinhassan 11
  • Bad grammar?

    Bad grammar?

    package main
    
    import (
    	"fmt"
    
    	"github.com/alecthomas/participle/v2"
    	"github.com/alecthomas/participle/v2/lexer/stateful"
    )
    
    var configLexer = stateful.MustSimple([]stateful.Rule{
    	{Name: `whitespace`, Pattern: `\s+`, Action: nil},
    	{Name: `Number`, Pattern: `[\d]+`, Action: nil},
    	{Name: `Ident`, Pattern: `[\w.-]+`, Action: nil},
    	{Name: `String`, Pattern: `"(?:\\.|[^"])*"`, Action: nil},
    	{Name: `Path`, Pattern: `(^[\w\\.:\s]+?\.\w{2,4}$|^\/[a-zA-Z0-9_\/-]*[^\/]$)`, Action: nil},
    })
    
    type Config struct {
    	Servers []*Server `@@*`
    }
    
    type Server struct {
    	VirtualHost string      `"server" @Ident "{"`
    	Properties  *Properties `@@* "}"`
    }
    
    type Properties struct {
    	Listen int `"listen" ":" @Ident`
    }
    
    var parser = participle.MustBuild(&Config{},
    	participle.Lexer(configLexer),
    	participle.Unquote("String"),
    )
    
    var code = `
    server dzonerzy.net {
    	listen: 1234
    }
    `
    
    func main() {
    	cfg := &Config{}
    	err := parser.ParseString("", code, cfg)
    	if err != nil {
    		panic(err)
    	}
    	fmt.Println(cfg.Servers)
    }
    

    Can you kindly explain me what I'm doing wrong?

    opened by dzonerzy 9
  • Improve error message for custom capture

    Improve error message for custom capture

    If custom capture generate an error, then user will receive an error message with full struct name.

     diff --git a/_examples/ini/main.go b/_examples/ini/main.go
     index be6ec9e..e483923 100644
     --- a/_examples/ini/main.go
     +++ b/_examples/ini/main.go
     @@ -1,6 +1,7 @@
      package main
      
      import (
     +       "errors"
             "os"
      
             "github.com/alecthomas/repr"
     @@ -51,11 +52,17 @@ type String struct {
      func (String) value() {}
      
      type Number struct {
     -       Number float64 `@Float`
     +       Number N `@Float`
      }
      
      func (Number) value() {}
      
     +type N struct{}
     +
     +func (*N) Capture(values []string) error {
     +       return errors.New("problem")
     +}
     +
    
    

    panic: Number.Number: problem

    An error message with type name or token name will be much better: <float>: problem

    opened by sievlev 0
  • An attempt to fix error message

    An attempt to fix error message

    Node of type "strct" or "union" looses information from child nodes during formatting of error message

    Error message before fix: /dev/stdin:1:5: unexpected token "a" (expected Value)

    Error message after fix: /dev/stdin:1:5: unexpected token "a" (expected <string> | <float>)

    opened by sievlev 0
  • possible bug in tag syntax

    possible bug in tag syntax

    Hello,

    it seems there is a difference between the "raw" and "parse" tag syntax? The following repro shows what I stumbled upon.

    package tagsyntax
    
    import (
    	"testing"
    
    	"github.com/alecthomas/assert/v2"
    	"github.com/alecthomas/participle/v2"
    )
    
    type GoodAST struct {
    	Key        string   `parser:"@Ident '='"`
    	BlankLines []string `"\n"`
    }
    
    type BadAST struct {
    	Key string `parser:"@Ident '='"`
    	// Same as field in GoodAST, as explained in https://github.com/alecthomas/participle#tag-syntax
    	BlankLines []string `parser:"'\n'"`
    }
    
    func TestLiteralNotTerminatedGood(t *testing.T) {
    	_, err := participle.Build(&GoodAST{})
    
    	assert.NoError(t, err)
    }
    
    func TestLiteralNotTerminatedBad(t *testing.T) {
    	_, err := participle.Build(&BadAST{})
    
    	// The error is:
    	//
    	//     Key: <input>:1:2: literal not terminated
    	//
    	// which is confusing because it refers to the previous field in the struct (Key)
    	// and unclear?
    	assert.NoError(t, err)
    }
    
    opened by marco-m 1
  • Struct capture wrongly applying previous captures in a failed branch

    Struct capture wrongly applying previous captures in a failed branch

    Hey, while working on parser code generation and implementing lookahead error recovery, I noticed a bug. Consider this example:

    type BugStructCapturesInBadBranch struct {
    	Bad bool                `parser:"(@'!'"`
    	A   BugFirstAlternative `parser:" @@) |"`
    	B   int                 `parser:"('!' '#' @Int)"`
    }
    
    type BugFirstAlternative struct {
    	Value string `parser:"'#' @Ident"`
    }
    
    func TestBug_GroupCapturesInBadBranch(t *testing.T) {
    	var out BugStructCapturesInBadBranch
    	require.NoError(t, MustBuild(&BugStructCapturesInBadBranch{}, UseLookahead(2)).ParseString("", "!#4", &out))
    	assert.Equal(t, BugStructCapturesInBadBranch{B: 4}, out)
    }
    

    I tried to make it as minimalistic as reasonable, it's quite an obscure bug that's unlikely to bother someone but I thought I'd report it anyway. strct.Parse will call ctx.Apply even if s.expr.Parse returned an error. The purpose of that is apparently providing a partial AST in case the entire parsing fails, but is has an unwanted side-effect. Any captures added to parseContext.apply added by the branch so far will be applied, even though the error may later be caught by a disjunction or a ?/*/+ group and recovered. I think this can only happen if lookahead is at least 2, as it requires one token for that unwanted capture and a second token for the strct to return an error instead of nil out.

    In the example above, the input is constructed to match the second disjunction alternative, but the first tokens will initially lead it into the first alternative and into the BugFirstAlternative struct. When attempting to match Ident for the Value field, the sequence will fail and return an error, but ctx.apply will already contain a capture for BugStructCapturesInBadBranch.Bad, which will be applied in strct.Parse, even though the disjunction recovers it and matches the second alternative.

    I don't think it's super important this is fixed, but my code generation parser's behavior will differ from this, because I'm trying to take a different approach to recovering failed branches - restoring to a backup of the parsed struct when branch fails instead of delaying applying captures.

    opened by petee-d 0
  • Generating parser code

    Generating parser code

    Hi there again @alecthomas,

    I'm already using this great parser in a large project (a GoLang implementation of Jinja, will be open sourced eventually) and one of the things that somewhat bothers me is the speed and GC pressure (allocations) of the parser. I've considered using the generated lexer to improve it, but it's just not enough. So what if this library could also generate code for the parser? Is this something you've already considered or even started playing with?

    I actually already did and have a very ugly prototype that can generate the parser code for this subset of features:

    • string and struct fields only
    • supported nodes (at least mostly) are strct, sequence, token reference, token literal, capture (@) and group (?, *, +, not ! yet)
    • totally trash error reporting
    • max lookahead, case insensitive tokens and probably other options are not respected yet

    Example grammar it can parse consistently with the native parser (except for error messages):

    type Stuff struct {
    	Pos    lexer.Position
    	A      string   `'test' @Ident @Ident`
    	Sub    SubStuff `@@+`
    	B      string   `@Ident`
    	EndPos lexer.Position
    }
    
    type SubStuff struct {
    	Object    string `'(' @Ident 'is'`
    	Adjective string `    @Ident ')'`
    }
    
    // Uses lexer.TextScannerLexer
    

    For this grammar and with a pre-built PeekingLexer (so that I compare only parsing speed), here are the benchmarks at this moment:

    Input string, pre-lexed: "test str ing (this is fast) (this is fast) (this is fast) (this is fast) (this is LAST) end"
    BenchmarkNative
    BenchmarkNative-8      	  282736	     18882 ns/op	   11536 B/op	     207 allocs/op
    BenchmarkGenerated
    BenchmarkGenerated-8   	10671034	       576.2 ns/op	       8 B/op	       1 allocs/op
    

    The reason for being >30x faster for this particular grammar (likely a very cherry-picked example) is that the generated code:

    • avoids allocations as much as humanly possible, in fact the only allocation it does in that example is when concatenating strings for Stuff.A
    • doesn't use Reflect, obviously - the huge benefit of generated code
    • avoids unnecessarily small functions - it generates 1 function per struct + 1 wrapper, uses goto (responsibly) to solve things that would otherwise need calling a function or duplicating code
    • also uses some optimizations that could be applied to the native parser (for example: allocation-free PeekingLexer.Clone alternative, avoiding string concatenation when capturing values, etc.)

    It's very possible I'll run into an issue I won't be able to overcome, but for now it seems like this should be very doable. The generated code isn't too long (160 LOC for the above grammar), is quite well isolated (adds one generic method to all nodes, plus a file for code generation utilities) and doesn't introduce any new dependencies. For now I would just like to let you know I'm working on this, so we can coordinate any similar efforts. :) I would also appreciate your support with a few questions (will post them in this issue later).

    What do you think? Cheers!

    opened by petee-d 9
Owner
Alec Thomas
Alec Thomas
Simple HCL (HashiCorp Configuration Language) parser for your vars.

HCL to Markdown About To write a good documentation for terraform module, quite often we just need to print all our input variables as a fancy table.

Dmytro Shamenko 15 Dec 14, 2021
A simple json parser built using golang

jsonparser A simple json parser built using golang Installation: go get -u githu

Krisna Pranav 1 Dec 29, 2021
Quick and simple parser for PFSense XML configuration files, good for auditing firewall rules

pfcfg-parser version 0.0.1 : 13 January 2022 A quick and simple parser for PFSense XML configuration files to generate a plain text file of the main c

Rory Campbell-Lange 0 Jan 13, 2022
A NMEA parser library in pure Go

go-nmea This is a NMEA library for the Go programming language (Golang). Features Parse individual NMEA 0183 sentences Support for sentences with NMEA

Adrián Moreno 180 Jul 9, 2022
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

JF Technology 482 Aug 1, 2022
A shell parser, formatter, and interpreter with bash support; includes shfmt

sh A shell parser, formatter, and interpreter. Supports POSIX Shell, Bash, and mksh. Requires Go 1.14 or later. Quick start To parse shell scripts, in

Daniel Martí 5k Aug 7, 2022
TOML parser for Golang with reflection.

THIS PROJECT IS UNMAINTAINED The last commit to this repo before writing this message occurred over two years ago. While it was never my intention to

Andrew Gallant 3.9k Jul 29, 2022
User agent string parser in golang

User agent parsing useragent is a library written in golang to parse user agent strings. Usage First install the library with: go get xojoc.pw/userage

Alexandru Cojocaru 71 Aug 2, 2021
A markdown parser written in Go. Easy to extend, standard(CommonMark) compliant, well structured.

goldmark A Markdown parser written in Go. Easy to extend, standards-compliant, well-structured. goldmark is compliant with CommonMark 0.29. Motivation

Yusuke Inuzuka 2.3k Jul 30, 2022
Unified diff parser and printer for Go

go-diff Diff parser and printer for Go. Installing go get -u github.com/sourcegraph/go-diff/diff Usage It doesn't actually compute a diff. It only rea

Sourcegraph 366 Jun 19, 2022
A PDF renderer for the goldmark markdown parser.

goldmark-pdf goldmark-pdf is a renderer for goldmark that allows rendering to PDF. Reference See https://pkg.go.dev/github.com/stephenafamo/goldmark-p

Stephen Afam-Osemene 86 Jul 6, 2022
Experimental parser Angular template

Experimental parser Angular template This repository only shows what a parser on the Go might look like Benchmark 100k line of template Parser ms @ang

Rustam 8 Dec 15, 2021
Freestyle xml parser with golang

fxml - FreeStyle XML Parser This package provides a simple parser which reads a XML document and output a tree structure, which does not need a pre-de

null 8 Jul 1, 2022
An extension to the Goldmark Markdown Parser

Goldmark-Highlight An extension to the Goldmark Markdown Parser which adds parsing / rendering capabilities for rendering highlighted text. Highlighte

Kevin Zuern 1 May 25, 2022
A parser combinator library for Go.

Takenoco A parser combinator library for Go. Examples CSV parser Dust - toy scripting language Usage Define the parser: package csv import ( "err

shellyln 3 Jul 1, 2022
Interpreted Programming Language built in Go. Lexer, Parser, AST, VM.

Gago | Programming Language Built in Go if you are looking for the docs, go here Gago is a interpreted programming language. It is fully written in Go

Glaukio 4 May 6, 2022
A golang package to work with Decentralized Identifiers (DIDs)

did did is a Go package that provides tools to work with Decentralized Identifiers (DIDs). Install go get github.com/ockam-network/did Example packag

Ockam 63 Jul 20, 2022
Genex package for Go

genex Genex package for Go Easy and efficient package to expand any given regex into all the possible strings that it can match. This is the code that

Alix Axel 66 Aug 8, 2022
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library

goq Example import ( "log" "net/http" "astuart.co/goq" ) // Structured representation for github file name table type example struct { Title str

Andrew Stuart 217 May 30, 2022