Go package for syntax highlighting of code

Overview

syntaxhighlight

Package syntaxhighlight provides syntax highlighting for code. It currently uses a language-independent lexer and performs decently on JavaScript, Java, Ruby, Python, Go, and C.

The main AsHTML(src []byte) ([]byte, error) function outputs HTML that uses the same CSS classes as google-code-prettify, so any stylesheets for that should also work with this package.

Documentation on Sourcegraph

Build Status status

Installation

go get -u github.com/sourcegraph/syntaxhighlight

First you should install the golang evironment, you can download it here or you can follow the getting started

Remember you should set the environment variables correctly (GOPATH and PATH)

Example usage

The function AsHTML(src []byte, options ...Option) ([]byte, error) returns an HTML-highlighted version of src. The input source code can be in any language; the lexer is language independent. An OrderedList() option can be passed to produce an <ol>...</ol>-wrapped list to display line numbers.

package syntaxhighlight_test

import (
	"fmt"
	"os"

	"github.com/sourcegraph/syntaxhighlight"
)

func Example() {
	src := []byte(`
/* hello, world! */
var a = 3;

// b is a cool function
function b() {
  return 7;
}`)

	highlighted, err := syntaxhighlight.AsHTML(src)
	if err != nil {
		fmt.Println(err)
		os.Exit(1)
	}

	fmt.Println(string(highlighted))

	// Output:
	// <span class="com">/* hello, world! */</span>
	// <span class="kwd">var</span> <span class="pln">a</span> <span class="pun">=</span> <span class="dec">3</span><span class="pun">;</span>
	//
	// <span class="com">// b is a cool function</span>
	// <span class="kwd">function</span> <span class="pln">b</span><span class="pun">(</span><span class="pun">)</span> <span class="pun">{</span>
	//   <span class="kwd">return</span> <span class="dec">7</span><span class="pun">;</span>
	// <span class="pun">}</span>
}

Contributors

Contributions are welcome! Submit a pull request on GitHub.

Comments
  • Pass in `io.Reader` instead of `[]byte` to `NewScanner`

    Pass in `io.Reader` instead of `[]byte` to `NewScanner`

    io.Reader is a more generic abstraction for input: it could be a file or it could be a reader of stdin.

    The changes will enable ccat to use the NewScannaer API to create a Scanner. See https://github.com/jingweno/ccat/commit/e7385e67afd8690c1ea677abfc13f2807aadfe05.

    opened by owenthereal 8
  • Support for <ol> to have line numbers

    Support for
      to have line numbers

    Would you be interested in accepting https://github.com/iafan/syntaxhighlight/commit/0da6819d25a1bdbf3c6206cc9ffece21d771bb57 as a PR?

    This commit introduces AsOrderedListHTML function which, in addition to highlighting the source, wraps each line in <li>..</li>, and then adds <ol>..</ol> global wrapper. The resulting highlighted source code now has line numbers.

    Commit comes with full test coverage.

    opened by iafan 7
  • Too many incidental dependencies.

    Too many incidental dependencies.

    This issue is largely documented downstream at https://github.com/shurcooL/go/issues/19#issuecomment-97699653, but to quickly summarize:

    After #11 was merged, syntaxhighlight package now indirectly imports an extremely large number of imports that are largely unrelated and unneeded to the core functionality of this package.

    image

    https://godoc.org/github.com/sourcegraph/syntaxhighlight?import-graph&hide=1

    The vast majority of that is coming from the two imports sourcegraph.com/sourcegraph/go-sourcegraph/sourcegraph and sourcegraph.com/sourcegraph/vcsstore/vcsclient which are only needed to pull in the following types:

    Despite those types being small, the packages that contain them are higher level and import many other dependencies.

    We should fix this so that users of syntaxhighlight that only need the core functionality and do not use the NilAnnotator should not need to pull in that many dependencies.

    opened by dmitshur 4
  • Token kinds/classes API could be improved.

    Token kinds/classes API could be improved.

    Hi,

    I'm trying to use the public interface/API of this package externally, but this API could be better:

    const (
        WHITESPACE = iota
        STRING
        KEYWORD
        COMMENT
        TYPE
        LITERAL
        PUNCTUATION
        PLAINTEXT
        TAG
        HTMLTAG
        HTMLATTRNAME
        HTMLATTRVALUE
        DECIMAL
    )
    

    First, it's not idiomatic Go style to use ALLCAPS; these consts should be Whitespace, etc. See style guide section on Mixed Caps. As far as I can tell, it will not cause naming conflicts inside the package.

    Second, these kind values are untyped constants.

    I'd prefer if they were typed, for example syntaxhighlight.Kind, so that I can use that type in a struct in my package.

    According to http://godoc.org/github.com/sourcegraph/syntaxhighlight?importers, the only public importers of this package are all my packages. Do you use the current WHITESPACE, etc. outside this package?

    Do you agree it would be an improvement and would you take a PR that changes the above code to something like this?

    // Kind represents a syntax highlighting class which will be assigned to elements. A style will assign visual properties to each kind.
    type Kind uint8
    
    const (
        Whitespace Kind = iota
        String
        Keyword
        Comment
        Type
        Literal
        Punctuation
        Plaintext
        Tag
        HTMLTag
        HTMLAttrName
        HTMLAttrValue
        Decimal
    )
    

    Also, Keywords probably does not need to be exported, does it? Unless users of this package are meant to be able to make changes to it (if it's meant to be exported, it should be documented, it's currently not).

    opened by dmitshur 3
  • Try to fix Travis.

    Try to fix Travis.

    • I'm guessing that trying to write to benchmark.txt inside travis is likely creating the error due to permissions. Any file changes inside Travis will be lost anyway, so there's no point in doing that.
    • Instead, I just opted to execute the test and benchmarks and show their output in Travis.
    • Also make sure the Go code is gofmted.
    • Use Go 1.3.
    opened by dmitshur 3
  • Css? Newlines?

    Css? Newlines?

    This is more of me not understanding how to use this package, but I'm not finding any css in here. Are we supposed to bring our own?

    For example, compare the way github displays simple.go:

    https://github.com/sourcegraph/syntaxhighlight/blob/master/testdata/simple.go

    Versus the output html of this package for that simple.go file:

    https://rawgithub.com/sourcegraph/syntaxhighlight/master/testdata/simple.go.html

    Are we supposed to create the css ourselves from scratch, or are there any existing ones that can be used?

    Also, all the newlines are missing. Is it because the css is missing or another reason?

    Thanks!

    opened by dmitshur 3
  • Run gofmt -s which simplifies struct definitions

    Run gofmt -s which simplifies struct definitions

    I ran go test and everything passes successfully. But verify that nothing is wrong.

    Note: gofmt -s outputs code which might not be compatible with earlier versions of Go.

    opened by hariharan-uno 2
  • Fix tests and example.

    Fix tests and example.

    • Update testdata/underscore.go.html to match expected output after input change in b4219f1fb43b89dede6b8a2bc0c03f909cd599f7.
    • Fix example verification to occur (see http://godoc.org/testing#hdr-Examples), and update expected output to match actual output.
    • Copy latest version of example_test.go into README.md so they are in sync.
    opened by dmitshur 2
  • Use text/scanner.

    Use text/scanner.

    When I asked why this branch wasn't merged into master yet, you said the most likely reason was that you simply forgot about it.

    So I'm making this PR for your convenience as a reminder. As far as I can tell, this approach of using text/template should be better overall (more general and simple, handles Unicode, 42% slower).

    opened by dmitshur 2
  • Add an ability to produce an `<ol>...</ol>`-wrapped list to display line numbers

    Add an ability to produce an `
      ...
    `-wrapped list to display line numbers

    This implements #20 using functional parameters:

    // default backward-compatible call
    got, err := AsHTML(input)
    
    // with optional parameter
    got, err := AsHTML(input, OrderedList())
    
    opened by iafan 1
  • Add a more direct NewScannerReader func.

    Add a more direct NewScannerReader func.

    If the user has an io.Reader, they can now use NewScannerReader to more directly create a Scanner.

    Previously, they would be forced to either read all bytes from the reader into a byte slice and use that (which then indirectly gets wrapped by a bytes.NewReader), or copy this func in their code.

    Resolves #14. /cc @jingweno

    opened by dmitshur 1
  • Dependency Dashboard

    Dependency Dashboard

    This issue contains a list of Renovate updates and their statuses.

    This repository currently has no open or pending branches.


    • [ ] Check this box to trigger a request for Renovate to run again on this repository
    opened by renovate[bot] 0
  • Configure Renovate

    Configure Renovate

    Mend Renovate

    Welcome to Renovate! This is an onboarding PR to help you understand and configure settings before regular Pull Requests begin.

    🚦 To activate Renovate, merge this Pull Request. To disable Renovate, simply close this Pull Request unmerged.


    Configuration Summary

    Based on the default config's presets, Renovate will:

    • Start dependency updates only once this onboarding PR is merged
    • Enable Renovate Dependency Dashboard creation.
    • If Renovate detects semantic commits, it will use semantic commit type fix for dependencies and chore for all others.
    • Ignore node_modules, bower_components, vendor and various test/tests directories.
    • Autodetect whether to pin dependencies or maintain ranges.
    • Rate limit PR creation to a maximum of two per hour.
    • Limit to maximum 10 open PRs at any time.
    • Group known monorepo packages together.
    • Use curated list of recommended non-monorepo package groupings.
    • A collection of workarounds for known problems with packages.
    • Run Renovate on following schedule: on the 1st through 7th day of the month

    🔡 Would you like to change the way Renovate is upgrading your dependencies? Simply edit the renovate.json in this branch with your custom config and the list of Pull Requests in the "What to Expect" section below will be updated the next time Renovate runs.


    What to Expect

    It looks like your repository dependencies are already up-to-date and no Pull Requests will be necessary right away.


    ❓ Got questions? Check out Renovate's Docs, particularly the Getting Started section. If you need any further assistance then you can also request help here.


    This PR has been generated by Mend Renovate. View repository job log here.

    opened by renovate[bot] 0
  • Normalize (consolidate) tokens of the same kind

    Normalize (consolidate) tokens of the same kind

    The way underlying text/scanner package works is it produces two different tokens for combined punctuation like <=, >=, !=, :=, ||, :: and so on. For example, != would be represented in the highlighted output as follows:

    <span class="pun">!</span><span class="pun">=</span>
    

    This is suboptimal in terms of final HTML size, and also prevents font ligatures from being displayed properly (see e.g. Fira Code font).

    This PR implements token normalization so that adjacent tokens of the same kind are merged together. The example above now looks like this:

    <span class="pun">!=</span>
    

    ...and allows for proper ligature rendering.

    opened by iafan 6
  • More control over Go lang (and other lang) tagging.

    More control over Go lang (and other lang) tagging.

    I am writing a debugger for Go and came across this project via Dmitri Shuralyov and his use of this which uses go/scanner.

    One improvement I'd like to see is more fined-grained control over what gets tagged. Rather than reinvent stuff when not necessary, I consulted Pygments to see what it has.

    This is from pygments/formatters/terminal.py;

        Whitespace:         ('lightgray',   'darkgray'),
        Comment:            ('lightgray',   'darkgray'),
        Comment.Preproc:    ('teal',        'turquoise'),
        Keyword:            ('darkblue',    'blue'),
        Keyword.Type:       ('teal',        'turquoise'),
        Operator.Word:      ('purple',      'fuchsia'),
        Name.Builtin:       ('teal',        'turquoise'),
        Name.Function:      ('darkgreen',   'green'),
        Name.Namespace:     ('_teal_',      '_turquoise_'),
        Name.Class:         ('_darkgreen_', '_green_'),
        Name.Exception:     ('teal',        'turquoise'),
        Name.Decorator:     ('darkgray',    'lightgray'),
        Name.Variable:      ('darkred',     'red'),
        Name.Constant:      ('darkred',     'red'),
        Name.Attribute:     ('teal',        'turquoise'),
        Name.Tag:           ('blue',        'blue'),
        String:             ('brown',       'brown'),
        Number:             ('darkblue',    'blue'),
    

    There are some things above that aren't relevant to Go (but you might want to add them for other languages), and there are some things that should be added above for go like Rune, possibly Case label and so on. So here is a list below:

    • Whitespace
    • Comment
    • Keyword
    • Type name
    • Built in
    • Function
    • Variable
    • Exported name
    • Number
    • Rune

    Kind is a uint8 so there is plenty of space to add constants to this.

    If you would like me to create a pull request for this, let me know.

    Thanks.

    opened by rocky 12
Owner
Sourcegraph
Code search and navigation for teams (self-hosted, OSS)
Sourcegraph
:triangular_ruler:gofmtmd formats go source code block in Markdown. detects fenced code & formats code using gofmt.

gofmtmd gofmtmd formats go source code block in Markdown. detects fenced code & formats code using gofmt. Installation $ go get github.com/po3rin/gofm

po3rin 91 Oct 31, 2022
A general purpose syntax highlighter in pure Go

Chroma — A general purpose syntax highlighter in pure Go NOTE: As Chroma has just been released, its API is still in flux. That said, the high-level i

Alec Thomas 3.6k Dec 27, 2022
Toy scripting language with a syntax similar to Rust.

Dust - toy scripting language Toy scripting language with a syntax similar to Rust. ?? Syntax similar to Rust ?? Loose JSON parsing ?? Calling host fu

shellyln 2 Sep 28, 2022
:evergreen_tree: Parses indented code and returns a tree structure.

codetree Parses indented code (Python, Pug, Stylus, Pixy, codetree, etc.) and returns a tree structure. Installation go get github.com/aerogo/codetree

Aero 22 Sep 27, 2022
:zap: Transfer files over wifi from your computer to your mobile device by scanning a QR code without leaving the terminal.

$ qrcp Transfer files over Wi-Fi from your computer to a mobile device by scanning a QR code without leaving the terminal. You can support development

Claudio d'Angelis 9k Dec 28, 2022
Auto-gen fuzzing wrappers from normal code. Automatically find buggy call sequences, including data races & deadlocks. Supports rich signature types.

fzgen fzgen auto-generates fuzzing wrappers for Go 1.18, optionally finds problematic API call sequences, can automatically wire outputs to inputs acr

thepudds 78 Dec 23, 2022
Frecuency of ASCII characters in Typescript and Javascript code

Tool to traverse Javascript and Typescript codebases counting the number of occurrences of each ASCII character. Usefull for optimizing tokenizers / lexers

Elian Cordoba 0 Jan 31, 2022
Pryrite, interactively execute shell code blocks in a markdown file

Pryrite Pryrite is a command line tool that interactively runs executable blocks in a markdown file. One can think of pryrite as a console REPL/debugg

Rama Shenai 169 Dec 18, 2022
A golang package to work with Decentralized Identifiers (DIDs)

did did is a Go package that provides tools to work with Decentralized Identifiers (DIDs). Install go get github.com/ockam-network/did Example packag

Ockam 69 Nov 25, 2022
Genex package for Go

genex Genex package for Go Easy and efficient package to expand any given regex into all the possible strings that it can match. This is the code that

Alix Axel 68 Nov 2, 2022
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library

goq Example import ( "log" "net/http" "astuart.co/goq" ) // Structured representation for github file name table type example struct { Title str

Andrew Stuart 222 Dec 12, 2022
Go (Golang) GNU gettext utilities package

Gotext GNU gettext utilities for Go. Features Implements GNU gettext support in native Go. Complete support for PO files including: Support for multil

Leonel Quinteros 363 Dec 18, 2022
htmlquery is golang XPath package for HTML query.

htmlquery Overview htmlquery is an XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression. htmlque

null 551 Jan 4, 2023
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.

csvplus Package csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream processing operations, indices and joins. The

Maxim 67 Apr 9, 2022
[Go] Package of validators and sanitizers for strings, numerics, slices and structs

govalidator A package of validators and sanitizers for strings, structs and collections. Based on validator.js. Installation Make sure that Go is inst

Alex Saskevich 5.6k Dec 28, 2022
Package sanitize provides functions for sanitizing text in golang strings.

sanitize Package sanitize provides functions to sanitize html and paths with go (golang). FUNCTIONS sanitize.Accents(s string) string Accents replaces

Kenny Grant 322 Dec 5, 2022
Package strit introduces a new type of string iterator, along with a number of iterator constructors, wrappers and combinators.

strit Package strit (STRing ITerator) assists in development of string processing pipelines by providing a simple iteration model that allows for easy

Maxim 84 Jun 21, 2022
A markdown renderer package for the terminal

go-term-markdown go-term-markdown is a go package implementing a Markdown renderer for the terminal. Note: Markdown being originally designed to rende

Michael Muré 253 Nov 25, 2022
A minimalistic emoji package for Go (golang)

emoji ?? ?? ?? emoji is a minimalistic emoji library for Go. It lets you use emoji characters in strings. Inspired by spatie/emoji Install ?? go get g

Enes Çakır 381 Dec 14, 2022