A faster file programming language detector

Overview

go-enry GoDoc Test codecov

Programming language detector and toolbox to ignore binary or vendored files. enry, started as a port to Go of the original Linguist Ruby library, that has an improved 2x performance.

CLI

The CLI binary is hosted in a separate repository go-enry/enry.

Library

enry is also a Go library for guessing a programming language that exposes API through FFI to multiple programming environments.

Use cases

enry guesses a programming language using a sequence of matching strategies that are applied progressively to narrow down the possible options. Each strategy varies on the type of input data that it needs to make a decision: file name, extension, the first line of the file, the full content of the file, etc.

Depending on available input data, enry API can be roughly divided into the next categories or use cases.

By filename

Next functions require only a name of the file to make a guess:

  • GetLanguageByExtension uses only file extension (wich may be ambiguous)
  • GetLanguageByFilename useful for cases like .gitignore, .bashrc, etc
  • all filtering helpers

Please note that such guesses are expected not to be very accurate.

By text

To make a guess only based on the content of the file or a text snippet, use

  • GetLanguageByShebang reads only the first line of text to identify the shebang.

  • GetLanguageByModeline for cases when Vim/Emacs modeline e.g. /* vim: set ft=cpp: */ may be present at a head or a tail of the text.

  • GetLanguageByClassifier uses a Bayesian classifier trained on all the ./samples/ from Linguist.

    It usually is a last-resort strategy that is used to disambiguate the guess of the previous strategies, and thus it requires a list of "candidate" guesses. One can provide a list of all known languages - keys from the data.LanguagesLogProbabilities as possible candidates if more intelligent hypotheses are not available, at the price of possibly suboptimal accuracy.

By file

The most accurate guess would be one when both, the file name and the content are available:

  • GetLanguagesByContent only uses file extension and a set of regexp-based content heuristics.
  • GetLanguages uses the full set of matching strategies and is expected to be most accurate.

Filtering: vendoring, binaries, etc

enry expose a set of file-level helpers Is* to simplify filtering out the files that are less interesting for the purpose of source code analysis:

  • IsBinary
  • IsVendor
  • IsConfiguration
  • IsDocumentation
  • IsDotFile
  • IsImage
  • IsTest
  • IsGenerated

Language colors and groups

enry exposes function to get language color to use for example in presenting statistics in graphs:

  • GetColor
  • GetLanguageGroup can be used to group similar languages together e.g. for Less this function will return CSS

Languages

Go

In a Go module, import enry to the module by running:

go get github.com/go-enry/go-enry/v2

The rest of the examples will assume you have either done this or fetched the library into your GOPATH.

")) fmt.Println(lang, safe) // result: Matlab true lang, safe := enry.GetLanguageByContent("bar.m", []byte("")) fmt.Println(lang, safe) // result: Objective-C true // all strategies together lang := enry.GetLanguage("foo.cpp", []byte("")) // result: C++ true ">
// The examples here and below assume you have imported the library.
import "github.com/go-enry/go-enry/v2"

lang, safe := enry.GetLanguageByExtension("foo.go")
fmt.Println(lang, safe)
// result: Go true

lang, safe := enry.GetLanguageByContent("foo.m", []byte(""))
fmt.Println(lang, safe)
// result: Matlab true

lang, safe := enry.GetLanguageByContent("bar.m", []byte(""))
fmt.Println(lang, safe)
// result: Objective-C true

// all strategies together
lang := enry.GetLanguage("foo.cpp", []byte(""))
// result: C++ true

Note that the returned boolean value safe is true if there is only one possible language detected.

A plural version of the same API allows getting a list of all possible languages for a given file.

")) // result: []string{"C", "C++", "Objective-C} langs := enry.GetLanguagesByExtension("foo.asc", []byte(""), nil) // result: []string{"AGS Script", "AsciiDoc", "Public Key"} langs := enry.GetLanguagesByFilename("Gemfile", []byte(""), []string{}) // result: []string{"Ruby"} ">
langs := enry.GetLanguages("foo.h",  []byte(""))
// result: []string{"C", "C++", "Objective-C}

langs := enry.GetLanguagesByExtension("foo.asc", []byte(""), nil)
// result: []string{"AGS Script", "AsciiDoc", "Public Key"}

langs := enry.GetLanguagesByFilename("Gemfile", []byte(""), []string{})
// result: []string{"Ruby"}

Java bindings

Generated Java bindings using a C shared library and JNI are available under java.

A library is published on Maven as tech.sourced:enry-java for macOS and linux platforms. Windows support is planned under src-d/enry#150.

Python bindings

Generated Python bindings using a C shared library and cffi are WIP under src-d/enry#154.

A library is going to be published on pypi as enry for macOS and linux platforms. Windows support is planned under src-d/enry#150.

Rust bindings

Generated Rust bindings using a C static library are available at https://github.com/go-enry/rs-enry.

Divergences from Linguist

The enry library is based on the data from github/linguist version v7.14.0.

Parsing linguist/samples the following enry results are different from the Linguist:

In all the cases above that have an issue number - we plan to update enry to match Linguist behavior.

Benchmarks

Enry's language detection has been compared with Linguist's on linguist/samples.

We got these results:

histogram

The histogram shows the number of files (y-axis) per time interval bucket (x-axis). Most of the files were detected faster by enry.

There are several cases where enry is slower than Linguist due to Go regexp engine being slower than Ruby's on, wich is based on oniguruma library, written in C.

See instructions for running enry with oniguruma.

Why Enry?

In the movie My Fair Lady, Professor Henry Higgins is a linguist who at the very beginning of the movie enjoys guessing the origin of people based on their accent.

"Enry Iggins" is how Eliza Doolittle, pronounces the name of the Professor.

Development

To run the tests use:

go test ./...

Setting ENRY_TEST_REPO to the path to existing checkout of Linguist will avoid cloning it and sepeed tests up. Setting ENRY_DEBUG=1 will provide insight in the Bayesian classifier building done by make code-generate.

Sync with github/linguist upstream

enry re-uses parts of the original github/linguist to generate internal data structures. In order to update to the latest release of linguist do:

$ git clone https://github.com/github/linguist.git .linguist
$ cd .linguist; git checkout <release-tag>; cd ..

# put the new release's commit sha in the generator_test.go (to re-generate .gold test fixtures)
# https://github.com/go-enry/go-enry/blob/13d3d66d37a87f23a013246a1b0678c9ee3d524b/internal/code-generator/generator/generator_test.go#L18

$ make code-generate

To stay in sync, enry needs to be updated when a new release of the linguist includes changes to any of the following files:

There is no automation for detecting the changes in the linguist project, so this process above has to be done manually from time to time.

When submitting a pull request syncing up to a new release, please make sure it only contains the changes in the generated files (in data subdirectory).

Separating all the necessary "manual" code changes to a different PR that includes some background description and an update to the documentation on "divergences from linguist" is very much appreciated as it simplifies the maintenance (review/release notes/etc).

Misc

Running a benchmark & faster regexp engine

Benchmark

All benchmark scripts are in benchmarks directory.

Dependencies

As benchmarks depend on Ruby and Github-Linguist gem make sure you have:

  • Ruby (e.g using rbenv), bundler installed
  • Docker
  • native dependencies installed
  • Build the gem cd .linguist && bundle install && rake build_gem && cd -
  • Install it gem install --no-rdoc --no-ri --local .linguist/github-linguist-*.gem

Quick benchmark

To run quicker benchmarks

make benchmarks

to get average times for the primary detection function and strategies for the whole samples set. If you want to see measures per sample file use:

make benchmarks-samples

Full benchmark

If you want to reproduce the same benchmarks as reported above:

  • Make sure all dependencies are installed
  • Install gnuplot (in order to plot the histogram)
  • Run ENRY_TEST_REPO="$PWD/.linguist" benchmarks/run.sh (takes ~15h)

It will run the benchmarks for enry and Linguist, parse the output, create csv files and plot the histogram.

Faster regexp engine (optional)

Oniguruma is CRuby's regular expression engine. It is very fast and performs better than the one built into Go runtime. enry supports swapping between those two engines thanks to rubex project. The typical overall speedup from using Oniguruma is 1.5-2x. However, it requires CGo and the external shared library. On macOS with Homebrew, it is:

brew install oniguruma

On Ubuntu, it is

sudo apt install libonig-dev

To build enry with Oniguruma regexps use the oniguruma build tag

go get -v -t --tags oniguruma ./...

and then rebuild the project.

License

Apache License, Version 2.0. See LICENSE

Comments
  • Code generator Win support

    Code generator Win support

    Fixes #4

    On Win make code-generate produces unreasonable Bayesian classifier weights from Linguist samples silently, failing only the final classification tests.

    TestPlan:

    • passing tests on Win CI
      go test ./internal/code-generator/... \
       -run Test_GeneratorTestSuite -testify.m TestGenerationFiles
      
    opened by bzz 9
  • Expose IsTest and GetLangaugeType methods  & Fix test cases for Java Bindings

    Expose IsTest and GetLangaugeType methods & Fix test cases for Java Bindings

    Purpsoe

    [Fixes]:

    1. Correct failing test cases for Java Bindings so that make test command under Java does not fail.
    2. Expose isTest method in java bindings.

    [Features] :

    1. Export new function GetLanguageType at enry package in go & expose the same at Java Bindings.
    opened by UtsavChokshiCNU 5
  • Is there a prebuilt shared library?

    Is there a prebuilt shared library?

    I couldn't find any way in this repo on how to get the shared library so it can be used from other languages.

    So is there a prebuilt one somewhere or some instructions on how to build your own?

    opened by CodeMyst 5
  • Linguist update automation opens multiple PRs

    Linguist update automation opens multiple PRs

    The Linguist update automation runs once a day, so if the generated PR isn't merged by then, it will open another one!

    https://github.com/go-enry/go-enry/pull/68 #70

    help wanted 
    opened by look 4
  • Expose `LanguageInfo` with all Linguist data

    Expose `LanguageInfo` with all Linguist data

    As discussed in https://github.com/go-enry/go-enry/issues/54, this provides an API for accessing a LanguageInfo struct which is populated with all the data from the Linguist YAML source file. Functions are provided to access the LanguageInfo by name or ID.

    The other top-level functions like GetLanguageExtensions, GetLanguageGroup, etc. could in principle be implemented using this structure, which would simplify the code generation. But that would be a big change so I didn't do any of that. Perhaps in the next major version something like that would make sense.

    cc @tclem

    Closes https://github.com/go-enry/go-enry/issues/54

    opened by look 4
  • Python: API to expose highest-level enry.GetLanguage

    Python: API to expose highest-level enry.GetLanguage

    This is a blueprint for all other methods, dealing with go slice conversion.

    It still lacks on build automation (and there is no release automation whatsoever), but this is already useful.

    opened by bzz 4
  • data: replace substring package with regex package

    data: replace substring package with regex package

    This PR remote the old substring package from @toqueteos (sorry dude) and use the internal regex package to use oniguruma regexp with all the regular expressions.

    opened by mcuadros 4
  • IsVendor() overmatching paths

    IsVendor() overmatching paths

    I discovered this through our Gitea server flagging files as vendored through its use of enry.IsVendor(). Paths like oslo_cache/_bmemcache_pool.py and playbooks/roles/create-venv/tasks/main.yaml are inappropriately marked vendored.

    My hunch is that the first path is matching https://github.com/go-enry/go-enry/blob/7168084e5e5de38b915b1874528ff73f20a86b69/data/vendor.go#L9 and the second is matching https://github.com/go-enry/go-enry/blob/7168084e5e5de38b915b1874528ff73f20a86b69/data/vendor.go#L110

    I've written a little reproducer that removes Gitea from the equation:

    package main
    
    import "fmt"
    import "regexp"
    import "github.com/go-enry/go-enry/v2"
    
    func main() {
    	input_str1 := "oslo_cache/_bmemcache_pool.py"
    
    	rawregex1, _ := regexp.MatchString(`(^|/)cache/`, input_str1)
    	fmt.Println("Raw regex:", rawregex1)
    
    	vendor1 := enry.IsVendor(input_str1)
    	fmt.Println("IsVendor:", vendor1)
    
    	input_str2 := "playbooks/roles/create-venv/tasks/main.yaml"
    
    	rawregex2, _ := regexp.MatchString(`(^|/)env/`, input_str2)
    	fmt.Println("Raw regex:", rawregex2)
    
    	vendor2 := enry.IsVendor(input_str2)
    	fmt.Println("IsVendor:", vendor2)
    }
    

    When you run this the results are:

    Raw regex: false
    IsVendor: true
    Raw regex: false
    IsVendor: true
    

    What this shows us is that the raw input regexes appear to behave as expected. Neither of our example input strings matches which is what we expect. But when we call IsVendor() the result becomes true. I suspect that the init function https://github.com/go-enry/go-enry/blob/7168084e5e5de38b915b1874528ff73f20a86b69/utils.go#L139-L246 is either adding rules that collide or introducing some bug to the expanded regex that causes this to happen.

    opened by cboylan 3
  • Mark `go.sum` as generated?

    Mark `go.sum` as generated?

    Most diffs to go checksum files are pure noise, I'm wondering if anyone else agrees it should be marked as generated so tools like gitea can hide diffs on it? Linguist doesn't do it but I think diverging here is fine.

    enhancement wontfix 
    opened by silverwind 3
  • Use a deterministic branch name for Linguist updates

    Use a deterministic branch name for Linguist updates

    Rather than creating the branch for the update PR ahead of time using the date, this changes it to use the short hash of the Linguist commit that was found, and updates the code so that if the branch already exists, it will exit without creating a PR.

    This branch name should be the same between runs of the workflow (unless the Linguist release tag is changed, which warrants another update anyway) and should address the problem of creating one PR a day until the update is merged.

    You can see an example of a PR created by this code here: https://github.com/look/go-enry/pull/8 (note the branch name)

    closes https://github.com/go-enry/go-enry/issues/69

    cc @bzz @lafriks

    opened by look 3
  • GitHub Actions workflow to automatically update Linguist

    GitHub Actions workflow to automatically update Linguist

    This adds a GitHub Actions workflow that performs the steps necessary to update Linguist to the latest release tag and creates a PR.

    You can see an example of such a PR in my fork (it also includes the commits from this PR, but that won't happen in your repo once this is merged).

    I've included a workflow_dispatch trigger with an override for the Linguist tag, so you can test it out. In order to test it, the workflow file needs to exist on master, but once it does you can pick a branch and run the workflow from there. The Linguist tag is optional, but allows you to test updating the generated code when Linguist hasn't changed.

    Screen Shot 2021-10-08 at 4 17 38 PM

    I also included a schedule to trigger the workflow once a day. That will start running once the workflow is on master, but I haven't been able to test if it works yet. If not, likely it will require a small tweak to add an explicit secret.

    Hopefully this reduces the maintenance overhead of keeping go-enry up-to-date with Linguist releases! ๐Ÿ™‡

    Closes #51

    opened by look 3
  • ci: fix Python profile after ubuntu-latest 20.04->22.04 update

    ci: fix Python profile after ubuntu-latest 20.04->22.04 update

    Addresses https://github.com/go-enry/go-enry/pull/144#issuecomment-1328810979

    Github has changed the ubuntu-latest runner and it does not ship Python 3.6 any more ๐Ÿคท

    See https://github.com/actions/setup-python/issues/544#issuecomment-1320295576

    test plan:

    • CI profile for python is green
    opened by bzz 0
  • test: cover GetLanguageByContent confusing edge cases

    test: cover GetLanguageByContent confusing edge cases

    And clarify documentation wording, based on discussion at https://github.com/go-enry/go-enry/issues/145

    test plan:

    • go test -run '^Test_EnryTestSuite$' -testify.m '^(TestGetLanguageByContent)$' ./...
    opened by bzz 1
  • `GetLanguageByContent` returns an empty string

    `GetLanguageByContent` returns an empty string

    Following code returns an empty string:

    language, _ := enry.GetLanguageByContent("foo.cpp", "int main() { return 0; }")
    

    Expected: language == "C++"

    opened by kuba-- 1
  • Refactoring tests

    Refactoring tests

    Several cosmetic changes

    • API function declarations order follows tests order
    • Linguist lazy loading logic unified & re-used, as much as possible between tests&benchmark
    • Separate test suite extracted for running over Linguist samples/fixtures
    opened by bzz 3
  • Move venrod RE collation at codegen

    Move venrod RE collation at codegen

    Initially, a part of the #138, this changes the way RE collation optimization is applied to VendorMatchers - from run-time at package initialization to build-time, at code generation 840981b2403cc2ea58ce46937d6a7b7393ad47da.

    The second commit is all the data re-generated with it.

    enhancement 
    opened by bzz 0
Releases(v2.8.3)
  • v2.8.3(Oct 6, 2022)

    What's Changed

    • Update Linguist to v7.21.0 by ๐Ÿค– in https://github.com/go-enry/go-enry/pull/131
    • A ๐Ÿ› in performance optimisation of IsVendor() was fixed by @cboylan #135
    • Backported from Linguist: catch files generated with go-to-protobuf and yarn .pnp files by @lafriks #83

    New Contributors

    • @cboylan made their first contribution in https://github.com/go-enry/go-enry/pull/136

    Full Changelog: https://github.com/go-enry/go-enry/compare/v2.8.2...v2.8.3

    Source code(tar.gz)
    Source code(zip)
  • v2.8.2(Apr 11, 2022)

    What's Changed

    • Update Linguist to v7.20.0 by ๐Ÿค– in https://github.com/go-enry/go-enry/pull/124
    • Use a deterministic branch name for Linguist updates by @look fixing #69

    Full Changelog: https://github.com/go-enry/go-enry/compare/v2.8.1...v2.8.2

    Source code(tar.gz)
    Source code(zip)
  • v2.8.1(Apr 11, 2022)

    What's Changed

    • Update Linguist to v7.19.0 by ๐Ÿค– in https://github.com/go-enry/go-enry/pull/95
    • A check for non-backtracking subexpressions added to the list of invalid regexes by @lafriks in https://github.com/go-enry/go-enry/pull/118
    • poetry.lock is detected as generated by @silverwind in https://github.com/go-enry/go-enry/pull/112
    • Java Bindings: expose .IsTest and .GetLangaugeType methods by @UtsavChokshiCNU in https://github.com/go-enry/go-enry/pull/80

    New Contributors

    • @silverwind made their first contribution in https://github.com/go-enry/go-enry/pull/113
    • @UtsavChokshiCNU made their first contribution in https://github.com/go-enry/go-enry/pull/80

    Full Changelog: https://github.com/go-enry/go-enry/compare/v2.8.0...v2.8.1

    Source code(tar.gz)
    Source code(zip)
  • v2.8.0(Nov 17, 2021)

    What's Changed

    • Expose LanguageInfo with all Linguist data by @look in https://github.com/go-enry/go-enry/pull/62
    • GitHub Actions workflow to automatically update Linguist version by @look in https://github.com/go-enry/go-enry/pull/61 #72
    • Test robustness w.r.t upstream language renames by @bzz in https://github.com/go-enry/go-enry/pull/67
    • Update Linguist to v7.17.0 (release notes) by @github-actions in https://github.com/go-enry/go-enry/pull/66

    New Contributors

    • @github-actions ๐Ÿค– made their first contribution ๐ŸŽ‰ in https://github.com/go-enry/go-enry/pull/66

    Full Changelog: https://github.com/go-enry/go-enry/compare/v2.7.2...v2.8.0

    Source code(tar.gz)
    Source code(zip)
  • v2.7.2(Sep 26, 2021)

    New Features

    • sync with the latest github/linguist v7.16.1 #60
    • improved GetLanguagesByShebang accuracy #56 , #58

    Infra

    • CI runs the latest go 1.16.x, 1.17.x releases #59

    Contributors

    • @lafriks
    • @rykov

    All the changes in v2.7.2 release

    Source code(tar.gz)
    Source code(zip)
  • v2.7.1(Jun 18, 2021)

    New Features

    • sync with the latest github/linguist v7.14.0 #52

    Infra

    • CI runs the latest go 1.15.x, 1.16.x releases #53

    Contributors

    • @look
    • @mcuadros
    Source code(tar.gz)
    Source code(zip)
  • v2.7.0(Apr 24, 2021)

    New Features

    • New GetLanguageID API introduced, to expose stable numerical IDs for all the languages #46
    • Rust bindings are available now at https://github.com/go-enry/rs-enry

    Fixes

    • GetLanguages now behaves exactly like Linguist.detect, resolving a long-standing src-d/enry#207 ๐ŸŽ‰ #47
    • IsVendor is optimized #44, benchmarked and tested better #45

    Contributors

    • @look
    • @6543
    • @zeripath
    • @vsmaxim
    Source code(tar.gz)
    Source code(zip)
  • v2.6.1(Mar 12, 2021)

  • v2.6.0(Dec 3, 2020)

    New Features

    • sync with the latest github/linguist 7.12.1 #39 (\wo native tokenizer update for -tags flex)
    • new GetLanguagesByXML strategy detecting XML by first 2 lines of file content #40
    • new GetLanguagesByManpage strategy detecting Roff manpages by filename #39
    • Python bindings are auto-generated now and have test (not part of the release yet, see #31, and known to leak memory a little bit, see #36 ) #29

    Fixes

    Contributors

    • @vsmaxim
    • @lafriks
    • @villelaitila
    Source code(tar.gz)
    Source code(zip)
  • v2.5.2(May 29, 2020)

  • v2.5.1(May 29, 2020)

  • v2.5.0(May 29, 2020)

  • v2.4.1(May 6, 2020)

  • v2.4.0(Apr 16, 2020)

  • v2.3.0(Mar 31, 2020)

    New Features

    • sync to the latest github/linguist v7.9.0 #3
    • getting HTML colors for languages falls back on a group default #2
    • Windows support ๐ŸŽ‰ for building and running go library (\wo oniguruma) #4
    • new CI for all 3 platforms, based on Github Actions
    • this and some further releases will not include Java library published yet, until #6 is resolved

    Fixes

    • 33 new languages added, 2 reworked (Perl 6 -> Raku, Visual Basic -> VBA, VBScript)
    • 13 content-based heuristics were improved and new disambiguations were added (.v for Coq, V or Verlilogl; .s for Motorola assembly; .plist for XML, OpenStep; .odin, .p Gnuplot, OpenEdge ABL)
    • BUCK, BUILD, BUILD.bazel and WORKSPACE of Bazel/Pants are all now recognized as Starlark, instead of general Python, as before
    • new well-known filenames recognized: .dircolors as GNU dircolor, .inputrc as readline, .curlrc as cURL config, .npmrc as NPM config, troffrc as Roff, yarn.lock as YAML.
    • vending detection improvements:dotnet-install is ignored, as well as .yarn/releases

    Contributors

    • @lafriks
    • @bzz
    • @mcuadros
    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Mar 19, 2020)

  • v2.1.0(Mar 19, 2020)

    New Features

    • sync to the latest github/linguist v7.5.1
    • a new API call for getting HTML colors for languages enry.GetColor(language) Generated from linguist languages.yml

    Fixes

    • content-based heuristics improved and new disambiguations were added (.vba for Vim script, .sql for TSQL, GraphQL)
    • shebang-based heuristic ignore osascript -l that can be non-interpretable language
    • vending detection improvements:testdata is ignored as Go fixtures, bulma.css as well
    • 21 new languages added (1 removed: Bro)
      • Altium Designer
      • Cabal Config
      • Dhall
      • EditorConfig
      • HolyC
      • JavaScript+ERB
      • Jsonnet
      • Motorola 68K Assembly
      • ObjectScript
      • Rich Text Format
      • SSH Config
      • Svelte
      • TSQL
      • TSX
      • WebVTT
      • Wollok
      • ZAP
      • ZIL
      • Zeek
      • ZenScript
      • mcfunction

    New contributors

    @lafriks

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Mar 19, 2020)

    New features

    • First release with go module support.
    • Import directly from github.com/src-d/enry/v2
    • Optional Flex-based tokenizer, same as Linguist uses. Hidden behind -tags flex, improves content classifier accuracy.

    Fixes

    • Optional oniguruma-based tokenizer now is based on new Oniguruma v6.x and produces consistent with RE2 results on all the samples from Linguist (including non-utf8).

    Full list of issues tracked under v2.0.0 milestone.

    Source code(tar.gz)
    Source code(zip)
  • v1.7.3(Mar 19, 2020)

    New Features

    • CLI application, when used in file modeenry <filename>, now includes information about file vendoring #217
    • CLI application defaults are now follow GIthub Linguist #214 Only Programming Languages and Markup files are reported -all allows for previous behaviour -prog was removed -mode=bytes is default (instead of files before)

    Fixes

    • -mode=lines/bytes produces actual results
    • unusable enry-java JAR artefact is not published on GH release any more. It's only on distributed though Maven.

    New contributors

    @SuhaibMujahid

    Source code(tar.gz)
    Source code(zip)
  • v1.7.2(Mar 19, 2020)

    New Features

    None

    Fixes

    • multiple candidates returned instead of empty slice (e.g for .h) #205
    • github.com/src-d/go-oniguruma is used now #206

    New contributors

    @kuba--

    Source code(tar.gz)
    Source code(zip)
  • v1.7.1(Mar 19, 2020)

  • v1.7.0(Mar 19, 2020)

    New Features

    #189 sync to linguist v7.2.0

    Generations of heuristics disambiguating files with the same extensions was simplified, that means

    • quality of the judgements that enry makes about language was improved
    • updates with Linguist upsteam will be done more frequently

    Summary of the upstream changes

    6 languages removed:

    • Arduino
    • KiCad Board
    • Matlab
    • PAWN
    • Sublime Text Config
    • XPM

    43 languages added:

    • AngelScript
    • Asymptote
    • Ballerina
    • Cloud Firestore Security Rules
    • CoNLL-U
    • Common Workflow Language
    • DataWeave
    • EML
    • Edje Data Collection
    • F*
    • FIGlet Font
    • Git Attributes
    • Git Config
    • Glyph Bitmap Distribution Format
    • HAProxy
    • HTML+Razor
    • HXML
    • HiveQL
    • Ignore List
    • JSON with Comments
    • Java Properties
    • KiCad Legacy Layout
    • LTspice Symbol
    • MATLAB
    • Modula-3
    • Nearley
    • Nextflow
    • Pawn
    • Pod 6
    • PostCSS
    • Quake
    • RPC
    • Roff Manpage
    • Slice
    • Solidity
    • SugarSS
    • Windows Registry Entries
    • X BitMap
    • X Font Directory Index
    • X PixMap
    • YARA
    • YASnippet
    • Zig

    Known Issues

    Although Languages and Heuristics were synced with upstream, it's not reproducing 100% of linguist yet

    • The missing parts are tracked under #155
    • Current difference is documented in https://github.com/src-d/enry/#divergences-from-linguist
    Source code(tar.gz)
    Source code(zip)
  • v1.6.8(Mar 19, 2020)

  • v1.6.7(Mar 19, 2020)

  • v1.6.6(Mar 19, 2020)

  • v1.6.5(Mar 19, 2020)

  • v1.6.4(Mar 19, 2020)

  • v1.6.3(Mar 19, 2020)

  • v1.6.2(Mar 19, 2020)

    • Use a precompiled package for osxcross for CI (#133).
    • Fix crash by checking for empty filenames (#134). Solves #129.
    • java: bump version to 1.6.2 (#135).
    Source code(tar.gz)
    Source code(zip)
  • v1.6.1(Mar 19, 2020)

Owner
go-enry
A faster file programming language detector, based on Linguist
go-enry
Abstract File Storage

afs - abstract file storage Please refer to CHANGELOG.md if you encounter breaking changes. Motivation Introduction Usage Matchers Content modifiers S

Viant, Inc 215 Nov 26, 2022
a tool for handling file uploads simple

baraka a tool for handling file uploads for http servers makes it easier to make operations with files from the http request. Contents Install Simple

Enes Furkan Olcay 45 Nov 13, 2022
Bigfile -- a file transfer system that supports http, rpc and ftp protocol https://bigfile.site

Bigfile โ€”โ€”โ€”โ€” a file transfer system that supports http, rpc and ftp protocol ็ฎ€ไฝ“ไธญๆ–‡ โˆ™ English Bigfile is a file transfer system, supports http, ftp and

null 233 Oct 28, 2022
Go file operations library chasing GNU APIs.

flop flop aims to make copying files easier in Go, and is modeled after GNU cp. Most administrators and engineers interact with GNU utilities every da

The Home Depot 33 Nov 10, 2022
Read csv file from go using tags

go-csv-tag Read csv file from Go using tags The project is in maintenance mode. It is kept compatible with changes in the Go ecosystem but no new feat

Louis 101 Nov 16, 2022
File system event notification library on steroids.

notify Filesystem event notification library on steroids. (under active development) Documentation godoc.org/github.com/rjeczalik/notify Installation

Rafal Jeczalik 773 Nov 30, 2022
Pluggable, extensible virtual file system for Go

vfs Package vfs provides a pluggable, extensible, and opinionated set of file system functionality for Go across a number of file system types such as

C2FO 204 Nov 21, 2022
An epoll(7)-based file-descriptor multiplexer.

poller Package poller is a file-descriptor multiplexer. Download: go get github.com/npat-efault/poller Package poller is a file-descriptor multiplexer

Nick Patavalis 107 Sep 25, 2022
QueryCSV enables you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to a CSV file

QueryCSV enable you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to CSV file

Mohamed Shapan 100 Dec 22, 2021
Goful is a CUI file manager written in Go.

Goful Goful is a CUI file manager written in Go. Works on cross-platform such as gnome-terminal and cmd.exe. Displays multiple windows and workspaces.

anmitsu 298 Nov 27, 2022
Read a tar file contents using go1.16 io/fs abstraction

go-tarfs Read a tar file contents using go1.16 io/fs abstraction Usage โš ๏ธ go-tarfs needs go>=1.16 Install: go get github.com/nlepage/go-tarfs Use: pac

Nicolas Lepage 19 Nov 5, 2022
Open Source Continuous File Synchronization

Goals Syncthing is a continuous file synchronization program. It synchronizes files between two or more computers. We strive to fulfill the goals belo

The Syncthing Project 47.9k Nov 27, 2022
Cross-platform file system notifications for Go.

File system notifications for Go fsnotify utilizes golang.org/x/sys rather than syscall from the standard library. Ensure you have the latest version

fsnotify 7.7k Dec 1, 2022
The best HTTP Static File Server, write with golang+vue

gohttpserver Goal: Make the best HTTP File Server. Features: Human-friendly UI, file uploading support, direct QR-code generation for Apple & Android

Sound Sun 1.9k Nov 25, 2022
Dragonfly is an intelligent P2P based image and file distribution system.

Dragonfly Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in o

dragonflyoss 6k Nov 26, 2022
Fast, dependency-free, small Go package to infer the binary file type based on the magic numbers signature

filetype Small and dependency free Go package to infer file and MIME type checking the magic numbers signature. For SVG file type checking, see go-is-

Tom 1.7k Nov 25, 2022
๐Ÿ“‚ Web File Browser

filebrowser provides a file managing interface within a specified directory and it can be used to upload, delete, preview, rename and edit your files.

File Browser 17.8k Nov 24, 2022
Plik is a scalable & friendly temporary file upload system ( wetransfer like ) in golang.

Want to chat with us ? Telegram channel : https://t.me/plik_root_gg Plik Plik is a scalable & friendly temporary file upload system ( wetransfer like

root.gg 1.1k Dec 1, 2022
File system for GitHub

HUBFS ยท File System for GitHub HUBFS is a read-only file system for GitHub and Git. Git repositories and their contents are represented as regular dir

Bill Zissimopoulos 1.6k Nov 25, 2022