A faster file programming language detector

Overview

go-enry GoDoc Test codecov

Programming language detector and toolbox to ignore binary or vendored files. enry, started as a port to Go of the original Linguist Ruby library, that has an improved 2x performance.

CLI

The CLI binary is hosted in a separate repository go-enry/enry.

Library

enry is also a Go library for guessing a programming language that exposes API through FFI to multiple programming environments.

Use cases

enry guesses a programming language using a sequence of matching strategies that are applied progressively to narrow down the possible options. Each strategy varies on the type of input data that it needs to make a decision: file name, extension, the first line of the file, the full content of the file, etc.

Depending on available input data, enry API can be roughly divided into the next categories or use cases.

By filename

Next functions require only a name of the file to make a guess:

  • GetLanguageByExtension uses only file extension (wich may be ambiguous)
  • GetLanguageByFilename useful for cases like .gitignore, .bashrc, etc
  • all filtering helpers

Please note that such guesses are expected not to be very accurate.

By text

To make a guess only based on the content of the file or a text snippet, use

  • GetLanguageByShebang reads only the first line of text to identify the shebang.

  • GetLanguageByModeline for cases when Vim/Emacs modeline e.g. /* vim: set ft=cpp: */ may be present at a head or a tail of the text.

  • GetLanguageByClassifier uses a Bayesian classifier trained on all the ./samples/ from Linguist.

    It usually is a last-resort strategy that is used to disambiguate the guess of the previous strategies, and thus it requires a list of "candidate" guesses. One can provide a list of all known languages - keys from the data.LanguagesLogProbabilities as possible candidates if more intelligent hypotheses are not available, at the price of possibly suboptimal accuracy.

By file

The most accurate guess would be one when both, the file name and the content are available:

  • GetLanguagesByContent only uses file extension and a set of regexp-based content heuristics.
  • GetLanguages uses the full set of matching strategies and is expected to be most accurate.

Filtering: vendoring, binaries, etc

enry expose a set of file-level helpers Is* to simplify filtering out the files that are less interesting for the purpose of source code analysis:

  • IsBinary
  • IsVendor
  • IsConfiguration
  • IsDocumentation
  • IsDotFile
  • IsImage
  • IsTest
  • IsGenerated

Language colors and groups

enry exposes function to get language color to use for example in presenting statistics in graphs:

  • GetColor
  • GetLanguageGroup can be used to group similar languages together e.g. for Less this function will return CSS

Languages

Go

In a Go module, import enry to the module by running:

go get github.com/go-enry/go-enry/v2

The rest of the examples will assume you have either done this or fetched the library into your GOPATH.

")) fmt.Println(lang, safe) // result: Matlab true lang, safe := enry.GetLanguageByContent("bar.m", []byte("")) fmt.Println(lang, safe) // result: Objective-C true // all strategies together lang := enry.GetLanguage("foo.cpp", []byte("")) // result: C++ true ">
// The examples here and below assume you have imported the library.
import "github.com/go-enry/go-enry/v2"

lang, safe := enry.GetLanguageByExtension("foo.go")
fmt.Println(lang, safe)
// result: Go true

lang, safe := enry.GetLanguageByContent("foo.m", []byte(""))
fmt.Println(lang, safe)
// result: Matlab true

lang, safe := enry.GetLanguageByContent("bar.m", []byte(""))
fmt.Println(lang, safe)
// result: Objective-C true

// all strategies together
lang := enry.GetLanguage("foo.cpp", []byte(""))
// result: C++ true

Note that the returned boolean value safe is true if there is only one possible language detected.

A plural version of the same API allows getting a list of all possible languages for a given file.

")) // result: []string{"C", "C++", "Objective-C} langs := enry.GetLanguagesByExtension("foo.asc", []byte(""), nil) // result: []string{"AGS Script", "AsciiDoc", "Public Key"} langs := enry.GetLanguagesByFilename("Gemfile", []byte(""), []string{}) // result: []string{"Ruby"} ">
langs := enry.GetLanguages("foo.h",  []byte(""))
// result: []string{"C", "C++", "Objective-C}

langs := enry.GetLanguagesByExtension("foo.asc", []byte(""), nil)
// result: []string{"AGS Script", "AsciiDoc", "Public Key"}

langs := enry.GetLanguagesByFilename("Gemfile", []byte(""), []string{})
// result: []string{"Ruby"}

Java bindings

Generated Java bindings using a C shared library and JNI are available under java.

A library is published on Maven as tech.sourced:enry-java for macOS and linux platforms. Windows support is planned under src-d/enry#150.

Python bindings

Generated Python bindings using a C shared library and cffi are WIP under src-d/enry#154.

A library is going to be published on pypi as enry for macOS and linux platforms. Windows support is planned under src-d/enry#150.

Rust bindings

Generated Rust bindings using a C static library are available at https://github.com/go-enry/rs-enry.

Divergences from Linguist

The enry library is based on the data from github/linguist version v7.14.0.

Parsing linguist/samples the following enry results are different from the Linguist:

In all the cases above that have an issue number - we plan to update enry to match Linguist behavior.

Benchmarks

Enry's language detection has been compared with Linguist's on linguist/samples.

We got these results:

histogram

The histogram shows the number of files (y-axis) per time interval bucket (x-axis). Most of the files were detected faster by enry.

There are several cases where enry is slower than Linguist due to Go regexp engine being slower than Ruby's on, wich is based on oniguruma library, written in C.

See instructions for running enry with oniguruma.

Why Enry?

In the movie My Fair Lady, Professor Henry Higgins is a linguist who at the very beginning of the movie enjoys guessing the origin of people based on their accent.

"Enry Iggins" is how Eliza Doolittle, pronounces the name of the Professor.

Development

To run the tests use:

go test ./...

Setting ENRY_TEST_REPO to the path to existing checkout of Linguist will avoid cloning it and sepeed tests up. Setting ENRY_DEBUG=1 will provide insight in the Bayesian classifier building done by make code-generate.

Sync with github/linguist upstream

enry re-uses parts of the original github/linguist to generate internal data structures. In order to update to the latest release of linguist do:

$ git clone https://github.com/github/linguist.git .linguist
$ cd .linguist; git checkout <release-tag>; cd ..

# put the new release's commit sha in the generator_test.go (to re-generate .gold test fixtures)
# https://github.com/go-enry/go-enry/blob/13d3d66d37a87f23a013246a1b0678c9ee3d524b/internal/code-generator/generator/generator_test.go#L18

$ make code-generate

To stay in sync, enry needs to be updated when a new release of the linguist includes changes to any of the following files:

There is no automation for detecting the changes in the linguist project, so this process above has to be done manually from time to time.

When submitting a pull request syncing up to a new release, please make sure it only contains the changes in the generated files (in data subdirectory).

Separating all the necessary "manual" code changes to a different PR that includes some background description and an update to the documentation on "divergences from linguist" is very much appreciated as it simplifies the maintenance (review/release notes/etc).

Misc

Running a benchmark & faster regexp engine

Benchmark

All benchmark scripts are in benchmarks directory.

Dependencies

As benchmarks depend on Ruby and Github-Linguist gem make sure you have:

  • Ruby (e.g using rbenv), bundler installed
  • Docker
  • native dependencies installed
  • Build the gem cd .linguist && bundle install && rake build_gem && cd -
  • Install it gem install --no-rdoc --no-ri --local .linguist/github-linguist-*.gem

Quick benchmark

To run quicker benchmarks

make benchmarks

to get average times for the primary detection function and strategies for the whole samples set. If you want to see measures per sample file use:

make benchmarks-samples

Full benchmark

If you want to reproduce the same benchmarks as reported above:

  • Make sure all dependencies are installed
  • Install gnuplot (in order to plot the histogram)
  • Run ENRY_TEST_REPO="$PWD/.linguist" benchmarks/run.sh (takes ~15h)

It will run the benchmarks for enry and Linguist, parse the output, create csv files and plot the histogram.

Faster regexp engine (optional)

Oniguruma is CRuby's regular expression engine. It is very fast and performs better than the one built into Go runtime. enry supports swapping between those two engines thanks to rubex project. The typical overall speedup from using Oniguruma is 1.5-2x. However, it requires CGo and the external shared library. On macOS with Homebrew, it is:

brew install oniguruma

On Ubuntu, it is

sudo apt install libonig-dev

To build enry with Oniguruma regexps use the oniguruma build tag

go get -v -t --tags oniguruma ./...

and then rebuild the project.

License

Apache License, Version 2.0. See LICENSE

Comments
  • Code generator Win support

    Code generator Win support

    Fixes #4

    On Win make code-generate produces unreasonable Bayesian classifier weights from Linguist samples silently, failing only the final classification tests.

    TestPlan:

    • passing tests on Win CI
      go test ./internal/code-generator/... \
       -run Test_GeneratorTestSuite -testify.m TestGenerationFiles
      
    opened by bzz 9
  • Expose IsTest and GetLangaugeType methods  & Fix test cases for Java Bindings

    Expose IsTest and GetLangaugeType methods & Fix test cases for Java Bindings

    Purpsoe

    [Fixes]:

    1. Correct failing test cases for Java Bindings so that make test command under Java does not fail.
    2. Expose isTest method in java bindings.

    [Features] :

    1. Export new function GetLanguageType at enry package in go & expose the same at Java Bindings.
    opened by UtsavChokshiCNU 5
  • Is there a prebuilt shared library?

    Is there a prebuilt shared library?

    I couldn't find any way in this repo on how to get the shared library so it can be used from other languages.

    So is there a prebuilt one somewhere or some instructions on how to build your own?

    opened by CodeMyst 5
  • Refactoring tests

    Refactoring tests

    Several cosmetic changes

    • API function declarations order follows tests order
    • Linguist lazy loading logic unified & re-used, as much as possible between tests&benchmark
    • Separate test suite extracted for running over Linguist samples/fixtures
    opened by bzz 4
  • Linguist update automation opens multiple PRs

    Linguist update automation opens multiple PRs

    The Linguist update automation runs once a day, so if the generated PR isn't merged by then, it will open another one!

    https://github.com/go-enry/go-enry/pull/68 #70

    help wanted 
    opened by look 4
  • Expose `LanguageInfo` with all Linguist data

    Expose `LanguageInfo` with all Linguist data

    As discussed in https://github.com/go-enry/go-enry/issues/54, this provides an API for accessing a LanguageInfo struct which is populated with all the data from the Linguist YAML source file. Functions are provided to access the LanguageInfo by name or ID.

    The other top-level functions like GetLanguageExtensions, GetLanguageGroup, etc. could in principle be implemented using this structure, which would simplify the code generation. But that would be a big change so I didn't do any of that. Perhaps in the next major version something like that would make sense.

    cc @tclem

    Closes https://github.com/go-enry/go-enry/issues/54

    opened by look 4
  • Python: API to expose highest-level enry.GetLanguage

    Python: API to expose highest-level enry.GetLanguage

    This is a blueprint for all other methods, dealing with go slice conversion.

    It still lacks on build automation (and there is no release automation whatsoever), but this is already useful.

    opened by bzz 4
  • data: replace substring package with regex package

    data: replace substring package with regex package

    This PR remote the old substring package from @toqueteos (sorry dude) and use the internal regex package to use oniguruma regexp with all the regular expressions.

    opened by mcuadros 4
  • IsVendor() overmatching paths

    IsVendor() overmatching paths

    I discovered this through our Gitea server flagging files as vendored through its use of enry.IsVendor(). Paths like oslo_cache/_bmemcache_pool.py and playbooks/roles/create-venv/tasks/main.yaml are inappropriately marked vendored.

    My hunch is that the first path is matching https://github.com/go-enry/go-enry/blob/7168084e5e5de38b915b1874528ff73f20a86b69/data/vendor.go#L9 and the second is matching https://github.com/go-enry/go-enry/blob/7168084e5e5de38b915b1874528ff73f20a86b69/data/vendor.go#L110

    I've written a little reproducer that removes Gitea from the equation:

    package main
    
    import "fmt"
    import "regexp"
    import "github.com/go-enry/go-enry/v2"
    
    func main() {
    	input_str1 := "oslo_cache/_bmemcache_pool.py"
    
    	rawregex1, _ := regexp.MatchString(`(^|/)cache/`, input_str1)
    	fmt.Println("Raw regex:", rawregex1)
    
    	vendor1 := enry.IsVendor(input_str1)
    	fmt.Println("IsVendor:", vendor1)
    
    	input_str2 := "playbooks/roles/create-venv/tasks/main.yaml"
    
    	rawregex2, _ := regexp.MatchString(`(^|/)env/`, input_str2)
    	fmt.Println("Raw regex:", rawregex2)
    
    	vendor2 := enry.IsVendor(input_str2)
    	fmt.Println("IsVendor:", vendor2)
    }
    

    When you run this the results are:

    Raw regex: false
    IsVendor: true
    Raw regex: false
    IsVendor: true
    

    What this shows us is that the raw input regexes appear to behave as expected. Neither of our example input strings matches which is what we expect. But when we call IsVendor() the result becomes true. I suspect that the init function https://github.com/go-enry/go-enry/blob/7168084e5e5de38b915b1874528ff73f20a86b69/utils.go#L139-L246 is either adding rules that collide or introducing some bug to the expanded regex that causes this to happen.

    opened by cboylan 3
  • Mark `go.sum` as generated?

    Mark `go.sum` as generated?

    Most diffs to go checksum files are pure noise, I'm wondering if anyone else agrees it should be marked as generated so tools like gitea can hide diffs on it? Linguist doesn't do it but I think diverging here is fine.

    enhancement wontfix 
    opened by silverwind 3
  • Use a deterministic branch name for Linguist updates

    Use a deterministic branch name for Linguist updates

    Rather than creating the branch for the update PR ahead of time using the date, this changes it to use the short hash of the Linguist commit that was found, and updates the code so that if the branch already exists, it will exit without creating a PR.

    This branch name should be the same between runs of the workflow (unless the Linguist release tag is changed, which warrants another update anyway) and should address the problem of creating one PR a day until the update is merged.

    You can see an example of a PR created by this code here: https://github.com/look/go-enry/pull/8 (note the branch name)

    closes https://github.com/go-enry/go-enry/issues/69

    cc @bzz @lafriks

    opened by look 3
  • Syntax-aware regexp generation for configurable engines

    Syntax-aware regexp generation for configurable engines

    This is alternative to #138 where on build-time we always generate unaltered regexp syntax for all the rules and make runtime checks, similar to #65.

    This, by itself, does not solve the problem of dealing with more non-RE2 syntax coming from linguist, only renders it more visible. The only solution that I can see now that would not require everyone using native library (oniguruma) or compromising on predictive accuracy (due to missing rules for unsupported syntax) is to try shipping another regexp engine in go that would support the necessary syntax.

    TODOs

    • [x] update content heuristics generation
    • [ ] update vendor generation
    • [ ] add Oniguruma-only tests, as in #65
    • [ ] add https://github.com/dlclark/regexp2 backend option
    opened by bzz 1
  • [tentative] Check vendor regex at build-time

    [tentative] Check vendor regex at build-time

    This change does several things:

    • ~refactors optimization of vendor regexp collation for IsVendor()~
    • ~moves it to build-time "code generation" phase (instead of runtime at package initialisation)~
    • introduces RE2 syntax check for vendor.go (case that fails #137), the same as we use for heuristics from content.go (and skip the rules with unsupported syntax)
    • adds new CI profiles to test code generation

    The attempt is made for the checks to be RE-lib-specific, thus make code-generate now also respects the same --tags and should be passed when using of oniguruma is desired.

    opened by bzz 3
  • JNAerator throws exceptions in Docker for Apple Silicon

    JNAerator throws exceptions in Docker for Apple Silicon

    I've found out that this project uses JNAerator (https://github.com/nativelibs4java/JNAerator) and JNAerator does not seem to be maintained anymore, being Sep 30, 2015 its last commit date.

    The JNAerator in this project uses JNA version 4.1.0 and this version does not provide for 'linux-aarch64'. (4.2.0 and above does.)

    I am using go-enry in my project and it gives out the following error in Docker for Apple Silicon. java.lang.UnsatisfiedLinkError: Native library (com/sun/jna/linux-aarch64/libjnidispatch.so) not found in resource path

    I hope there's a way to fix this!

    help wanted 
    opened by citygxoxo 1
  • support for incompatible heuristics using oniguruma

    support for incompatible heuristics using oniguruma

    This PR enables the support of all the non-support heuristics due to the go regexp engine.

    • Exposes the original regular expressions to the regex package.
    • The regex package now handles the conversions of the Ruby regular expression to the go-ish version.
    • The heuristics rules now support nil regular expressions since some of the heuristics can't compile using the standard library.

    I added some tests using the linguist fixtures and passes all the fixtures (using onigurama).

    opened by mcuadros 5
  • Python bindings are memory leaking

    Python bindings are memory leaking

    All python wrappers are memory leaking.

    This may happen in several places:

    1. In CFFI - layer (when structures are converted between Python and Go runtimes)
    2. In CGo shared library (when C structures are converted in Go instances)
    3. Somewhere in enry library (nearly impossible, because all wrappers are leaking)

    You can reproduce it using this test script:

    import os
    
    import enry
    import psutil
    
    process = psutil.Process(os.getpid())
    
    content = "import os\nprint('Hello, world!')".encode()
    path = "test.py"
    initial_usage = process.memory_info().rss
    
    for _ in range(10000):
        enry.get_language(path, content)
    
    print(round(process.memory_info().rss / initial_usage * 100, 2))
    

    Here is some info about memory usage of get_language function (in % from initial ram usage), you can see that it's leaking: 1000 iterations: ~100.5 10000 iterations: ~102.4 100000 iterations: ~111.1

    bug 
    opened by vsmaxim 2
Releases(v2.8.3)
  • v2.8.3(Oct 6, 2022)

    What's Changed

    • Update Linguist to v7.21.0 by ๐Ÿค– in https://github.com/go-enry/go-enry/pull/131
    • A ๐Ÿ› in performance optimisation of IsVendor() was fixed by @cboylan #135
    • Backported from Linguist: catch files generated with go-to-protobuf and yarn .pnp files by @lafriks #83

    New Contributors

    • @cboylan made their first contribution in https://github.com/go-enry/go-enry/pull/136

    Full Changelog: https://github.com/go-enry/go-enry/compare/v2.8.2...v2.8.3

    Source code(tar.gz)
    Source code(zip)
  • v2.8.2(Apr 11, 2022)

    What's Changed

    • Update Linguist to v7.20.0 by ๐Ÿค– in https://github.com/go-enry/go-enry/pull/124
    • Use a deterministic branch name for Linguist updates by @look fixing #69

    Full Changelog: https://github.com/go-enry/go-enry/compare/v2.8.1...v2.8.2

    Source code(tar.gz)
    Source code(zip)
  • v2.8.1(Apr 11, 2022)

    What's Changed

    • Update Linguist to v7.19.0 by ๐Ÿค– in https://github.com/go-enry/go-enry/pull/95
    • A check for non-backtracking subexpressions added to the list of invalid regexes by @lafriks in https://github.com/go-enry/go-enry/pull/118
    • poetry.lock is detected as generated by @silverwind in https://github.com/go-enry/go-enry/pull/112
    • Java Bindings: expose .IsTest and .GetLangaugeType methods by @UtsavChokshiCNU in https://github.com/go-enry/go-enry/pull/80

    New Contributors

    • @silverwind made their first contribution in https://github.com/go-enry/go-enry/pull/113
    • @UtsavChokshiCNU made their first contribution in https://github.com/go-enry/go-enry/pull/80

    Full Changelog: https://github.com/go-enry/go-enry/compare/v2.8.0...v2.8.1

    Source code(tar.gz)
    Source code(zip)
  • v2.8.0(Nov 17, 2021)

    What's Changed

    • Expose LanguageInfo with all Linguist data by @look in https://github.com/go-enry/go-enry/pull/62
    • GitHub Actions workflow to automatically update Linguist version by @look in https://github.com/go-enry/go-enry/pull/61 #72
    • Test robustness w.r.t upstream language renames by @bzz in https://github.com/go-enry/go-enry/pull/67
    • Update Linguist to v7.17.0 (release notes) by @github-actions in https://github.com/go-enry/go-enry/pull/66

    New Contributors

    • @github-actions ๐Ÿค– made their first contribution ๐ŸŽ‰ in https://github.com/go-enry/go-enry/pull/66

    Full Changelog: https://github.com/go-enry/go-enry/compare/v2.7.2...v2.8.0

    Source code(tar.gz)
    Source code(zip)
  • v2.7.2(Sep 26, 2021)

    New Features

    • sync with the latest github/linguist v7.16.1 #60
    • improved GetLanguagesByShebang accuracy #56 , #58

    Infra

    • CI runs the latest go 1.16.x, 1.17.x releases #59

    Contributors

    • @lafriks
    • @rykov

    All the changes in v2.7.2 release

    Source code(tar.gz)
    Source code(zip)
  • v2.7.1(Jun 18, 2021)

    New Features

    • sync with the latest github/linguist v7.14.0 #52

    Infra

    • CI runs the latest go 1.15.x, 1.16.x releases #53

    Contributors

    • @look
    • @mcuadros
    Source code(tar.gz)
    Source code(zip)
  • v2.7.0(Apr 24, 2021)

    New Features

    • New GetLanguageID API introduced, to expose stable numerical IDs for all the languages #46
    • Rust bindings are available now at https://github.com/go-enry/rs-enry

    Fixes

    • GetLanguages now behaves exactly like Linguist.detect, resolving a long-standing src-d/enry#207 ๐ŸŽ‰ #47
    • IsVendor is optimized #44, benchmarked and tested better #45

    Contributors

    • @look
    • @6543
    • @zeripath
    • @vsmaxim
    Source code(tar.gz)
    Source code(zip)
  • v2.6.1(Mar 12, 2021)

  • v2.6.0(Dec 3, 2020)

    New Features

    • sync with the latest github/linguist 7.12.1 #39 (\wo native tokenizer update for -tags flex)
    • new GetLanguagesByXML strategy detecting XML by first 2 lines of file content #40
    • new GetLanguagesByManpage strategy detecting Roff manpages by filename #39
    • Python bindings are auto-generated now and have test (not part of the release yet, see #31, and known to leak memory a little bit, see #36 ) #29

    Fixes

    Contributors

    • @vsmaxim
    • @lafriks
    • @villelaitila
    Source code(tar.gz)
    Source code(zip)
  • v2.5.2(May 29, 2020)

  • v2.5.1(May 29, 2020)

  • v2.5.0(May 29, 2020)

  • v2.4.1(May 6, 2020)

  • v2.4.0(Apr 16, 2020)

  • v2.3.0(Mar 31, 2020)

    New Features

    • sync to the latest github/linguist v7.9.0 #3
    • getting HTML colors for languages falls back on a group default #2
    • Windows support ๐ŸŽ‰ for building and running go library (\wo oniguruma) #4
    • new CI for all 3 platforms, based on Github Actions
    • this and some further releases will not include Java library published yet, until #6 is resolved

    Fixes

    • 33 new languages added, 2 reworked (Perl 6 -> Raku, Visual Basic -> VBA, VBScript)
    • 13 content-based heuristics were improved and new disambiguations were added (.v for Coq, V or Verlilogl; .s for Motorola assembly; .plist for XML, OpenStep; .odin, .p Gnuplot, OpenEdge ABL)
    • BUCK, BUILD, BUILD.bazel and WORKSPACE of Bazel/Pants are all now recognized as Starlark, instead of general Python, as before
    • new well-known filenames recognized: .dircolors as GNU dircolor, .inputrc as readline, .curlrc as cURL config, .npmrc as NPM config, troffrc as Roff, yarn.lock as YAML.
    • vending detection improvements:dotnet-install is ignored, as well as .yarn/releases

    Contributors

    • @lafriks
    • @bzz
    • @mcuadros
    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Mar 19, 2020)

  • v2.1.0(Mar 19, 2020)

    New Features

    • sync to the latest github/linguist v7.5.1
    • a new API call for getting HTML colors for languages enry.GetColor(language) Generated from linguist languages.yml

    Fixes

    • content-based heuristics improved and new disambiguations were added (.vba for Vim script, .sql for TSQL, GraphQL)
    • shebang-based heuristic ignore osascript -l that can be non-interpretable language
    • vending detection improvements:testdata is ignored as Go fixtures, bulma.css as well
    • 21 new languages added (1 removed: Bro)
      • Altium Designer
      • Cabal Config
      • Dhall
      • EditorConfig
      • HolyC
      • JavaScript+ERB
      • Jsonnet
      • Motorola 68K Assembly
      • ObjectScript
      • Rich Text Format
      • SSH Config
      • Svelte
      • TSQL
      • TSX
      • WebVTT
      • Wollok
      • ZAP
      • ZIL
      • Zeek
      • ZenScript
      • mcfunction

    New contributors

    @lafriks

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Mar 19, 2020)

    New features

    • First release with go module support.
    • Import directly from github.com/src-d/enry/v2
    • Optional Flex-based tokenizer, same as Linguist uses. Hidden behind -tags flex, improves content classifier accuracy.

    Fixes

    • Optional oniguruma-based tokenizer now is based on new Oniguruma v6.x and produces consistent with RE2 results on all the samples from Linguist (including non-utf8).

    Full list of issues tracked under v2.0.0 milestone.

    Source code(tar.gz)
    Source code(zip)
  • v1.7.3(Mar 19, 2020)

    New Features

    • CLI application, when used in file modeenry <filename>, now includes information about file vendoring #217
    • CLI application defaults are now follow GIthub Linguist #214 Only Programming Languages and Markup files are reported -all allows for previous behaviour -prog was removed -mode=bytes is default (instead of files before)

    Fixes

    • -mode=lines/bytes produces actual results
    • unusable enry-java JAR artefact is not published on GH release any more. It's only on distributed though Maven.

    New contributors

    @SuhaibMujahid

    Source code(tar.gz)
    Source code(zip)
  • v1.7.2(Mar 19, 2020)

    New Features

    None

    Fixes

    • multiple candidates returned instead of empty slice (e.g for .h) #205
    • github.com/src-d/go-oniguruma is used now #206

    New contributors

    @kuba--

    Source code(tar.gz)
    Source code(zip)
  • v1.7.1(Mar 19, 2020)

  • v1.7.0(Mar 19, 2020)

    New Features

    #189 sync to linguist v7.2.0

    Generations of heuristics disambiguating files with the same extensions was simplified, that means

    • quality of the judgements that enry makes about language was improved
    • updates with Linguist upsteam will be done more frequently

    Summary of the upstream changes

    6 languages removed:

    • Arduino
    • KiCad Board
    • Matlab
    • PAWN
    • Sublime Text Config
    • XPM

    43 languages added:

    • AngelScript
    • Asymptote
    • Ballerina
    • Cloud Firestore Security Rules
    • CoNLL-U
    • Common Workflow Language
    • DataWeave
    • EML
    • Edje Data Collection
    • F*
    • FIGlet Font
    • Git Attributes
    • Git Config
    • Glyph Bitmap Distribution Format
    • HAProxy
    • HTML+Razor
    • HXML
    • HiveQL
    • Ignore List
    • JSON with Comments
    • Java Properties
    • KiCad Legacy Layout
    • LTspice Symbol
    • MATLAB
    • Modula-3
    • Nearley
    • Nextflow
    • Pawn
    • Pod 6
    • PostCSS
    • Quake
    • RPC
    • Roff Manpage
    • Slice
    • Solidity
    • SugarSS
    • Windows Registry Entries
    • X BitMap
    • X Font Directory Index
    • X PixMap
    • YARA
    • YASnippet
    • Zig

    Known Issues

    Although Languages and Heuristics were synced with upstream, it's not reproducing 100% of linguist yet

    • The missing parts are tracked under #155
    • Current difference is documented in https://github.com/src-d/enry/#divergences-from-linguist
    Source code(tar.gz)
    Source code(zip)
  • v1.6.8(Mar 19, 2020)

  • v1.6.7(Mar 19, 2020)

  • v1.6.6(Mar 19, 2020)

  • v1.6.5(Mar 19, 2020)

  • v1.6.4(Mar 19, 2020)

  • v1.6.3(Mar 19, 2020)

  • v1.6.2(Mar 19, 2020)

    • Use a precompiled package for osxcross for CI (#133).
    • Fix crash by checking for empty filenames (#134). Solves #129.
    • java: bump version to 1.6.2 (#135).
    Source code(tar.gz)
    Source code(zip)
  • v1.6.1(Mar 19, 2020)

Owner
go-enry
A faster file programming language detector, based on Linguist
go-enry
Abstract File Storage

afs - abstract file storage Please refer to CHANGELOG.md if you encounter breaking changes. Motivation Introduction Usage Matchers Content modifiers S

Viant, Inc 221 Dec 30, 2022
a tool for handling file uploads simple

baraka a tool for handling file uploads for http servers makes it easier to make operations with files from the http request. Contents Install Simple

Enes Furkan Olcay 46 Nov 30, 2022
Bigfile -- a file transfer system that supports http, rpc and ftp protocol https://bigfile.site

Bigfile โ€”โ€”โ€”โ€” a file transfer system that supports http, rpc and ftp protocol ็ฎ€ไฝ“ไธญๆ–‡ โˆ™ English Bigfile is a file transfer system, supports http, ftp and

null 238 Dec 31, 2022
Go file operations library chasing GNU APIs.

flop flop aims to make copying files easier in Go, and is modeled after GNU cp. Most administrators and engineers interact with GNU utilities every da

The Home Depot 33 Nov 10, 2022
Read csv file from go using tags

go-csv-tag Read csv file from Go using tags The project is in maintenance mode. It is kept compatible with changes in the Go ecosystem but no new feat

Louis 101 Nov 16, 2022
File system event notification library on steroids.

notify Filesystem event notification library on steroids. (under active development) Documentation godoc.org/github.com/rjeczalik/notify Installation

Rafal Jeczalik 788 Dec 31, 2022
Pluggable, extensible virtual file system for Go

vfs Package vfs provides a pluggable, extensible, and opinionated set of file system functionality for Go across a number of file system types such as

C2FO 212 Jan 3, 2023
An epoll(7)-based file-descriptor multiplexer.

poller Package poller is a file-descriptor multiplexer. Download: go get github.com/npat-efault/poller Package poller is a file-descriptor multiplexer

Nick Patavalis 107 Sep 25, 2022
QueryCSV enables you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to a CSV file

QueryCSV enable you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to CSV file

Mohamed Shapan 100 Dec 22, 2021
Goful is a CUI file manager written in Go.

Goful Goful is a CUI file manager written in Go. Works on cross-platform such as gnome-terminal and cmd.exe. Displays multiple windows and workspaces.

anmitsu 300 Dec 28, 2022
Read a tar file contents using go1.16 io/fs abstraction

go-tarfs Read a tar file contents using go1.16 io/fs abstraction Usage โš ๏ธ go-tarfs needs go>=1.16 Install: go get github.com/nlepage/go-tarfs Use: pac

Nicolas Lepage 20 Dec 1, 2022
Open Source Continuous File Synchronization

Goals Syncthing is a continuous file synchronization program. It synchronizes files between two or more computers. We strive to fulfill the goals belo

The Syncthing Project 48.6k Jan 9, 2023
Cross-platform file system notifications for Go.

File system notifications for Go fsnotify utilizes golang.org/x/sys rather than syscall from the standard library. Ensure you have the latest version

fsnotify 7.7k Jan 1, 2023
The best HTTP Static File Server, write with golang+vue

gohttpserver Goal: Make the best HTTP File Server. Features: Human-friendly UI, file uploading support, direct QR-code generation for Apple & Android

Sound Sun 1.9k Dec 30, 2022
Dragonfly is an intelligent P2P based image and file distribution system.

Dragonfly Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in o

dragonflyoss 6k Jan 9, 2023
Fast, dependency-free, small Go package to infer the binary file type based on the magic numbers signature

filetype Small and dependency free Go package to infer file and MIME type checking the magic numbers signature. For SVG file type checking, see go-is-

Tom 1.7k Jan 3, 2023
๐Ÿ“‚ Web File Browser

filebrowser provides a file managing interface within a specified directory and it can be used to upload, delete, preview, rename and edit your files.

File Browser 18.3k Jan 9, 2023
Plik is a scalable & friendly temporary file upload system ( wetransfer like ) in golang.

Want to chat with us ? Telegram channel : https://t.me/plik_root_gg Plik Plik is a scalable & friendly temporary file upload system ( wetransfer like

root.gg 1.1k Jan 2, 2023
File system for GitHub

HUBFS ยท File System for GitHub HUBFS is a read-only file system for GitHub and Git. Git repositories and their contents are represented as regular dir

Bill Zissimopoulos 1.6k Dec 28, 2022