A faster file programming language detector

Overview

go-enry GoDoc Test codecov

Programming language detector and toolbox to ignore binary or vendored files. enry, started as a port to Go of the original Linguist Ruby library, that has an improved 2x performance.

CLI

The CLI binary is hosted in a separate repository go-enry/enry.

Library

enry is also a Go library for guessing a programming language that exposes API through FFI to multiple programming environments.

Use cases

enry guesses a programming language using a sequence of matching strategies that are applied progressively to narrow down the possible options. Each strategy varies on the type of input data that it needs to make a decision: file name, extension, the first line of the file, the full content of the file, etc.

Depending on available input data, enry API can be roughly divided into the next categories or use cases.

By filename

Next functions require only a name of the file to make a guess:

  • GetLanguageByExtension uses only file extension (wich may be ambiguous)
  • GetLanguageByFilename useful for cases like .gitignore, .bashrc, etc
  • all filtering helpers

Please note that such guesses are expected not to be very accurate.

By text

To make a guess only based on the content of the file or a text snippet, use

  • GetLanguageByShebang reads only the first line of text to identify the shebang.

  • GetLanguageByModeline for cases when Vim/Emacs modeline e.g. /* vim: set ft=cpp: */ may be present at a head or a tail of the text.

  • GetLanguageByClassifier uses a Bayesian classifier trained on all the ./samples/ from Linguist.

    It usually is a last-resort strategy that is used to disambiguate the guess of the previous strategies, and thus it requires a list of "candidate" guesses. One can provide a list of all known languages - keys from the data.LanguagesLogProbabilities as possible candidates if more intelligent hypotheses are not available, at the price of possibly suboptimal accuracy.

By file

The most accurate guess would be one when both, the file name and the content are available:

  • GetLanguagesByContent only uses file extension and a set of regexp-based content heuristics.
  • GetLanguages uses the full set of matching strategies and is expected to be most accurate.

Filtering: vendoring, binaries, etc

enry expose a set of file-level helpers Is* to simplify filtering out the files that are less interesting for the purpose of source code analysis:

  • IsBinary
  • IsVendor
  • IsConfiguration
  • IsDocumentation
  • IsDotFile
  • IsImage
  • IsTest
  • IsGenerated

Language colors and groups

enry exposes function to get language color to use for example in presenting statistics in graphs:

  • GetColor
  • GetLanguageGroup can be used to group similar languages together e.g. for Less this function will return CSS

Languages

Go

In a Go module, import enry to the module by running:

go get github.com/go-enry/go-enry/v2

The rest of the examples will assume you have either done this or fetched the library into your GOPATH.

")) fmt.Println(lang, safe) // result: Matlab true lang, safe := enry.GetLanguageByContent("bar.m", []byte("")) fmt.Println(lang, safe) // result: Objective-C true // all strategies together lang := enry.GetLanguage("foo.cpp", []byte("")) // result: C++ true ">
// The examples here and below assume you have imported the library.
import "github.com/go-enry/go-enry/v2"

lang, safe := enry.GetLanguageByExtension("foo.go")
fmt.Println(lang, safe)
// result: Go true

lang, safe := enry.GetLanguageByContent("foo.m", []byte(""))
fmt.Println(lang, safe)
// result: Matlab true

lang, safe := enry.GetLanguageByContent("bar.m", []byte(""))
fmt.Println(lang, safe)
// result: Objective-C true

// all strategies together
lang := enry.GetLanguage("foo.cpp", []byte(""))
// result: C++ true

Note that the returned boolean value safe is true if there is only one possible language detected.

A plural version of the same API allows getting a list of all possible languages for a given file.

")) // result: []string{"C", "C++", "Objective-C} langs := enry.GetLanguagesByExtension("foo.asc", []byte(""), nil) // result: []string{"AGS Script", "AsciiDoc", "Public Key"} langs := enry.GetLanguagesByFilename("Gemfile", []byte(""), []string{}) // result: []string{"Ruby"} ">
langs := enry.GetLanguages("foo.h",  []byte(""))
// result: []string{"C", "C++", "Objective-C}

langs := enry.GetLanguagesByExtension("foo.asc", []byte(""), nil)
// result: []string{"AGS Script", "AsciiDoc", "Public Key"}

langs := enry.GetLanguagesByFilename("Gemfile", []byte(""), []string{})
// result: []string{"Ruby"}

Java bindings

Generated Java bindings using a C shared library and JNI are available under java.

A library is published on Maven as tech.sourced:enry-java for macOS and linux platforms. Windows support is planned under src-d/enry#150.

Python bindings

Generated Python bindings using a C shared library and cffi are WIP under src-d/enry#154.

A library is going to be published on pypi as enry for macOS and linux platforms. Windows support is planned under src-d/enry#150.

Rust bindings

Generated Rust bindings using a C static library are available at https://github.com/go-enry/rs-enry.

Divergences from Linguist

The enry library is based on the data from github/linguist version v7.14.0.

Parsing linguist/samples the following enry results are different from the Linguist:

In all the cases above that have an issue number - we plan to update enry to match Linguist behavior.

Benchmarks

Enry's language detection has been compared with Linguist's on linguist/samples.

We got these results:

histogram

The histogram shows the number of files (y-axis) per time interval bucket (x-axis). Most of the files were detected faster by enry.

There are several cases where enry is slower than Linguist due to Go regexp engine being slower than Ruby's on, wich is based on oniguruma library, written in C.

See instructions for running enry with oniguruma.

Why Enry?

In the movie My Fair Lady, Professor Henry Higgins is a linguist who at the very beginning of the movie enjoys guessing the origin of people based on their accent.

"Enry Iggins" is how Eliza Doolittle, pronounces the name of the Professor.

Development

To run the tests use:

go test ./...

Setting ENRY_TEST_REPO to the path to existing checkout of Linguist will avoid cloning it and sepeed tests up. Setting ENRY_DEBUG=1 will provide insight in the Bayesian classifier building done by make code-generate.

Sync with github/linguist upstream

enry re-uses parts of the original github/linguist to generate internal data structures. In order to update to the latest release of linguist do:

$ git clone https://github.com/github/linguist.git .linguist
$ cd .linguist; git checkout <release-tag>; cd ..

# put the new release's commit sha in the generator_test.go (to re-generate .gold test fixtures)
# https://github.com/go-enry/go-enry/blob/13d3d66d37a87f23a013246a1b0678c9ee3d524b/internal/code-generator/generator/generator_test.go#L18

$ make code-generate

To stay in sync, enry needs to be updated when a new release of the linguist includes changes to any of the following files:

There is no automation for detecting the changes in the linguist project, so this process above has to be done manually from time to time.

When submitting a pull request syncing up to a new release, please make sure it only contains the changes in the generated files (in data subdirectory).

Separating all the necessary "manual" code changes to a different PR that includes some background description and an update to the documentation on "divergences from linguist" is very much appreciated as it simplifies the maintenance (review/release notes/etc).

Misc

Running a benchmark & faster regexp engine

Benchmark

All benchmark scripts are in benchmarks directory.

Dependencies

As benchmarks depend on Ruby and Github-Linguist gem make sure you have:

  • Ruby (e.g using rbenv), bundler installed
  • Docker
  • native dependencies installed
  • Build the gem cd .linguist && bundle install && rake build_gem && cd -
  • Install it gem install --no-rdoc --no-ri --local .linguist/github-linguist-*.gem

Quick benchmark

To run quicker benchmarks

make benchmarks

to get average times for the primary detection function and strategies for the whole samples set. If you want to see measures per sample file use:

make benchmarks-samples

Full benchmark

If you want to reproduce the same benchmarks as reported above:

  • Make sure all dependencies are installed
  • Install gnuplot (in order to plot the histogram)
  • Run ENRY_TEST_REPO="$PWD/.linguist" benchmarks/run.sh (takes ~15h)

It will run the benchmarks for enry and Linguist, parse the output, create csv files and plot the histogram.

Faster regexp engine (optional)

Oniguruma is CRuby's regular expression engine. It is very fast and performs better than the one built into Go runtime. enry supports swapping between those two engines thanks to rubex project. The typical overall speedup from using Oniguruma is 1.5-2x. However, it requires CGo and the external shared library. On macOS with Homebrew, it is:

brew install oniguruma

On Ubuntu, it is

sudo apt install libonig-dev

To build enry with Oniguruma regexps use the oniguruma build tag

go get -v -t --tags oniguruma ./...

and then rebuild the project.

License

Apache License, Version 2.0. See LICENSE

Issues
  • Code generator Win support

    Code generator Win support

    Fixes #4

    On Win make code-generate produces unreasonable Bayesian classifier weights from Linguist samples silently, failing only the final classification tests.

    TestPlan:

    • passing tests on Win CI
      go test ./internal/code-generator/... \
       -run Test_GeneratorTestSuite -testify.m TestGenerationFiles
      
    opened by bzz 9
  • Expose IsTest and GetLangaugeType methods  & Fix test cases for Java Bindings

    Expose IsTest and GetLangaugeType methods & Fix test cases for Java Bindings

    Purpsoe

    [Fixes]:

    1. Correct failing test cases for Java Bindings so that make test command under Java does not fail.
    2. Expose isTest method in java bindings.

    [Features] :

    1. Export new function GetLanguageType at enry package in go & expose the same at Java Bindings.
    opened by UtsavChokshiCNU 5
  • Is there a prebuilt shared library?

    Is there a prebuilt shared library?

    I couldn't find any way in this repo on how to get the shared library so it can be used from other languages.

    So is there a prebuilt one somewhere or some instructions on how to build your own?

    opened by CodeMyst 5
  • Linguist update automation opens multiple PRs

    Linguist update automation opens multiple PRs

    The Linguist update automation runs once a day, so if the generated PR isn't merged by then, it will open another one!

    https://github.com/go-enry/go-enry/pull/68 #70

    help wanted 
    opened by look 4
  • Expose `LanguageInfo` with all Linguist data

    Expose `LanguageInfo` with all Linguist data

    As discussed in https://github.com/go-enry/go-enry/issues/54, this provides an API for accessing a LanguageInfo struct which is populated with all the data from the Linguist YAML source file. Functions are provided to access the LanguageInfo by name or ID.

    The other top-level functions like GetLanguageExtensions, GetLanguageGroup, etc. could in principle be implemented using this structure, which would simplify the code generation. But that would be a big change so I didn't do any of that. Perhaps in the next major version something like that would make sense.

    cc @tclem

    Closes https://github.com/go-enry/go-enry/issues/54

    opened by look 4
  • Python: API to expose highest-level enry.GetLanguage

    Python: API to expose highest-level enry.GetLanguage

    This is a blueprint for all other methods, dealing with go slice conversion.

    It still lacks on build automation (and there is no release automation whatsoever), but this is already useful.

    opened by bzz 4
  • data: replace substring package with regex package

    data: replace substring package with regex package

    This PR remote the old substring package from @toqueteos (sorry dude) and use the internal regex package to use oniguruma regexp with all the regular expressions.

    opened by mcuadros 4
  • Use a deterministic branch name for Linguist updates

    Use a deterministic branch name for Linguist updates

    Rather than creating the branch for the update PR ahead of time using the date, this changes it to use the short hash of the Linguist commit that was found, and updates the code so that if the branch already exists, it will exit without creating a PR.

    This branch name should be the same between runs of the workflow (unless the Linguist release tag is changed, which warrants another update anyway) and should address the problem of creating one PR a day until the update is merged.

    You can see an example of a PR created by this code here: https://github.com/look/go-enry/pull/8 (note the branch name)

    closes https://github.com/go-enry/go-enry/issues/69

    cc @bzz @lafriks

    opened by look 3
  • GitHub Actions workflow to automatically update Linguist

    GitHub Actions workflow to automatically update Linguist

    This adds a GitHub Actions workflow that performs the steps necessary to update Linguist to the latest release tag and creates a PR.

    You can see an example of such a PR in my fork (it also includes the commits from this PR, but that won't happen in your repo once this is merged).

    I've included a workflow_dispatch trigger with an override for the Linguist tag, so you can test it out. In order to test it, the workflow file needs to exist on master, but once it does you can pick a branch and run the workflow from there. The Linguist tag is optional, but allows you to test updating the generated code when Linguist hasn't changed.

    Screen Shot 2021-10-08 at 4 17 38 PM

    I also included a schedule to trigger the workflow once a day. That will start running once the workflow is on master, but I haven't been able to test if it works yet. If not, likely it will require a small tweak to add an explicit secret.

    Hopefully this reduces the maintenance overhead of keeping go-enry up-to-date with Linguist releases! ๐Ÿ™‡

    Closes #51

    opened by look 3
  • Support tm_scope and other fields from languages.yml

    Support tm_scope and other fields from languages.yml

    I'd like to know the tm_scope value for a given language. For example, if Go was detected as the programming language, the value would be source.go. Even better, it would be great for GetLanguage and friends (or some variant) to return a struct with all the information linguist stores with regard to a given language. It would also be fine to do this as secondary step:

    langName := enry.GetLanguage("foo.cpp", []byte("<cpp-code>"))
    lang := enry.LookupLanguage(langName)
    // where lang is a instance of struct like
    type Language struct {
      Name string
      Type string
      Aliases []string
      Extensions []string
      TmScope string
      ... etc
    }
    
    opened by tclem 3
  • Inconsistent with linguist

    Inconsistent with linguist "languages.yml"

    Hi! I ran enry.get_language from latest Python bindings for the file https://github.com/vim-scripts/gtags.vim/blob/master/plugin/gtags.vim The output is "Vim script" while languages.yml from github/linguist have "Vim Script" https://github.com/github/linguist/blob/master/lib/linguist/languages.yml Can you please fix it?

    opened by DimaProskurin 2
  • Backport generated file detection changes

    Backport generated file detection changes

    https://github.com/github/linguist/blob/97bc889ce840208652bf09b45f3b7859de43fe8e/lib/linguist/generated.rb#L329 and https://github.com/github/linguist/blob/97bc889ce840208652bf09b45f3b7859de43fe8e/lib/linguist/generated.rb#L427

    Originally posted by @lafriks in https://github.com/go-enry/go-enry/issues/81#issuecomment-1048834841

    opened by lafriks 0
  • JNAerator throws exceptions in Docker for Apple Silicon

    JNAerator throws exceptions in Docker for Apple Silicon

    I've found out that this project uses JNAerator (https://github.com/nativelibs4java/JNAerator) and JNAerator does not seem to be maintained anymore, being Sep 30, 2015 its last commit date.

    The JNAerator in this project uses JNA version 4.1.0 and this version does not provide for 'linux-aarch64'. (4.2.0 and above does.)

    I am using go-enry in my project and it gives out the following error in Docker for Apple Silicon. java.lang.UnsatisfiedLinkError: Native library (com/sun/jna/linux-aarch64/libjnidispatch.so) not found in resource path

    I hope there's a way to fix this!

    help wanted 
    opened by citygxoxo 1
  • support for incompatible heuristics using oniguruma

    support for incompatible heuristics using oniguruma

    This PR enables the support of all the non-support heuristics due to the go regexp engine.

    • Exposes the original regular expressions to the regex package.
    • The regex package now handles the conversions of the Ruby regular expression to the go-ish version.
    • The heuristics rules now support nil regular expressions since some of the heuristics can't compile using the standard library.

    I added some tests using the linguist fixtures and passes all the fixtures (using onigurama).

    opened by mcuadros 5
  • Python bindings are memory leaking

    Python bindings are memory leaking

    All python wrappers are memory leaking.

    This may happen in several places:

    1. In CFFI - layer (when structures are converted between Python and Go runtimes)
    2. In CGo shared library (when C structures are converted in Go instances)
    3. Somewhere in enry library (nearly impossible, because all wrappers are leaking)

    You can reproduce it using this test script:

    import os
    
    import enry
    import psutil
    
    process = psutil.Process(os.getpid())
    
    content = "import os\nprint('Hello, world!')".encode()
    path = "test.py"
    initial_usage = process.memory_info().rss
    
    for _ in range(10000):
        enry.get_language(path, content)
    
    print(round(process.memory_info().rss / initial_usage * 100, 2))
    

    Here is some info about memory usage of get_language function (in % from initial ram usage), you can see that it's leaking: 1000 iterations: ~100.5 10000 iterations: ~102.4 100000 iterations: ~111.1

    bug 
    opened by vsmaxim 2
  • synchronized in getLanguage function in Java library

    synchronized in getLanguage function in Java library

    Hello,

    A lot of thanks for providing such a great project! I'm currently using enry to detect code language in my Java program and it works well,

    I have a question that, the getLanguage function is synchronized, which will definitely slow the performance in multi-thread scenario. Is it possible to remove the lock? or maybe I can call private EnryLibrary's function with reflection? @mcuadros

    Best Regards, Jianquan. Ye

    enhancement 
    opened by yejianquan 1
Releases(v2.8.2)
  • v2.8.2(Apr 11, 2022)

    What's Changed

    • Update Linguist to v7.20.0 by ๐Ÿค– in https://github.com/go-enry/go-enry/pull/124
    • Use a deterministic branch name for Linguist updates by @look fixing #69

    Full Changelog: https://github.com/go-enry/go-enry/compare/v2.8.1...v2.8.2

    Source code(tar.gz)
    Source code(zip)
  • v2.8.1(Apr 11, 2022)

    What's Changed

    • Update Linguist to v7.19.0 by ๐Ÿค– in https://github.com/go-enry/go-enry/pull/95
    • A check for non-backtracking subexpressions added to the list of invalid regexes by @lafriks in https://github.com/go-enry/go-enry/pull/118
    • poetry.lock is detected as generated by @silverwind in https://github.com/go-enry/go-enry/pull/112
    • Java Bindings: expose .IsTest and .GetLangaugeType methods by @UtsavChokshiCNU in https://github.com/go-enry/go-enry/pull/80

    New Contributors

    • @silverwind made their first contribution in https://github.com/go-enry/go-enry/pull/113
    • @UtsavChokshiCNU made their first contribution in https://github.com/go-enry/go-enry/pull/80

    Full Changelog: https://github.com/go-enry/go-enry/compare/v2.8.0...v2.8.1

    Source code(tar.gz)
    Source code(zip)
  • v2.8.0(Nov 17, 2021)

    What's Changed

    • Expose LanguageInfo with all Linguist data by @look in https://github.com/go-enry/go-enry/pull/62
    • GitHub Actions workflow to automatically update Linguist version by @look in https://github.com/go-enry/go-enry/pull/61 #72
    • Test robustness w.r.t upstream language renames by @bzz in https://github.com/go-enry/go-enry/pull/67
    • Update Linguist to v7.17.0 (release notes) by @github-actions in https://github.com/go-enry/go-enry/pull/66

    New Contributors

    • @github-actions ๐Ÿค– made their first contribution ๐ŸŽ‰ in https://github.com/go-enry/go-enry/pull/66

    Full Changelog: https://github.com/go-enry/go-enry/compare/v2.7.2...v2.8.0

    Source code(tar.gz)
    Source code(zip)
  • v2.7.2(Sep 26, 2021)

    New Features

    • sync with the latest github/linguist v7.16.1 #60
    • improved GetLanguagesByShebang accuracy #56 , #58

    Infra

    • CI runs the latest go 1.16.x, 1.17.x releases #59

    Contributors

    • @lafriks
    • @rykov

    All the changes in v2.7.2 release

    Source code(tar.gz)
    Source code(zip)
  • v2.7.1(Jun 18, 2021)

    New Features

    • sync with the latest github/linguist v7.14.0 #52

    Infra

    • CI runs the latest go 1.15.x, 1.16.x releases #53

    Contributors

    • @look
    • @mcuadros
    Source code(tar.gz)
    Source code(zip)
  • v2.7.0(Apr 24, 2021)

    New Features

    • New GetLanguageID API introduced, to expose stable numerical IDs for all the languages #46
    • Rust bindings are available now at https://github.com/go-enry/rs-enry

    Fixes

    • GetLanguages now behaves exactly like Linguist.detect, resolving a long-standing src-d/enry#207 ๐ŸŽ‰ #47
    • IsVendor is optimized #44, benchmarked and tested better #45

    Contributors

    • @look
    • @6543
    • @zeripath
    • @vsmaxim
    Source code(tar.gz)
    Source code(zip)
  • v2.6.1(Mar 12, 2021)

  • v2.6.0(Dec 3, 2020)

    New Features

    • sync with the latest github/linguist 7.12.1 #39 (\wo native tokenizer update for -tags flex)
    • new GetLanguagesByXML strategy detecting XML by first 2 lines of file content #40
    • new GetLanguagesByManpage strategy detecting Roff manpages by filename #39
    • Python bindings are auto-generated now and have test (not part of the release yet, see #31, and known to leak memory a little bit, see #36 ) #29

    Fixes

    Contributors

    • @vsmaxim
    • @lafriks
    • @villelaitila
    Source code(tar.gz)
    Source code(zip)
  • v2.5.2(May 29, 2020)

  • v2.5.1(May 29, 2020)

  • v2.5.0(May 29, 2020)

  • v2.4.1(May 6, 2020)

  • v2.4.0(Apr 16, 2020)

  • v2.3.0(Mar 31, 2020)

    New Features

    • sync to the latest github/linguist v7.9.0 #3
    • getting HTML colors for languages falls back on a group default #2
    • Windows support ๐ŸŽ‰ for building and running go library (\wo oniguruma) #4
    • new CI for all 3 platforms, based on Github Actions
    • this and some further releases will not include Java library published yet, until #6 is resolved

    Fixes

    • 33 new languages added, 2 reworked (Perl 6 -> Raku, Visual Basic -> VBA, VBScript)
    • 13 content-based heuristics were improved and new disambiguations were added (.v for Coq, V or Verlilogl; .s for Motorola assembly; .plist for XML, OpenStep; .odin, .p Gnuplot, OpenEdge ABL)
    • BUCK, BUILD, BUILD.bazel and WORKSPACE of Bazel/Pants are all now recognized as Starlark, instead of general Python, as before
    • new well-known filenames recognized: .dircolors as GNU dircolor, .inputrc as readline, .curlrc as cURL config, .npmrc as NPM config, troffrc as Roff, yarn.lock as YAML.
    • vending detection improvements:dotnet-install is ignored, as well as .yarn/releases

    Contributors

    • @lafriks
    • @bzz
    • @mcuadros
    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Mar 19, 2020)

  • v2.1.0(Mar 19, 2020)

    New Features

    • sync to the latest github/linguist v7.5.1
    • a new API call for getting HTML colors for languages enry.GetColor(language) Generated from linguist languages.yml

    Fixes

    • content-based heuristics improved and new disambiguations were added (.vba for Vim script, .sql for TSQL, GraphQL)
    • shebang-based heuristic ignore osascript -l that can be non-interpretable language
    • vending detection improvements:testdata is ignored as Go fixtures, bulma.css as well
    • 21 new languages added (1 removed: Bro)
      • Altium Designer
      • Cabal Config
      • Dhall
      • EditorConfig
      • HolyC
      • JavaScript+ERB
      • Jsonnet
      • Motorola 68K Assembly
      • ObjectScript
      • Rich Text Format
      • SSH Config
      • Svelte
      • TSQL
      • TSX
      • WebVTT
      • Wollok
      • ZAP
      • ZIL
      • Zeek
      • ZenScript
      • mcfunction

    New contributors

    @lafriks

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Mar 19, 2020)

    New features

    • First release with go module support.
    • Import directly from github.com/src-d/enry/v2
    • Optional Flex-based tokenizer, same as Linguist uses. Hidden behind -tags flex, improves content classifier accuracy.

    Fixes

    • Optional oniguruma-based tokenizer now is based on new Oniguruma v6.x and produces consistent with RE2 results on all the samples from Linguist (including non-utf8).

    Full list of issues tracked under v2.0.0 milestone.

    Source code(tar.gz)
    Source code(zip)
  • v1.7.3(Mar 19, 2020)

    New Features

    • CLI application, when used in file modeenry <filename>, now includes information about file vendoring #217
    • CLI application defaults are now follow GIthub Linguist #214 Only Programming Languages and Markup files are reported -all allows for previous behaviour -prog was removed -mode=bytes is default (instead of files before)

    Fixes

    • -mode=lines/bytes produces actual results
    • unusable enry-java JAR artefact is not published on GH release any more. It's only on distributed though Maven.

    New contributors

    @SuhaibMujahid

    Source code(tar.gz)
    Source code(zip)
  • v1.7.2(Mar 19, 2020)

    New Features

    None

    Fixes

    • multiple candidates returned instead of empty slice (e.g for .h) #205
    • github.com/src-d/go-oniguruma is used now #206

    New contributors

    @kuba--

    Source code(tar.gz)
    Source code(zip)
  • v1.7.1(Mar 19, 2020)

  • v1.7.0(Mar 19, 2020)

    New Features

    #189 sync to linguist v7.2.0

    Generations of heuristics disambiguating files with the same extensions was simplified, that means

    • quality of the judgements that enry makes about language was improved
    • updates with Linguist upsteam will be done more frequently

    Summary of the upstream changes

    6 languages removed:

    • Arduino
    • KiCad Board
    • Matlab
    • PAWN
    • Sublime Text Config
    • XPM

    43 languages added:

    • AngelScript
    • Asymptote
    • Ballerina
    • Cloud Firestore Security Rules
    • CoNLL-U
    • Common Workflow Language
    • DataWeave
    • EML
    • Edje Data Collection
    • F*
    • FIGlet Font
    • Git Attributes
    • Git Config
    • Glyph Bitmap Distribution Format
    • HAProxy
    • HTML+Razor
    • HXML
    • HiveQL
    • Ignore List
    • JSON with Comments
    • Java Properties
    • KiCad Legacy Layout
    • LTspice Symbol
    • MATLAB
    • Modula-3
    • Nearley
    • Nextflow
    • Pawn
    • Pod 6
    • PostCSS
    • Quake
    • RPC
    • Roff Manpage
    • Slice
    • Solidity
    • SugarSS
    • Windows Registry Entries
    • X BitMap
    • X Font Directory Index
    • X PixMap
    • YARA
    • YASnippet
    • Zig

    Known Issues

    Although Languages and Heuristics were synced with upstream, it's not reproducing 100% of linguist yet

    • The missing parts are tracked under #155
    • Current difference is documented in https://github.com/src-d/enry/#divergences-from-linguist
    Source code(tar.gz)
    Source code(zip)
  • v1.6.8(Mar 19, 2020)

  • v1.6.7(Mar 19, 2020)

  • v1.6.6(Mar 19, 2020)

  • v1.6.5(Mar 19, 2020)

  • v1.6.4(Mar 19, 2020)

  • v1.6.3(Mar 19, 2020)

  • v1.6.2(Mar 19, 2020)

    • Use a precompiled package for osxcross for CI (#133).
    • Fix crash by checking for empty filenames (#134). Solves #129.
    • java: bump version to 1.6.2 (#135).
    Source code(tar.gz)
    Source code(zip)
  • v1.6.1(Mar 19, 2020)

  • v1.6.0(Mar 19, 2020)

    • Use rubex for faster regular expressions (#113).
    • Add the external test linguist dir from env var (#114).
    • make IsDotFile do not treat '.' as true (#127).
    • legal: Add DCO (#128).
    • Fixing getHeaderAndFooter issues (#130). Solves #129.
    Source code(tar.gz)
    Source code(zip)
Owner
go-enry
A faster file programming language detector, based on Linguist
go-enry
Abstract File Storage

afs - abstract file storage Please refer to CHANGELOG.md if you encounter breaking changes. Motivation Introduction Usage Matchers Content modifiers S

Viant, Inc 186 Jun 19, 2022
a tool for handling file uploads simple

baraka a tool for handling file uploads for http servers makes it easier to make operations with files from the http request. Contents Install Simple

Enes Furkan Olcay 43 Jun 4, 2022
Bigfile -- a file transfer system that supports http, rpc and ftp protocol https://bigfile.site

Bigfile โ€”โ€”โ€”โ€” a file transfer system that supports http, rpc and ftp protocol ็ฎ€ไฝ“ไธญๆ–‡ โˆ™ English Bigfile is a file transfer system, supports http, ftp and

null 225 Jun 13, 2022
Go file operations library chasing GNU APIs.

flop flop aims to make copying files easier in Go, and is modeled after GNU cp. Most administrators and engineers interact with GNU utilities every da

The Home Depot 31 Feb 10, 2022
Read csv file from go using tags

go-csv-tag Read csv file from Go using tags The project is in maintenance mode. It is kept compatible with changes in the Go ecosystem but no new feat

Louis 94 Apr 9, 2022
File system event notification library on steroids.

notify Filesystem event notification library on steroids. (under active development) Documentation godoc.org/github.com/rjeczalik/notify Installation

Rafal Jeczalik 743 Jun 12, 2022
Pluggable, extensible virtual file system for Go

vfs Package vfs provides a pluggable, extensible, and opinionated set of file system functionality for Go across a number of file system types such as

C2FO 174 Jun 17, 2022
An epoll(7)-based file-descriptor multiplexer.

poller Package poller is a file-descriptor multiplexer. Download: go get github.com/npat-efault/poller Package poller is a file-descriptor multiplexer

Nick Patavalis 105 Apr 5, 2022
QueryCSV enables you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to a CSV file

QueryCSV enable you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to CSV file

Mohamed Shapan 100 Dec 22, 2021
Goful is a CUI file manager written in Go.

Goful Goful is a CUI file manager written in Go. Works on cross-platform such as gnome-terminal and cmd.exe. Displays multiple windows and workspaces.

anmitsu 279 Jun 20, 2022
Read a tar file contents using go1.16 io/fs abstraction

go-tarfs Read a tar file contents using go1.16 io/fs abstraction Usage โš ๏ธ go-tarfs needs go>=1.16 Install: go get github.com/nlepage/go-tarfs Use: pac

Nicolas Lepage 17 Mar 26, 2022
Open Source Continuous File Synchronization

Goals Syncthing is a continuous file synchronization program. It synchronizes files between two or more computers. We strive to fulfill the goals belo

The Syncthing Project 45.4k Jun 26, 2022
Cross-platform file system notifications for Go.

File system notifications for Go fsnotify utilizes golang.org/x/sys rather than syscall from the standard library. Ensure you have the latest version

fsnotify 7k Jun 22, 2022
The best HTTP Static File Server, write with golang+vue

gohttpserver Goal: Make the best HTTP File Server. Features: Human-friendly UI, file uploading support, direct QR-code generation for Apple & Android

Sound Sun 1.7k Jun 28, 2022
Dragonfly is an intelligent P2P based image and file distribution system.

Dragonfly Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in o

dragonflyoss 5.8k Jun 18, 2022
Fast, dependency-free, small Go package to infer the binary file type based on the magic numbers signature

filetype Small and dependency free Go package to infer file and MIME type checking the magic numbers signature. For SVG file type checking, see go-is-

Tom 1.5k Jun 23, 2022
๐Ÿ“‚ Web File Browser

filebrowser provides a file managing interface within a specified directory and it can be used to upload, delete, preview, rename and edit your files.

File Browser 16.3k Jun 27, 2022
Plik is a scalable & friendly temporary file upload system ( wetransfer like ) in golang.

Want to chat with us ? Telegram channel : https://t.me/plik_root_gg Plik Plik is a scalable & friendly temporary file upload system ( wetransfer like

root.gg 1k Jun 28, 2022
File system for GitHub

HUBFS ยท File System for GitHub HUBFS is a read-only file system for GitHub and Git. Git repositories and their contents are represented as regular dir

Bill Zissimopoulos 1.5k Jun 25, 2022