SQL interface to git repositories, written in Go. https://docs.sourced.tech/gitbase

Overview

gitbase GitHub version Build Status codecov GoDoc Go Report Card

gitbase, is a SQL database interface to Git repositories.

This project is now part of source{d} Community Edition, which provides the simplest way to get started with a single command. Visit https://docs.sourced.tech/community-edition for more information.

It can be used to perform SQL queries about the Git history and about the Universal AST of the code itself. gitbase is being built to work on top of any number of git repositories.

gitbase implements the MySQL wire protocol, it can be accessed using any MySQL client or library from any language.

src-d/go-mysql-server is the SQL engine implementation used by gitbase.

Status

The project is currently in alpha stage, meaning it's still lacking performance in a number of cases but we are working hard on getting a performant system able to process thousands of repositories in a single node. Stay tuned!

Examples

You can see some query examples in gitbase documentation.

Motivation and scope

gitbase was born to ease the analysis of git repositories and their source code.

Also, making it MySQL compatible, we provide the maximum compatibility between languages and existing tools.

It comes as a single self-contained binary and it can be used as a standalone service. The service is able to process local repositories and integrates with existing tools and frameworks to simplify source code analysis on a large scale. The integration with Apache Spark is planned and is currently under active development.

Further reading

From here, you can directly go to getting started.

License

Apache License Version 2.0, see LICENSE

Issues
  • In memory caching lead to crash

    In memory caching lead to crash

    Issue

    In the context of doing topic modeling experiments, @m09 and myself tried to use Gitbase to parse all blobs in tagged references of a given repository, in order to extract all identifiers, comments and literals. However, we have not been able to successfully use Gitbase to do so, and have had to switch to doing the parsing client side.

    The reason for that is that, when querying Gitbase, we see the following behavior:

    1. An increase in memory usage.
    2. No decrease after time goes by.
    3. When all available memory is consumed, an increase in block I/O and a quasi stagnation of the memory consumed by Gitbase at 99.999 ... %, indicating heavy use of Swap memory.
    4. Server crash if the query goes on for too long past that point.

    We still see the same behavior when retrieving only the blob contents from Gitbase, however the memory consumed is not an issue, as it is much less then when parsing UASTs. We have inferred that there was some caching going one, and after talking about the issue on the dev-processing channel, we tried to disable the caching - however it changed nothing. Javi told us that the caching we had disabled was for go-git cache, so it is probably something else.

    What we don't understand is why we cannot get rid of the behavior, i.e. why once a blob has been parsed and returned client side it seemingly remains in memory.

    Steps to reproduce

    Launch gitbase and babelfish containers:

    docker run -d --rm --name bblfshd --privileged -p 9432:9432 -m 4g bblfsh/bblfshd:v2.14.0-drivers
    docker run -d --rm --name gitbase -p 3306:3306 --link bblfshd:bblfshd -e BBLFSH_ENDPOINT=bblfshd:9432 -m 2g -v /path/to/repos:/opt/repos srcd/gitbase:latest
    

    With /path/to/repos pointing to a repository, for instance pytorch. Then, open two more terminals to monitor what's happening with docker stats, and run queries like this one for example for pytorch, using for example the mySQL client:

       SELECT
            cf.file_path,
            cf.blob_hash,
            LANGUAGE(cf.file_path) as lang,
            uast_extract(uast(f.blob_content, LANGUAGE(cf.file_path), '//uast:String'), "Value")
        FROM repositories r
            NATURAL JOIN refs rf
            NATURAL JOIN commit_files cf
            NATURAL JOIN files f
        WHERE r.repository_id = 'pytorch'
            AND is_tag(rf.ref_name)
            AND lang ='Python'
    

    You should see the memory usage of the gitbase container increase sharply until hitting 2 GB, then a heavy increase in BLOCK I/O, and finally the container will crash.

    question blocked research 
    opened by r0mainK 20
  • Empty results on a seemingly correct query

    Empty results on a seemingly correct query

    My goal: only see files from the ~equivelant~ of HEAD of PGA siva files (PGA's head is fuzzy, I know).

    Download the same dataset:

    pga list -l java -f json | head -n 100 | jq -r '.sivaFilenames[]' | pga get -i -o repositories
    

    Index creation is very fast (there are about 125k rows in refs on my 185 siva repos):

    CREATE INDEX refs_name_substr ON refs USING pilosalib (SUBSTRING(refs.ref_name,1,15));
    

    My query (runs for 35 seconds and then returns empty results):

    SELECT 
        files.repository_id,
        files.file_path
    FROM files
    NATURAL JOIN commit_files
    NATURAL JOIN commits
    NATURAL JOIN refs
    WHERE 
        SUBSTRING(refs.ref_name,1,15) = 'refs/heads/HEAD';
    

    See if results are returned on my WHERE clause:

     SELECT * FROM refs WHERE SUBSTRING(refs.ref_name,1,15) = 'refs/heads/HEAD';
    +---------------+----------+-------------+
    | repository_id | ref_name | commit_hash |
    +---------------+----------+-------------+
    | /home/mthek/projects/demo-vt/repositories/siva/latest/03a0faf87e411ee894be474ac0ebd8e48652df69.siva | refs/heads/HEAD/01612921-7835-16ee-b6a3-e3381810c049 | 7824ae7845d63d5dfae4165f75b14f71d476248f |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/03a0faf87e411ee894be474ac0ebd8e48652df69.siva | refs/heads/HEAD/016129f9-edb3-15eb-2d16-e7dac4cd41f6 | 7824ae7845d63d5dfae4165f75b14f71d476248f |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/0573f918d9b0822e9ce30b8a8f8a92bbab17300f.siva | refs/heads/HEAD/01612921-765b-ff7c-a5f3-2e12701794fc | 71afe993bd14ee3232caf92b64c05b8514235890 |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/0573f918d9b0822e9ce30b8a8f8a92bbab17300f.siva | refs/heads/HEAD/016129f9-ebd0-630f-2ae8-8ab9d76198ca | 71afe993bd14ee3232caf92b64c05b8514235890 |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/09eccb718faf3ac3d2ac08eeb3deb3d5a403d5fa.siva | refs/heads/HEAD/01612921-7787-3bdf-bbce-e4e525a410ab | c3c7b957295cb8b7d61acf53060bddff4a317505 |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/09eccb718faf3ac3d2ac08eeb3deb3d5a403d5fa.siva | refs/heads/HEAD/016129f9-ed12-034f-948f-d7c3a78a727e | c3c7b957295cb8b7d61acf53060bddff4a317505 |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/09eccb718faf3ac3d2ac08eeb3deb3d5a403d5fa.siva | refs/heads/HEAD/016129fa-1dd1-944c-44d4-344b03342aad | 0119b6f175a57d57501e4e94ba5f9eafe32a9359 |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/0a78a10ff25754b510a2423fb40a00cb02f1a44d.siva | refs/heads/HEAD/01612921-76b0-8517-7dd5-1a6745a234e0 | 0a979d145683b62eff62796acbc21ac8766088a0 |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/0a78a10ff25754b510a2423fb40a00cb02f1a44d.siva | refs/heads/HEAD/016129f9-ec2f-0c49-2bed-25f216edf2c3 | 0a979d145683b62eff62796acbc21ac8766088a0 |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/0a78a10ff25754b510a2423fb40a00cb02f1a44d.siva | refs/heads/HEAD/016129fb-9cf2-2fc6-1654-045352989fd1 | 5c1da606814d97eaba28d4b7206d126bc23627b3 |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/0a78a10ff25754b510a2423fb40a00cb02f1a44d.siva | refs/heads/HEAD/016129fd-c3a2-140d-41ab-fca5cc71f6cf | 0a979d145683b62eff62796acbc21ac8766088a0 |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/0a78a10ff25754b510a2423fb40a00cb02f1a44d.siva | refs/heads/HEAD/01612a00-10e8-3ff7-896f-a970071eecbc | 5c1da606814d97eaba28d4b7206d126bc23627b3 |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/0be86af052a0368c65020a248d5d13efd1ec74f9.siva | refs/heads/HEAD/016129f9-eb5d-9c72-652b-9d6fa13ad169 | e73dab44213ffa76b3d6a853aecb109929c3e2b5 |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/0be86af052a0368c65020a248d5d13efd1ec74f9.siva | refs/heads/HEAD/016129fe-9027-203d-aeb5-116fcb9bbf95 | e73dab44213ffa76b3d6a853aecb109929c3e2b5 |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/0c3197aa444d192843ccb62b8eb80b04eddd2322.siva | refs/heads/HEAD/016129fb-03d5-c928-c444-da8daa361e21 | a6c0d95184c8985423331fe916edee59378f61fe |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/0e06624a50bd6d3c46c611c24c8a419e995ad81b.siva | refs/heads/HEAD/01612921-7654-a83e-9a25-bbfb30d9d9ef | 21df6c124e90c6312301bf4fdd61ae98c5486109 |
    | /home/mthek/projects/demo-vt/repositories/siva/latest/0e06624a50bd6d3c46c611c24c8a419e995ad81b.siva | refs/heads/HEAD/016129f9-ebc8-ec4b-12c6-ef80c5d902c9 | 21df6c124e90c6312301bf4fdd61ae98c5486109 |
    +---------------+----------+-------------+
    17 rows in set (0.04 sec)
    

    (number of rows seems low but that is probably because I need to check another pattern for head or master, but that isn't relevant to this bug)

    Extra:

    I went and checked this query on our staging environment with gitbase-playground, and it works perfectly:

    SELECT 
        files.repository_id,
        files.file_path
    FROM files
    NATURAL JOIN commit_files
    NATURAL JOIN commits
    NATURAL JOIN refs
    WHERE refs.ref_name = 'HEAD'
    
    bug 
    opened by eiso 17
  • panic: runtime error: invalid memory address or nil pointer dereference

    panic: runtime error: invalid memory address or nil pointer dereference

    gitbase v0.19.0-beta4

    query:

    SELECT 
    r.repository_id, SUM(ARRAY_LENGTH(SPLIT(b.blob_content, '\n'))) as lines_count
    FROM refs r
    NATURAL JOIN commit_blobs ct
    NATURAL JOIN blobs b
    WHERE r.ref_name = 'HEAD'
    GROUP BY r.repository_id
    ORDER BY lines_count DESC
    

    Traceback:

    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x87fe52]
    
    goroutine 6076 [running]:
    github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-git.v4/plumbing/cache.(*ObjectLRU).Put(0xc000aa4ff0, 0x14440c0, 0xc01391da40)
    	/go/src/github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-git.v4/plumbing/cache/object_lru.go:64 +0x352
    github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-git.v4/storage/filesystem.(*ObjectStorage).getFromUnpacked(0xc017bec868, 0xa9112fc6650d62b5, 0x83f2209191640cdc, 0xa7d9180f, 0x14440c0, 0xc01391da40, 0x0, 0x0)
    	/go/src/github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-git.v4/storage/filesystem/object.go:344 +0x39a
    github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-git.v4/storage/filesystem.(*ObjectStorage).EncodedObject(0xc017bec868, 0x112fc6650d62b503, 0xf2209191640cdca9, 0xa7d9180f83, 0x0, 0x0, 0x0, 0x0)
    	/go/src/github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-git.v4/storage/filesystem/object.go:254 +0x3eb
    github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-git.v4/plumbing/object.GetBlob(0x1441f00, 0xc017bec850, 0xa9112fc6650d62b5, 0x83f2209191640cdc, 0xa7d9180f, 0x650d62b5000081a4, 0x91640cdca9112fc6, 0xa7d9180f83f22091)
    	/go/src/github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-git.v4/plumbing/object/blob.go:23 +0x4e
    github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-git.v4/plumbing/object.(*FileIter).Next(0xc01e6b4b40, 0x4211e8, 0xc00498dd10, 0x7efccf3a1cb3)
    	/go/src/github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-git.v4/plumbing/object/file.go:100 +0x136
    github.com/src-d/gitbase.(*squashCommitBlobsIter).Advance(0xc017becbd0, 0xc, 0xc00113fb01)
    	/go/src/github.com/src-d/gitbase/squash_iterator.go:2754 +0x7c
    github.com/src-d/gitbase.(*squashCommitBlobBlobsIter).Advance(0xc02245c5a0, 0x5c6fc722, 0x27c2150a)
    	/go/src/github.com/src-d/gitbase/squash_iterator.go:3051 +0x49
    github.com/src-d/gitbase.(*chainableRowIter).Next(0xc00b3d0370, 0x5777dccfb2, 0x2118820, 0x20, 0x25, 0xc00113fc80)
    	/go/src/github.com/src-d/gitbase/squash.go:150 +0x37
    github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql.(*spanIter).Next(0xc02245c5f0, 0xc000063290, 0xc000062000, 0xc00113fcb0, 0x414d10, 0xc0044c4b90)
    	/go/src/github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/session.go:346 +0x5d
    github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/plan.(*trackedRowIter).Next(0xc0227dad20, 0x50, 0x44b4f8, 0x52307915bd55c, 0x27c2138b, 0x27c2138b0113fd68)
    	/go/src/github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/plan/process.go:145 +0x37
    github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/plan.(*FilterIter).Next(0xc00bb17e00, 0x5777dcce26, 0x2118820, 0x3, 0x3, 0xc00113feb2)
    	/go/src/github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/plan/filter.go:105 +0x38
    github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql.(*spanIter).Next(0xc02245c780, 0xc00113fef8, 0x44b4f8, 0x52307915bd4b6, 0xc027c212db, 0x27c212db0113fe20)
    	/go/src/github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/session.go:346 +0x5d
    github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/plan.(*iter).Next(0xc0227dad40, 0x5777dccd80, 0x2118820, 0x4dd96c, 0xc02226c440, 0xc026896120)
    	/go/src/github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/plan/project.go:129 +0x38
    github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql.(*spanIter).Next(0xc02245c7d0, 0xc00113feac, 0x3, 0x2, 0x0, 0x0)
    	/go/src/github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/session.go:346 +0x5d
    github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/plan.(*exchangeRowIter).iterPartition(0xc01b52a5a0, 0x142d620, 0xc00b3d01b0)
    	/go/src/github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/plan/exchange.go:245 +0x251
    github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/plan.(*exchangeRowIter).start.func1(0xc01b52a5a0, 0xc0217396b0, 0x142d620, 0xc00b3d01b0)
    	/go/src/github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/plan/exchange.go:170 +0x3f
    created by github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/plan.(*exchangeRowIter).start
    	/go/src/github.com/src-d/gitbase/vendor/gopkg.in/src-d/go-mysql-server.v0/sql/plan/exchange.go:169 +0x10d
    
    bug 
    opened by smacker 16
  • Increase go-git cache size

    Increase go-git cache size

    Right now there is no way of changing the default cache size in go-git and its size too small (96 MiB). I've been doing tests changing this value and its performance improved a lot.

    Repositories: linux (2013), numpy, tensorflow Number of rows: 395709 Query: SELECT count(*) FROM commits c NATURAL JOIN ref_commits r WHERE r.ref_name = 'HEAD';

    Default cache: 1 row in set (54 min 22.17 sec) Cache size * 8: 1 row in set (20 min 43.69 sec)

    Memory consumption is also not too big. gitbase used 1.3 GiB in this query.

    We should add an option to go-git Open to select cache size.

    proposal performance 
    opened by jfontan 16
  • Parsing C# doesn't work

    Parsing C# doesn't work

    Run this query on any C# repository:

    SELECT UAST(f.blob_content, LANGUAGE(f.file_path, f.blob_content)) AS uast
    FROM refs AS r
    NATURAL JOIN commit_files
    NATURAL JOIN files AS f
    WHERE r.ref_name = 'HEAD' AND f.file_path REGEXP('.*.cs')
    LIMIT 5
    

    it will return empty uasts.

    bug 3rd-party 
    opened by smacker 15
  • Couple of question / issues on the standalone installation of gitbase

    Couple of question / issues on the standalone installation of gitbase

    I have done a standalone installation of the gitbase server.I started the server providing the git directory pat. When I launch the mysql client using the following command - mysql -q -u root -h 127.0.0.1

    I get the mysql prompt . When I execute the following query -

    mysql> select * from repositories; +---------------+ | repository_id | +---------------+ +---------------+

    I get empty directory. So not sure how to troubleshoot this issues. As there is no much documentation around how to stop / purge the gitbase server.

    question 
    opened by deepakorantak 15
  • Make mysqldump work with gitbase

    Make mysqldump work with gitbase

    Right now if you try to do a mysqldump, you will have the next error:

    mysqldump --all-databases --port=3306 --host=localhost --protocol=tcp --user=root
    
    mysqldump: Couldn't execute '/*!40100 SET @@SQL_MODE='' */': unknown error: syntax error at position 30 (1105)
    
    feature 
    opened by ajnavarro 15
  • Negation Expression on indexed column not working correctly

    Negation Expression on indexed column not working correctly

    Executing the following query:

    mysql> select count(*) from commits where commit_author_email='[email protected]' group by repository_id;
    +----------+
    | COUNT(*) |
    +----------+
    |     1213 |
    +----------+
    1 row in set (0,37 sec)
    

    All appears to be good, but if we just want the count of commits that doesn't have that commit author email:

    mysql> select count(*) from commits where commit_author_email!='[email protected]' group by repository_id;
    +----------+
    | COUNT(*) |
    +----------+
    |     6569 |
    +----------+
    1 row in set (1,00 sec)
    
    

    The result appears to be not correct. The complete count result:

    DEBU[2114] finished pilosa indexing                      duration=34m42.734502461s id=commits_author_email_idx mapping=30.782425411s pilosa=7.745289374s rows=2567829
    

    So I suppose the second query should return 2567829-1213 = 2566616

    bug 
    opened by ajnavarro 14
  • Low perf on NATURAL JOINs

    Low perf on NATURAL JOINs

    I have these two requests which should perform basically the same:

    SELECT f.repository_id, COUNT(*) as n
    FROM   files AS f
           JOIN commit_files cf ON
                f.repository_id=cf.repository_id AND
                f.file_path=cf.file_path AND
                f.blob_hash=cf.blob_hash AND
                f.tree_hash=cf.tree_hash
           JOIN refs ON
                cf.repository_id = refs.repository_id AND
                cf.commit_hash = refs.commit_hash
    WHERE  ref_name = 'HEAD'
    GROUP BY f.repository_id
    ORDER BY n DESC
    

    and its NATURAL JOIN equivalent

    SELECT f.repository_id, COUNT(*) as n
    FROM   files AS f
           NATURAL JOIN commit_files cf
           NATURAL JOIN refs
    WHERE  ref_name = 'HEAD'
    GROUP BY f.repository_id
    ORDER BY n DESC
    

    Unfortunately, while the first one finishes after a couple of seconds, the second one takes double that. I analyzed their EXPLAIN output and saw there's a tiny difference and wonder whether this could be the culprit.

    For the first JOIN ON version, the plan is:

    Sort(n DESC)
     └─ Project(files.repository_id, COUNT(*) as n)
         └─ GroupBy
             ├─ Aggregate(files.repository_id, COUNT(*))
             ├─ Grouping(files.repository_id)
             └─ Exchange(parallelism=96)
                 └─ SquashedTable(refs, commit_files, files)
                     ├─ Columns
                     │   ├─ Column(repository_id, TEXT, nullable=false)
                     │   ├─ Column(file_path, TEXT, nullable=false)
                     │   ├─ Column(blob_hash, TEXT, nullable=false)
                     │   ├─ Column(tree_hash, TEXT, nullable=false)
                     │   ├─ Column(tree_entry_mode, TEXT, nullable=false)
                     │   ├─ Column(blob_content, BLOB, nullable=false)
                     │   ├─ Column(blob_size, INT64, nullable=false)
                     │   ├─ Column(repository_id, TEXT, nullable=false)
                     │   ├─ Column(commit_hash, TEXT, nullable=false)
                     │   ├─ Column(file_path, TEXT, nullable=false)
                     │   ├─ Column(blob_hash, TEXT, nullable=false)
                     │   ├─ Column(tree_hash, TEXT, nullable=false)
                     │   ├─ Column(repository_id, TEXT, nullable=false)
                     │   ├─ Column(ref_name, TEXT, nullable=false)
                     │   └─ Column(commit_hash, TEXT, nullable=false)
                     └─ Filters
                         ├─ commit_files.repository_id = refs.repository_id
                         ├─ commit_files.commit_hash = refs.commit_hash
                         ├─ files.repository_id = commit_files.repository_id
                         ├─ files.file_path = commit_files.file_path
                         ├─ files.blob_hash = commit_files.blob_hash
                         ├─ files.tree_hash = commit_files.tree_hash
                         └─ refs.ref_name = "HEAD"
    

    While for the one with NATURAL JOIN:

    Sort(n DESC)
     └─ Project(files.repository_id, COUNT(*) as n)
         └─ GroupBy
             ├─ Aggregate(files.repository_id, COUNT(*))
             ├─ Grouping(files.repository_id)
             └─ Exchange(parallelism=96)
                 └─ Project(files.repository_id, commit_files.commit_hash, files.file_path, files.blob_hash, files.tree_hash, files.tree_entry_mode, files.blob_content, files.blob_size, refs.ref_name)
                     └─ Filter(files.repository_id = refs.repository_id)
                         └─ SquashedTable(refs, commit_files, files)
                             ├─ Columns
                             │   ├─ Column(repository_id, TEXT, nullable=false)
                             │   ├─ Column(file_path, TEXT, nullable=false)
                             │   ├─ Column(blob_hash, TEXT, nullable=false)
                             │   ├─ Column(tree_hash, TEXT, nullable=false)
                             │   ├─ Column(tree_entry_mode, TEXT, nullable=false)
                             │   ├─ Column(blob_content, BLOB, nullable=false)
                             │   ├─ Column(blob_size, INT64, nullable=false)
                             │   ├─ Column(repository_id, TEXT, nullable=false)
                             │   ├─ Column(commit_hash, TEXT, nullable=false)
                             │   ├─ Column(file_path, TEXT, nullable=false)
                             │   ├─ Column(blob_hash, TEXT, nullable=false)
                             │   ├─ Column(tree_hash, TEXT, nullable=false)
                             │   ├─ Column(repository_id, TEXT, nullable=false)
                             │   ├─ Column(ref_name, TEXT, nullable=false)
                             │   └─ Column(commit_hash, TEXT, nullable=false)
                             └─ Filters
                                 ├─ commit_files.commit_hash = refs.commit_hash
                                 ├─ files.repository_id = commit_files.repository_id
                                 ├─ files.file_path = commit_files.file_path
                                 ├─ files.blob_hash = commit_files.blob_hash
                                 ├─ files.tree_hash = commit_files.tree_hash
                                 └─ refs.ref_name = "HEAD"
    

    Is it possible that the extra Project and Filter right above the SquashedTable can cause such a change in performance?

    question performance 
    opened by campoy 13
  • surprising performance issue

    surprising performance issue

    I just ran this query on top of github.com/golang/go:

      SELECT
      	LANGUAGE(t.tree_entry_name, b.blob_content) as lang,
    	t.tree_entry_name as name,
           b.blob_content as code
      FROM refs r 
           JOIN commits c ON r.commit_hash = c.commit_hash
           JOIN commit_trees ct ON c.commit_hash = ct.commit_hash
           JOIN tree_entries t ON ct.tree_hash = t.tree_hash
           JOIN blobs b ON t.blob_hash = b.blob_hash
    

    This finishes in 0.65s :tada:

    Unfortunately this other request takes forever:

      SELECT
    	t.tree_entry_name as name,
            b.blob_content as code
      FROM refs r 
           JOIN commits c ON r.commit_hash = c.commit_hash
           JOIN commit_trees ct ON c.commit_hash = ct.commit_hash
           JOIN tree_entries t ON ct.tree_hash = t.tree_hash
           JOIN blobs b ON t.blob_hash = b.blob_hash
       WHERE LANGUAGE(t.tree_entry_name, b.blob_content) = 'go'
    

    Trying to see whether I could find a workaround I wrote this second query:

    SELECT name, code
    FROM 
    (
      SELECT
    	LANGUAGE(t.tree_entry_name, b.blob_content) = 'go' as lang,
    	t.tree_entry_name as name,    
            b.blob_content as code
      FROM refs r 
           JOIN commits c ON r.commit_hash = c.commit_hash
           JOIN commit_trees ct ON c.commit_hash = ct.commit_hash
           JOIN tree_entries t ON ct.tree_hash = t.tree_hash
           JOIN blobs b ON t.blob_hash = b.blob_hash
    ) as blobs
    WHERE lang = 'go'
    

    Both of these requests take too long for me to wait.

    performance 
    opened by campoy 13
  • Gitbase doesn't work on windows with mounted directory for indexes

    Gitbase doesn't work on windows with mounted directory for indexes

    Gitbase 0.19.0.

    Screenshots because it's hard to copy-past from remote windows console.

    Client: Screenshot 2019-03-22 at 16 18 34

    Server: Screenshot 2019-03-22 at 16 17 19 Screenshot 2019-03-22 at 16 18 02

    The error:

    unable to save the index open file: open ...: invalid argument

    But the file is actually created on host file system: Screenshot 2019-03-22 at 16 33 19

    I tried to read the source code and see where the error can come from. I found this line: https://github.com/pilosa/pilosa/blob/f2994736585a8aafc2f2c47c3698b7acd3b95373/fragment.go#L199 which looks like the right place.

    I tried to reproduce it by creating simple script and running it inside docker with mounted directory:

    package main
    
    import (
    	"fmt"
    	"os"
    )
    
    func main() {
    	asPilosa := "/mounted/asPilosa"
    
    	_, err := os.OpenFile(asPilosa, os.O_RDWR|os.O_CREATE|os.O_APPEND, 0666)
    	if err != nil {
    		fmt.Printf("open file as pilosa: %s\n", err)
    	}
    }
    

    but no luck. It creates the file without error.

    It's super inconvinient to debug inside remote desktop. So I didn't go further by rebuilding gitbase with extra debug messages.

    bug blocked 3rd-party 
    opened by smacker 12
  • The link of community-edition has broken

    The link of community-edition has broken

    The community-edition in README is broken that I can't reach, is there have something go wrong and forget update documentation? or my un stable network

    opened by mistricky 0
  • Java JDBC Connection Error: unknown error: expecting

    Java JDBC Connection Error: unknown error: expecting "EOF" but got 'V' instead

    hi I use below code docker run -itd --name git_base --env GITBASE_PASSWORD=root -p 3344:3306 -v /Users/code/test:/opt/repos srcd/gitbase:latest and java jdbc connection gitbase mysql Connection connection=JDBCUtils.getConnection("jdbc:mysql://127.0.0.1:3344/gitbase","root","root"); but print error Exception in thread "main" java.sql.SQLException: unknown error: expecting "EOF" but got 'V' instead at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:996) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2526) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484) at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1446) at com.mysql.jdbc.ConnectionImpl.loadServerVariables(ConnectionImpl.java:3828) at com.mysql.jdbc.ConnectionImpl.initializePropsFromServer(ConnectionImpl.java:3268) at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2278) at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2064) at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:790) at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:44) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at com.mysql.jdbc.Util.handleNewInstance(Util.java:377) at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:395) Is there any way to solve it? thanks

    opened by xlw712 0
  • Missing function in Gitbase DB (MariaDB)

    Missing function in Gitbase DB (MariaDB)

    Hey,

    when I try to run the following example in the db

    SELECT repository_id, file_path,
           JSON_UNQUOTE(JSON_EXTRACT(bl, "$.linenum")),
           JSON_UNQUOTE(JSON_EXTRACT(bl, "$.author")),
           JSON_UNQUOTE(JSON_EXTRACT(bl, "$.text"))
    FROM   (SELECT repository_id, file_path,
                   EXPLODE(BLAME(repository_id, commit_hash, file_path)) AS bl
            FROM   ref_commits
                   NATURAL JOIN blobs
                   NATURAL JOIN commit_files
            WHERE  ref_name = 'HEAD'
                   AND NOT IS_BINARY(blob_content)
            ) as p
    WHERE  JSON_EXTRACT(bl, "$.text") LIKE '%// TODO%';
    

    I get the following error

    ERROR 1105 (HY000): unknown error: A function: 'blame' not found.
    

    I'm new to source{d} and using the community edition. Could you guys point me in the right direction. For some reason SHOW FUNCTION STATUSis working either so I'm having problems debugging this.

    opened by ml-netent 4
  • Natural join seems to eliminate rows which it shouldn't

    Natural join seems to eliminate rows which it shouldn't

    MySQL [gitbase]> select blob_hash, repository_id from blobs natural join repositories where blob_hash in ('93ec5b4525363844ddb1981adf1586ebddbc21c1', 'aad34590345310fe813fd1d9eff868afc4cea10c', 'ed82eb69daf806e521840f4320ea80d4fe0af435');
    +------------------------------------------+-------------------------------------+
    | blob_hash                                | repository_id                       |
    +------------------------------------------+-------------------------------------+
    | aad34590345310fe813fd1d9eff868afc4cea10c | github.com/bblfsh/javascript-driver |
    | ed82eb69daf806e521840f4320ea80d4fe0af435 | github.com/src-d/enry               |
    | aad34590345310fe813fd1d9eff868afc4cea10c | github.com/bblfsh/python-driver     |
    | 93ec5b4525363844ddb1981adf1586ebddbc21c1 | github.com/src-d/go-mysql-server    |
    | aad34590345310fe813fd1d9eff868afc4cea10c | github.com/bblfsh/ruby-driver       |
    | ed82eb69daf806e521840f4320ea80d4fe0af435 | github.com/src-d/gitbase            |
    +------------------------------------------+-------------------------------------+
    6 rows in set (14.90 sec)
    
    MySQL [gitbase]> select blob_hash, repository_id from blobs where blob_hash in ('93ec5b4525363844ddb1981adf1586ebddbc21c1', 'aad34590345310fe813fd1d9eff868afc4cea10c', 'ed82eb69daf806e521840f4320ea80d4fe0af435');
    +------------------------------------------+-------------------------------------+
    | blob_hash                                | repository_id                       |
    +------------------------------------------+-------------------------------------+
    | aad34590345310fe813fd1d9eff868afc4cea10c | github.com/bblfsh/python-driver     |
    | aad34590345310fe813fd1d9eff868afc4cea10c | github.com/bblfsh/javascript-driver |
    | ed82eb69daf806e521840f4320ea80d4fe0af435 | github.com/src-d/enry               |
    | aad34590345310fe813fd1d9eff868afc4cea10c | github.com/bblfsh/ruby-driver       |
    | 93ec5b4525363844ddb1981adf1586ebddbc21c1 | github.com/src-d/gitbase            |
    | ed82eb69daf806e521840f4320ea80d4fe0af435 | github.com/src-d/gitbase            |
    | 93ec5b4525363844ddb1981adf1586ebddbc21c1 | github.com/src-d/go-mysql-server    |
    | ed82eb69daf806e521840f4320ea80d4fe0af435 | github.com/src-d/go-mysql-server    |
    +------------------------------------------+-------------------------------------+
    8 rows in set (0.13 sec)
    

    also note that removing the natural join makes things go much faster- it was my understanding that normally we want to join with repositories to benefit from some specific optimizations (although I'm guessing that filtering with blob_hash makes those optimizations moot).

    bug blocked 
    opened by alexpdp7 6
  • Schema introspection (SHOW FULL COLUMNS ...) became very slow

    Schema introspection (SHOW FULL COLUMNS ...) became very slow

    In gitbase 0.20 schema introspection is fast and full.

    MySQL Connector/J JDBC metadata call that gets all columns for all tables at once metaData.getColumns("gitbase", "", "%", "%") is converted to calls like the following for each table SHOW FULL COLUMNS FROM `commit_trees` FROM `gitbase` LIKE '%'"

    In 0.23 and 0.24-rc the above queries are very slow (several minutes) and even fail for some tables completely in 0.23 (0.24 seems to fix that).

    The above prevents from using gitbase in DB tools like JetBrains DataGrip.

    bug blocked performance 
    opened by gregsh 6
Releases(v0.24.0-rc3)
Owner
source{d}
source{d}
A Simple and Comprehensive Vulnerability Scanner for Container Images, Git Repositories and Filesystems. Suitable for CI

A Simple and Comprehensive Vulnerability Scanner for Containers and other Artifacts, Suitable for CI. Abstract Trivy (tri pronounced like trigger, vy

Aqua Security 12.6k Jun 24, 2022
Quickly clone git repositories into a nested folders like GOPATH.

cl cl clones git repositories into nested folders like GOPATH and outputs the path of the cloned directory. Example: cl https://github.com/foo/bar Is

Felix Geisendörfer 12 Jun 2, 2022
ReGit: A Tiny Git-Compatible Git Implementation written in Golang

ReGit is a tiny Git implementation written in Golang. It uses the same underlying file formats as Git. Therefore, all the changes made by ReGit can be checked by Git.

null 165 Jun 22, 2022
Git with a cup of tea, painless self-hosted git service

Gitea - Git with a cup of tea View the chinese version of this document Purpose The goal of this project is to make the easiest, fastest, and most pai

Gitea 30.6k Jun 27, 2022
A Git RPC service for handling all the git calls made by GitLab

Quick Links: Roadmap | Want to Contribute? | GitLab Gitaly Issues | GitLab Gitaly Merge Requests | Gitaly is a Git RPC service for handling all the gi

null 1 Nov 13, 2021
A simple cli tool for switching git user easily inspired by Git-User-Switch

gitsu A simple cli tool for switching git user easily inspired by Git-User-Switch Installation Binary releases are here. Homebrew brew install matsuyo

Masaya Watanabe 200 May 15, 2022
Removes unnecessarily saved git objects to optimize the size of the .git directory.

Git Repo Cleaner Optimizes the size of the .git directory by removing all of the files that are unnecessarily-still-saved as part of the git history.

Omar Yasser 2 Mar 24, 2022
Gum - Git User Manager (GUM) - Switch between git user profiles

Git User Manager (GUM) Add your profile info to config.yaml Build project: go bu

Mehmet Tevfik YÜKSEL 6 Feb 14, 2022
Git-now-playing - Git commits are the new AIM status messages

git-now-playing git-now-playing is an attempt to bring some of the panache of th

Paddy 1 Apr 4, 2022
A simple tool to help apply changes across many GitHub repositories simultaneously

A simple tool to help apply changes across many GitHub repositories simultaneously

Skyscanner 315 Jun 28, 2022
Find trending repositories on GitHub

octotrends.com A niftly little tool I wrote to try and find repos and languages that are rapidly growing on GitHub. Growth rates are based on % growth

Dominik Dabrowski 7 Jun 14, 2022
Simple git hooks written in go that installs globally to your machine

Go-hooks Simple git hooks written in go that installs globally to your machine Install curl -fsSL

Vadim Makerov 0 Nov 1, 2021
Gogs is a painless self-hosted Git service

Gogs - A painless self-hosted Git service 简体中文 ?? Vision The Gogs (/gɑgz/) project aims to build a simple, stable and extensible self-hosted Git servi

Gogs 40.4k Jun 26, 2022
A highly extensible Git implementation in pure Go.

go-git is a highly extensible git implementation library written in pure Go. It can be used to manipulate git repositories at low level (plumbing) or

go-git 3.6k Jun 29, 2022
commit/branch/workdir explorer for git

gitin gitin is a commit/branch/status explorer for git gitin is a minimalist tool that lets you explore a git repository from the command line. You ca

Ibrahim Serdar Acikgoz 1.8k Jun 26, 2022
A command-line tool that makes git easier to use with GitHub.

hub is a command line tool that wraps git in order to extend it with extra features and commands that make working with GitHub easier. For an official

GitHub 21.8k Jun 20, 2022
Fast and powerful Git hooks manager for any type of projects.

Lefthook The fastest polyglot Git hooks manager out there Fast and powerful Git hooks manager for Node.js, Ruby or any other type of projects. Fast. I

Abroskin Alexander 2.1k Jun 27, 2022
Implementation of git internals from scratch in Go language

This project is part of a learning exercise to implement a subset of "git" commands. It can be used to create and maintain git objects, such as blobs, trees, commits, references and tags.

Shyamsunder Rathi 31 Apr 17, 2022
go mod vendor lets you check in your dependencies to git, but that's both bloaty (for developers) and tedious (remembering to update it).

go-mod-archiver Afraid of being unable to build historical versions of your Go program? go mod vendor lets you check in your dependencies to git, but

Tailscale 82 Jun 25, 2022