A high-speed data import tool for TiDB

Overview

TiDB Lightning

Build Status Coverage Status FOSSA Status

TiDB Lightning is a tool for fast full import of large amounts of data into a TiDB cluster. Currently, we support reading SQL dump exported via mydumper.

TiDB Lightning architecture

Contributing

Contributions are welcomed and greatly appreciated. See CONTRIBUTING.md for details on submitting patches and the contribution workflow.

License

TiDB Lightning is under the Apache 2.0 license. See the LICENSE file for details.

FOSSA Status

More resources

Issues
  • restore: Try to create tables in parallel

    restore: Try to create tables in parallel

    What problem does this PR solve?

    Issue Number: close #434

    What is changed and how it works?

    • add schemaStmt hold one statement(create db|table|view)
    • add schemaJob holds whole statements of one restore schema job
    • add restoreSchemaWorker produce a async goroutine to create restore schema jobs(as producer)
    • set "hardcoded"(16 goroutines) concurrency when restoreSchema#doJob called(as consumer)

    Benchmark

    (tests/restore/run.sh $TABLE_COUNT=300) time costs report as below:

    Before

    
    ________________________________________________________
    Executed in  211.51 secs   fish           external 
       usr time   76.28 secs  187.00 micros   76.28 secs 
       sys time   44.62 secs  617.00 micros   44.62 secs 
    
    [2020/12/08 17:06:33.389 +08:00] [INFO] [restore.go:964] ["restore all tables data completed"] [takeTime=1m9.093660687s] []
    [2020/12/08 17:06:33.389 +08:00] [INFO] [restore.go:745] ["everything imported, stopping periodic actions"]
    [2020/12/08 17:06:33.389 +08:00] [INFO] [restore.go:1409] ["skip full compaction"]
    [2020/12/08 17:06:33.411 +08:00] [INFO] [restore.go:294] ["the whole procedure completed"] [takeTime=1m45.325956477s] []
    [2020/12/08 17:06:33.411 +08:00] [INFO] [main.go:95] ["tidb lightning exit"]
    [2020/12/08 17:06:33.411 +08:00] [INFO] [checksum.go:425] ["service safe point keeper exited"]
    

    After

    
    ________________________________________________________
    Executed in  213.24 secs   fish           external 
       usr time   78.08 secs  140.00 micros   78.08 secs 
       sys time   44.92 secs  475.00 micros   44.92 secs 
    
    [2020/12/08 16:55:15.821 +08:00] [INFO] [restore.go:820] ["restore all tables data completed"] [takeTime=1m9.754043571s] []
    [2020/12/08 16:55:15.821 +08:00] [INFO] [restore.go:601] ["everything imported, stopping periodic actions"]
    [2020/12/08 16:55:15.821 +08:00] [INFO] [restore.go:1265] ["skip full compaction"]
    [2020/12/08 16:55:15.840 +08:00] [INFO] [restore.go:293] ["the whole procedure completed"] [takeTime=1m42.140242288s] []
    [2020/12/08 16:55:15.840 +08:00] [INFO] [main.go:95] ["tidb lightning exit"]
    [2020/12/08 16:55:15.840 +08:00] [INFO] [checksum.go:425] ["service safe point keeper exited"]
    

    PS: the benchmark occurs from non-cluster TiDB, it maybe means the only one node(TiDB) as both DDL owner/non-owner stuck the total thread. we should benchmark again in TiDB cluster(multiple DDL non-cluster)

    -------- Update ---------

    Benchmark 1 * PD|3 * TiDB| 4 * TiKV cluster (single machine)

    preset:

    mysql> set @@global.tidb_scatter_region = "1";
    

    Concurrency

    [2020/12/29 14:55:09.523 +08:00] [INFO] [restore.go:503] ["restore schema completed"] [takeTime=2m15.052150251s] []

    Serial

    [2020/12/29 15:04:52.746 +08:00] [INFO] [restore.go:357] ["restore schema completed"] [takeTime=2m47.520433308s] []

    Check List

    Tests

    • Unit test
    • Integration test

    Side effects

    • Increased code complexity

    Related changes

    • Need to cherry-pick to the release branch
    • Need to be included in the release note
    status/LGT2 status/PTAL rewarded 
    opened by hidehalo 36
  • Try to create tables in parallel

    Try to create tables in parallel

    Feature Request

    Is your feature request related to a problem? Please describe:

    Currently we perform CREATE TABLE (tidbMgr.InitSchema) in sequence. But experiences in BR shows that running them in parallel is faster pingcap/br#377

    Describe the feature you'd like:

    Execute the CREATE TABLE in restoreSchema in parallel over 16 connections.

    Benchmark that by importing 300 small tables.

    Describe alternatives you've considered:

    Don't do it.

    Teachability, Documentation, Adoption, Optimization:

    N/A

    Score

    600

    SIG slack channel

    sig-migrate

    Mentor

    @glorv @lance6716

    feature-request priority/P3 difficulty/1-easy challenge-program high-performance picked 
    opened by kennytm 24
  • Update dependencies and remove juju/errors

    Update dependencies and remove juju/errors

    1. Replaced juju/errors by pingcap/errors (exported as pkg/errors due to how pingcap/tidb imports it) (LGPL-v3 → BSD-2-clause)

    2. Updated pingcap/tidb to v2.1.0-rc.4 to entirely remove juju/errors from the vendor.

      • Updated pingcap/pd to v2.1.0-rc.4
      • Updated pingcap/kvproto to certain master
      • Updated pingcap/tipb to certain master
      • Replaced golang/protobuf by gogo/protobuf (BSD-3-clause)
      • Added opentracing/basictracer-go (Apache-2.0)
    3. Removed the golang.org/x/net dependency as we can use the built-in context package (the two are interchangeable after Go 1.7 anyway)

    4. Removed the explicit dependency on pingcap/tidb-tools and siddontang/go, we're not using glide anymore

    5. Updated some direct dependencies:

      • Updated BurntSushi/toml from v0.3.0 to v0.3.1 (WTFPL → MIT)
      • Updated prometheus/client_golang from v0.8.0 to v0.9.0
      • Updated sirupsen/logrus from v0.11.6 to v1.1.1
      • Updated golang.org/x/sys to certain master
      • Updated google.golang.org/grpc from v1.12.0 to v1.15.0
    6. Added the commercial license

    opened by kennytm 23
  • restore: update and restore GCLifeTime once when parallel

    restore: update and restore GCLifeTime once when parallel

    What problem does this PR solve?

    DoChecksum doesn't prepare for parallel case

    What is changed and how it works?

    There may be multiple DoChecksum running, setting and resetting GC Life Time should not be done when there's unfinished DoChecksum tasks.

    Lightning have restoreTables runs simultaneously, so we use this logic

    func restoreTables() {
        // init_helper
        for some conurrency {
            go restoreTable()
        }
    }
    
    func restoreTable() {
        // call postProcess() -> ... -> DoChecksum()
    }
    
    func DoChecksum() {
        // using helper's pointers of lock, running jobs counter, original value
    }
    

    Using variable to count running checksum jobs. Before a remote chechsum call starting, lock, increase this counter, check if it just rised from zero then backup original value and call set GC Life Time logic, unlock. After a remote chechsum finishing, lock, decrease this counter, check if it just drop to zero then call resetting GC Life Time logic, unlock.

    The lock, counter, original value are pointed to same per location in one restoreTables.

    Check List

    Tests

    • Unit test
    • Integration test

    Side effects

    Related changes

    • Need to cherry-pick to the release branch
    status/LGT2 type/bug-fix type/bug 
    opened by lance6716 20
  • restore: check row value count to avoid unexpected encode result

    restore: check row value count to avoid unexpected encode result

    What problem does this PR solve?

    ~Check row field count before encodes, if row value count is bigger than table field count, directly return an error.~

    • check column count in tidb encoder and return an error if column count doesn't match table column count
    • Be compatible with special field _tidb_rowid in getColumnNames and tidb encoder

    What is changed and how it works?

    Check List

    Tests

    • Unit test
    • Integration test
    • Manual test (add detailed scripts or steps below)
    • No code

    Side effects

    Related changes

    Release Note

    • Fix the bug that tidb backend will panics if source file columns are more than target table columns
    status/LGT2 type/bug-fix 
    opened by glorv 17
  • restore: optimize SQL processing speed

    restore: optimize SQL processing speed

    DNM: Based on #109 for simplicity of development.

    What problem does this PR solve?

    Optimizing SQL processing speed

    Result:

    • PR 109: TableConcurrency = 20, RegionConcurrency = 40, the metrics data has been lost due to cluster be cleanup.

      Data size: 340G
      Rate: ~45MB/s
      Total time: ~= 2h30m (import time: 40m)
      
    • PR 110: TableConcurrency = 20, RegionConcurrency = 20, IOConcurrency = 5, Test1 metrics snapshot, Test2 metrics snapshot

      CAN NOT REPREDUCE [IO Delay unstable]

      Test1: 
      Data size: 146G
      Rate: 90~160MB/s
      Total time: ~= 48m (import time: 27m)
      
      Test2:
      Data size: 146G
      Rate: 130~190MB/s
      Total time: ~= 46m (import time: 28m)
      
      Test3
      coming ...
      
    • PR 110: TableConcurrency = 40, RegionConcurrency = 40, IOConcurrency = 10 160G, Metrics

      	2018/12/30 01:16:32.871 restore.go:477: [info] restore all tables data takes 52m59.59960385s
      	2018/12/30 01:16:32.871 restore.go:366: [info] Everything imported, stopping periodic actions
      	2018/12/30 01:16:32.871 restore.go:208: [error] run cause error : [types:1292]invalid time format: '{2038 1 19 4 4 36 0}'
      	2018/12/30 01:16:32.871 restore.go:214: [info] the whole procedure takes 53m7.986292573s
      

    Early conclusion:

    Concurrent IO causes delays to lengthen, which lengthens SQL processing time

    What is changed and how it works?

    Limiting IO concurrency

    Check List

    Tests

    • Unit test

    Code changes

    Side effects

    Related changes

    status/LGT2 type/feature 
    opened by lonng 16
  • Support table routing rules (merging sharded tables)

    Support table routing rules (merging sharded tables)

    What problem does this PR solve?

    TOOL-142

    (Note: still won't handle UNIQUE/PRIMARY key conflict. This needs column-mapping)

    What is changed and how it works?

    Rename the tables while the loading the files. Since we don't care about the table name when parsing the data files, it becomes very simple to support merging: just associate all those data files to the target table.

    This PR supersedes #54. Note that #54 is very large because it attempts to do some refactoring at the same time.

    Check List

    Tests

    • Integration test

    Code changes

    Side effects

    Related changes

    • Need to update the documentation
    • Need to be included in the release note
    status/LGT2 Should Update Docs type/feature 
    opened by kennytm 16
  • backend: add local kv storage backend to get rid of importer

    backend: add local kv storage backend to get rid of importer

    What problem does this PR solve?

    Use local key-value storage as a new backend to get ride of the dependency of tikv-importer, thus make lightning easier to use. In our benchmark, the performance of import speed with local mode is as good as importer mode, so this change won't bring any performance loss. Thus it will be a much better choice in compare with tidb backend.

    What is changed and how it works?

    The logic of local mode is as follows:

    1. Write the data read for csv/mydumper to local kv-value storage pebble.
    2. Batch write sorted kv pairs and generate sst file at each tikv instance. https://github.com/tikv/tikv/pull/7459
    3. ingest the sst file to tikv cluster

    In the local mode, because sorted kv data are managed by the lightning process, so there are some change for checkpoint in local mode:

    • In the restore phase with checkpoint enabled, we save all the chunk checkpoints after the engine is close. This is because when lightning exited before engine closed, there maybe some data written to kv store but not flushed, thus these data may lost. Another approach is to do a flush after each chunk processed, but this is so much slow.
    • Before we update an engine checkpoint to CheckpointStatusClosed, we will do a flush to the related index-engine to make sure related index kvs are saved.
    • Skip the CheckpointStatusAllWritten stage for local mode, because at the point , the data/index kv are not flushed, so if lightning exits at this point, we can't promise local kv store contains all the key-values for this engine, thus we have to restore the engine from start.
    • Add a new meta file for each engine store with the engine db files. This meta containes some data used in the import phase. The meta file is generated when engine is closed.

    Changes for tidb-lightning-ctl:

    • the import-engine command is not supported in local mode. Because now import phase is done by lightning, so this command is not meanful anymore.

    NOTE:

    • We recommend to separate the sorted-kv-dir from the data disk if possible, because the data-disk si read heavy, and the local storage dir is both read and write heavy, use another disk for this temp-store will make at least 10% performance gain.

    TODO:

    • Maybe we should save chunk checkpoints more frequently. One possible approach is after writing specific bytes to pebble, we can arrange a flush for both data&index engine, thus we can save current chunk checkpoints fearlessly.

    Check List

    Tests

    • Unit test
    • Integration test
    • Manual test (add detailed scripts or steps below)
    • No code

    Side effects

    • Reusing checkpoint may take more time than importer backend, thus make #303 even worse.

    Related changes

    • Need to cherry-pick to the release branch
    • Need to update the documentation
    • Need to update the tidb-ansible repository
    • Need to be included in the release note
    Should Update Docs status/PTAL type/feature 
    opened by glorv 15
  • Rewrite data file parser; Insert _tidb_rowid when needed; Update checkpoint structure

    Rewrite data file parser; Insert _tidb_rowid when needed; Update checkpoint structure

    What problem does this PR solve?

    Completely fix TOOL-462, by recording the _tidb_rowid on non-PkIsHandle tables to ensure idempotence when importing the same chunk twice.

    What is changed and how it works?

    1. Assign a Row ID to every row of a table before import starts, so importing two chunks from the same table is no longer order-dependent.
    2. To properly assign a Row ID, we need to know exactly how many rows each chunk has. So we replaced splitFuzzyRegion back to an exact version.
    3. Since we need to read the whole file before importing, we want to make this step as fast as possible. Therefore, I replaced the MDDataReader by a ragel-based parser, which is about 8x faster on my machine.
    4. We also need to record the RowIDs into the checkpoints. The checkpoint tables are modified to accommodate this change. Additionally, the checksums are stored as properties of a chunk instead of the whole table.
    5. To ensure the only global property, the allocator, won't interfere with the data output in future updates, I've created a custom allocator which will panic on any unsupported operation.

    Check List

    Tests

    • [x] Unit test
    • [x] Integration test
    • [ ] Manual test (add detailed scripts or steps below)
    • [ ] No code

    Code changes

    • [ ] Has exported function/method change
    • [ ] Has exported variable/fields change
    • [ ] Has interface methods change
    • [x] Has persistent data change

    Side effects

    • [ ] Possible performance regression
    • [x] Increased code complexity
    • [x] Breaking backward compatibility (only if you update Lightning after it saved a checkpoint)

    Related changes

    • [x] Need to cherry-pick to the release branch (2.1)
    • [ ] Need to update the tidb-ansible repository
    • [ ] Need to update the documentation
    • [ ] Need to be included in the release note
    opened by kennytm 15
  • Restore from S3 compatible API?

    Restore from S3 compatible API?

    A feature request for your roadmap:

    Can it be possible to restore directly from a mydumper backup stored in S3? In most cloud deployments this is where user backups will be stored (the S3 API is implemented by many other object stores).


    Value

    Value description

    Support restore to TiDB via S3.

    Value score

    • (TBD) / 5

    Workload estimation

    • (TBD)

    Time

    GanttStart: 2020-07-27 GanttDue: 2020-09-04 GanttProgress: 100%

    priority/P1 difficulty/2-medium feature/accepted 
    opened by morgo 15
  • restore: ensure the importer engine is closed before recycling the table worker

    restore: ensure the importer engine is closed before recycling the table worker

    The close-engine operation is extracted out of Flush() (which now only does ImportEngine). The engine count should now be strictly limited by table-concurrency.

    opened by kennytm 15
  • Add  progress bar and the final result(pass or failed) in command output

    Add progress bar and the final result(pass or failed) in command output

    Feature Request

    Is your feature request related to a problem? Please describe:

    When using lightning command to import data, now user cannot get the progress status nor the final result via command out, Now lightning log and monitor can display those information, but the cli not. For users, the most direct way to see the import progress and cli result also is through cli output, not log file or monitor. Describe the feature you'd like:

    Add progress bar and the final result(pass or failed) in command output. Describe alternatives you've considered:

    User friendly. Teachability, Documentation, Adoption, Optimization:

    feature-request 
    opened by Tammyxia 0
  • use system_time_zone to encode kv if tidb set it

    use system_time_zone to encode kv if tidb set it

    What problem does this PR solve?

    Resolve #562

    What is changed and how it works?

    if tidb time_zone is "SYSTEM", try to use "system_time_zone" to set time_zone for lightning session.

    Check List

    Tests

    • Manual test (add detailed scripts or steps below)

    Related changes

    • Need to cherry-pick to the release branch
    • Need to update the documentation
    • Need to be included in the release note

    Release note

    • Fix the issue that lightning didn't use tidb's time zone to encode timestamp data.
    opened by 3pointer 2
  • local backend oom

    local backend oom

    Bug Report

    Please answer these questions before submitting your issue. Thanks!

    1. What did you do? If possible, provide a recipe for reproducing the error. I used lightning to restore csv files (tpcc 5000 warehouses).

    2. What did you expect to see? Restored successfully.

    3. What did you see instead? Lightning oom. In January, the memory usage of lightning is about 20~30 GB. But now it costs at least 60GB.

    4. Versions of the cluster

      • TiDB-Lightning version (run tidb-lightning -V):

        Release Version: v5.0.0-rc-21-g230eef2
        Git Commit Hash: 230eef2a6e16648a49a4c74910dca693781012c4
        Git Branch: master
        UTC Build Time: 2021-02-04 03:10:38
        Go Version: go version go1.15.6 linux/amd64
        
      • TiKV-Importer version (run tikv-importer -V):

        none
        
      • TiKV version (run tikv-server -V):

        TiKV
        Release Version:   5.0.0-rc.x
        Edition:           Community
        Git Commit Hash:   81c4de98a9a21e4dcf3cce6d7783793b1238044e
        Git Commit Branch: limit-write-batch-ingest
        UTC Build Time:    2021-02-04 11:43:07
        Rust Version:      rustc 1.51.0-nightly (1d0d76f8d 2021-01-24)
        Enable Features:   jemalloc mem-profiling portable sse protobuf-codec test-engines-rocksdb
        Profile:           release
        
      • TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

        Release Version: v4.0.0-beta.2-2067-g415d14b6a
        Edition: Community
        Git Commit Hash: 415d14b6ac65e3c73529d07b4331c2f4917b2701
        Git Branch: master
        UTC Build Time: 2021-01-27 15:27:10
        GoVersion: go1.13
        Race Enabled: false
        TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
        Check Table Before Drop: false
        
      • Other interesting information (system version, hardware config, etc):

    5. Operation logs

      • Please upload tidb-lightning.log for TiDB-Lightning if possible
      • Please upload tikv-importer.log from TiKV-Importer if possible
      • Other interesting logs
    6. Configuration of the cluster and the task

      • tidb-lightning.toml for TiDB-Lightning if possible
      • tikv-importer.toml for TiKV-Importer if possible
      • inventory.ini if deployed by Ansible
    7. Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus for TiDB-Lightning if possible image

    type/bug severity/major 
    opened by gozssky 1
  • tidb-lightning alters the values of timestamp columns

    tidb-lightning alters the values of timestamp columns

    Bug Report

    1. What did you do? If possible, provide a recipe for reproducing the error. Using the tidb-lightning tool to restore a full backup data.
    • In the full backup data, there is a table that has timestamp columns like this:

      CREATE TABLE `users` (
        `id` bigint(20) NOT NULL,
        `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
        `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
        ...
      )
      
    • Set all servers (TiDB, PD, TiKV, Lightning, ...)' timezone to UTC

    • In the script deploy/scripts/start_lightning.sh, set timezone to Asia/Tokyo

      #!/bin/bash
      set -e
      ulimit -n 1000000
      cd "/home/ec2-user/deploy" || exit 1
      mkdir -p status
      
      export RUST_BACKTRACE=1
      
      export TZ=Asia/Tokyo
      
      echo -n 'sync ... '
      stat=$(time sync)
      echo ok
      echo $stat
      
      nohup ./bin/tidb-lightning -config ./conf/tidb-lightning.toml &> log/tidb_lightning_stderr.log &
      
      echo $! > "status/tidb-lightning.pid"
      
    • Start the tidb-lightning tool

      $ cd deploy/
      $ scripts/start_lightning.sh
      
    1. What did you expect to see? The tidb-lightning tool should respect the original data (in the full backup), import the original data as it is.

    2. What did you see instead? The tidb-lightning tool has altered the values of timestamp columns. For examples,

      Original data (from the full backup):

      $ head -2 ./xxxxx.users.000000001.sql
      INSERT INTO `users` VALUES
      (123456789123456789,'2019-09-03 12:31:02','2019-09-03 12:35:18',...)
      

      Imported data:

      > select id, created_at, updated_at from users where id = 123456789123456789;
      +--------------------+---------------------+---------------------+
      | id                 | created_at          | updated_at          |
      +--------------------+---------------------+---------------------+
      | 123456789123456789 | 2019-09-03 03:31:02 | 2019-09-03 03:35:18 |
      +--------------------+---------------------+---------------------+
      

      So the tidb-lightning tool has altered the values of columns created_at and updated_at. The original values have been subtracted by 9 hours.

    3. Versions of the cluster

      • TiDB-Lightning version (run tidb-lightning -V):

        Release Version: v4.0.9
        Git Commit Hash: 56bc32daad19b9dff10104c55300292de959fde3
        Git Branch: heads/refs/tags/v4.0.9
        UTC Build Time: 2020-12-19 04:48:01
        Go Version: go version go1.13 linux/amd64
        
      • TiKV-Importer version (run tikv-importer -V)

        Didn't use
        
      • TiKV version (run tikv-server -V):

        TiKV
        Release Version:   4.0.10
        Edition:           Community
        Git Commit Hash:   2ea4e608509150f8110b16d6e8af39284ca6c93a
        Git Commit Branch: heads/refs/tags/v4.0.10
        UTC Build Time:    2021-01-15 03:16:35
        Rust Version:      rustc 1.42.0-nightly (0de96d37f 2019-12-19)
        Enable Features:   jemalloc mem-profiling portable sse protobuf-codec
        Profile:           dist_release
        
      • TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

        +---------------------+
        | version()           |
        +---------------------+
        | 5.7.25-TiDB-v4.0.10 |
        +---------------------+
        
      • Other interesting information (system version, hardware config, etc):

        > show variables like '%time_zone%';
        +------------------+--------+
        | Variable_name    | Value  |
        +------------------+--------+
        | system_time_zone | UTC    |
        | time_zone        | SYSTEM |
        +------------------+--------+
        
        $ cat /etc/os-release
        NAME="Amazon Linux"
        VERSION="2"
        ID="amzn"
        ID_LIKE="centos rhel fedora"
        VERSION_ID="2"
        PRETTY_NAME="Amazon Linux 2"
        ANSI_COLOR="0;33"
        CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
        HOME_URL="https://amazonlinux.com/"
        
    4. Operation logs

      • Please upload tidb-lightning.log for TiDB-Lightning if possible
      • Please upload tikv-importer.log from TiKV-Importer if possible
      • Other interesting logs
    5. Configuration of the cluster and the task

      • tidb-lightning.toml for TiDB-Lightning if possible
       # lightning Configuration
      
       [lightning]
       file = "/home/tidb/deploy/log/tidb_lightning.log"
       index-concurrency = 2
       io-concurrency = 5
       level = "info"
       max-backups = 14
       max-days = 28
       max-size = 128
       pprof-port = 8289
       table-concurrency = 6
      
       [checkpoint]
       enable = true
       schema = "tidb_lightning_checkpoint"
       driver = "file"
      
       [tikv-importer]
       backend = "local"
       sorted-kv-dir = "/home/tidb/deploy/sorted-kv-dir"
      
       [mydumper]
       data-source-dir = "/home/tidb/deploy/mydumper/scheduled-backup-20210120-044816"
       no-schema = false
       read-block-size = 65536
      
       [tidb]
       build-stats-concurrency = 20
       checksum-table-concurrency = 16
       distsql-scan-concurrency = 100
       host = "TIDB_HOST"
       index-serial-scan-concurrency = 20
       log-level = "error"
       password = "xxxxx"
       port = 4000
       status-port = 10080
       user = "root"
       pd-addr = "PD_HOST:2379"
      
       [post-restore]
       analyze = true
       checksum = true
      
       [cron]
       log-progress = "5m"
       switch-mode = "5m"
      
      • inventory.ini if deployed by Ansible
    6. Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus for TiDB-Lightning if possible

    type/bug severity/major 
    opened by ngocson2vn 1
  • In the TPC-C test, the following error occurred when using local backend lighting

    In the TPC-C test, the following error occurred when using local backend lighting

    Question

    Before asking a question, make sure you have: csv data:110G In the TPC-C test, the following error occurred when using local backend lighting to convert to a CSV file, and then import tidb

    read metadata from /data/tikv-20160/import/4ff45145-3a21-4e78-81da-2bca45104be3_10513_5_350_default.sst "read metadata from /data/tikv-20160/import/4ff45145-3a21-4e78-81da-2bca45104be3_10513_5_350_default.sst: Os { code: 2, kind: NotFound, message: \"No such file or directory\" }")"]

    type/bug question 
    opened by newHE3DBer 1
  • panic in tidb backend in strict sql-mode

    panic in tidb backend in strict sql-mode

    Bug Report

    Please answer these questions before submitting your issue. Thanks!

    1. What did you do? If possible, provide a recipe for reproducing the error. Use lightning tidb backend to import data with config:
    [tidb]
    sql-mode = "STRICT_ALL_TABLES"
    

    panic backtrace:

    goroutine 578 [running]:
    github.com/pingcap/tidb/types.(*Datum).ConvertTo(0xc00a0fd220, 0xc0001dcdc0, 0xc0004a0be8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
      /go/pkg/mod/github.com/pingcap/[email protected]/types/datum.go:843 +0xcff
    github.com/pingcap/tidb/table.CastValue(0x2b48760, 0xc000c6e000, 0x5, 0x0, 0x25442e2, 0xb, 0xc00cb05371, 0x8, 0x8, 0x0, ...)
      /go/pkg/mod/github.com/pingcap/[email protected]/table/column.go:244 +0xf2
    github.com/pingcap/tidb-lightning/lightning/backend.(*tidbEncoder).appendSQL(0xc00810e040, 0xc000c22120, 0xc00a0fd660, 0xc008097770, 0x0, 0x0)
      /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/backend/tidb.go:181 +0x594
    github.com/pingcap/tidb-lightning/lightning/backend.(*tidbEncoder).Encode(0xc00810e040, 0xc019f0a3c0, 0xc00062a480, 0xa, 0x10, 0x1, 0xc00a0ec060, 0xa, 0xb, 0x1f37685, ...)
      /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/backend/tidb.go:251 +0x32d
    github.com/pingcap/tidb-lightning/lightning/restore.(*chunkRestore).encodeLoop(0xc000c46040, 0x2b02ec0, 0xc008097680, 0xc019f0a360, 0xc008063aa0, 0xc019f0a3c0, 0x2adba00, 0xc00810e040, 0xc00a0ec000, 0xc0192fc000, ...)
      /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1902 +0x350
    github.com/pingcap/tidb-lightning/lightning/restore.(*chunkRestore).restore(0xc000c46040, 0x2b02ec0, 0xc008097680, 0xc008063aa0, 0x0, 0xc000170e00, 0xc000bb2180, 0xc0192fc000, 0x0, 0x0)
      /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1976 +0x7a4
    github.com/pingcap/tidb-lightning/lightning/restore.(*TableRestore).restoreEngine.func1(0xc00c3f8fe0, 0xc0192fc000, 0x2b02ec0, 0xc008097680, 0xc008063aa0, 0xc000000000, 0xc000170e00, 0xc000bb2180, 0xc000c46020, 0xc014f65060, ...)
      /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1086 +0x175
    created by github.com/pingcap/tidb-lightning/lightning/restore.(*TableRestore).restoreEngine
      /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1078 +0x64c
    panic: should never happen
    

    Root Cause: The tidb backend FetchRemoteTableModels implementation is not accurate, it only set Flag in The FieldType and ignore other fields. So when run tidb backend with strict sql-mode, the table.CastValue panic because the fieldtype.Tp is 0(undefined).

    type/bug severity/major 
    opened by glorv 0
Releases(v4.0.11)
Owner
PingCAP
The team behind TiDB TiKV, an open source MySQL compatible NewSQL HTAP database
PingCAP
Golang library for managing configuration data from environment variables

envconfig import "github.com/kelseyhightower/envconfig" Documentation See godoc Usage Set some environment variables: export MYAPP_DEBUG=false export

Kelsey Hightower 4.2k Aug 3, 2022
Manage local application configuration files using templates and data from etcd or consul

confd confd is a lightweight configuration management tool focused on: keeping local configuration files up-to-date using data stored in etcd, consul,

Kelsey Hightower 7.8k Aug 8, 2022
YML2FSTAB - Convert from yml data to /etc/fstab configuration

YML2FSTAB - Convert from yml data to /etc/fstab configuration

null 0 Nov 1, 2021
Cfginterpolator is an interpolate library in golang allowing to include data from external sources in your configuration

cfginterpolator cfginterpolator is an interpolate library in golang allowing to include data from external sources in your configuration cfginterpolat

Benoit Bayszczak 0 Dec 14, 2021
shops is a simple command-line tool written in Go that helps you simplify the way you manage configuration across a set of machines.

shops is a simple command-line tool written in Go that helps you simplify the way you manage configuration across a set of machines. shops is your configuration management tool of choice when Chef, Puppet, Ansible are all too complicated and all you really want to do is run a bunch of regular shell against a set of hosts.

James Mills 16 Jul 5, 2021
Golang Configuration tool that support YAML, JSON, TOML, Shell Environment

Configor Golang Configuration tool that support YAML, JSON, TOML, Shell Environment (Supports Go 1.10+) Usage package main import ( "fmt" "github.c

Jinzhu 1.5k Aug 11, 2022
An awesome command-line tool to manage Wireguard configurations.

wg-manage A command line tool to centrally manage Wireguard configuration files - all config options are stored in one YAML file that is then used to

null 31 Feb 1, 2022
Little Go tool to infer an uncrustify config file from an expected format

uncrustify-infer Little Go tool to infer an uncrustify config file from an expected format Install This tool relies on an uncrustify executable, you m

xdrm-brackets 0 Oct 8, 2021
A simple tool that utilizes already existing libraries such as joho/godotenv to add .env-files to global path

Go dotenv A simple tool that utilizes already existing libraries such as joho/godotenv to add .env-files to global path. Created as a practical way to

null 0 Nov 15, 2021
Tmpl - A tool to apply variables from cli, env, JSON/TOML/YAML files to templates

tmpl allows to apply variables from JSON/TOML/YAML files, environment variables or CLI arguments to template files using Golang text/template and functions from the Sprig project.

krako 1 May 30, 2022
formicidate is a small tool for Go application can update the value of environment variables in a .env file with code

formicidae Update .env files in Go with code. What is fomicidae? formicidate is a small tool for Go application. You can update the value of environme

akuma 0 Jan 23, 2022
Efficient moving window for high-speed data processing.

Moving Window Data Structure Copyright (c) 2012. Jake Brukhman. ([email protected]). All rights reserved. See the LICENSE file for BSD-style license. I

Jake Brukhman 31 Jun 30, 2021
A high-performance timeline tracing library for Golang, used by TiDB

Minitrace-Go A high-performance, ergonomic timeline tracing library for Golang. Basic Usage package main import ( "context" "fmt" "strcon

TiKV Project 43 May 5, 2022
Dumpling is a fast, easy-to-use tool written by Go for dumping data from the database(MySQL, TiDB...) to local/cloud(S3, GCP...) in multifarious formats(SQL, CSV...).

?? Dumpling Dumpling is a tool and a Go library for creating SQL dump from a MySQL-compatible database. It is intended to replace mysqldump and mydump

PingCAP 261 Jul 14, 2022
A tool for finding corrupted data rows in TiDB

tidb-bad-rows A tool for finding corrupted data rows in TiDB. It scans the target table and using a divide and conquer paradigm to locate all corrupte

Wenxuan 1 Nov 17, 2021
Data Connector is a Google Sheets Add-on that lets you import (and export) data to/from Google Sheets

Data Connector Data Connector is a Google Sheets Add-on that lets you import (and export) data to/from Google Sheets. Our roadmap: Connect to JSON/XML

Brent Adamson 114 Jul 30, 2022
High-speed, flexible tree-based HTTP router for Go.

httptreemux High-speed, flexible, tree-based HTTP router for Go. This is inspired by Julien Schmidt's httprouter, in that it uses a patricia tree, but

Daniel Imfeld 549 Jul 25, 2022
Package mafsa implements Minimal Acyclic Finite State Automata in Go, essentially a high-speed, memory-efficient, Unicode-friendly set of strings.

MA-FSA for Go Package mafsa implements Minimal Acyclic Finite State Automata (MA-FSA) with Minimal Perfect Hashing (MPH). Basically, it's a set of str

SmartyStreets (Archives) 291 Jul 2, 2022
CryptoPump is a cryptocurrency trading bot that focuses on high speed and flexibility

CryptoPump is a cryptocurrency trading bot that focuses on high speed and flexibility. The algorithms utilize Go Language and WebSockets to react in real-time to market movements based on Bollinger statistical analysis and pre-defined profit margins.

null 55 Jul 13, 2022
A tool to download and install TiDB components

What is TiUP tiup is a tool to download and install TiDB components. Documentati

leoppro 2 Apr 14, 2022
KM(Key Manager) is a tool for decoding TiDB format keys.

Introduction KM(Key Manager) is a tool for decoding TiDB format keys. Usage decode-stream The tool is for decoding the KV stream format generated by s

山岚 0 Dec 28, 2021
Simple, yet powerful Adcell go client to import data feeds into you projects.

adcell-go Simple, yet powerful Adcell go client to import data feeds into you projects. Explore the docs » View Demo · Report Bug · Request Feature Ta

Matthias Bruns 0 Oct 31, 2021
In one particular project, i had to import some key/value data to Prometheus. So i have decided to create my custom-built Node Exporter in Golang.

In one particular project, i had to import some key/value data to Prometheus. So i have decided to create my custom-built Node Exporter in Golang.

Hamid Hosseinzadeh 1 May 19, 2022
fofax is a fofa query tool written in go, positioned as a command-line tool and characterized by simplicity and speed.

fofaX 0x00 Introduction fofax is a fofa query tool written in go, positioned as

null 429 Aug 12, 2022
the pluto is a gateway new time, high performance, high stable, high availability, easy to use

pluto the pluto is a gateway new time, high performance, high stable, high availability, easy to use Acknowledgments thanks nbio for providing low lev

mobus 2 Sep 19, 2021
TiDB is an open source distributed HTAP database compatible with the MySQL protocol

Slack Channel Twitter: @PingCAP Reddit Mailing list: lists.tidb.io For support, please contact PingCAP What is TiDB? TiDB ("Ti" stands for Titanium) i

PingCAP 32k Aug 12, 2022
Simple and Powerful ORM for Go, support mysql,postgres,tidb,sqlite3,mssql,oracle, Moved to https://gitea.com/xorm/xorm

xorm HAS BEEN MOVED TO https://gitea.com/xorm/xorm . THIS REPOSITORY WILL NOT BE UPDATED ANY MORE. 中文 Xorm is a simple and powerful ORM for Go. Featur

null 6.5k Aug 1, 2022
TiDB In Action: based on 4.0

Gitbook Read it: TiDB In Action: based on 4.0 深入介绍了 TiDB 的基本原理和操作,它是基于

TiDB Incubator 681 Jul 26, 2022