A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.

Overview

grate

A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.

Why?

Grate focuses on speed and stability first, and makes no attempt to parse charts, figures, or other content types that may be present embedded within the input files. It tries to perform as few allocations as possible and errs on the side of caution.

There are certainly still some bugs and edge cases, but we have run it successfully on a set of 400k .xls and .xlsx files to catch many bugs and error conditions. Please file an issue with any feedback and additional problem files.

Usage

Grate provides a simple standard interface for all supported filetypes, allowing access to both named worksheets in spreadsheets and single tables in plaintext formats.

package main

import (
    "fmt"
    "os"
    "strings"

    "github.com/pbnjay/grate"
    _ "github.com/pbnjay/grate/simple" // tsv and csv support
    _ "github.com/pbnjay/grate/xls"
    _ "github.com/pbnjay/grate/xlsx"
)

func main() {
    wb, _ := grate.Open(os.Args[1])  // open the file
    sheets, _ := wb.List()           // list available sheets
    for _, s := range sheets {       // enumerate each sheet name
        sheet, _ := wb.Get(s)        // open the sheet
        for sheet.Next() {           // enumerate each row of data
            row := sheet.Strings()   // get the row's content as []string
            fmt.Println(strings.Join(row, "\t"))
        }
    }
    wb.Close()
}

License

All source code is licensed under the GNU GPLv3.

Issues
  • Date column prints as days from epoch

    Date column prints as days from epoch

    grater xlsm_date_hataly_hatar.xlsm prints

    F_MODKOD        F_TIPUS F_ERTEK F_HATALY        F_HATAR F_TERITO
    11622   E       4.5     43983   44347   T
    F_MODKOD        F_DIJFIZGYAK    F_DIJFIZMOD     F_ERTEK F_HATALY        F_HATAR F
    13101   E       C       496     43983   44317   T
    F_MODKOD        F_TARTAMTOL     F_TARTAMIG      F_KEZD_MULT     F_NYK_MULT      F_EXTRA_MULT    F_BEF_MULT      F_BEF_MULT2     F_HATALY        F_HATALYIG      F_FL    F_MINIMALIS_POOL        F_INIT_KOCK_MULT        F_LAST_KOCK_MULT      F_RESZVISSZA_KTSG
    13103   1       99      50      3       3       0.12    0.08    43983   44347   0       7144    1       1       1744
    F_MODK  F_TAGSZAM       F_EVESDIJ       F_HATALYTOL     F_HATALYIG
    12410   1       1111    43983   44347
    F_HATALY        F_MODKOD        F_BEKOD F_BOSSZEG 2014  F_TERITO        F_SZAZTOL       F_SZAZIG
    43983   12410   E31001  123456  T       100     100
    

    Here, "F_HATALY", "F_HATAR", "F_TARTAMTOL", "F_TARTAMIG", "F_HATALYTOL", "F_HATALYIG" columns are dates.

    xlsm_date_hataly_hatar.xlsm.gz

    opened by tgulacsi 4
  • Store raw value beside formatted

    Store raw value beside formatted

    And use the Raw() interface{} for Scan.

    Needs more test for xls files - I don't have enough at hand.

    opened by tgulacsi 3
  • Fix compile for armv7

    Fix compile for armv7

    Fix compile for armv7 as per https://github.com/golang/go/issues/23086#issuecomment-371017565

    opened by fcwoknhenuxdfiyv 2
  • Fix index out of bounds panic

    Fix index out of bounds panic

    And a few small fixes.

    Cherry-pick if you wish.

    opened by tgulacsi 1
  • xls reads

    xls reads "0" on rows with many integer values

    • Attached is testing.xls test case
    • testing.tsv (filetype not supported by github) was created by copying testing.xls data into testing.tsv file
    • grate/xls/simple_test.go TestBasic was edited to use testing.xls and testing.tsv and to log all mismatches
    
    func TestBasic(t *testing.T) {
    	trueFile, err := os.ReadFile("../testdata/testing.tsv")
    	if err != nil {
    		t.Skip()
    	}
    	lines := strings.Split(string(trueFile), "\n")
    
    	fn := "../testdata/testing.xls"
    	wb, err := Open(fn)
    	if err != nil {
    		t.Fatal(err)
    	}
    
    	sheets, err := wb.List()
    	if err != nil {
    		t.Fatal(err)
    	}
    	for _, s := range sheets {
    		sheet, err := wb.Get(s)
    		if err != nil {
    			t.Fatal(err)
    		}
    
    
    		i := 0
    		for sheet.Next() {
    			row := strings.Join(sheet.Strings(), "\t")
    			if lines[i] != row {
    				t.Logf("line %d mismatch: '%s' <> '%s'", i, row, lines[i])
    			}
    			i++
    		}
    	}
    
    	err = wb.Close()
    	if err != nil {
    		t.Fatal(err)
    	}
    }
    

    ` --- FAIL: TestBasic (0.00s)

    /Users/zeke/Programming/grate/xls/simple_test.go:71: line 2 mismatch: 'b	0	0	0' <> 'b	2	3	4'
    
    /Users/zeke/Programming/grate/xls/simple_test.go:71: line 4 mismatch: 'b	0	0	0' <> 'b	1	2	1'
    
    /Users/zeke/Programming/grate/xls/simple_test.go:71: line 5 mismatch: 'b	0	0	0' <> 'b	4	3	2'
    
    /Users/zeke/Programming/grate/xls/simple_test.go:71: line 6 mismatch: '0	0	0	0' <> '1	1	1   1'`
    

    testing.xls

    opened by zvandehy 0
  • date formatting weekdays

    date formatting weekdays

    Needs some backtracking in makeFormatter, currently "dddd" becomes "Sunday" but then "d" is applied to become "Sun11ay"

    opened by pbnjay 0
Owner
Jeremy Jay
Jeremy Jay
Golang library for reading and writing Microsoft Excel™ (XLSX) files.

Excelize Introduction Excelize is a library written in pure Go providing a set of functions that allow you to write to and read from XLSX / XLSM / XLT

360 Enterprise Security Group, Endpoint Security, inc. 10.1k Dec 2, 2021
Golang bindings for libxlsxwriter for writing XLSX files

goxlsxwriter goxlsxwriter provides Go bindings for the libxlsxwriter C library. Install goxlsxwriter requires the libxslxwriter library to be installe

Frank Terragna 730 May 30, 2021
Go (golang) library for reading and writing XLSX files.

XLSX Introduction xlsx is a library to simplify reading and writing the XML format used by recent version of Microsoft Excel in Go programs. Tutorial

Geoffrey J. Teale 5.2k Nov 26, 2021
Fast and reliable way to work with Microsoft Excel™ [xlsx] files in Golang

Xlsx2Go package main import ( "github.com/plandem/xlsx" "github.com/plandem/xlsx/format/conditional" "github.com/plandem/xlsx/format/conditional/r

Andrey G. 144 Nov 22, 2021
Pure go library for creating and processing Office Word (.docx), Excel (.xlsx) and Powerpoint (.pptx) documents

unioffice is a library for creation of Office Open XML documents (.docx, .xlsx and .pptx). Its goal is to be the most compatible and highest performan

UniDoc 3.1k Dec 6, 2021
A simple excel engine without ui to parse .csv files.

A simple excel engine without ui to parse .csv files.

Akmal Hossain 1 Nov 4, 2021
Fastq demultiplexer for single cell data from MGI sequencer (10x converted library).

fastq_demultiplexer Converts fastq single cell data from MGI (10x converted library) to Illumina compatible format. Installation go install github.com

Rostislav Vorobev 0 Nov 24, 2021
Extraction politique de conformité : xlsx (fichier de suivi) -> xml (format AlgoSec)

go_policyExtractor Extraction politique de conformité : xlsx (fichier de suivi) -> xml (format AlgoSec). Le programme suivant se base sur les intitulé

Nokeni 0 Nov 4, 2021
Fast, realtime regex-extraction, and aggregation into common formats such as histograms, numerical summaries, tables, and more!

rare A file scanner/regex extractor and realtime summarizor. Supports various CLI-based graphing and metric formats (histogram, table, etc). Features

Chris LaPointe 123 Nov 29, 2021
Herbert Fischer 196 Nov 17, 2021
Dumpling is a fast, easy-to-use tool written by Go for dumping data from the database(MySQL, TiDB...) to local/cloud(S3, GCP...) in multifarious formats(SQL, CSV...).

?? Dumpling Dumpling is a tool and a Go library for creating SQL dump from a MySQL-compatible database. It is intended to replace mysqldump and mydump

PingCAP 247 Nov 29, 2021
sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document formats like CSV or Excel.

sq: swiss-army knife for data sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document fo

Neil O'Toole 352 Nov 30, 2021
Command-line tool to load csv and excel (xlsx) files and run sql commands

csv-sql supports loading and saving results as CSV and XLSX files with data processing with SQLite compatible sql commands including joins.

Dhamith Hewamullage 24 Sep 30, 2021
datatable is a Go package to manipulate tabular data, like an excel spreadsheet.

datatable is a Go package to manipulate tabular data, like an excel spreadsheet. datatable is inspired by the pandas python package and the data.frame R structure. Although it's production ready, be aware that we're still working on API improvements

Datasweet 218 Nov 27, 2021
:triangular_ruler:gofmtmd formats go source code block in Markdown. detects fenced code & formats code using gofmt.

gofmtmd gofmtmd formats go source code block in Markdown. detects fenced code & formats code using gofmt. Installation $ go get github.com/po3rin/gofm

po3rin 90 Nov 3, 2021
Formats discord tokens to different formats.

token_formatter Formats discord tokens to different formats. Features Format your current tokens to a new format! Every tool uses a different format f

post 7 Oct 31, 2021
Converts a trace of Datadog to a sequence diagram of PlantUML (Currently, supports only gRPC)

jigsaw Automatically generate a sequence diagram from JSON of Trace in Datadog. ⚠️ Only gRPC calls appear in the sequence diagram. Example w/ response

Yu SERIZAWA 5 Nov 2, 2021
Scalable golang ratelimiter using the sliding window algorithm. Currently supports only Redis.

go-ratelimiter Scalable golang ratelimiter using the sliding window algorithm. Currently supports only Redis. Example usage client := redis.NewClient

null 0 Oct 19, 2021
Query, update and convert data structures from the command line. Comparable to jq/yq but supports JSON, TOML, YAML, XML and CSV with zero runtime dependencies.

dasel Dasel (short for data-selector) allows you to query and modify data structures using selector strings. Comparable to jq / yq, but supports JSON,

Tom Wright 1.7k Dec 1, 2021
Query, update and convert data structures from the command line. Comparable to jq/yq but supports JSON, TOML, YAML, XML and CSV with zero runtime dependencies.

dasel Dasel (short for data-selector) allows you to query and modify data structures using selector strings. Comparable to jq / yq, but supports JSON,

Tom Wright 1.7k Dec 2, 2021
converts text-formats from one to another, it is very useful if you want to re-format a json file to yaml, toml to yaml, csv to yaml, ... etc

re-txt reformates a text file from a structure to another, i.e: convert from json to yaml, toml to json, ... etc Supported Source Formats json yaml hc

Mohammed Al Ashaal 63 Jul 22, 2021
CLI tool that can execute SQL queries on CSV, LTSV, JSON and TBLN. Can output to various formats.

trdsql CLI tool that can execute SQL queries on CSV, LTSV, JSON and TBLN. It is a tool like q, textql and others. The difference from these tools is t

Noboru Saito 664 Nov 24, 2021
A go library to improve readability in terminal apps using tabular data

uitable uitable is a go library for representing data as tables for terminal applications. It provides primitives for sizing and wrapping columns to i

Greg Osuri 620 Nov 17, 2021
Gotabulate - Easily pretty-print your tabular data with Go

Gotabulate - Easily pretty-print tabular data Summary Go-Tabulate - Generic Go Library for easy pretty-printing of tabular data. Installation go get g

Vadim Kravcenko 272 Nov 17, 2021
Create key value sqlite3 database from tabular data, fast.

Turn tabular data into a lookup table using sqlite3. This is a working PROTOTYPE with limitations, e.g. no customizations, the table definition is fixed, etc.

Martin Czygan 5 Oct 22, 2021
Make a sqlite3 database from tabular data, fast.

MAKTA make a database from tabular data Turn tabular data into a lookup table using sqlite3. This is a working PROTOTYPE with limitations, e.g. no cus

Martin Czygan 5 Oct 22, 2021
sops is an editor of encrypted files that supports YAML, JSON, ENV, INI and BINARY formats and encrypts with AWS KMS, GCP KMS, Azure Key Vault, age, and PGP

sops is an editor of encrypted files that supports YAML, JSON, ENV, INI and BINARY formats and encrypts with AWS KMS, GCP KMS, Azure Key Vault, age, and PGP. (demo)

Mozilla 8.7k Nov 29, 2021
Pi-hole data right from your terminal. Live updating view, query history extraction and more!

Pi-CLI Pi-CLI is a command line program used to view data from a Pi-Hole instance directly in your terminal.

Reece Mercer 39 Nov 14, 2021
A block parser tool that allows extraction of various data types on DAS

das-database A block parser tool that allows extraction of various data types on DAS (register, edit, sell, transfer, ...) from CKB Prerequisites Ubun

DAS 2 Dec 6, 2021