datatable is a Go package to manipulate tabular data, like an excel spreadsheet.

Overview

datatable

Circle CI Go Report Card GoDoc GitHub stars GitHub license

datasweet-logo

datatable is a Go package to manipulate tabular data, like an excel spreadsheet. datatable is inspired by the pandas python package and the data.frame R structure. Although it's production ready, be aware that we're still working on API improvements

Installation

go get github.com/datasweet/datatable

Features

  • Create custom Series (ie custom columns). Currently available, serie.Int, serie.String, serie.Time, serie.Float64.
  • Apply expressions
  • Selects (head, tail, subset)
  • Sorting
  • InnerJoin, LeftJoin, RightJoin, OuterJoin, Concats
  • Aggregate
  • Import from CSV
  • Export to map, slice

Creating a DataTable

package main

import (
	"fmt"

	"github.com/datasweet/datatable"
)

func main() {
	dt := datatable.New("test")
	dt.AddColumn("champ", datatable.String, datatable.Values("Malzahar", "Xerath", "Teemo"))
	dt.AddColumn("champion", datatable.String, datatable.Expr("upper(`champ`)"))
	dt.AddColumn("win", datatable.Int, datatable.Values(10, 20, 666))
	dt.AddColumn("loose", datatable.Int, datatable.Values(6, 5, 666))
	dt.AddColumn("winRate", datatable.Float64, datatable.Expr("`win` * 100 / (`win` + `loose`)"))
	dt.AddColumn("winRate %", datatable.String, datatable.Expr(" `winRate` ~ \" %\""))
	dt.AddColumn("sum", datatable.Float64, datatable.Expr("sum(`win`)"))

	fmt.Println(dt)
}

/*
CHAMP       CHAMPION    WIN    LOOSE  WINRATE    WINRATE %   SUM  
Malzahar                MALZAHAR                10              6               62.5                    62.5 %                  696              
Xerath                  XERATH                  20              5               80                      80 %                    696              
Teemo                   TEEMO                   666             666             50                      50 %                    696    
*/

Reading a CSV and aggregate

package main

import (
	"fmt"
	"log"
	"os"
	"time"

	"github.com/datasweet/datatable"
	"github.com/datasweet/datatable/import/csv"
)

func main() {
	dt, err := csv.Import("csv", "phone_data.csv",
		csv.HasHeader(true),
		csv.AcceptDate("02/01/06 15:04"),
		csv.AcceptDate("2006-01"),
	)
	if err != nil {
		log.Fatalf("reading csv: %v", err)
	}

	dt.Print(os.Stdout, datatable.PrintMaxRows(24))

	dt2, err := dt.Aggregate(datatable.AggregateBy{Type: datatable.Count, Field: "index"})
	if err != nil {
		log.Fatalf("aggregate COUNT('index'): %v", err)
	}
	fmt.Println(dt2)

	groups, err := dt.GroupBy(datatable.GroupBy{
		Name: "year",
		Type: datatable.Int,
		Keyer: func(row datatable.Row) (interface{}, bool) {
			if d, ok := row["date"]; ok {
				if tm, ok := d.(time.Time); ok {
					return tm.Year(), true
				}
			}
			return nil, false
		},
	})
	if err != nil {
		log.Fatalf("GROUP BY 'year': %v", err)
	}
	dt3, err := groups.Aggregate(
		datatable.AggregateBy{Type: datatable.Sum, Field: "duration"},
		datatable.AggregateBy{Type: datatable.CountDistinct, Field: "network"},
	)
	if err != nil {
		log.Fatalf("Aggregate SUM('duration'), COUNT_DISTINCT('network') GROUP BY 'year': %v", err)
	}
	fmt.Println(dt3)
}

Creating a custom serie

To create a custom serie you must provide:

  • a caster function, to cast a generic value to your serie value. The signature must be func(i interface{}) T
  • a comparator, to compare your serie value. The signature must be func(a, b T) int

Example with a NullInt

// IntN is an alis to create the custom Serie to manage IntN
func IntN(v ...interface{}) Serie {
	s, _ := New(NullInt{}, asNullInt, compareNullInt)
	if len(v) > 0 {
		s.Append(v...)
	}
	return s
}

type NullInt struct {
	Int   int
	Valid bool
}

// Interface() to render the current struct as a value.
// If not provided, the serie.All() or serie.Get() wills returns the embedded value
// IE: NullInt{}
func (i NullInt) Interface() interface{} {
	if i.Valid {
		return i.Int
	}
	return nil
}

// asNullInt is our caster function
func asNullInt(i interface{}) NullInt {
	var ni NullInt
	if i == nil {
		return ni
	}

	if v, ok := i.(NullInt); ok {
		return v
	}

	if v, err := cast.ToIntE(i); err == nil {
		ni.Int = v
		ni.Valid = true
	}
	return ni
}

// compareNullInt is our comparator function
// used to sort
func compareNullInt(a, b NullInt) int {
	if !b.Valid {
		if !a.Valid {
			return Eq
		}
		return Gt
	}
	if !a.Valid {
		return Lt
  }
  if a.Int == b.Int {
		return Eq
	}
	if a.Int < b.Int {
		return Lt
	}
	return Gt
}

Who are we ?

We are Datasweet, a french startup providing full service (big) data solutions.

Questions ? problems ? suggestions ?

If you find a bug or want to request a feature, please create a GitHub Issue.

Contributors


Cléo Rebert

License

This software is licensed under the Apache License, version 2 ("ALv2"), quoted below.

Copyright 2017-2020 Datasweet 

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.
You might also like...
Owl is a db manager platform,committed to standardizing the data, index in the database and operations to the database, to avoid risks and failures.

Owl is a db manager platform,committed to standardizing the data, index in the database and operations to the database, to avoid risks and failures. capabilities which owl provides include Process approval、sql Audit、sql execute and execute as crontab、data backup and recover .

Mantil-template-form-to-dynamodb - Receive form data and write it to a DynamoDB table
Mantil-template-form-to-dynamodb - Receive form data and write it to a DynamoDB table

This template is an example of serverless integration between Google Forms and DynamoDB

Simple Shamir's Secret Sharing (s4) - A go package giving a easy to use interface for the shamir's secret sharing algorithm

Simple Shamir's Secret Sharing (s4) With Simple Shamir's Secret Sharing (s4) I want to provide you an easy to use interface for this beautiful little

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

What is Miller? Miller is like awk, sed, cut, join, and sort for data formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed. What can M

A simple and light excel file reader to read a standard excel as a table faster | 一个轻量级的Excel数据读取库,用一种更`关系数据库`的方式解析Excel。

Intro | 简介 Expect to create a reader library to read relate-db-like excel easily. Just like read a config. This library can read all xlsx file correct

golang 在线预览word,excel,pdf,MarkDown(Online Preview Word,Excel,PPT,PDF,Image by Golang)
golang 在线预览word,excel,pdf,MarkDown(Online Preview Word,Excel,PPT,PDF,Image by Golang)

Go View File 在线体验地址 http://39.97.98.75:8082/view/upload (不会经常更新,保留最基本的预览功能。服务器配置较低,如果出现链接超时请等待几秒刷新重试,或者换Chrome) 目前已经完成 docker部署 (不用为运行环境烦恼) Wor

A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.

grate A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats. Why? Grate focuses on speed and stability first

A command line tool for mainly exporting logbook records from Google Spreadsheet to PDF file in EASA format
A command line tool for mainly exporting logbook records from Google Spreadsheet to PDF file in EASA format

Logbook CLI This is a command line tool for mainly exporting logbook records from Google Spreadsheet to PDF file in EASA format. It also supports rend

A Wrapper Client for Google Spreadsheet API (Sheets API)

Senmai A Wrapper Client for Google Spreadsheet API (Sheets API) PREPARATION Service Account and Key File Create a service account on Google Cloud Plat

sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document formats like CSV or Excel.

sq: swiss-army knife for data sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document fo

A go library to improve readability in terminal apps using tabular data

uitable uitable is a go library for representing data as tables for terminal applications. It provides primitives for sizing and wrapping columns to i

Gotabulate - Easily pretty-print your tabular data with Go

Gotabulate - Easily pretty-print tabular data Summary Go-Tabulate - Generic Go Library for easy pretty-printing of tabular data. Installation go get g

Create key value sqlite3 database from tabular data, fast.
Create key value sqlite3 database from tabular data, fast.

Turn tabular data into a lookup table using sqlite3. This is a working PROTOTYPE with limitations, e.g. no customizations, the table definition is fixed, etc.

Make a sqlite3 database from tabular data, fast.
Make a sqlite3 database from tabular data, fast.

MAKTA make a database from tabular data Turn tabular data into a lookup table using sqlite3. This is a working PROTOTYPE with limitations, e.g. no cus

Tabular simplifies printing ASCII tables from command line utilities

tabular Tabular simplifies printing ASCII tables from command line utilities without the need to pass large sets of data to it's API. Simply define th

tabitha is a no-frills tabular formatter for the terminal.

tabitha tabitha is a no-frills tabular formatter for the terminal. Features Supports padding output to the longest display text, honoring ANSI colors

Groupie Trackers consists on receiving a given API and manipulate the data contained in it, in order to create a site, displaying the information.

groupie-tracker Objectives Groupie Trackers consists on receiving a given API and manipulate the data contained in it, in order to create a site, disp

Golang package to manipulate time intervals.

timespan timespan is a Go library for interacting with intervals of time, defined as a start time and a duration. Documentation API Installation Insta

A Golang library to manipulate strings according to the word parsing rules of the UNIX Bourne shell.

shellwords A Golang library to manipulate strings according to the word parsing rules of the UNIX Bourne shell. Installation go get github.com/Wing924

Comments
  • Error handling

    Error handling

    Hi!

    I have changed hardcoded errors to either directly return one error from a list of global errors, or by wrapping the err with one error from the list. The goal is to be able to errors.Is(err, datatable.ErrX) on an error to determine programmatically what caused the problem without the need for a human to understand and contextualize it.

    opened by constantoine 1
  • How to add new row into datatable

    How to add new row into datatable

    For example

    dataTableExamForAdd := datatable.New("ExamForAdd") dataTableExamForAdd.AddColumn("ContactId", datatable.String) dataTableExamForAdd.AddColumn("CourseId", datatable.String) dataTableExamForAdd.AddColumn("ClassId", datatable.String) dataTableExamForAdd.AddColumn("ContentId", datatable.String) dataTableExamForAdd.AddColumn("ExamId", datatable.String)

    for i := 0; i < len(Data); i++ { dataRow := dataTableExamForAdd.NewRow() dataRow.Set("ContactId", ExamData[i].ContactId) dataRow.Set("CourseId", ExamData[i].CourseId) dataRow.Set("ClassId", ExamData[i].ClassId) dataRow.Set("ContentId", ExamData[i].ContentId) dataRow.Set("ExamId", ExamData[i].ExamId) }

    opened by atomsbaza 0
  • Need to be able to update master when working with subset

    Need to be able to update master when working with subset

    Hi there,

    Love the datatable concept. One thing that I am currently missing is the ability to update master/superset table while working with data in subset.

    An example is: Master table: timestamp, portfolio_id, portfolio_group, amount

    Subset table:

    • the same columns
    • records filtered using subset := master.Where ... timestamp & portfolio_group

    I do some more complex operations on the subset and record results for it in the subset.

    Now I need to propagate it to the master table.

    So, it would be great to know the row index of the row or that the row subset would be a collection of pointers and would update the master directly when doing subset.Update ...

    Or maybe I am missing something. I am not that proficient in Golang yet and there is very little documentation and even less can be found on the net.

    Thanks for the great tool and hope that you can help me.

    enhancement 
    opened by tomas-kucera 3
Releases(v0.3.2)
Owner
Datasweet
Datasweet
Dolt is a SQL database that you can fork, clone, branch, merge, push and pull just like a git repository.

Dolt is a SQL database that you can fork, clone, branch, merge, push and pull just like a git repository. Connect to Dolt just like any MySQL database to run queries or update the data using SQL commands. Use the command line interface to import CSV files, commit your changes, push them to a remote, or merge your teammate's changes.

DoltHub 13.8k Dec 31, 2022
Efficient cache for gigabytes of data written in Go.

BigCache Fast, concurrent, evicting in-memory cache written to keep big number of entries without impact on performance. BigCache keeps entries on hea

Allegro Tech 6.2k Jan 4, 2023
:handbag: Cache arbitrary data with an expiration time.

cache Cache arbitrary data with an expiration time. Features 0 dependencies About 100 lines of code 100% test coverage Usage // New cache c := cache.N

Eduard Urbach 125 Jan 5, 2023
A simple, fast, embeddable, persistent key/value store written in pure Go. It supports fully serializable transactions and many data structures such as list, set, sorted set.

NutsDB English | 简体中文 NutsDB is a simple, fast, embeddable and persistent key/value store written in pure Go. It supports fully serializable transacti

徐佳军 2.7k Jan 1, 2023
Lightweight RESTful database engine based on stack data structures

piladb [pee-lah-dee-bee]. pila means stack or battery in Spanish. piladb is a lightweight RESTful database engine based on stack data structures. Crea

Fernando Álvarez 200 Nov 27, 2022
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

etcd-io 42.3k Jan 9, 2023
VectorSQL is a free analytics DBMS for IoT & Big Data, compatible with ClickHouse.

NOTICE: This project have moved to fuse-query VectorSQL is a free analytics DBMS for IoT & Big Data, compatible with ClickHouse. Features High Perform

VectorEngine 251 Jan 6, 2023
This is a mongodb data comparison tool.

mongo-compare This is a mongodb data comparison tool. In the mongodb official tools, mongodb officially provides a series of tools such as mongodump,

null 32 Sep 13, 2022
Membin is an in-memory database that can be stored on disk. Data model smiliar to key-value but values store as JSON byte array.

Membin Docs | Contributing | License What is Membin? The Membin database system is in-memory database smiliar to key-value databases, target to effici

Membin 3 Jun 3, 2021
Distributed cache and in-memory key/value data store.

Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

Burak Sezer 2.7k Dec 30, 2022