churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline applications.

Overview

Churro - ETL for Kubernetes

churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline applications.

What is churro?

churro is an application that processes input files and streams by extracting content and loading that content into a database of your choice, all running on Kubernetes. Today, churro supports processing of JSON, XML, XLSX, CSV files along with JSON API streams. End users create churro pipelines to process data, those pipelines can be created using the churro web app or via a churro pipeline custom resource (yaml) directly.

Design

Some key aspects of the churro design:

  • churro is designed from the start to run within a Kubernetes cluster.
  • churro uses a micro-service architecture to scale ETL processing
  • churro has extension points defined to allow for customized processing to be performed per customer requirements.
  • churro is written in golang
  • churro currently supports persisting ingested data into cockroachdb, singlestore, and mysql databases
  • churro implements a kubernetes operator to handle git-ops style provisioning of churro pipeline resources including the pipeline database

For more details on the churro design, checkout out the documentation at the churro github pages.

Docs

Detailed documentation is found at the churro github pages, additional content such as blogs can be found at the churrodata.com web site.

Images

Today, churro container images are found on DockerHub here. These images are muulti-arch manifest images that include builds for both amd64 and arm64 architectures.

churro Cloud

Inquires about churro Cloud can be directed to [email protected]. In the near future, users will be able to provision a churro instance on the cloud of their choice, with billing and management handled by churrodata.com

Starting with churro

People generally start with churro by creating a kubernetes cluster, then deploying churro to your running cluster. Installation documentation is found on the churro github pages.

Contributing

Since churro is open source, you can view the source code and make contributions such as pull requests on our github.com site. We encourage users to log any issues they find on our github issues site.

Support

churro enterprise support and services are provided by churrodata.com.

You might also like...
Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go.

kanzi Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go. modern: state-of-the-art algorithms are impleme

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data
Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data throughout the software development life cycle (SDLC) for engineering teams.

Data syncing in golang for ClickHouse.
Data syncing in golang for ClickHouse.

ClickHouse Data Synchromesh Data syncing in golang for ClickHouse. based on go-zero ARCH A typical data warehouse architecture design of data sync Aut

sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document formats like CSV or Excel.

sq: swiss-army knife for data sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document fo

Machine is a library for creating data workflows.
Machine is a library for creating data workflows.

Machine is a library for creating data workflows. These workflows can be either very concise or quite complex, even allowing for cycles for flows that need retry or self healing mechanisms.

Stream data into Google BigQuery concurrently using InsertAll() or BQ Storage.

bqwriter A Go package to write data into Google BigQuery concurrently with a high throughput. By default the InsertAll() API is used (REST API under t

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Storm, etc. I am also working on another similar pure Go system, https://github.com/chrislusf/gleam , which is more flexible and more performant.
Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

Gonum Installation The core packages of the Gonum suite are written in pure Go with some assembly. Installation is done using go get. go get -u gonum.

Comments
Releases(0.0.2)
Owner
churrodata
churrodata works on churro the Kubernetes based file/API ETL processor.
churrodata
xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL.

xyr [WIP] xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL. Supported Drivers

Mohammed Al Ashaal 56 Jul 14, 2022
Baker is a high performance, composable and extendable data-processing pipeline for the big data era

Baker is a high performance, composable and extendable data-processing pipeline for the big data era. It shines at converting, processing, extracting or storing records (structured data), applying whatever transformation between input and output through easy-to-write filters.

AdRoll 152 Sep 7, 2022
Declarative streaming ETL for mundane tasks, written in Go

Benthos is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform hydration, enrichments, transformations and filters on payloads.

Ashley Jeffs 4.8k Sep 23, 2022
A distributed, fault-tolerant pipeline for observability data

Table of Contents What Is Veneur? Use Case See Also Status Features Vendor And Backend Agnostic Modern Metrics Format (Or Others!) Global Aggregation

Stripe 1.6k Sep 6, 2022
Prometheus Common Data Exporter can parse JSON, XML, yaml or other format data from various sources (such as HTTP response message, local file, TCP response message and UDP response message) into Prometheus metric data.

Prometheus Common Data Exporter Prometheus Common Data Exporter 用于将多种来源(如http响应报文、本地文件、TCP响应报文、UDP响应报文)的Json、xml、yaml或其它格式的数据,解析为Prometheus metric数据。

null 7 May 18, 2022
Dud is a lightweight tool for versioning data alongside source code and building data pipelines.

Dud Website | Install | Getting Started | Source Code Dud is a lightweight tool for versioning data alongside source code and building data pipelines.

Kevin Hanselman 111 Sep 2, 2022
CUE is an open source data constraint language which aims to simplify tasks involving defining and using data.

CUE is an open source data constraint language which aims to simplify tasks involving defining and using data.

null 3k Sep 23, 2022
Simple CRUD application using CockroachDB and Go

Simple CRUD application using CockroachDB and Go

Martín Montes 2 Feb 20, 2022
DEPRECATED: Data collection and processing made easy.

This project is deprecated. Please see this email for more details. Heka Data Acquisition and Processing Made Easy Heka is a tool for collecting and c

Mozilla Services 3.4k Sep 16, 2022
Open source framework for processing, monitoring, and alerting on time series data

Kapacitor Open source framework for processing, monitoring, and alerting on time series data Installation Kapacitor has two binaries: kapacitor – a CL

InfluxData 2.2k Sep 21, 2022