Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JSON, and custom formats) in streaming fashion and transforms data into desired JSON output based on a schema written in JSON.
Golang Version: 1.14
- Getting Started: a tutorial for writing your first omniparser schema.
- IDR: in-memory data representation of ingested data for omniparser.
- XPath Based Record Filtering and Data Extraction: xpath queries are essential to omniparser schema writing. Learn the concept and tricks in depth.
- All About Transforms: everything about
- Use of
custom_funcis used, specially the all mighty
- CSV Schema in Depth: everything about schemas for CSV input.
- Fixed-Length Schema in Depth: everything about schemas for fixed-length (e.g. TXT) input
- JSON/XML Schema in Depth: everything about schemas for JSON or XML input.
- EDI Schema in Depth: everything about schemas for EDI input.
- Programmability: Advanced techniques for using omniparser (or some of its components) in your code.
- Custom Functions: a complete reference of all built-in custom functions.
- CSV Examples
- Fixed-Length Examples
- JSON Examples
- XML Examples.
- EDI Examples.
- Custom File Format
- Custom Funcs
In the example folders above you will find pairs of input files and their schema files. Then in the
.snapshots sub directory, you'll find their corresponding output files.
Use https://omniparser.herokuapp.com/ (may need to wait for a few seconds for heroku instance to wake up) for trying out schemas and inputs, yours or existing samples, to see how ingestion and transform work.
- No good ETL transform/parser library exists in Golang.
- Even looking into Java and other languages, choices aren't many and all have limitations:
- Many of the parsers/transforms don't support streaming read, loading entire input into memory - not acceptable in some situations.
- Golang 1.14
Recent Major Feature Additions/Changes
Transform.RawRecord()for caller of omniparser to access the raw ingested record.
custom_parsein favor of
custom_parseis still usable for back-compatibility, it is just removed from all public docs and samples).
NonValidatingReaderEDI segment reader.
- Added fixed-length file format support in omniv21 handler.
- Added EDI file format support in omniv21 handler.
- Major restructure/refactoring
- Upgrade omni schema version to
omni.2.1due a number of incompatible schema changes:
- Changed how we handle custom functions: previously we always use strings as in param type as well as result param type. Not anymore, all types are supported for custom function in and out params.
- Changed the way how we package custom functions for extensions: previously we collect custom functions from all extensions and then pass all of them to the extension that is used; This feels weird, now changed to only the custom functions included in a particular extension are used in that extension.
- A number of package renaming.
- Upgrade omni schema version to
- Added CSV file format support in omniv2 handler.
- Introduced IDR node cache for allocation recycling.
- Introduced IDR for in-memory data representation.
- Added trie based high performance
- Command line interface (one-off
transformcmd or long-running http
- JSON stream parser.
- Ability to provide custom functions.
- Ability to provide custom schema handler.
- Ability to customize the built-in omniv2 schema handler's parsing code.
- Ability to provide a new file format support to built-in omniv2 schema handler.