Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

Overview

Optimus

test workflow build workflow Coverage Status License Version

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management. It enables data analysts and engineers to transform their data by writing simple SQL queries and YAML configuration while Optimus handles dependency management, scheduling and all other aspects of running transformation jobs at scale.

Key Features

Discover why users choose Optimus as their main data transformation tool.

  • Warehouse management: Optimus allows you to create and manage your data warehouse tables and views through YAML based configuration.
  • Scheduling: Optimus provides an easy way to schedule your SQL transformation through a YAML based configuration.
  • Automatic dependency resolution: Optimus parses your data transformation queries and builds a dependency graphs automaticaly instead of users defining their source and taget dependencies in DAGs.
  • Dry runs: Before SQL query is scheduled for transformation, during deployment query will be dry-run to make sure it passes basic sanity checks.
  • Powerful templating: Optimus provides query compile time templating with variables, loop, if statements, macros, etc for allowing users to write complex tranformation logic.
  • Cross tenant dependency: Optimus is a multi-tenant service, if there are two tenants registered, serviceA and serviceB then service B can write queries eferencing serviceA as source and Optimus will handle this dependency as well.
  • Hooks: Optimus provides hooks for post tranformation logic. e,g. You can sink BigQuery tables to Kafka.
  • Extensibility: Optimus support Python transformation and allows for writing custom plugins.
  • Workflows: Optimus provides industry proven workflows using git based specification management and REST/GRPC based specification management for data warehouse management.

Usage

Optimus has two components, Optimus service that is the core orchestrator installed on server side, and a CLI binary used to interact with this service. You can install Optimus CLI using homebrew on macOS:

$ brew install odpf/taps/optimus
$ optimus --help

optimus v0.0.2-alpha.1

optimus is a scaffolding tool for creating transformation job specs

Usage:
  optimus [command]

Available Commands:
  config      Manage optimus configuration required to deploy specifications
  create      Create a new job/resource
  deploy      Deploy current project to server
  help        Help about any command
  render      convert raw representation of specification to consumables
  replay      re-running jobs in order to update data for older dates/partitions
  serve       Starts optimus service
  version     Print the client version information

Flags:
  -h, --help       help for optimus
      --no-color   disable colored output

Additional help topics:
  optimus validate check if specifications are valid for deployment

Use "optimus [command] --help" for more information about a command.

Documentation

Explore the following resources to get started with Optimus:

  • Guides provides guidance on using Optimus.
  • Concepts describes all important Optimus concepts.
  • Reference contains details about configurations, metrics and other aspects of Optimus.
  • Contribute contains resources for anyone who wants to contribute to Optimus.

Running locally

Optimus requires the following dependencies:

  • Golang (version 1.16 or above)
  • Git

Run the following commands to compile optimus from source

$ git clone [email protected]:odpf/optimus.git
$ cd optimus
$ make build

Use the following command to run

$ ./optimus version

Optimus service can be started with

$ ./optimus serve

serve command has few required configurations that needs to be set for it to start. Configuration can either be stored in .optimus.yaml file or set as environment variable. Read more about it in getting started.

Compatibility

Optimus is currently undergoing heavy development with frequent, breaking API changes. Current major version is zero (v0.x.x) to accommodate rapid development and fast iteration while getting early feedback from users (feedback on APIs are appreciated). The public API could change without a major version update before v1.0.0 release.

Contribute

Development of Optimus happens in the open on GitHub, and we are grateful to the community for contributing bugfixes and improvements. Read below to learn how you can take part in improving Optimus.

Read our contributing guide to learn about our development process, how to propose bugfixes and improvements, and how to build and test your changes to Optimus.

To help you get your feet wet and get you familiar with our contribution process, we have a list of good first issues that contain bugs which have a relatively limited scope. This is a great place to get started.

License

Optimus is Apache 2.0 licensed.

Comments
  • Support for External Sensor for Optimus Jobs

    Support for External Sensor for Optimus Jobs

    Currently Optimus Supports sensors for job dependencies which are within and the outside the project but they are managed by the same Optimus Server. It would be helpful if Optimus supports job sensors which are managed in a different Optimus, as with in an organisation there will be many deployments, checking for data availability may not always guarantee completeness & correctness of data which is guaranteed through Optimus dependencies.

    Expectation : The sensor provides checks for the status of the jobs b/w the input window.

    Configuration :

    dependencies : 
     job : 
     type : external
     project : 
     host : 
     start_time : // start time of the data that the job depends on
     end_time : // end time of the data that the job depends on.
    

    The Optimus Server which accepts the requests based on its window, schedule configuration checks for all the jobs which outputs the data for the given window

    This has the challenge of breaking the dependencies when job name changes.

    enhancement 
    opened by sravankorumilli 23
  • Optimus Commands functions are all package scoped, it is better to group them under specific group command structs

    Optimus Commands functions are all package scoped, it is better to group them under specific group command structs

    The current cmd package contains many implementations in the scope of accepting the user input on how to execute Optimus. There are some issues being observed for the current approach:

    • many functionalities are defined within one package, making the package itself seems bigger with a lot of functions, variables, and constants that are accessible accross the package
    • with many components defined within one package, some of it could conflict with one another, and there were cases during development that two components (in this example, variables) were defined but served the same purpose
    • during development, IDE or text editor's suggestion could clutter with other functionalities

    To address the mentioned issues above, one approach that can be done is by restructuring the package to grouping some similar functionalities, like for job command, it is put in one place like struct and/or package.

    opened by sravankorumilli 12
  • Optimus provide a mechanism to register a project & namespace through cli

    Optimus provide a mechanism to register a project & namespace through cli

    Currently, project & namespace cannot be register through Optimus CLI, as most of the users are CLI users, it would be better if there is a mechanism to register project & namespace.

    Acceptance Criteria

    1. User should be able to register a project without any namespace.
    2. User should be able to register a namespace only.
    3. User should be able to register both project & namespace together.
    4. On deploy if any new project/namespace is modified it should be register/updated.
    5. remove existing config init command

    User experience

    1. optimus project/namespace register
    enhancement 
    opened by sravankorumilli 12
  • feat: add labels on job spec as tags in airflow2 dag

    feat: add labels on job spec as tags in airflow2 dag

    Hi Maintainers,

    I need to organizing my dags/jobs on the Airflow UI, there's already that feature in airflow using tags https://airflow.apache.org/docs/apache-airflow/stable/howto/add-dag-tags.html and there's already labels on job spec

    so, I'll need that labels rendered as tags in optimus' rendered dag codes. I hope my code can be tested and reviewed to give this feature on optimus.

    thank you

    enhancement 
    opened by novanxyz 10
  • Refactor observers across Optimus

    Refactor observers across Optimus

    As part of this card, we would expect observers usage and implementation to be standardized, currently the way observers is used involves some processing on the client side, rather it would be better if through observers all the information is passed in a direct consumable fashion such that clients just log.

    Scope

    1. Events are not standardized - event naming.
    2. Optimus CLI we are extra processing after consuming these events rather we can avoid and just the log the events.
    opened by sravankorumilli 8
  • Optimus Sensor with automated inference.

    Optimus Sensor with automated inference.

    Describe the solution you'd like An Optimus sensor should check given a resource and the start and end dates, will check the corresponding jobs are successful or not. If Optimus Sensor is used in the Optimus setup then it should be automatically inferred.

    Users

    1. Other Optimus Users with in the same organization managing a different
    2. Users using a different system other than optimus for managing the pipelines

    Describe alternatives you've considered Relying on respective storage sensors with automated inference, but it has its own challenges around data completeness and data quality.

    • [x] #398
    • [x] #399
    • [x] #400
    opened by sravankorumilli 7
  • feat: bind config with cobra flags to override the conf

    feat: bind config with cobra flags to override the conf

    • [x] overriding the config via flags provided by cobra
    • [x] mapping the flag with - delimiter instead of . delimiter (--project-name instead of --project.name)
    • [x] change flag names for each command that needs config overriding

    TODO next (can be done in separate PR):

    • [ ] bump salt version to support pflags
    • [ ] rename --project to --project-name on entrypoint.sh for each plugins
    opened by deryrahman 6
  • feat: enhance replay & backup to support multiple namespaces jobs

    feat: enhance replay & backup to support multiple namespaces jobs

    Users should be able to do backup and replay for downstream jobs with a different namespace, as long as authorized to do so.

    • Optimus CLI will accept allowed downstream namespaces that will be replayed/backup (as a flag)
    • "*" means downstream from all namespaces (within the same project) are allowed
    • the default will be empty. means, the allowed downstream are only from same namespace

    Also, adding ignore downstream option in Replay.

    opened by arinda-arif 5
  • Event types like task, sensor, hook failure does not trigger slack alert

    Event types like task, sensor, hook failure does not trigger slack alert

    Describe the bug For any job, if we configure the slack alerts on failure, then for event type such as task, sensor, hook failure the slack alerts are not getting posted successfully.

    To Reproduce Steps to reproduce the behavior:

    1. Configure a job behaviour to notify on slack channel on failure
    2. Run the job and while the job is running, mark any task as failed.

    Expected behavior Failure message on configured slack channel.

    Started happening from tag v0.3.0

    bug 
    opened by SumitAgrawal03071989 4
  • Add Proper Migration Up and Down

    Add Proper Migration Up and Down

    Background

    If we look at the latest commit (referring to this), internally we will have migration mechanism being used. The migration up is executed whenever we run the server. However, if we check even further, no down migration is provided. So, even if there's an issue with the current database schema, it's quite tricky to roll back.

    Proposal

    To address this, this issue proposes to have such functionality being provided in the Optimus. At the high level, more or less, it will be like the following:

    optimus server migrate up
    # it will execute all migrate `up`
    
    optimus server migrate down
    # it will execute all migrate `down`
    
    optimus server migrate {n}
    # n is integer number, with positive means `up` n-time and negative means `down` n-time
    

    The command or the mechanism is flexible, but the point is that the up and down are both defined properly.

    Additional Context

    Since the mentioned commit use golang-migrate/migrate, we can use .Steps(int) method for the n up or down times.

    enhancement techdebt 
    opened by irainia 4
  • Plugin Manager : Support Basic Plugin Manager to install plugins

    Plugin Manager : Support Basic Plugin Manager to install plugins

    Description As part of this, Plugin Manager component will be introduced to install plugins in the configured location declaratively. Plugins will be configured in the config.yaml as per the specification. Refer RFC : Simplify plugins Depends on : #410

    Acceptance Criteria

    • [x] Install plugins through which are available through https, gcs, file locations.
    • [x] User should be able to install plugins in the configured directory.

    Out of Scope

    1. Installation of plugins which needed auth is out of scope for this.

    Tech Details

    enhancement 
    opened by sravankorumilli 4
  • fix: port changes from 0.4

    fix: port changes from 0.4

    • default window version to 1 if not provided
    • fix bug to connect to upstream optimus server for get the status of upstream jobs.
    • change image pull policy to ifnotpresent
    opened by sravankorumilli 1
  • feat: change optimus architecture to be more domain oriented

    feat: change optimus architecture to be more domain oriented

    opened by sbchaos 1
  • Update code to support airflow version > 2.2.0

    Update code to support airflow version > 2.2.0

    Is your feature request related to a problem? Please describe. Optimus currently runs with version 2.1.4 of airflow, it should support the newer versions of airflow.

    Describe the solution you'd like Currently the pod launcher used in base_dag.py in optimus uses pod_launcher which is deprecated in newer versions of airflow, hence the same dag will not work with never versions. We need to update the SuperKubernetesOperator to provide support for newer version of airflow.

    improvement 
    opened by sbchaos 0
  • Move plugin install command in server side

    Move plugin install command in server side

    Description The plugin command, optimus plugin install, which will be used on the server side should belongs to the server package. As part of making the client and server segregation, this command should be moved to the server command package

    Acceptance Criteria

    • [ ] Command optimus plugin install should be moved to the server command package

    Out of Scope N/A

    Tech Details TBD

    opened by deryrahman 0
Releases(v0.5.0-rc.1)
Owner
Open Data Platform
Next-gen collaborative, domain-driven and distributed data platform
Open Data Platform
A tool for secrets management, encryption as a service, and privileged access management

Vault Please note: We take Vault's security and our users' trust very seriously. If you believe you have found a security issue in Vault, please respo

HashiCorp 26.5k Dec 2, 2022
Naabu - a port scanning tool written in Go that allows you to enumerate valid ports for hosts in a fast and reliable manner

Naabu is a port scanning tool written in Go that allows you to enumerate valid ports for hosts in a fast and reliable manner. It is a really simple tool that does fast SYN/CONNECT scans on the host/list of hosts and lists all ports that return a reply.

null 0 Jan 2, 2022
Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

Cossack Labs 1.6k Nov 29, 2022
An easy-to-use XChaCha20-encryption wrapper for io.ReadWriteCloser (even lossy UDP) using ECDH key exchange algorithm, ED25519 signatures and Blake3+Poly1305 checksums/message-authentication for Go (golang). Also a multiplexer.

Quick start Prepare keys (on both sides): [ -f ~/.ssh/id_ed25519 ] && [ -f ~/.ssh/id_ed25519.pub ] || ssh-keygen -t ed25519 scp ~/.ssh/id_ed25519.pub

null 25 Nov 9, 2022
A fast and easy to use URL health checker ⛑️ Keep your links healthy during tough times

AreYouOK? A minimal, fast & easy to use URL health checker Who is AreYouOk made for ? OSS Package Maintainers ??️

Bhupesh Varshney 30 Oct 7, 2022
An easy-to-use SHA-1 hash-cracker written in Golang.

wrench - An easy-to-use SHA-1 hash-cracker. Wrench is an SHA-1 hash-cracker that relies on wordlists for comparing hashes, and cracking them. Before W

null 4 Aug 30, 2021
Easy-to-use Fortnite Launcher for DLL Injection & SSL-Bypass

Easy-to-use Fortnite Launcher for DLL Injection & SSL-Bypass

Ali Hashemi 16 Dec 5, 2022
Product Analytics, Business Intelligence, and Product Management in a fully self-contained box

Engauge Concept It's not pretty but it's functional. Track user interactions in your apps and products in real-time and see the corresponding stats in

Engauge 93 Nov 17, 2021
Sqreen's Application Security Management for the Go language

Sqreen's Application Security Management for Go After performance monitoring (APM), error and log monitoring it’s time to add a security component int

Sqreen 169 Nov 23, 2022
Secret management toolchain

Harp TL;DR. Why harp? Use cases How does it work? Like a Data pipeline but for secret Immutable transformation What can I do? FAQ License Homebrew ins

elastic 133 Nov 9, 2022
Secretsmanager - Secrets management that allows you to store your secrets encrypted in git

I created secretsmanager to store some secrets within a repository. The secrets are encrypted at rest, with readable keys and editable JSON, so you can rename a key or delete it by hand. The cli tool handles the bare minumum of requirements.

Tit Petric 20 May 6, 2022
step-ca is an online certificate authority for secure, automated certificate management.

??️ A private certificate authority (X.509 & SSH) & ACME server for secure automated certificate management, so you can use TLS everywhere & SSO for SSH.

Smallstep 4.9k Dec 5, 2022
Create strong passwords using words that are easy for you to remember

Grasp Create strong passwords using words that are easy for you to remember A way to circumvent password complexity rules and restrictions while only

Luca Sepe 22 Nov 3, 2022
Golang library to make sandboxing AppImages easy

aisap AppImage SAndboxing Project: a Golang library to help sandbox AppImages with bwrap What is it? aisap intends to be a simple way to implement And

Mathew R Gordon 15 Nov 16, 2022
Nuclei is a fast tool for configurable targeted vulnerability scanning based on templates offering massive extensibility and ease of use.

Fast and customisable vulnerability scanner based on simple YAML based DSL. How • Install • For Security Engineers • For Developers • Documentation •

ProjectDiscovery 10.7k Nov 29, 2022
High-Performance Shortlink ( Short URL ) app creator in Golang. For privacy reasons, you may prefer to host your own short URL app and this is the one to use.

About The Project Shortlink App in Golang Multiple Node based Architecture to create and scale at ease Highly performant key-value storage system Cent

null 130 Nov 9, 2022