Katib is a Kubernetes-native project for automated machine learning (AutoML).

Overview

logo

Build Status Coverage Status Go Report Card Releases Slack Status

Katib is a Kubernetes-native project for automated machine learning (AutoML). Katib supports Hyperparameter Tuning, Early Stopping and Neural Architecture Search.

Katib is the project which is agnostic to machine learning (ML) frameworks. It can tune hyperparameters of applications written in any language of the users’ choice and natively supports many ML frameworks, such as TensorFlow, Apache MXNet, PyTorch, XGBoost, and others.

Katib can perform training jobs using any Kubernetes Custom Resources with out of the box support for Kubeflow Training Operator, Argo Workflows, Tekton Pipelines and many more.

Katib stands for secretary in Arabic.

Search Algorithms

Katib supports several search algorithms. Follow the Kubeflow documentation to know more about each algorithm and check the Suggestion service guide to implement your custom algorithm.

Hyperparameter Tuning Neural Architecture Search Early Stopping
Random Search ENAS Median Stop
Grid Search DARTS
Bayesian Optimization
TPE
Multivariate TPE
CMA-ES
Sobol's Quasirandom Sequence
HyperBand

To perform above algorithms Katib supports the following frameworks:

Installation

For the various Katib installs check the Kubeflow guide. Follow the next steps to install Katib standalone.

Prerequisites

This is the minimal requirements to install Katib:

  • Kubernetes >= 1.17
  • kubectl >= 1.21

Latest Version

For the latest Katib version run this command:

kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=master"

Release Version

For the specific Katib release (for example v0.11.1) run this command:

kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=v0.11.1"

Make sure that all Katib components are running:

$ kubectl get pods -n kubeflow

NAME                                READY   STATUS      RESTARTS   AGE
katib-cert-generator-rw95w          0/1     Completed   0          35s
katib-controller-566595bdd8-hbxgf   1/1     Running     0          36s
katib-db-manager-57cd769cdb-4g99m   1/1     Running     0          36s
katib-mysql-7894994f88-5d4s5        1/1     Running     0          36s
katib-ui-5767cfccdc-pwg2x           1/1     Running     0          36s

For the Katib Experiments check the complete examples list.

Documentation

Community

We are always growing our community and invite new users and AutoML enthusiasts to contribute to the Katib project. The following links provide information about getting involved in the community:

Contributing

Please feel free to test the system! Developer guide is a good starting point for our developers.

Blog posts

Events

Citation

If you use Katib in a scientific publication, we would appreciate citations to the following paper:

A Scalable and Cloud-Native Hyperparameter Tuning System, George et al., arXiv:2006.02085, 2020.

Bibtex entry:

@misc{george2020katib,
    title={A Scalable and Cloud-Native Hyperparameter Tuning System},
    author={Johnu George and Ce Gao and Richard Liu and Hou Gang Liu and Yuan Tang and Ramdoot Pydipaty and Amit Kumar Saha},
    year={2020},
    eprint={2006.02085},
    archivePrefix={arXiv},
    primaryClass={cs.DC}
}
Comments
  • how to collect the indicator of training results???

    how to collect the indicator of training results???

    /kind bug

    After completion of bayesianoptimization automated training, the corresponding indicator results cannot be collected. Could you please tell me how to collect the indicator of training results. My yaml file is as follows: apiVersion: "kubeflow.org/v1alpha3" kind: Experiment metadata: namespace: kubeflow labels: controller-tools.k8s.io: "1.0" name: bayesianoptimization-example spec: objective: type: maximize goal: 0.99 objectiveMetricName: Validation-accuracy additionalMetricNames: - accuracy algorithm: algorithmName: bayesianoptimization algorithmSettings: - name: "random_state" value: "10" parallelTrialCount: 3 maxTrialCount: 12 maxFailedTrialCount: 3 MetricsCollectorSpec: Collector: Kind: stdOut parameters: - name: --lr parameterType: double feasibleSpace: min: "0.01" max: "0.03" - name: --num-layers parameterType: int feasibleSpace: min: "2" max: "5" - name: --optimizer parameterType: categorical feasibleSpace: list: - sgd - adam - ftrl trialTemplate: goTemplate: rawTemplate: |- apiVersion: batch/v1 kind: Job metadata: name: {{.Trial}} namespace: {{.NameSpace}} spec: template: spec: containers: - name: {{.Trial}} image: docker.io/katib/mxnet-mnist-example command: - "python" - "/mxnet/example/image-classification/train_mnist.py" - "--batch-size=64" {{- with .HyperParameters}} {{- range .}} - "{{.Name}}={{.Value}}" {{- end}} {{- end}} restartPolicy: Never

    What steps did you take and what happened: [A clear and concise description of what the bug is.]

    What did you expect to happen:

    Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

    Environment:

    • Kubeflow version:0.7.0
    • Minikube version:
    • Kubernetes version: (use kubectl version):1.15.5
    • OS (e.g. from /etc/os-release):CentOS Linux release 7.7.1908
    kind/bug 
    opened by cleveryg 99
  • Disable dynamic creation for admission hooks and update dependencies

    Disable dynamic creation for admission hooks and update dependencies

    Fixes: https://github.com/kubeflow/katib/issues/1405.

    This PR introduces new mechanism to get certificate for webhooks. I updated YAMLs for our webhooks. I added initContainer to Katib controller which executes cert-generator.sh script. This script creates CertificateSigningRequest, katib-webhook-cert secret and patches webhooks configurations with appropriate caBundle. Since we have katib-webhook-cert secret in the manifest, cleanup process should delete everything.

    So we don't need to deploy cert-manager for Katib.

    @gaocegege @johnugeorge @yanniszark @kuikuikuizzZ @knkski What do you think about this approach ?

    Also I updated controller-runtime to v0.8.2 and k8s.io deps to v0.20.4. That requires some changes:

    • Change some packages location
    • Change the arguments for client calls (List, Get, etc.)
    • In the newer Kubernetes versions we can't add owner reference for cluster-scoped objects (e.g. PV) with namespace-scoped object (e.g. Suggestion). Thus, I have to disable owner reference for the PV which is created when Experiment has FromVolume resume policy. For that reason, I added PersistentVolumeReclaimPolicy: Delete for the PV and once PVC is garbage collected, PV should also be deleted.
    • I removed PyTorch operator from the dependencies because of this problem.

    I still need to make some tests and create new image for cert generator. It would be great if you can start to review this.

    /cc @gaocegege @johnugeorge

    lgtm size/XXL approved 
    opened by andreyvelich 62
  • [feature] Reconsider the design of Trial Template

    [feature] Reconsider the design of Trial Template

    /kind feature

    Describe the solution you'd like [A clear and concise description of what you want to happen.]

    We need to marshal the TFJob to JSON string then use it to create experiments if we are using K8s client-go. It is not good. And, go template is ugly, too.

    Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

    priority/p0 kind/feature 
    opened by gaocegege 56
  • Switch to AWS CI/CD

    Switch to AWS CI/CD

    Related: https://github.com/kubeflow/katib/issues/1332. I will debug the infra in this PR.

    I also made few changes to improve CI/CD quality.

    /cc @gaocegege @johnugeorge /cc @Jeffwan @PatrickXYS @jlewi @Bobgy

    lgtm size/XXL approved 
    opened by andreyvelich 55
  • Katib v1alpha2 API for CRDs

    Katib v1alpha2 API for CRDs

    @YujiOshima @gaocegege @johnugeorge @alexandraj777 @hougangliu @xyhuang

    This is an initial proposal for the Katib v1alpha2 API. The changes here reflect the discussion in https://github.com/kubeflow/katib/issues/370.

    Comments and suggestions are welcome.

    Please note that the NAS APIs are not included here since the feature is still in early phase.


    This change is Reviewable

    lgtm approved size/L 
    opened by richardsliu 54
  • Studyctl crd

    Studyctl crd

    Add StudyController CRD: studycontroller.kubeflow.org Operator: StudyController

    Update examples. This implementation is polling workers status in go process of StudyController. Though I understand this is not an elegant implementation, this is the least impact to existing codes.

    Next step we should make worker CRD and its controller and support multi-type jobs (k8s, TF-Job..). Assign @gaocegege


    This change is Reviewable

    lgtm size/XXL approved 
    opened by YujiOshima 50
  • Population based training

    Population based training

    What this PR does / why we need it:

    Support the discovery of modulated hyperparameters rather than attempting to find a fixed set over the entire training process. The paper has more details about the technique.

    Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

    This PR provides some initial support for PBT within Katib (#1382).

    Checklist:

    • [ ] Docs included if any changes are user facing
    lgtm size/XXL approved ok-to-test 
    opened by a9p 46
  • Improve Katib README

    Improve Katib README

    Related: #1332. I will debug the infra in this PR.

    • [x] This is the PR to see if we can trigger AWS Presubmit.
    • [x] This is the PR to see if Github UI integrate aws-kf-ci-bot
    size/XS lgtm approved 
    opened by PatrickXYS 44
  • can't set up CRD

    can't set up CRD "Experiment"

    when I deploy katib_v1alpha3 with scripts/v1alpha3/deploy.sh, the katib-controller pod gives the following error: {"level":"info","ts":1578296376.3173876,"logger":"entrypoint","msg":"Config:","experiment-suggestion-name":"default","cert-local-filesystem":false} {"level":"info","ts":1578296376.375878,"logger":"entrypoint","msg":"Registering Components."} {"level":"info","ts":1578296376.3765948,"logger":"entrypoint","msg":"Setting up controller"} {"level":"info","ts":1578296376.3766346,"logger":"experiment-controller","msg":"Using the default suggestion implementation"} {"level":"info","ts":1578296376.3767953,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"experiment-controller","source":"kind source: /, Kind="} {"level":"error","ts":1578296376.3768966,"logger":"kubebuilder.source","msg":"if kind is a CRD, it should be installed before calling Start","kind":{"Group":"kubeflow.org","Kind":"Experiment"},"error":"no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"","stacktrace":"github.com/kubeflow/katib/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubeflow/katib/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start\n\t/go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/source/source.go:89\ngithub.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Watch\n\t/go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.addWatch\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:119\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:107\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.Add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:62\ngithub.com/kubeflow/katib/pkg/controller%2ev1alpha3.AddToManager\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/controller.go:28\nmain.main\n\t/go/src/github.com/kubeflow/katib/cmd/katib-controller/v1alpha3/main.go:90\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"} {"level":"error","ts":1578296376.377135,"logger":"experiment-controller","msg":"Experiment watch failed","error":"no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"","stacktrace":"github.com/kubeflow/katib/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubeflow/katib/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.addWatch\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:121\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:107\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.Add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:62\ngithub.com/kubeflow/katib/pkg/controller%2ev1alpha3.AddToManager\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/controller.go:28\nmain.main\n\t/go/src/github.com/kubeflow/katib/cmd/katib-controller/v1alpha3/main.go:90\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"} {"level":"error","ts":1578296376.3772092,"logger":"experiment-controller","msg":"Trial watch failed","error":"no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"","stacktrace":"github.com/kubeflow/katib/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubeflow/katib/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:108\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.Add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:62\ngithub.com/kubeflow/katib/pkg/controller%2ev1alpha3.AddToManager\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/controller.go:28\nmain.main\n\t/go/src/github.com/kubeflow/katib/cmd/katib-controller/v1alpha3/main.go:90\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"} {"level":"error","ts":1578296376.377267,"logger":"entrypoint","msg":"unable to register controllers to the manager","error":"no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"","stacktrace":"github.com/kubeflow/katib/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubeflow/katib/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/github.com/kubeflow/katib/cmd/katib-controller/v1alpha3/main.go:91\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}

    And the ui pod gives the following error: 2020/01/06 06:56:46 CreateExperiment from YAML failed: no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"

    lifecycle/stale 
    opened by wyong16 41
  • Katib experiments run indefintely without completing a single trial

    Katib experiments run indefintely without completing a single trial

    /kind bug

    Hi, I'm setting a Katib job through the Kale deployment panel - after creating a Kale pipeline. The pipeline builds successfully but the Katib experiments run forever and don't complete a single trial.

    I expect the Katib jobs to run successfully, but to no avail.

    Any way/suggestion to go about this?

    Environment:

    • Kubeflow version (kfctl version):
    • Minikube version (minikube version):
    • Kubernetes version: (use kubectl version):
    • OS (e.g. from /etc/os-release):
    kind/bug 
    opened by Dampolo03 39
  • ERROR:grpc._server:Exception calling application: Method not implemented!

    ERROR:grpc._server:Exception calling application: Method not implemented!

    /kind bug

    Hi, I'm having trouble using katib v1alpha3. First, I installed katib by the followings

    1. git clone https://github.com/kubeflow/katib
    2. sh katib/scripts/v1alpha3/deploy.sh

    And I tried to apply random-example.yaml kubectl apply -f random-example.yaml (example in katib/examples/v1alpha3)

    Results: kubectl get pods -n kubeflow NAME READY STATUS RESTARTS AGE katib-controller-6c6974678d-zsnlc 1/1 Running 1 24m katib-db-558f649dc6-8cd9t 1/1 Running 0 24m katib-manager-5f74bdff84-4d78z 1/1 Running 0 24m katib-ui-6568bd6b44-qbq5k 1/1 Running 0 24m random-example-random-846dc99654-bxb8j 1/1 Running 0 23m

    kubectl get trials -n kubeflow NAME TYPE STATUS AGE random-example-drpkvb4b Running True 23m random-example-k7xv6ktt Running True 23m random-example-w6jlwdp2 Running True 23m

    kubectl get experiment -n kubeflow -oyaml apiVersion: v1 items:

    • apiVersion: kubeflow.org/v1alpha3 kind: Experiment metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"kubeflow.org/v1alpha3","kind":"Experiment","metadata":{"annotations":{},"labels":{"controller-tools.k8s.io":"1.0"},"name":"random-example","namespace":"kubeflow"},"spec":{"algorithm":{"algorithmName":"random"},"maxFailedTrialCount":3,"maxTrialCount":12,"objective":{"additionalMetricNames":["accuracy"],"goal":0.99,"objectiveMetricName":"Validation-accuracy","type":"maximize"},"parallelTrialCount":3,"parameters":[{"feasibleSpace":{"max":"0.03","min":"0.01"},"name":"--lr","parameterType":"double"},{"feasibleSpace":{"max":"5","min":"2"},"name":"--num-layers","parameterType":"int"},{"feasibleSpace":{"list":["sgd","adam","ftrl"]},"name":"--optimizer","parameterType":"categorical"}],"trialTemplate":{"goTemplate":{"rawTemplate":"apiVersion: batch/v1\nkind: Job\nmetadata:\n name: {{.Trial}}\n namespace: {{.NameSpace}}\nspec:\n template:\n spec:\n containers:\n - name: {{.Trial}}\n image: docker.io/kubeflowkatib/mxnet-mnist-example\n command:\n - "python"\n - "/mxnet/example/image-classification/train_mnist.py"\n - "--batch-size=64"\n {{- with .HyperParameters}}\n {{- range .}}\n - "{{.Name}}={{.Value}}"\n {{- end}}\n {{- end}}\n restartPolicy: Never"}}}} creationTimestamp: "2019-12-20T07:58:52Z" finalizers:
      • update-prometheus-metrics generation: 2 labels: controller-tools.k8s.io: "1.0" name: random-example namespace: kubeflow resourceVersion: "11682124" selfLink: /apis/kubeflow.org/v1alpha3/namespaces/kubeflow/experiments/random-example uid: 9005bab0-22fe-11ea-8cf0-0679676001a5 spec: algorithm: algorithmName: random algorithmSettings: null maxFailedTrialCount: 3 maxTrialCount: 12 metricsCollectorSpec: collector: kind: StdOut objective: additionalMetricNames:
        • accuracy goal: 0.99 objectiveMetricName: Validation-accuracy type: maximize parallelTrialCount: 3 parameters:
      • feasibleSpace: max: "0.03" min: "0.01" name: --lr parameterType: double
      • feasibleSpace: max: "5" min: "2" name: --num-layers parameterType: int
      • feasibleSpace: list:
        • sgd
        • adam
        • ftrl name: --optimizer parameterType: categorical trialTemplate: goTemplate: rawTemplate: |- apiVersion: batch/v1 kind: Job metadata: name: {{.Trial}} namespace: {{.NameSpace}} spec: template: spec: containers: - name: {{.Trial}} image: docker.io/kubeflowkatib/mxnet-mnist-example command: - "python" - "/mxnet/example/image-classification/train_mnist.py" - "--batch-size=64" {{- with .HyperParameters}} {{- range .}} - "{{.Name}}={{.Value}}" {{- end}} {{- end}} restartPolicy: Never status: conditions:
      • lastTransitionTime: "2019-12-20T07:58:52Z" lastUpdateTime: "2019-12-20T07:58:52Z" message: Experiment is created reason: ExperimentCreated status: "True" type: Created
      • lastTransitionTime: "2019-12-20T08:00:22Z" lastUpdateTime: "2019-12-20T08:00:22Z" message: Experiment is running reason: ExperimentRunning status: "True" type: Running currentOptimalTrial: observation: metrics: null parameterAssignments: null startTime: "2019-12-20T07:58:52Z" trials: 3 trialsRunning: 3 kind: List metadata: resourceVersion: "" selfLink: ""

    kubectl logs -n kubeflow random-example-random-846dc99654-bxb8j INFO:hyperopt.utils:Failed to load dill, try installing dill via "pip install dill" for enhanced pickling support. INFO:hyperopt.fmin:Failed to load dill, try installing dill via "pip install dill" for enhanced pickling support. ERROR:grpc._server:Exception calling application: Method not implemented! Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/grpc/_server.py", line 434, in _call_behavior response_or_iterator = behavior(argument, context) File "/usr/src/app/github.com/kubeflow/katib/pkg/apis/manager/v1alpha3/python/api_pb2_grpc.py", line 135, in ValidateAlgorithmSettings raise NotImplementedError('Method not implemented!') NotImplementedError: Method not implemented!

    What can I do to fix it? Thank you for your help in solving this problem.

    • Kubernetes version: (use kubectl version): Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5+icp", GitCommit:"903c3b31caddc675ce2d8bddf62aa0f875c2a3bc", GitTreeState:"clean", BuildDate:"2019-05-08T06:16:32Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5+icp", GitCommit:"903c3b31caddc675ce2d8bddf62aa0f875c2a3bc", GitTreeState:"clean", BuildDate:"2019-05-08T06:16:32Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

    • OS (e.g. from /etc/os-release): CentOS Linux release 7.7.1908 (Core)

    kind/bug 
    opened by devxoxo 38
  • Support Kubernetes v1.26

    Support Kubernetes v1.26

    /kind feature

    Describe the solution you'd like [A clear and concise description of what you want to happen.] We need to support Kubernetes v1.26 since that version was released on 2022-12-9.

    https://kubernetes.io/releases/#release-v1-26

    Maybe, we can support that version after the next katib release. This means supporting v1.26 is out of scope in katib v0.15.0.

    Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]


    Love this feature? Give it a πŸ‘ We prioritize the features with the most πŸ‘

    kind/feature 
    opened by tenzen-y 0
  • The `operators` directory is a bit out of date

    The `operators` directory is a bit out of date

    /kind discussion

    Describe the solution you'd like [A clear and concise description of what you want to happen.] The operators directory corresponds to katib v0.12.0, which is a bit out of date.

    Also, it looks like the latest Charmed katib-operator exists at https://github.com/canonical/katib-operators. Those Charmed katib-operators don't seem to sync. This situation is likely to be confusing for users.

    @DomFleischmann @DnPlas @ca-scribner @knkski Would you like to keep maintaining both kubeflow/katib/operators and canonical/katib-operators? Or would you like to remove Charmed katib-operator from this repository (katib repo)?

    /cc @kubeflow/wg-automl-leads

    Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]


    Love this feature? Give it a πŸ‘ We prioritize the features with the most πŸ‘

    kind/discussion 
    opened by tenzen-y 0
  • Remove Chocolate Suggestion Service

    Remove Chocolate Suggestion Service

    Signed-off-by: Yuki Iwai [email protected]

    What this PR does / why we need it: I removed all coded related Chocolate Suggestion Service.

    Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Part-of #2058

    Checklist:

    • [ ] Docs included if any changes are user facing
    approved size/XL do-not-merge/hold 
    opened by tenzen-y 2
  • Add Label to Disable Katib Webhooks

    Add Label to Disable Katib Webhooks

    In this PR: https://github.com/kubeflow/katib/pull/2018#issuecomment-1330713221, I proposed to introduce label for disabling Katib Webhooks (validator, defaulter, mutator). For example: katib.kubeflow.org/webhooks: disabled. Let's discuss if that would be useful for the users with large-scale environment.

    Currently, if user's namespace has katib.kubeflow.org/metrics-collector-injection: enabled label, Katib Mutation Webhook runs for every Pod in that namespace. That might increase latency in the Kubernetes API server. Some users might want to use Katib Experiments and run other pods in their namespaces without Webhook execution.

    What do you think @gaocegege @johnugeorge @tenzen-y @anencore94 @terrytangyuan ?

    /kind discussion


    Love this feature? Give it a πŸ‘ We prioritize the features with the most πŸ‘

    kind/discussion 
    opened by andreyvelich 3
  • kwa(front): Sort conditions table by timestamp

    kwa(front): Sort conditions table by timestamp

    This is a follow-up PR to this and updates the COMMIT file to checkout to the latest KF commit and have the conditions table sorted by timestamp by default.

    size/XS 
    opened by orfeas-k 3
Releases(v0.14.0)
  • v0.14.0(Aug 19, 2022)

    This is the Katib v0.14.0 release.

    New Features

    Core Features

    • Population based training (#1833 by @a9p)
    • Support JSON format logs in file-metrics-collector (#1765 by @tenzen-y)
    • Include MetricsUnavailable condition to Complete in Trial (#1877 by @tenzen-y)
    • Allow running examples on Apple Silicon M1 and fix image build errors for arm64 (#1898 by @tenzen-y)
    • Configurable job name and service name for cert generator (#1889 by @shaowei-su)

    UI Features and Enhancements

    • Add PBT to experiment creation form (#1909 by @a9p)
    • Distinct page for each Trial in the UI (#1783 by @d-gol)

    Bug fixes

    Documentation

    Misc

    • Updating the training operator image in CI (#1910 by @johnugeorge)
    • Upgrade Python and Pytorch versions for some examples (#1906 by @tenzen-y)
    • Linting for K8s YAML files (#1901 by @Rishit-dagli)
    • Change integration test sysytem from KinD Cluster to Minikube Cluster (#1899 by @tenzen-y)
    • Upgrade mysql version to v8.0.29 (#1897 by @tenzen-y)
    • Upgrade tensorflow-aarch64 version to v2.9.1 (#1891 by @tenzen-y)
    • chore: Upgrade Go libraries to resolve some security issues in the katib-controller (#1888 by @tenzen-y)
    • Migrate kubeflow-katib-presubmit to GitHub Actions (#1882 by @tenzen-y)
    • Add semicolon when using command command in Makefile (#1885 by @tenzen-y)
    • Fix HAS_SHELLCHECK and HAS_SETUP_ENVTEST in Makefile (#1884 by @tenzen-y)
    • Remove presubmit tests depending on optional-test-infra (#1871 by @aws-kf-ci-bot)
    • Upgrade the Tensorflow version to address some security issues (#1870 by @tenzen-y)
    • Upgrade the grpc_health_probe version to v0.4.11 to resolve security vulnerability CVE-2022-27191 (#1875 by @tenzen-y)
    • additional metric names should not include objective metric name (#1874 by @henrysecond1)
    • Upgrade the Kubernetes Python client to 22.6.0 (#1869 by @tenzen-y)
    • Upgrade the kubebuilder to v3.2.0 and Kubernetes Go libraries to v1.22.2 (#1861 by @tenzen-y)
    • Update FPGA XGBoost example (#1865 by @eliaskoromilas)
    • Fix kubeflowkatib/mxnet-mnist image (#1866 by @tenzen-y)
    • pins pip and setuptools versions operators to avoid installation issues (#1867 by @DnPlas)
    • Add shellcheck (#1857 by @tenzen-y)
    • Bump kubeflow-katib and kfp version in notebook examples (#1849 by @tenzen-y)
    • Add prometheus scraping and grafana support to charmed katib-controller operator (#1839 by @jardon)
    • Upgrade Black to fix linting (#1842 by @jardon)

    Change Log

    Check the Full Change Log.

    Source code(tar.gz)
    Source code(zip)
  • v0.14.0-rc.0(Jun 30, 2022)

  • v0.13.0(Mar 4, 2022)

    This is the Katib v0.13.0 release.

    Breaking changes:

    1. Namespace label for Metrics collector enabled Katib namespaces is changed to katib.kubeflow.org/metrics-collector-injection=enabled #1740
    2. Current request number field in gRPC API is renamed to current_request_number #1728
    3. training.kubeflow.org prefix is added to the default primary pod labelsjob-role and replica-type of the Training Operators #1813

    New Features

    Algorithms and Components

    • Implement validation for Early Stopping (#1709 by @tenzen-y)
    • Change namespace label for Metrics Collector injection (#1740 by @andreyvelich)
    • Modify gRPC API with Current Request Number (#1728 by @andreyvelich)
    • Allow to remove each resource in Katib config (#1729 by @andreyvelich)
    • Support leader election for Katib Controller (#1713 by @tenzen-y)
    • Change default Metrics Collect format (#1707 by @anencore94)
    • Bump Python version to 3.9 (#1731 by @tenzen-y)
    • Update Go version to 1.17 (#1683 by @andreyvelich)
    • Create Python script to run e2e Argo Workflow (#1674 by @andreyvelich)
    • Reimplement Katib Cert Generator in Go (#1662 by @tenzen-y)
    • SDK: change list apis to return objects as default (#1630 by @anencore94)

    UI Features

    • Enhance Katib UI feasible space (#1721 by @seong7)
    • Handle missing TrialTemplates in Katib UI (#1652 by @kimwnasptd)
    • Add Prettier devDependency in Katib UI (#1629 by @seong7)

    Documentation

    • Fix a link for GRPC API documentation (#1786 by @tenzen-y)
    • Add my presentations that include Katib (#1753 by @terrytangyuan)
    • Add Akuity to list of adopters (#1749 by @terrytangyuan)
    • Change Argo -> Argo Workflows (#1741 by @terrytangyuan)
    • Update Algorithm Service Doc for the new CI script (#1724 by @andreyvelich)
    • Update link to Training Operator (#1699 by @terrytangyuan)
    • Refactor examples folder structure (#1691 by @andreyvelich)
    • Fix README in examples directory (#1687 by @tenzen-y)
    • Add Kubeflow MXJob example (#1688 by @andreyvelich)
    • Update FPGA examples (#1685 by @eliaskoromilas)
    • Refactor README (#1667 by @andreyvelich)
    • Change the minimal Kustomize version in the developer guide (#1675 by @tenzen-y)
    • Add Katib release process guide (#1641 by @andreyvelich)

    Bug Fixes

    • Remove unrecognized keys from metadata.yaml in Charmed operators (#1759 by @DnPlas)
    • Fix the default Metrics Collector regex (#1755 by @andreyvelich)
    • Fix Status Handling in Charmed Operators (#1743 by @DomFleischmann)
    • Fix bug on list type HP in Katib UI (#1704 by @seong7)
    • Fix Range for Int and Double values in Grid search (#1732 by @andreyvelich)
    • Check if parameter references exist in Experiment parameters (#1726 by @henrysecond1)
    • Fix same set for HyperParameters in Bayesian Optimization algorithm (#1701 by @fabianvdW)
    • Close MySQL statement and rows resources when SQL exec ends (#1720 by @chenwenjun-github)
    • Fix Cluster Role of Katib Controller to access image pull secrets (#1725 by @henrysecond1)
    • Emit events when fails to reconcile all Trials (#1706 by @henrysecond1)
    • Missing metrics port annotation (#1715 by @alexeykaplin)
    • Fix absolute value in Katib UI (#1676 by @anencore94)
    • Add missing omitempty parameter to APIs (#1645 by @andreyvelich)
    • Reconcile semantics for Suggestion Algorithms (#1633 by @johnugeorge)
    • Fix default label for Training Operators (#1813 by @andreyvelich)
    • Update supported Python version for Katib SDK (#1798 by @tenzen-y)

    Misc

    • Use release tags for Trial images (#1757 by @andreyvelich)
    • Upgrade cert-manager API from v1alpha2 to v1 (#1752 by @haoxins)
    • Add Workflow to Publish Katib Images (#1746 by @andreyvelich)
    • Update Charmed Katib Operators + CI to 0.12 (#1717 by @knkski)
    • Updating Katib CI to use Training Operator (#1710 by @midhun1998)
    • Update OWNERS for Charmed operators (#1718 by @ca-scribner)
    • Implement some unit tests for the Katib Config package (#1690 by @tenzen-y)
    • Add GitHub Actions for Python unit tests (#1677 by @andreyvelich)
    • Add OWNERS file for the Katib new UI (#1681 by @kimwnasptd)
    • Add envtest to check reconcileRBAC (#1678 by @tenzen-y)
    • Use golangci-lint as linter for Go (#1671 by @tenzen-y)
    Source code(tar.gz)
    Source code(zip)
  • v0.12.0(Oct 6, 2021)

    This is the Katib v0.12.0 release.

    The major advantages:

    • Optuna Suggestion service with the new algorithms, big thanks to @g-votte and @c-bata.
    • Sobol's Quasirandom Sequence algorithm and IPOP-CMA-ES or BIPOP-CMA-ES restart strategies, big thanks to @c-bata.
    • Katib can perform Argo Workflows, big thanks to @andreyvelich.

    New Features

    Algorithms and Components

    • Add Optuna based suggestion service (#1613 by @g-votte)
    • Support Sobol's Quasirandom Sequence using Goptuna. (#1523 by @c-bata)
    • Bump the Goptuna version up to v0.8.0 with IPOP-CMA-ES and BIPOP-CMA-ES support. (#1519 by @c-bata)
    • Validate possible operations for Grid suggestion (#1205 by @andreyvelich)
    • Validate for Bayesian Optimization algorithm settings (#1600 by @anencore94)
    • Add Support for Argo Workflows (#1605 by @andreyvelich)
    • Add Support for XGBoost Operator with LightGBM example (#1603 by @andreyvelich)
    • Allow empty resources for CPU and Memory in Katib config (#1564 by @andreyvelich)
    • Add kustomization overlay: katib-openshift (#1513 by @maanur)
    • Switch to SDI in Katib Charm (#1555 by @knkski)

    UI Features

    • Add Multivariate TPE to Katib UI (#1625 by @andreyvelich)
    • Update Katib UI with Optuna Algorithm Settings (#1626 by @andreyvelich)
    • Change the default image for the new Katib UI (#1608 by @andreyvelich)

    Documentation

    • Add Katib 2021 ROADMAP (#1524 by @andreyvelich)
    • Add AutoML and Training WG Summit July 2021 (#1615 by @andreyvelich)
    • Add the new Katib presentations 2021 (#1539 by @andreyvelich)
    • Add Doc checklist to PR template (#1568 by @andreyvelich)
    • Fix typo in operators/README (#1557 by @evilnick)
    • Adds docs on how to use Katib Charm within KF (#1556 by @RFMVasconcelos)
    • Fix a link to Kustomize manifest for new Katib UI (#1521 by @c-bata)

    Bug Fixes

    • Fix UI for handling missing params (#1657 by @kimwnasptd)
    • Reconcile semantics for Suggestion Algorithms (#1644 by @johnugeorge)
    • Fix Metrics Collector error in case of non-existing Process (#1614 by @andreyvelich)
    • Fix mysql version in docker image (#1594 by @munagekar)
    • Fix grep in Tekton Experiment Doc (#1578 by @andreyvelich)
    • Error messages corrected (#1522 by @himanshu007-creator)
    • Install charmcraft 1.0.0 (#1593 by @DomFleischmann)

    Misc

    • Modify XGBoostJob example for the new Controller (#1623 by @andreyvelich)
    • Modify Labels for controller resources (#1621 by @andreyvelich)
    • Modify Labels for Katib Components (#1611 by @andreyvelich)
    • Upgrade CRDs to apiextensions.k8s.io/v1 (#1610 by @andreyvelich)
    • Update Katib SDK with OpenAPI generator (#1572 by @andreyvelich)
    • Disable default PV for Experiment with resume from volume (#1552 by @andreyvelich)
    • Remove PV from MySQL component (#1527 by @andreyvelich)
    • feat: add naming regex check on validating webhook (#1541 by @anencore94)

    Change Log

    Check the Full Change Log.

    Source code(tar.gz)
    Source code(zip)
  • v0.11.1(Jun 11, 2021)

    This is the Katib v0.11.1 release.

    Bug fixes

    • Fix Katib manifest for Kubeflow 1.3 (https://github.com/kubeflow/katib/pull/1503 by @yanniszark)
    • Fix Katib release script (https://github.com/kubeflow/katib/pull/1510 by @andreyvelich)

    Enhancements

    • Remove Application CR (https://github.com/kubeflow/katib/pull/1509 by @yanniszark)
    • Modify Katib manifest to support newer Kustomize version (https://github.com/kubeflow/katib/pull/1515 by @DavidSpek and @andreyvelich)

    Check the Full Change Log.

    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Mar 22, 2021)

    This is the Katib v0.11.0 release. The major advantages:

    • Katib is now supporting Kubernetes >= 1.18
    • Possibility to deploy a new Katib UI, big thanks to @kimwnasptd!
    • Juju operator support, big thanks to @DomFleischmann, @knkski and @RFMVasconcelos!

    New Features

    Core Features

    • Disable dynamic Webhook creation (https://github.com/kubeflow/katib/pull/1450 by @andreyvelich and @tenzen-y)
    • Add the waitAllProcesses flag to the Katib config (https://github.com/kubeflow/katib/pull/1394 by @robbertvdg)
    • Migrate Katib to Go modules (https://github.com/kubeflow/katib/pull/1438 by @andreyvelich)
    • Update Katib SDK with the get_success_trial_details API (https://github.com/kubeflow/katib/pull/1442 by @Adarsh2910)
    • Add release process script (https://github.com/kubeflow/katib/pull/1473 by @andreyvelich)
    • Refactor the Katib installation using Kustomize (https://github.com/kubeflow/katib/pull/1464 by @andreyvelich)

    UI Features and Enhancements

    • First step for the Katib new UI implementation (https://github.com/kubeflow/katib/pull/1427 by @kimwnasptd)
    • Add missing fields to the Katib new UI (https://github.com/kubeflow/katib/pull/1463 by @kimwnasptd)
    • Add instructions to install the new Katib UI (https://github.com/kubeflow/katib/pull/1476 by @kimwnasptd)

    Katib Juju operator

    • Add Juju operator support for Katib (https://github.com/kubeflow/katib/pull/1403 by @knkski and @RFMVasconcelos)
    • Add GitHub Actions for the Juju operator (https://github.com/kubeflow/katib/pull/1407 by @knkski)
    • Add install docs for the Juju operator (https://github.com/kubeflow/katib/pull/1411 by @RFMVasconcelos)
    • Modify ClusterRoles for the Juju operator (https://github.com/kubeflow/katib/pull/1426 by @DomFleischmann)
    • Update the Juju operator with the new Katib Webhooks (https://github.com/kubeflow/katib/pull/1465 by @knkski)

    Bug fixes

    • Fix compare step for Early Stopping (https://github.com/kubeflow/katib/pull/1386 by @andreyvelich)
    • Fix Early Stopping in the Goptuna Suggestion (https://github.com/kubeflow/katib/pull/1404 by @andreyvelich)
    • Fix SDK examples to work with the Katib 0.10 (https://github.com/kubeflow/katib/pull/1402 by @andreyvelich)
    • Fix links in the TFEvent Metrics Collector (https://github.com/kubeflow/katib/pull/1417 by @zuston)
    • Fix the gRPC build script (https://github.com/kubeflow/katib/pull/1492 by @andreyvelich)

    Documentation

    • Modify docs for the Katib 0.10 (https://github.com/kubeflow/katib/pull/1392 by @andreyvelich)
    • Add Katib presentation list (https://github.com/kubeflow/katib/pull/1446 by @andreyvelich)
    • Add Canonical to the Katib Adopters (https://github.com/kubeflow/katib/pull/1401 by @RFMVasconcelos)
    • Update developer guide with the Katib controller flags (https://github.com/kubeflow/katib/pull/1449 by @annajung)
    • Add Fuzhi to the Katib Adopters (https://github.com/kubeflow/katib/pull/1451 by @Planck0591)
    • Fix Katib broken links to the Kubeflow guides (https://github.com/kubeflow/katib/pull/1477 by @theofpa)
    • Add the Katib Webhook docs (https://github.com/kubeflow/katib/pull/1486 by @andreyvelich)

    Misc

    • Add recreate strategy for the MySQL deployment (https://github.com/kubeflow/katib/pull/1393 by @andreyvelich)
    • Modify worker image for the Katib AWS CI/CD (https://github.com/kubeflow/katib/pull/1423 by @PatrickXYS)
    • Add the SVG logo for Katib (https://github.com/kubeflow/katib/pull/1414 by @knkski)
    • Verify empty Objective in the Experiment defaults (https://github.com/kubeflow/katib/pull/1445 by @andreyvelich)
    • Move the Katib manifests upstream (https://github.com/kubeflow/katib/pull/1432 by @yanniszark)
    • Build the Trial images in the Katib CI (https://github.com/kubeflow/katib/pull/1457 by @andreyvelich)
    • Add script to update the boilerplates (https://github.com/kubeflow/katib/pull/1491 by @andreyvelich)

    Change Log

    Check the Full Change Log.

    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Nov 7, 2020)

    This is the Katib 0.10 release for the Kubeflow 1.2. The new Katib v1beta1 API version has been released.

    New Features

    Core Features

    • The new Trial template design (https://github.com/kubeflow/katib/issues/1208)
    • Support custom Kubernetes CRD in the Trial template (https://github.com/kubeflow/katib/issues/1214)
      • Add example for the Tekton Pipeline (https://github.com/kubeflow/katib/pull/1339)
      • Add example for the Kubeflow MPIJob (https://github.com/kubeflow/katib/pull/1342)
    • Support early stopping with the Median Stopping Rule (https://github.com/kubeflow/katib/pull/1344)
    • Resume Experiment from the volume (https://github.com/kubeflow/katib/pull/1275)
      • Support volume settings in the Katib config (https://github.com/kubeflow/katib/pull/1291)
    • Extract the Experiment metrics in multiple ways (https://github.com/kubeflow/katib/pull/1140)
    • Update the Python SDK for the v1beta1 version (https://github.com/kubeflow/katib/pull/1252)

    UI Features and Enhancements

    • Show the Trial parameters on the submit Experiment page (https://github.com/kubeflow/katib/pull/1224)
    • Enable to set the Trial template YAML from the submit Experiment page (https://github.com/kubeflow/katib/pull/1363)
    • Optimise the Katib UI image (https://github.com/kubeflow/katib/pull/1232)
    • Enable sorting in the Trial list table (https://github.com/kubeflow/katib/pull/1251)
    • Add pages to the Trial list table (https://github.com/kubeflow/katib/pull/1262)
    • Use the V4 version for the Material UI (https://github.com/kubeflow/katib/pull/1254)
    • Automatically delete an empty ConfigMap with Trial templates (https://github.com/kubeflow/katib/pull/1260)
    • Create a ConfigMap with Trial templates (https://github.com/kubeflow/katib/pull/1265)
    • Support metrics strategies on the submit Experiment page (https://github.com/kubeflow/katib/pull/1364)
    • Add the resume policy to the submit Experiment page (https://github.com/kubeflow/katib/pull/1362)
    • Enable to create an early stopping Experiment from the submit Experiment page (https://github.com/kubeflow/katib/pull/1373)

    Bug fixes

    • Check the Trials count before deleting it (https://github.com/kubeflow/katib/pull/1223)
    • Check that Trials are deleted (https://github.com/kubeflow/katib/pull/1288)
    • Fix the out of range error in the Hyperopt suggestion (https://github.com/kubeflow/katib/pull/1315)
    • Fix the pod ownership to inject the metrics collector (https://github.com/kubeflow/katib/pull/1303)

    Misc

    • Switch the test infra to the AWS (https://github.com/kubeflow/katib/pull/1356)
    • Use the docker.io/kubeflowkatib registry to release images (https://github.com/kubeflow/katib/pull/1372)

    Change Log

    See the Full Change Log.

    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Jun 16, 2020)

  • v0.6.0-rc.0(Jun 28, 2019)

  • v0.1.2-alpha(Jun 5, 2018)

    Full Changelog

    Closed issues:

    • [request] Invite libbyandhelen as reviewer for algorithm support #82
    • cli failed to connect #80
    • CreateStudy RPC error: Objective_Value_Name is required #73
    • [cli] Use cobra to refactor the cli #54
    • Reduce time it takes to build all images #50
    • [release] Ksonnet the katib #32

    Merged pull requests:

    Source code(tar.gz)
    Source code(zip)
    katib-cli-darwin-amd64.darwin(13.65 MB)
    katib-cli-linux-amd64(13.66 MB)
  • v0.1.1-alpha(Apr 26, 2018)

    Full Changelog

    Closed issues:

    • [upstream] Update name in kubernetes/test-infra #63
    • [go] Update the package name, again #62
    • [test] Fix broken unit test cases #58
    • Provide a cli binary for macOS / darwin #57
    • Error running katib on latest master (04/13) #44
    • Upload existing models to modelDB interface #43
    • [release] Add cli to v0.1.0-alpha #31
    • [discussion] Find a new way to install CLI #26
    • [maintainance] Setup the repository #8
    • Existing approaches and design for hyperparameter-tuning #2

    Merged pull requests:

    Source code(tar.gz)
    Source code(zip)
    katib-cli-darwin-amd64(13.57 MB)
    katib-cli-linux-amd64(13.59 MB)
  • v0.1.0-alpha(Apr 10, 2018)

    Closed issues:

    • [suggestion] Move the logic about random service to random package #18
    • [build-release] Reuse the vendor during the image building process #14
    • [go] Rename the package from mlkube/katib to this repo #7
    • [go] Establish vendor dependencies for go #5
    • Rename to hyperparameter-tuning ? #1

    Merged pull requests:

    Source code(tar.gz)
    Source code(zip)
    katib-cli-darwin-amd64(11.57 MB)
    katib-cli-linux-amd64(10.22 MB)
Owner
Kubeflow
Kubeflow is an open, community driven project to make it easy to deploy and manage an ML stack on Kubernetes
Kubeflow
On-line Machine Learning in Go (and so much more)

goml Golang Machine Learning, On The Wire goml is a machine learning library written entirely in Golang which lets the average developer include machi

Conner DiPaolo 1.4k Jan 5, 2023
Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Gorgonia 4.8k Dec 30, 2022
Machine Learning libraries for Go Lang - Linear regression, Logistic regression, etc.

package ml - Machine Learning Libraries ###import "github.com/alonsovidales/go_ml" Package ml provides some implementations of usefull machine learnin

Alonso Vidales 196 Nov 10, 2022
Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Gorgonia 4.8k Dec 27, 2022
Prophecis is a one-stop machine learning platform developed by WeBank

Prophecis is a one-stop machine learning platform developed by WeBank. It integrates multiple open-source machine learning frameworks, has the multi tenant management capability of machine learning compute cluster, and provides full stack container deployment and management services for production environment.

WeBankFinTech 392 Dec 28, 2022
Go Machine Learning Benchmarks

Benchmarks of machine learning inference for Go

Nikolay Dubina 25 Dec 30, 2022
Deploy, manage, and scale machine learning models in production

Deploy, manage, and scale machine learning models in production. Cortex is a cloud native model serving platform for machine learning engineering teams.

Cortex Labs 7.9k Dec 30, 2022
A High-level Machine Learning Library for Go

Overview Goro is a high-level machine learning library for Go built on Gorgonia. It aims to have the same feel as Keras. Usage import ( . "github.

AUNUM 351 Nov 20, 2022
Standard machine learning models

Cog: Standard machine learning models Define your models in a standard format, store them in a central place, run them anywhere. Standard interface fo

Replicate 3.5k Jan 9, 2023
PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage.

δΈ­ζ–‡ | English PaddleDTX PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage. It solves the d

null 82 Dec 14, 2022
Self-contained Machine Learning and Natural Language Processing library in Go

Self-contained Machine Learning and Natural Language Processing library in Go

NLP Odyssey 1.3k Jan 8, 2023
A Kubernetes Native Batch System (Project under CNCF)

Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workloa

Volcano 2.8k Jan 9, 2023
Reinforcement Learning in Go

Overview Gold is a reinforcement learning library for Go. It provides a set of agents that can be used to solve challenges in various environments. Th

AUNUM 306 Dec 11, 2022
Spice.ai is an open source, portable runtime for training and using deep learning on time series data.

Spice.ai Spice.ai is an open source, portable runtime for training and using deep learning on time series data. ⚠️ DEVELOPER PREVIEW ONLY Spice.ai is

Spice.ai 774 Dec 15, 2022
FlyML perfomant real time mashine learning libraryes in Go

FlyML perfomant real time mashine learning libraryes in Go simple & perfomant logistic regression (~100 LoC) Status: WIP! Validated on mushrooms datas

Vadim Kulibaba 1 May 30, 2022
Go (Golang) encrypted deep learning library; Fully homomorphic encryption over neural network graphs

DC DarkLantern A lantern is a portable case that protects light, A dark lantern is one who's light can be hidden at will. DC DarkLantern is a golang i

Raven 2 Oct 31, 2022
A tool for building identical machine images for multiple platforms from a single source configuration

Packer Packer is a tool for building identical machine images for multiple platforms from a single source configuration. Packer is lightweight, runs o

null 2 Oct 3, 2021
A native Go clean room implementation of the Porter Stemming algorithm.

Go Porter Stemmer A native Go clean room implementation of the Porter Stemming Algorithm. This algorithm is of interest to people doing Machine Learni

Charles Iliya Krempeaux 183 Jan 3, 2023
A Hackathon project created by Alpha Interface team for Agri-D Food Hack

Alpha Interface A Hackathon project created by Alpha Interface team for Agri-D Food Hack Installation Downloading Wasp and wasp-cli https://wiki.iota.

Jirawat Boonkumnerd 3 Oct 16, 2022