Weaviate is a cloud-native, modular, real-time vector search engine

Overview

Weaviate Weaviate logo

Build Status Go Report Card Coverage Status Slack Newsletter

Demo of Weaviate

Weaviate GraphQL demo on news article dataset containing: Transformers module, GraphQL usage, semantic search, _additional{} features, Q&A, and Aggregate{} function. You can the demo on this dataset in the GUI here: semantic search, Q&A, Aggregate.

Description

Weaviate is a cloud-native, real-time vector search engine (aka neural search engine or deep search engine). There are modules for specific use cases such as semantic search, plugins to integrate Weaviate in any application of your choice, and a console to visualize your data.

GraphQL - RESTful - vector search engine - vector database - neural search engine - semantic search - HNSW - deep search - machine learning - kNN

Features

Weaviate makes it easy to use state-of-the-art AI models while giving you the scalability, ease of use, safety and cost-effectiveness of a purpose-built vector database. Most notably:

  • Fast queries
    Weaviate typically performs a 10-NN neighbor search out of millions of objects in considerably less than 100ms.

  • Any media type with Weaviate Modules
    Use State-of-the-Art AI model inference (e.g. Transformers) for Text, Images, etc. at search and query time to let Weaviate manage the process of vectorizing your data for your - or import your own vectors.

  • Combine vector and scalar search
    Weaviate allows for efficient combined vector and scalar searches, e.g “articles related to the COVID 19 pandemic published within the past 7 days”. Weaviate stores both your objects and the vectors and make sure the retrieval of both is always efficient. There is no need for a third party object storage.

  • Real-time and persistent
    Weaviate let’s you search through your data even if it’s currently being imported or updated. In addition, every write is written to a Write-Ahead-Log (WAL) for immediately persisted writes - even when a crash occurs.

  • Horizontal Scalability
    Scale Weaviate for your exact needs, e.g. High-Availability, maximum ingestion, largest possible dataset size, maximum queries per second, etc. (Currently under development, ETA Fall 2021)

  • Cost-Effectiveness
    Very large datasets do not need to be kept entirely in memory in Weaviate. At the same time available memory can be used to increase the speed of queries. This allows for a conscious speed/cost trade-off to suit every use case.

  • Graph-like connections between objects
    Make arbitrary connections between your objects in a graph-like fashion to resemble real-life connections between your data points. Traverse those connections using GraphQL.

Documentation

You can find detailed documentation in the developers section of our website or directly go to one of the docs using the links in the list below.

Additional reading

Examples

You can find code examples here

Support

Contributing

Comments
  • Vectorization mask for classes

    Vectorization mask for classes

    Currently all string/text values as well as the class name and property names are considered in the vectorization. However not all property names and values have be important for the context. Take the following meta class of a table as an example:

    {
                    "class": "Column",
                    "description": "",
                    "properties":[
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["int"],
                            "keywords": [],
                            "name": "index"
                        },
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["text"],
                            "keywords": [],
                            "name": "name"
                        },
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["string"],
                            "keywords": [],
                            "name": "dataType"
                        }
                    ]
                }
    

    In this case the vector would be created based on Column, index, name, data, type and the values of the properties. However the class name and properties shift the vector into the context of tables while the context should be based solely on the column name and dataType values.

    Proposal:

    • If nothing is specified the vector gets created automatically based on all information in the class
    • The user can explicitly mask information away from the vectorization in the schema:
    {
                    "class": "Column",
                    "description": "",
                    "vectorizeClassName": False
                    "properties":[
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["int"],
                            "keywords": [],
                            "name": "index"
                        },
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["text"],
                            "keywords": [],
                            "name": "name"
                            "vectorizePropertyName": False
                            "vectorizePropertyValue": True
                        },
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["string"],
                            "keywords": [],
                            "name": "dataType"
                            "vectorizePropertyName": False
                            "vectorizePropertyValue": True
                        }
                    ]
                }
    
    opened by fefi42 30
  • Hybrid Search (combining Vector + Sparse search)

    Hybrid Search (combining Vector + Sparse search)

    WHAT

    Combining vector search with sparse (e.g. BM25) search in one query

    WHY

    To bridge the gap between sparse search and vector search.

    Longer why

    Vector search (using dense vectors, computed by ML models) works well in-domain, but has poor performance out-of-domain. BM25 search works well out-of-domain, because it uses sparse methods (keyword matching), but can't perform context-based search. Combining both methods will improve search results out-of-domain.

    HOW

    • This issue and implementation depend on issue https://github.com/semi-technologies/weaviate/issues/2133
    • Do both a dense and BM25 search using a query (in parallel)
    • You should be able to define a function to combine the results into 1 result list, using the scores of data in both candidate lists.
    • A default function can be defined in the Weaviate setup, but can be overwritten in the GraphQL query.
    • BM25 score is unbounded. Score normalization or scaling is not a good idea, because you lose information on how good the results are textually. However, when combining BM25 with Dense search, some form of normalization may be handy, so we can choose to offer normalization methods anyways. We can explain strategies in the documentation. Possible strategies are:
      • minmax - To leave the distribution intact, the best normalization method is a minmax approach, which takes the minimum BM25 score and maximum BM25 score into account. Taking the maximum BM25 score of a particular query as maximum in the formula is not a good idea, because then you're setting this result to the maximum score regardless of the actual score. So a (theoretical) maximum needs to be defined, although BM25 is unbounded. This can be achieved by running a number of different queries, and recording the maximum value. Since this is quite complex to implement as a feature, we can start by offering a setting with a default value, which the user can change at runtime. An example can be: max((x/10), 1) if the guessed maximum is 10.
      • Arctangent - An arctan scales values in a logarithmic manner. A function to scale scores <-1, 1> (practically between <0,1> because x will be a positive score) is: 2/pi*arctan(x) (see https://www.mdpi.com/2227-7390/10/8/1335)

    Design - Weaviate setup

    Requirements:

    • Dense retrieval and sparse retrieval independently (with the same or different query)
    • Combine the result of both methods using a scoring function

    Schema The settings should be configured in the schema, per class:

    {
      "class": "string",
      "vectorIndexType": "hnsw",                
      "vectorIndexConfig": {
        ...                                     
      },
      "vectorizer": "text2vec-transformers",
      "moduleConfig": {
        "text2vec-transformers": {  
          "vectorizeClassName": true            
        }
      },                       
      "sparseIndexType": "bm25",                
      "sparseIndexConfig": {                   
        "bm25": {
          "b": 0.75,
          "k1": 1.2
        }
      },
      "properties": [                            
        {
          "name": "string",                     
          "description": "string",              
          "dataType": [                         
            "string"
          ],
          "sparseIndexConfig": {                
            "bm25": {
              "b": 0.75,
              "k1": 1.2, 
            }
          },
          "indexInverted": true                 
        }
      ]
    }
    

    Docker-compose In case we need to let Weaviate know on startup whether to enable sparse search, we can introduce an env var like ENABLE_SPARSE_INDEX:

    ---
    version: '3.4'
    services:
      weaviate:
        command:
        - --host
        - 0.0.0.0
        - --port
        - '8080'
        - --scheme
        - http
        image: semitechnologies/weaviate:1.14.1
        ports:
        - 8080:8080
        restart: on-failure:0
        environment:
          QUERY_DEFAULTS_LIMIT: 25
          AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
          PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
          DEFAULT_VECTORIZER_MODULE: text2vec-transformers
          ENABLE_MODULES: text2vec-transformers
          CLUSTER_HOSTNAME: 'node1'
          ENABLE_SPARSE_INDEX: 'true' # <== NEW. Which method (e.g. bm25) can be specified in the schema. Not sure if this variable is needed in the docker-compose actually.
        t2v-transformers:
          image: semitechnologies/transformers-inference:sentence-transformers-msmarco-distilroberta-base-v2
          environment:
            ENABLE_CUDA: 0 # set to 1 to enable
    ...
    

    Design - GraphQL Queries

    The current API is documented in this comment.

    Below is the more elaborate original proposal.

    {
      Get {
        Paper (
          hybridSearch: {               # NEW
            operands: [{
              sparseSearch: {           # inherits all fields from sparseSearch. alternative name: sparse
                function: bm25,
                query: "my query",
                properties: ["abstract"],
                normalization: {
                  minmax: {
                    max: 10
                  }
                }
              },
              weight: 0.2
            }, {
              nearText: {               # inherits all fields from nearText. alternative name: dense or denseSearch
                concepts: ["my query"],
                certainty: 0.7   
              },
              weight: 0.8
            }],
            type: Sum                   # or Average, or RRF (with RRF, weights will be discarded)
          }
        ) {
          title
          abstract
          _additional {
            score    # NEW - because distance is not a good word, it really is a ranking score instead of distance between vectors
          }
        }
      }
    }
    

    Multiple sparse searches should also be supported, like:

    {
      Get {
        Paper (
          hybridSearch: {               # NEW
            operands: [{
              sparseSearch: {           # inherits all fields from sparseSearch. alternative name: sparse
                function: bm25,
                query: "my query",
                properties: ["abstract"],
                normalization: {
                  minmax: {
                    max: 10
                  }
                }
              },
              weight: 0.2
            }, {
              sparseSearch: {           # inherits all fields from sparseSearch. alternative name: sparse
                function: bm25,
                query: "my query 2",
                properties: ["title"],
                normalization: {
                  minmax: {
                    max: 10
                  }
                },
                limit: 100
              },
              weight: 0.2
            }, {
              nearText: {              # inherits all fields from nearText. alternative name: dense or denseSearch
                concepts: ["my query"],
                certainty: 0.7,
                limit: 100   
              },
              weight: 0.6
            }],
            type: Sum                  # or Average, or RRF (with RRF, weights will be discarded)
          }
        ) {
          title
          abstract
          _additional {
            score        # NEW - because distance is not a good word, it really is a ranking score instead of distance between vectors
          }
        }
      }
    }
    
    planned-1.17 
    opened by laura-ham 28
  • Add _snippets in GraphQL and REST API

    Add _snippets in GraphQL and REST API

    When an object is indexed on a larger text item (e.g., a paragraph like in the news article demo) certain search terms can be found in sentences.

    The idea is to add a starting point and endpoint of the most important part of the text corpus in the __meta end-point as a potentialAnswer which can be enabled or disabled by setting an distanceToAnswer. This can work both for explore filters as where filters.

    Idea

    I was searching for something on WikiPedia under the search term: Is herbalife a pyramid scheme? and got this response.

    Because Google isn't giving the actual answer but a location for the answer, we should be able to calculate something similar.

    Screenshot 2020-06-05 at 10 53 31

    Explore example

    {
      Get{
        Things{
          Article(
            explore: {
              concepts: ["I want a spare rib"],
              certainty: 0.7,
              moveAwayFrom: {
                concepts: ["bacon"],
                force: 0.45
              }
            }
          ){
            name
            __meta {
              potentialAnswer(
                distanceToAnswer: 0.5 # <== optional
              ) {
                start
                end
                property
                distanceToQuery
              }
            }
          }
        }
      }
    }
    

    Result where the start and end give the starting and ending position and in which property the answer / most important part can be found.

    {
        "data": {
            "Get": {
                "Things": {
                    "Article": [
                        {
                            "name": "Bacon ipsum dolor amet tri-tip hamburger leberkas short ribs chicken turkey sirloin tenderloin shoulder pig bresaola. Pastrami ham hock meatball rump ribeye cupim, capicola venison burgdoggen brisket meatloaf. Turducken t-bone landjaeger pork chop, bresaola pig prosciutto pastrami sausage pancetta capicola short ribs hamburger tail spare ribs. Jerky kevin doner cupim pork belly picanha, pancetta capicola pork loin alcatra corned beef shank. Bacon chislic landjaeger doner corned beef, hamburger beef ribs filet mignon turducken tri-tip andouille pastrami chuck pork loin capicola. Prosciutto shankle chislic, shoulder tri-tip turducken meatball ham pork loin fatback hamburger pork chop bacon pork belly. Kevin sausage salami spare ribs tenderloin t-bone meatball picanha flank jowl pork chop tail turducken tri-tip.",
                            "__meta": [ // <== array because multiple results are possible and/or multiple 
                                {
                                    "property": "name",
                                    "distanceToQuery": 0.0, // <== distance to the query
                                    "start": 26,// <== just a random example
                                    "end": 130  // <== just a random example
                                }
                            ]
                        }
                    ]
                }
            }
        },
        "errors": null
    }
    

    Where example

    {
      Get {
        Things {
          Article(where: {
                path: ["name"],
                operator: Like,
                valueString: "New *"
            }) {
            name
            __meta {
              potentialAnswer(
                distanceToAnswer: 0.5 # <== optional
              ) {
                start
                end
                property
                distanceToQuery
              }
            }
          }
        }
      }
    }
    

    Result where the start and end give the starting and ending position and in which property the answer / most important part can be found.

    {
        "data": {
            "Get": {
                "Things": {
                    "Article": [
                        {
                            "name": "Bacon ipsum dolor amet tri-tip hamburger leberkas short ribs chicken turkey sirloin tenderloin shoulder pig bresaola. Pastrami ham hock meatball rump ribeye cupim, capicola venison burgdoggen brisket meatloaf. Turducken t-bone landjaeger pork chop, bresaola pig prosciutto pastrami sausage pancetta capicola short ribs hamburger tail spare ribs. Jerky kevin doner cupim pork belly picanha, pancetta capicola pork loin alcatra corned beef shank. Bacon chislic landjaeger doner corned beef, hamburger beef ribs filet mignon turducken tri-tip andouille pastrami chuck pork loin capicola. Prosciutto shankle chislic, shoulder tri-tip turducken meatball ham pork loin fatback hamburger pork chop bacon pork belly. Kevin sausage salami spare ribs tenderloin t-bone meatball picanha flank jowl pork chop tail turducken tri-tip.",
                            "__meta": [ // <== array because multiple results are possible and/or multiple properties might be indexed
                                {
                                    "property": "name",
                                    "distanceToQuery": 0.0, // <== distance to the query
                                    "start": 26,// <== just a random example
                                    "end": 130  // <== just a random example
                                }
                            ]
                        }
                    ]
                }
            }
        },
        "errors": null
    }
    

    Suggested (first) implementation

    1. Results are returned like the current implementation.
    2. The vectorization of the query is used to find the closest match of a word in a sentence. *
    3. When the closest word is found, the start and endpoint are found at the beginning and end of the sentence.
    4. The distanceToAnswer if the minimal distance, if it is not set, no start and end-points will be available, if multiple sentences make the mark, they will all be part of the array.

    *- there might be potential to also do this on groups of words or complete sentences.

    Related

    #1136 #1139 #1155 #1156

    graphql Contextionary autoclosed _underscoreProp 
    opened by bobvanluijt 22
  • Add geotype to datatypes

    Add geotype to datatypes

    Todos

    • [x] design decisions
      • [x] name of the field
        • current proposals: geoCoordinate, geoLocation, geoPoint
          • my personal favorite being geoCoordinate
          • cc @laura-ham @bobvanluijt
      • [x] design of the where filter
        • @laura-ham suggestions?
    • [x] spike out happy path in janusgraph only
      • [x] index creation
      • [x] adding property
      • [x] searching by property within range
    • [x] add new data type on import, goal: an import with geoCoordinates field succeeds
      • [x] allow in schema creation
      • [x] allow in class instance creation
      • [x] validation?
        • [x] add basic validation
        • [x] refactor validateSchemaInBody (it's way too long and extremely difficult to read/extend)
      • [x] janus graph create vertex
    • [x] include on simple read queries
      • [x] Local Get
      • [x] Network Get
    • [x] filter by property
      • [x] Local Filters
        • [x] extract filter from graphql
        • [x] set required validations, so that required fields cannot be omitted
        • [x] apply filter in connectors (Janusgraph)
      • ~~Network Filters~~
        • nothing to do here, they use the same code as local filters
    • [x] deal with property in GetMeta and Aggregate
      • proposal for now to simply not support those fields there
      • long-term?
    • [x] rename according to latest decisions
      • [x] pluralize name geoCoordinate -> geoCoordinates
      • [x] restructure where filter
        • [x] WithinRange -> WithinGeoRange
        • [x] valueRange -> valueGeoRange
        • [x] wrap distance and geoCoordinates in separate objects
    • [x] Update docs (@laura-ham volunteered to help out here)
    • [x] e2e/acceptance test

    Original Content below: for most up-to-date summary see comments below

    Abstract Examples

    • A location in a 2-dimensional space, i.e. x=3, y=5
    • Coordinates that point to a location on a world map

    Research Questions

    • Are geo types always two-dimensional or can they also be more dimensional?
      • Part 1: Do we have use cases for more than 2d?
      • Part 2: Is there technical support for more than 2d?
    • Optimal way to query, e.g. within 500m coordinates x,y mixes concepts of metrical distance and coordinates, what API do we want

    Features we'd need

    • Import/update geo coordinates
    • use geo in where filters (see research question)
    • do we want to allow aggregation functions like Aggregate or GetMeta
      • if so, what should they look like?
      • if so, is there technical support in our current stack?
      • if so, right from the start or later?

    cc @bobvanluijt @laura-ham

    enhancement Core implementation graphql discussion documentation API design & UX data types 
    opened by etiennedi 22
  • Dropping 'things' or 'actions' from the filter.

    Dropping 'things' or 'actions' from the filter.

    In the section on filters in the GraphQL documentation, the paths of the filters are prefixed with "things" and "actions".

    This is superfluous information. The class names cannot overlap in any case.

    Current query:

     Get(where:{
          operator: And,
          operands: [{
            path: ["Things", "Animal", "age"],
            operator: LessThan
            valueInt: 5
          }, {
            path: ["Things", "Animal", "inZoo", "Zoo", "name"],
            operator: Equal,
            valueString: "London Zoo"
          }]
        }) { ... }
    

    After removing the kind of class:

     Get(where:{
          operator: And,
          operands: [{
            path: ["Animal", "age"],
            operator: LessThan
            valueInt: 5
          }, {
            path: ["Animal", "inZoo", "Zoo", "name"],
            operator: Equal,
            valueString: "London Zoo"
          }]
        }) { ... }
    
    opened by moretea 19
  • batch_create does not report on 'store is read-only' errors

    batch_create does not report on 'store is read-only' errors

    When using the python client with client.batch and having crossed the DISK_USE_READONLY_PERCENTAGE my ingest fails when using data_object.create but will seemingly succeed when using batches. However the ingest that succeeded using batch did not contain any items when running a Get query though they have been processed by my t2v-transformers container.

    There might be other errors that are not covered by batch.create and I only found out through a helpful comment in Weaviate Slack and after running into my issue. This seems to also have been the issue with https://github.com/semi-technologies/weaviate/issues/1929 .

    autoclosed 
    opened by chris-aeviator 18
  • (DOC) REST API authentication to a WCS cluster

    (DOC) REST API authentication to a WCS cluster

    Hi,

    I'm trying to connect to an authentication enabled WCS cluster, with the REST API (for WooCommerce owners who want to use WCS hosting rather than installing Weaviate).

    Could you point me to the documentation ? I received a 404 link after the cluster was created https://www.semi.technology/developers/weaviate/v1.8.0/configuration/authentication

    I noticed the Enterprise token, but I do not think it is what I need. https://www.semi.technology/developers/weaviate/current/configuration/authentication.html is only about self setup I suspect.

    opened by eostis 18
  • Docker-compose DB fails:

    Docker-compose DB fails: "Could not create SSTable component"

    Weaviate doesn't start with Docker-compose and keeps in a failing loop;

    db_1        | ERROR 2019-02-27 15:30:53,471 [shard 0] sstable - Could not create SSTable component /var/lib/scylla/
    data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-114-TOC.txt.tmp. Found exc
    eption: std::system_error (error system:2, No such file or directory)
    db_1        | ERROR 2019-02-27 15:30:53,471 [shard 0] database - failed to write sstable /var/lib/scylla/data/syste
    m_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-114-Data.db: std::system_error (erro
    r system:2, No such file or directory)
    db_1        | WARN  2019-02-27 15:30:53,471 [shard 0] sstable - Unable to delete /var/lib/scylla/data/system_schema
    /keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-114-TOC.txt because it doesn't exist.
    db_1        | ERROR 2019-02-27 15:31:03,471 [shard 0] sstable - Could not create SSTable component /var/lib/scylla/
    data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-116-TOC.txt.tmp. Found exc
    eption: std::system_error (error system:2, No such file or directory)
    db_1        | ERROR 2019-02-27 15:31:03,472 [shard 0] database - failed to write sstable /var/lib/scylla/data/syste
    m_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-116-Data.db: std::system_error (erro
    r system:2, No such file or directory)
    db_1        | WARN  2019-02-27 15:31:03,472 [shard 0] sstable - Unable to delete /var/lib/scylla/data/system_schema
    /keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-116-TOC.txt because it doesn't exist.
    db_1        | ERROR 2019-02-27 15:31:13,472 [shard 0] sstable - Could not create SSTable component /var/lib/scylla/
    data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-118-TOC.txt.tmp. Found exc
    eption: std::system_error (error system:2, No such file or directory)
    db_1        | ERROR 2019-02-27 15:31:13,510 [shard 0] database - failed to write sstable /var/lib/scylla/data/syste
    m_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-118-Data.db: std::system_error (erro
    r system:2, No such file or directory)
    db_1        | WARN  2019-02-27 15:31:13,510 [shard 0] sstable - Unable to delete /var/lib/scylla/data/system_schema
    /keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-118-TOC.txt because it doesn't exist.
    
    bug docker 
    opened by bobvanluijt 18
  • Can Aggregate depend on Get ?

    Can Aggregate depend on Get ?

    I noticed that the Aggregate function does not depend on the Get query. For instance, on the following query, facets are always the same, whatever the 'wpsolr_type' condition.

    {
      results: Get {
        WeaviateWooCommerce(
          limit: 10
          where: {path: ["wpsolr_type"], operator: Equal, valueString: "post"}
        ) {
          wpsolr_title
          wpsolr_product_cat_str
        }
      }
      facets: Aggregate {
        WeaviateWooCommerce(groupBy: ["wpsolr_product_cat_str"]) {
          meta {
            count
          }
          groupedBy {
            value
          }
        }
      }
    }
    
    

    I'd like to get facets related to the results instead.

    Is it possible?

    autoclosed 
    opened by eostis 17
  • Suggestion: auto cut-off Explore search results

    Suggestion: auto cut-off Explore search results

    Problem & current behaviour

    If the Explore filter in a GraphQL Get query is used, it is unclear how to set and control the certainty (or distance) parameter. This parameter controls what results to return, but with the current design, the user does not know what to set this parameter, to get optimal results.

    You don't want to see 'bad' results amongst the results, but you don't know beforehand where the cut off point of the certainty is. Additionally, we observed that the user prefers not to see any results if there are no good results at all.

    Proposed solution

    Automatically find a cut-off threshold for which results to show. This threshold can be calculated by e.g. an elbow in distances between each result and the query, or between a cluster of results and the query.

    IMG_20200508_144440458

    Questions

    1. Should the user have the option to set whether they want to enable this auto function on their query?
    2. What value to set the (relative) cut-off point to? (=how big should the gap between the points or clusters relatively be?)
    graphql Developer Experience API design & UX DX autoclosed 
    opened by laura-ham 17
  • SUGGESTION: Dump vectors

    SUGGESTION: Dump vectors

    Some data scientists might want to leverage the vectorization mechanism in Weaviate to train new models. The result would be similar to /things and /things/{UUID} but with a focus on a matrix to download all objects.

    RESTful URL suggestion: /c11y/vectors and /c11y/vectors/{UUID}.

    c11y/vectors/{UUID}

    returns:

    {
        "type": "thing",
        "vector": [
            0,
            0,
            0,
            0,
            //etc
        ]
    }
    

    c11y/vectors?page=0

    returns:

    {
        "result": {
            "UUID_1": {
                "type": "thing",
                "vector": [ //Array is same object as c11y/vectors/{UUID}
                    0,
                    0,
                    0,
                    0,
                    //etc
                ]
            },
            "UUID_1": {
                "type": "thing",
                "vectors": [ //Array is same object as c11y/vectors/{UUID}
                    0,
                    0,
                    0,
                    0,
                    //etc
                ]
            }
        },
        "pages": 100 // total pages
    }
    
    wontfix 
    opened by bobvanluijt 17
  • Push docker container to github registry

    Push docker container to github registry

    What's being changed:

    This should enable docker image downloads from other repos without docker logins

    Review checklist

    • [ ] Documentation has been updated, if necessary. Link to changed documentation:
    • [ ] Chaos pipeline run or not necessary. Link to pipeline:
    • [ ] All new code is covered by tests where it is reasonable.
    • [ ] Performance tests have been run or not necessary.
    opened by dirkkul 1
  • Roaring allow list rebased

    Roaring allow list rebased

    What's being changed:

    Review checklist

    • [ ] Documentation has been updated, if necessary. Link to changed documentation:
    • [ ] Chaos pipeline run or not necessary. Link to pipeline:
    • [ ] All new code is covered by tests where it is reasonable.
    • [ ] Performance tests have been run or not necessary.
    opened by antas-marcin 1
  • Support hybrid search in Aggregate

    Support hybrid search in Aggregate

    What's being changed:

    Review checklist

    • [ ] Documentation has been updated, if necessary. Link to changed documentation:
    • [ ] Chaos pipeline run or not necessary. Link to pipeline:
    • [ ] All new code is covered by tests where it is reasonable.
    • [ ] Performance tests have been run or not necessary.

    Closes #2482

    opened by parkerduckworth 1
  • creationTimeUnix is beeing updated during upsert

    creationTimeUnix is beeing updated during upsert

    I’ve noticed that creationTimeUnix is being updated (and is the same to lastUpdateTimeUnix) when I upsert the record (batch write with the same id) I hoped that I could determine whether the record had been upserted by comparing those two in batch result.

    I've tested this with two documents, one was already there, the other one was being inserted: this was the result:

    [
    {
    //This one should have been updated
    'class':'P_8062f70f_d2c7_4b9b_b0b7_1c587ed49c2d',
    'creationTimeUnix': 1672845088023,
    'id':'cebcaf20-2a4b-5a42-93b9-d662c02928bc',
    'lastUpdateTimeUnix': 1672845088023,
    'properties': {'_creationTimeUnix': 1672845088023, _lastUpdateTimeUnix': 1672845088023, ...},
    'vector': [...],
    'deprecations':None,
    'result':{}
    },
    {
    //This one should have been inserted
    'class':'P_8062f70f_d2c7_4b9b_b0b7_1c587ed49c2d',
    'creationTimeUnix': 1672845088023,
    'id':'bb323fed-3026-500b-8d61-e686b75aa19e',
    'lastUpdateTimeUnix': 1672845088023,
    'properties': {'_creationTimeUnix': 1672845088023, '_lastUpdateTimeUnix': 1672845088023, ...},
    'vector': [...],
    'deprecations':None,
    'result':{}
    },
    ]
    

    According to @byronvoorbach it should work as expected and the upserted document should have only lastUpdateTimeUnix updated. Currently, there seems to be no direct way to determine how many documents were updated, inserted, upserted, and so on...

    opened by ju-bezdek 0
  • Improve decoding values in LSM SetDecoder by not maintaining values original order

    Improve decoding values in LSM SetDecoder by not maintaining values original order

    Access to the map has been reduced to a minimum

    What's being changed:

    Review checklist

    • [ ] Documentation has been updated, if necessary. Link to changed documentation:
    • [ ] Chaos pipeline run or not necessary. Link to pipeline:
    • [ ] All new code is covered by tests where it is reasonable.
    • [ ] Performance tests have been run or not necessary.
    opened by redouan-rhazouani 1
Releases(v1.17.0)
  • v1.17.0(Dec 20, 2022)

    Breaking Changes

    none

    New Features

    Leaderless Replication

    Weaviate v1.17 introduces leaderless replication with tunable consistency. With replication, Weaviate becomes highly available and can be scaled to increase throughput. Imports can be automatically replicated across the cluster. Existing datasets can be scaled up to allow for higher read-throughput and fail-safety.

    To learn more about the architecture behind Weaviate's leaderless replication, how to use replication, what limitations there are, and what the future roadmap looks like, please see the documentation.

    Contributed by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2344, by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2361, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2362, by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2365, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2371, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2373, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2377, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2380, by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2382, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2384, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2389, by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2395, by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2420, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2418, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2446, by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2447, by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2463, by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2461, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2464, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2399, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2415, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2402, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2417, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2437, by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2426

    Hybrid Search (Dense Vector & BM25F)

    Weaviate v1.17 introduces the ability to perform a hybrid search consisting of a BM25(F) and dense vector search. The results are combined using rank fusion. In addition to Hybrid Search, Weaviate now also supports pure BM25 and BM25F search.

    To learn more about Hybrid Search and BM25 Search, see the documentation.

    Contributed by @donomii in https://github.com/semi-technologies/weaviate/pull/2319, by @donomii in https://github.com/semi-technologies/weaviate/pull/2381, by @donomii in https://github.com/semi-technologies/weaviate/pull/2450, by @donomii in https://github.com/semi-technologies/weaviate/pull/2467, by @aliszka in https://github.com/semi-technologies/weaviate/pull/2469, by @aliszka in https://github.com/semi-technologies/weaviate/pull/2468, by @donomii in https://github.com/semi-technologies/weaviate/pull/2466, by @donomii in https://github.com/semi-technologies/weaviate/pull/2391

    Other Features

    • Dynamically Add Nodes To Running Cluster after data has been imported by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2350
    • Make startup cluster schema sync backward-compatible by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2413
    • Add TTLs to cluster-wide transactions (Schema, Classifications) by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2364
    • Adjust Memtable Size Dynamically by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2425

    Fixes

    Performance

    • Improve Startup Time by persisting more LSM Segment Information by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2385
    • Improve p9999 Batch Import Latency by precomputing segment metadata for newly compacted segments by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2421
    • Improve HNSW startup time by improving allocation efficiency in Deserializer by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2392

    UX

    • UX: No error message on empty schema by @trengrj in https://github.com/semi-technologies/weaviate/pull/2345
    • Updated contributor guide by @databyjp in https://github.com/semi-technologies/weaviate/pull/2353
    • Update README.md by @erika-cardenas in https://github.com/semi-technologies/weaviate/pull/2354
    • Fix "schema out of sync" error when upgrading to v1.17 by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2419
    • nit: Improve GitHub Community Standards Score by @MarcusSorealheis in https://github.com/semi-technologies/weaviate/pull/2409

    Other / Internal / Security

    • Bugfix: created and last updated timestamps missing from reference additional fields by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2342
    • Fix typos by @trengrj in https://github.com/semi-technologies/weaviate/pull/2374
    • Add Go-specific Vulnerability Scanner by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2379
    • misc: fix typo in indexTimestamps prop name by @dandv in https://github.com/semi-technologies/weaviate/pull/2396
    • Fix broken logger in tx manager by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2411
    • update go/x/net version by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2427
    • Change references from Travis to GitHub actions in PR summary by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2422
    • Add Codeowners file by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2404
    • Run docker push step on larger machine by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2435
    • Fix file descriptor leak in compaction pre-compute by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2440

    New Contributors

    • @databyjp made their first contribution in https://github.com/semi-technologies/weaviate/pull/2353
    • @erika-cardenas made their first contribution in https://github.com/semi-technologies/weaviate/pull/2354
    • @dandv made their first contribution in https://github.com/semi-technologies/weaviate/pull/2396
    • @MarcusSorealheis made their first contribution in https://github.com/semi-technologies/weaviate/pull/2409

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.16.9...v1.17.0

    Source code(tar.gz)
    Source code(zip)
  • v1.16.9(Dec 18, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Fix incorrect object count in nodes API by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2460
    • Fix issue where text2vec-openai ignores model version when vectorizing queries by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2459

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.16.8...v1.16.9

    Source code(tar.gz)
    Source code(zip)
  • v1.16.8(Dec 16, 2022)

    What's Changed

    • Fix autoschema when adding objects with varying properties by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2444
    • Support new text-embedding-ada-002 model in text2vec-openai module by allowing user to set modelVersion by @kcm in https://github.com/semi-technologies/weaviate/pull/2448

    New Contributors

    • @kcm made their first contribution in https://github.com/semi-technologies/weaviate/pull/2448

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.16.7...v1.16.8

    Source code(tar.gz)
    Source code(zip)
  • v1.16.7(Dec 15, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Fix group with nearVector by @trengrj in https://github.com/semi-technologies/weaviate/pull/2423
    • Improve defaults for DefaultCohereModel and DefaultTruncate by @bobvanluijt in https://github.com/semi-technologies/weaviate/pull/2434

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.16.6...v1.16.7

    Source code(tar.gz)
    Source code(zip)
  • v1.16.6(Dec 6, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Provide alternative to OpenAI deprecation of "Answers" endpoint through qna-openai module by @byronvoorbach in https://github.com/semi-technologies/weaviate/pull/2346
    • Skips re-vectorize objects on PATCH if not necessary by @aliszka in https://github.com/semi-technologies/weaviate/pull/2383
    • Fix potential deadlock by releasing the index lock in case of error by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2401

    Internal Fixes & Refactoring

    • Additional tokenization tests by @aliszka in https://github.com/semi-technologies/weaviate/pull/2339

    New Contributors

    • @byronvoorbach made their first contribution in https://github.com/semi-technologies/weaviate/pull/2346

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.16.5...v1.16.6

    Source code(tar.gz)
    Source code(zip)
  • v1.16.5(Nov 21, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Update Dependencies to fix known vulnerabilities by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2378

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.16.4...v1.16.5

    Source code(tar.gz)
    Source code(zip)
  • v1.16.4(Nov 18, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Fix length and null state filtering for empty arrays by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2367
    • Prevent IndexNullState and IndexPropertyLength to be updated by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2368
    • Remove obsolete invertedIndexConfig.cleanupIntervalSeconds by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2376

    Internal Fixes

    • Self-Service Docker Images by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2369
    • adjusts batch size to decrease test execution time by @aliszka in https://github.com/semi-technologies/weaviate/pull/2372

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.16.3...v1.16.4

    Source code(tar.gz)
    Source code(zip)
  • v1.16.3(Nov 15, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Fix another potential SEGFAULT issue during compactions by @aliszka in https://github.com/semi-technologies/weaviate/pull/2363

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.16.2...v1.16.3

    Source code(tar.gz)
    Source code(zip)
  • v1.16.2(Nov 15, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Fix SEGFAULT error (a race between reads and compaction) by @aliszka in https://github.com/semi-technologies/weaviate/pull/2349
    • Fix "stuck API" (deadlock situation) by @aliszka in https://github.com/semi-technologies/weaviate/pull/2349

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.16.1...v1.16.2

    Source code(tar.gz)
    Source code(zip)
  • v1.16.1(Nov 10, 2022)

    This release was made with ❤️ , ☕ , 🍕, and 🍷 in 🇮🇹 . Learn more.

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Fix filters with len() in path by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2340
    • Fix concurrent Write / Read Performance Regression (introduced in v1.16.0) by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2352

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.16.0...v1.16.1

    Source code(tar.gz)
    Source code(zip)
  • v1.16.0(Oct 31, 2022)

    Breaking Changes

    none

    New Features

    Null Property & Prop Length Indexing & Filtering

    It is now possible to efficiently filter for all props that are set, not set, or have a specific length. Note that null-state and prop-length indexing is optional and needs to be activated prior to importing data.

    • Add filtering for null properties by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2209
    • Add property length indexing and filtering by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2236

    Distributed Backups

    v1.15 introduced one-command backups, but they were limited to single-node setups. v1.16 lifts this limitation, and backups now work for distributed (multi-node) setups as well. Old backups, created in v1.15 are backward-compatible.

    • Add new type coordinator to coordinate distributed backup operation among shards by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2231
    • Implement the coordinator selector by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2243
    • Get node name from sharding state by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2241
    • Implement shard's endpoints used to coordinate creation of a distributed backup by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2246
    • Implement endpoints of a shard participating in a coordinated backup restoration by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2250
    • Backup: query status of a shard participating in the creation of a coordinated backup by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2251
    • Implement endpoints for communication between coordinator and shards by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2248
    • Assign backup operations to coordinators (part 1) by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2254
    • Multinode backup integration testing preparation by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2252
    • Implement scheduler’s endpoint for getting the status of a backup request. by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2257
    • Introduce coordinated cluster backup integration tests by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2259
    • Implement scheduler’s endpoint for getting the status of a restoration request by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2261
    • Backup manager get node name from schema manager by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2266
    • Implement scheduler’s endpoint for restoring distributed backups by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2267
    • Bind restful backup API to the scheduler to allow for execution distributed backup operations by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2268
    • Fix acceptance tests for the backup journey by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2270
    • Cluster backup/restore module journey tests by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2273
    • Make distributed backups backward compatible with existing single node backups by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2277
    • Fix restoring empty classes and returning status of a restoration request by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2280
    • Backwards compatibility bug fixes for backup/restore operations by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2286
    • Fix backup restoration race condition by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2289
    • Fix single-node backups to work without needing cluster env vars by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2295
    • Prevent cluster BRO when backend is local filesystem by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2298
    • Cancel restoration when failing to store initial metadata in the specified backend. by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2326

    Modules

    • Introduce the ref2vec-centroid module by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2205
    • Mark all shards as read-only if memory threshold is reached by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2232
    • Change to OpenAI embedding API by @samos123 in https://github.com/semi-technologies/weaviate/pull/2306
    • Create text2vec-cohere module by @DasithEdirisinghe in https://github.com/semi-technologies/weaviate/pull/2323
    • Add support for HuggingFace Inference API by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2328
    • Add support for experimental multilingual-2210-alpha cohere model by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2332

    UX & Operations

    • Add API to view cluster nodes status by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2249
      • Renames nodes status response by @aliszka in https://github.com/semi-technologies/weaviate/pull/2284
    • Added support for OpenID scopes setting by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2275
    • Allow creating class schema with references to itself by @aliszka in https://github.com/semi-technologies/weaviate/pull/2317
    • Add default vector distance metric by @adlerfaulkner in https://github.com/semi-technologies/weaviate/pull/2300

    Fixes

    • Remove certainty deprecation warning by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2244
    • Performance: Reduce allocations when unmarshalling properties by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2269
    • Fix vector_index_size metric by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2321
    • Fix data race by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2288
    • Fix empty references bug by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2320

    Other / Internal / Refactoring / Improved Testing

    • Remove unused code to make it easier to navigate the code base by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2235
    • Fix inverted index test compilation error by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2242
    • Track vector dimensions in inverted index by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2230
    • Fix tracking vector dimensions metric when index gets dropped by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2255
    • Fix for handling stopping of the tracking vector dimensions goroutine by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2258
    • Enable aggregation integration tests with filters by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2265
    • Run PR tests as github actions by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2272
    • Migrate all tests to GH actions by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2287
    • Fixes cyclemanager flaky tests by @aliszka in https://github.com/semi-technologies/weaviate/pull/2294
    • Add a migration option to fill dimensions stored index at startup by @donomii in https://github.com/semi-technologies/weaviate/pull/2264
    • Migrate swagger check to Github action by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2293
    • Add tests for empty references by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2312
    • Adjust deprecations generation script and remove the timestamp information by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2314
    • Adds exhaustive linter by @aliszka in https://github.com/semi-technologies/weaviate/pull/2309
    • Fix syntax in error message by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2315
    • Remove vector_index_dimensions_total metric in favor of using vector_dimensions_sum metric by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2325
    • Fix releasing resources associated with time.Ticker in the code base by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2327

    Fixes since v1.15.0

    Not every user upgrades whenever a patch is released. The following fixes were already included in v1.15.x patch releases. If you haven't upgraded since v1.15.0, these will be new to you:

    • Fix total of 6 bugs related to indexing into LSM Store and HNSW index by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2191
    • Fix potential SEGFAULT on concurrent nested filter and hash bucket compaction by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2212
    • Fix issues with sorting when multiple properties are set by @aliszka in https://github.com/semi-technologies/weaviate/pull/2214
    • Fix date aggregation with multiple shards + numerical median computation by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2164
    • Fix seven aggregations bugs by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2192
    • Mark PointingTo as deprecated/experimental by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2216
    • Set restoration status in the sync part of RestoreBackup endpoint by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2187
    • Bugfix: Deleting a referenced class panics on GQL schema rebuild by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2189
    • Wrap backend error when checking for metadata existence by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2200
    • Fix HuggingFace module class settings by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2201
    • Add support for HuggingFace module error warnings in Weaviate's response by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2202
    • Fix issues around explicit nil values for properties by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2207
    • Set default for vector cache limit to unlimited by @bobvanluijt in https://github.com/semi-technologies/weaviate/pull/2217
    • Fix nil pointer error on empty aggregate group (accidentally introduced in v1.15.1)
    • Fix issue with vector cache limit introduced in v1.15.1 by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2240
    • Fix various aggregation issues for non-array data types by @aliszka in https://github.com/semi-technologies/weaviate/pull/2237
    • Fix race conditions when adding objects and in HNSW by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2253
    • Fix cross-class near object when using a beacon by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2260
    • Fix out-of-range error on old dimensions tracking logic by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2262
    • Fix combining of results with multiple shards by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2263
    • Fix module dependency class configuration support by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2278
    • Add support for all AWS IAM-based authorizations by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2271
    • Fix updating (PATCH) empty arrays by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2285
    • Fix retrieving vector inside cross-reference by @danieldaeschle [email protected] in https://github.com/semi-technologies/weaviate/pull/2059 and https://github.com/semi-technologies/weaviate/pull/2301
    • Fix /.well-known/openid-configuration path construction by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2307

    New Contributors

    • @samos123 made their first contribution in https://github.com/semi-technologies/weaviate/pull/2306
    • @adlerfaulkner made their first contribution in https://github.com/semi-technologies/weaviate/pull/2300

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.15.5...v1.16.0

    Source code(tar.gz)
    Source code(zip)
  • v1.15.5(Oct 18, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Fix updating (PATCH) empty arrays by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2285
    • Fix retrieving vector inside cross-reference by @danieldaeschle [email protected] in https://github.com/semi-technologies/weaviate/pull/2059 and https://github.com/semi-technologies/weaviate/pull/2301
    • Fix /.well-known/openid-configuration path construction by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2307

    New Contributors

    • @danieldaeschle made their first contribution in #2059

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.15.4...v1.15.5

    Source code(tar.gz)
    Source code(zip)
  • v1.15.4(Oct 11, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Fix various aggregation issues for non-array data types by @aliszka in https://github.com/semi-technologies/weaviate/pull/2237
    • Fix race conditions when adding objects and in HNSW by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2253
    • Fix cross-class near object when using a beacon by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2260
    • Fix out-of-range error on old dimensions tracking logic by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2262
    • Fix combining of results with multiple shards by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2263
    • Fix module dependency class configuration support by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2278
    • Add support for all AWS IAM-based authorizations by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2271

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.15.3...v1.15.4

    Source code(tar.gz)
    Source code(zip)
  • v1.15.3(Sep 28, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Fix issue with vector cache limit introduced in v1.15.1 by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2240 v1.15.1 changed the default for the vector cache limit to the maximum value of an int64. This could lead to isssues when an implicit int->float conversion happened. Depending on the architecture, this would resolve in either an error or an overflow, thus leading to a negative cache limit. This version fixes this by explicitly handling the error, and setting the default to a safer value (1 Trillion), which will effectively act as unlimited.

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.15.2...v1.15.3

    Source code(tar.gz)
    Source code(zip)
  • v1.15.2(Sep 26, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Fix nil pointer error on empty aggregate group (accidentally introduced in v1.15.1)

    Internal Fixes or refactors

    • Improve weaviate docker image build time by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2219
    • Move hnsw.UserConfig to entities pkg by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2218
    • Add a simple way to determine memory usage percentage by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2220
    • Testmatrix for aggregation - array data types by @aliszka in https://github.com/semi-technologies/weaviate/pull/2234

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.15.1...v1.15.2

    Source code(tar.gz)
    Source code(zip)
  • v1.15.1(Sep 21, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    Indexing & Stability

    • Fix total of 6 bugs related to indexing into LSM Store and HNSW index by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2191
    • Fix potential SEGFAULT on concurrent nested filter and hash bucket compaction by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2212

    Sorting

    • Fix issues with sorting when multiple properties are set by @aliszka in https://github.com/semi-technologies/weaviate/pull/2214

    Aggregation

    • Fix date aggregation with multiple shards + numerical median computation by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2164
    • Fix seven aggregations bugs by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2192
    • Mark PointingTo as deprecated/experimental by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2216

    UX

    • Set restoration status in the sync part of RestoreBackup endpoint by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2187
    • Bugfix: Deleting a referenced class panics on GQL schema rebuild by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2189
    • Wrap backend error when checking for metadata existence by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2200
    • Fix HuggingFace module class settings by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2201
    • Add support for HuggingFace module error warnings in Weaviate's response by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2202
    • Fix issues around explicit nil values for properties by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2207
    • Set default for vector cache limit to unlimited by @bobvanluijt in https://github.com/semi-technologies/weaviate/pull/2217

    Other & Internal

    • Fix github actions for forked PR's by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2170
    • Add script to measure weaviate start time by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2178
    • Eliminate use of ObjectByID method where possible by @donomii in https://github.com/semi-technologies/weaviate/pull/2158
    • Normalize Grafana PromQL expressions in example dashboards by @trengrj in https://github.com/semi-technologies/weaviate/pull/2199
    • Add more unit tests for creating and restoring backups and fix metrics initialisation. by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2204
    • Fix duplicate metrics collector registration attempted panic by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2210
    • Fix sum-transformers module acceptance tests by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2215

    New Contributors

    • @trengrj made their first contribution in https://github.com/semi-technologies/weaviate/pull/2199

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.15.0...v1.15.1

    Source code(tar.gz)
    Source code(zip)
  • v1.15.0(Sep 7, 2022)

    Breaking Changes

    none

    New Features

    Cloud-Native Backups

    Weaviate v1.15 introduces the ability to backup to and restore from any cloud storage, such as (AWS) S3 and GCS. The backup implementation is minimally intrusive - even allowing writes while a backup is running. Learn more about how to use & configure backups here.

    • Add Skeleton for backup/snapshot provider by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2022
    • Introduce module types in the module system by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2036
    • REST API definitions of create and restore snapshots endpoints by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2034
    • Introduce modules provider method for getting storage providers by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2041
    • Filesystem backup storage by @aliszka in https://github.com/semi-technologies/weaviate/pull/2042
    • Coordinate snapshotting at LSM bucket level by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2028
    • Coordinate snapshots at HNSW Store level by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2039
    • Add "S3" backup storage provider as Weaviate Module by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2046
    • Support snapshot creation at the index level by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2074
    • Add "Google Cloud Storage" backup storage provider as Weaviate Module by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2069
    • Adjust GCS storage provider to new modules API by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2081
    • Fixes generate code from swagger script by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2082
    • Add restore functionality to storage-filesystem by @donomii in https://github.com/semi-technologies/weaviate/pull/2075
    • Support snapshot releasing at the index level by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2089
    • Add S3 storage to restore-snapshot functionality by @donomii in https://github.com/semi-technologies/weaviate/pull/2079
    • Add restore snapshot functionality to storage GCS module by @donomii in https://github.com/semi-technologies/weaviate/pull/2094
    • Add new methods to SnapshotStorage interface by @aliszka in https://github.com/semi-technologies/weaviate/pull/2098
    • Add className to snapshot, gcs, s3 restore by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2099
    • Add GetMetaStatus / SetMetaStatus to GCS and S3 provider modules by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2101
    • Backup Manager by @aliszka in https://github.com/semi-technologies/weaviate/pull/2097
    • Filesystem storage: implements DestinationPath, GetMetaStatus, SetMetaStatus, and adds className where needed by @donomii in https://github.com/semi-technologies/weaviate/pull/2107
    • Add DestinationPath to s3 and gcs modules by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2106
    • Initiate snapshot creation, get creation status by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2114
    • Restore from snapshot by @donomii in https://github.com/semi-technologies/weaviate/pull/2121
    • Create backup for filesystem, GCS, and S3 by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2120
    • Backup manager unit tests by @aliszka in https://github.com/semi-technologies/weaviate/pull/2130
    • Fix the design of back & restore feature by moving business logic back to the usecases by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2132
    • Require gcs/s3 bucket to exist for snapshot operation by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2131
    • Fix getting snapshotter when index doesn't exist by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2136
    • Restore Snapshots stored in GCS/S3 by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2140
    • Increase snapshot timeout settings and unify module acceptance tests by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2141
    • Add monitoring for backup and restore by @donomii in https://github.com/semi-technologies/weaviate/pull/2137
    • Ignore WAL files and make creating backups and writing objects mutually exclusive. by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2138
    • Decouple backup & restore use cases and endpoints from schema by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2152
    • Use consistent names for backup & restore and git rid of snapshot by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2161
    • Update snapshot meta, populate new meta fields at snapshotter level by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2154
    • Update swagger spec with revised backups API by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2162
    • Fix saving of the snapshot.json file in storage-filesystem module which may lead to an error during status fetch operation by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2166
    • Backup and restore multiple classes in the same request by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2171
    • prevent from corrupted backup files during backup restoration by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2185

    Go 1.19 (GOMEMLIMIT)

    Weaviate v1.15 is compiled using Go 1.19 which allows the usage of GOMEMLIMIT.

    • Upgrade to Go 1.19 by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2073

    Performance & Memory Improvements

    Go 1.15 contains various performance and memory improvements. Most notably; an introduction of a Red-Black Tree in the LSM Store which vastly imporives the performance of ordered imports, much improved filtered aggregations, and considerably lower memory requirements for the HNSW index:

    • Improve performance of memtable for sequential writes by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2056
    • Reduce memory allocations on filtered aggregation, startup and import by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2068
    • Increase HNSW Lock Performance, Remove Contention by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2057
    • Improve implementation of the visited list (fixes, new memory allocation strategy) by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2023
    • Add thread pooling for batch requests by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2104
    • Small aggregation memory allocation improvements by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2129
    • Improve Memory Footprint of HNSW Connections by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2146

    New Distance Metrics

    • Add "Manhattan distance" as an additional distance metric. by @sky-2002 in https://github.com/semi-technologies/weaviate/pull/1974
    • Add "Hamming distance" as a distance metric by @sky-2002 in https://github.com/semi-technologies/weaviate/pull/2048

    New Modules

    • Create HuggingFace module by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2143
    • Create sum-transformers module by @DasithEdirisinghe in https://github.com/semi-technologies/weaviate/pull/2142
      • Fix linter errors in sum-transformers module and enable linter build for forked repositories by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2167

    New Monitoring Metrics

    • Monitor LSM Memtable Vitals (Current Size, Durations of Operations) by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2031
    • Monitor Concurrent Requests (Read, Write) by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2033
    • Reduce Allocations of Prometheus Instructions by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2054
    • Include usage dimensions on Get Requests in optional Monitoring by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2139
    • Track vector index dimensions by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2163

    Fixes

    Various Bug Fixes

    Weaviate v1.15 fixes many smaller bugs. Thanks to everyone who reported them and helped us reproduce them.

    • Handle Common Request Errors that would otherwise be shown as panics by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2047
    • Fix a potential deadlock situation when shutting down or deleting a class by @aliszka in https://github.com/semi-technologies/weaviate/pull/2062
    • Fix an issue where async HNSW Cleanup Operations would not be cancelled correctly on Shutdown & Class Delete by @aliszka in https://github.com/semi-technologies/weaviate/pull/2072
    • Fix KNN classification failing with only one of the where filters being set by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2092
    • Fix nearObject together with certainty parameter gives inconsistent results by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2110
    • Fix get/update shard status to include remote shards by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2116
    • Fixes OpenAI module does not support nearText on aggregations by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2113
    • Fix integer overflow in byte operations by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2117
    • Fixes Aggregate queries throw error when grouping over array properties with nearObject filter present by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2119
    • Fix re-computing vector representation when patching a data object by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2127
    • Fix multiple aggregation edge cases by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2150
    • Fix potential nil-pointer panic in HNSW with "partially cleaned up nodes" by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2157
    • Fix numerical aggregation with empty results by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2160

    Refactoring & Developer Productivity

    • Standardize SegmentGroup receiver name by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2025
    • Parallelize test pipelines by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2029
    • Add performance tracking by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2027
    • Improve Dev-Server Script by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2045
    • Add Scripts to automatically execute performance benchmarks on Remote VM by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2067
    • Unify all errorCompounder definitions by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2065
    • Add SIFT and profiling data to dockerignore by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2070
    • Improve benchmark tests by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2051
    • Enable all tests for external PRs by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2077
    • Update golangci-lint to v1.48 and go1.19 on CI by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2080
    • Tombstones cleanup tests flakiness workaround by @aliszka in https://github.com/semi-technologies/weaviate/pull/2083
    • Fix pipelines for external PRs by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2084
    • Add information about an external build to travis pipelines by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2086
    • Adjust cyclemanager test times to get rid of flakiness by @aliszka in https://github.com/semi-technologies/weaviate/pull/2091
    • Fix generate code from swagger script (2) by @aliszka in https://github.com/semi-technologies/weaviate/pull/2093
    • Fix generate code from swagger script (3) by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2096
    • Use go testing temporary folders for testdata by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2122
    • Add gofumpt precommit hook by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2148
    • Add pull request template by @dirkkul in https://github.com/semi-technologies/weaviate/pull/2115
    • Flaky backup manager tests fix by @aliszka in https://github.com/semi-technologies/weaviate/pull/2156

    Improved Code-Level Documentation

    • Add package-level comment for lsmkv package by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2151
    • Document user-facing CRUD methods of lsmkv.Store and Bucket by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2159

    New Contributors

    • @dirkkul made their first contribution in https://github.com/semi-technologies/weaviate/pull/2029
    • @sky-2002 made their first contribution in https://github.com/semi-technologies/weaviate/pull/1974
    • @donomii made their first contribution in https://github.com/semi-technologies/weaviate/pull/2075
    • @DasithEdirisinghe made their first contribution in https://github.com/semi-technologies/weaviate/pull/2142

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.14.1...v1.15.0

    Source code(tar.gz)
    Source code(zip)
  • v1.15.0-alpha1(Aug 26, 2022)

    Preview version (not feature complete) of upcoming v1.15.0-release:

    Contained:

    • Various performance fixes (Aggregations, Ordered Imports)
    • Cloud-native Backup features

    Still to come:

    • API changes in backup feature expected
    • Documentation on new features
    • Client support for new features
    • Multi-node support for the backup feature

    This is a pre-release. Use at your own risk

    Source code(tar.gz)
    Source code(zip)
  • v1.14.1(Jul 8, 2022)

    Breaking Changes

    None

    Fixes

    • Prevent a potential "startup race" leading to nil-pointer panics by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2021

      Prior to this fix, it was possible for the HNSW index to start its delete cycle before the LSM store was ready. If this ever occurred, the result would be a nil-pointer panic as the HNSW delete cycle would try to access vectors from the LSM store which may not have existed. This fix makes sure that the delete cycle is only initiated as part of the PostStartup() routine which guarantees that the LSM store is fully up and running.

    New Features

    None

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.14.0...v1.14.1

    Source code(tar.gz)
    Source code(zip)
  • v1.14.0(Jul 7, 2022)

    Breaking Changes

    none

    Fixes

    • Fix a critical bug in compaction logic that could in extreme cases lead to data loss This bug could occur on setups that have frequent updates or deletes. There was a rare, but critical error in the compaction logic that could lead to the compaction operation either corrupting or completely losing data elements.
      • This could result in a variety of symptoms:
        • Filters that should match a specific number of objects matched fewer objects than expected
        • Retrieving an object by it's ID would lead to a different result than retrieving the object using a filter on the id property
        • Objects missing completely
        • Filters with limit=1 would not return any results when there should be exactly one element, but increasing the limit would then include the object
        • Filters would return results with null ids
      • Changes:
        • Fixes segment map and collection cursors to always properly set the next offset by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2009
        • Fixes a problem where duplicate ids might have been created during the compaction process by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/2004
    • Improve MTTR If Weaviate crashes unexpectedly, the Write-Ahead-Logs (WALs) need to parsed at startup to aid in recovery and make sure no data loss could have occurred. In extreme situations this recovery could take hours. This fixes makes sure that the recovery takes seconds or minutes at most.
      • Improve Object/Inverted Store Startup Time in normal and recovery situations by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1989
      • Flush any memtable that has been idle for a configured threshold by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2001
      • Optimize MTTR for Object and Hash buckets (LSM Store) by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2014
    • Fix an issue that would slow down imports on large-scale (25M+) imports Prior to this importing would slow down because of lock contention when frequently growing the index. This became most notable at large-scale setups where the index was already so big that growing it took a noticeable amount of time.
      • Improve HNSW Index Growth Operations, Monitor HNSW inserts & deletes by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1976
    • Fix various situations where Weaviate could panic
      • Panic on hnsw.ReadAddLinks operation by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/1960
      • Fix index out of range panic when trying to read links in hnsw index by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/1967
      • Fix an issue in the vector cache that could lead to "index out of range" panics by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2007
      • Fixes index out of range panic on parallel reset and cleanup tombstones methods calls by @aliszka in https://github.com/semi-technologies/weaviate/pull/2012
    • Flush hnsw commit log when shutting down shard by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1952
    • Fix batch delete by ref bug, unify whereFilter parsing by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1956
    • Properly shutdown vector index to remove dangling goroutines by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1959
    • Fix where filter ref prop bug by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1961
    • Fix Get query bug which fails to group when filter/sort included by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1966
    • Disable Explore queries for OpenAI module because of conflicting dimensionality across classes by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/1965
    • Bugfix: id filter returns empty when obj has no props by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1972
    • Fix aggregate groupBy/nearMedia bug: certainty was ignored by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1975
    • Fix object patch with nil vector failure occurring after restart by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1979
    • Fix an issue where a delete would not free up space in the vector cache by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1997
    • Prevent Caching of Filters in Batch Delete by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1999
    • Fixes data race warnings by @aliszka in https://github.com/semi-technologies/weaviate/pull/1985

    New Features

    • Support for Prometheus Monitoring See the Documentation on Monitoring for details.

      This release adds support for Prometheus-compatible monitoring. The monitoring can be used to asses the general health of the Weaviate setup, as well as debug query times, etc. Various examples that run a Weaviate+Prometheus+Grafana, as well as sample dashboards, are present.

      • Add minimal Prometheus Monitoring (Import time metrics) by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1964
      • Improve HNSW Monitoring Capabilities (Operations, Deletes, Cleanup) by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1971
      • Improve LSM Store Monitoring (Segment & Compaction Details) by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1973
      • Extend Prometheus Monitoring to include Metrics for Startup & Crash Recovery by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1988
      • Add Monitoring for Batch Delete Operations by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2006
      • Monitor number of imported objects by @etiennedi in https://github.com/semi-technologies/weaviate/pull/2005
    • Support Distance Metrics other than cosine See the Documentation on Distances for details. Prior to this release, distance metrics other than cosine only had experimental support. This was mainly due to limitations in the API which made assumptions about the distance metric being cosine. This release officially supports cosine, l2-squared, and dot distances. It also paves the way for adding new metrics in the future. (Contributions welcome).

      • Support dot product as distance metric by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2015
      • Display distance, score where certainty is displayed, deprecate certainty by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1992
      • Remove score as a user-facing similarity metric by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2013
      • Distance can be specified anywhere certainty can. Certainty to be deprecated by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2011
      • Check for the presence of certainty gql prop, handle according to distance type by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/2020
    • Add REST endpoints that respect the class as namespacing Prior to this release, there was ambiguity in the APIs. Many endpoints referred to an object only by its ID. This could lead to confusion if an ID would exist in multiple classes. This release introduces new endpoints that respect the class as a namespace. As a result, it is impossible that an operation such as DELETE would have an undesired side-effect outside of the target class. The old endpoints are still present for backward compatibility but are considered deprecated. They will be removed in a future version.

      • Extend REST-API with new endpoints to uniquely manipulate data objects of a specific class by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/1969
      • Extend REST-API with a new endpoint to add a cross-reference to a data object of a specific class by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/1994
      • Extend REST-API with a new endpoint to update cross-references of an object of a specific class by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2000
      • Extend REST-API with a new endpoint to delete cross-references of an object of a specific class by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2008
      • Change beacon structure to include class name by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2016
      • Extend HRef with the class name when available by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2018
      • Add warnings for deprecated endpoints and bump the rest-API version to 1.14.0 by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2017
      • Query objects of a specific class using class query parameter by @redouan-rhazouani in https://github.com/semi-technologies/weaviate/pull/2019
    • Add support for aggregating date fields by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1987

    Deprecations

    • Start using distance in favor of certainty. Read more here.
    • Stop using endpoints that refer to an object only by its ID without including the class name. These will be removed in the future. Instead, use newer endpoints that include the class name. Read more here.

    New Contributors

    • @redouan-rhazouani made their first contribution in https://github.com/semi-technologies/weaviate/pull/1969

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.13.2...v1.14.0

    Source code(tar.gz)
    Source code(zip)
  • v1.13.2(May 20, 2022)

    Breaking Changes

    none

    New Features (Preview)

    • L2 distance by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1953 This is in preview/experimental state and will be fully supported in v1.14.0

    Fixes

    • Bugfix: Patching object without vector by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1941
    • WVT-91: node values not changed on flatten operation by @aliszka in https://github.com/semi-technologies/weaviate/pull/1948
    • Panic on filtered vector search with flat-search-cutoff by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/1945
    • Fix ReadDeleteNode method in deserializer by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/1950
    • Fix for multi shard unlimited vector search by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/1955
    • WVT-31: missing return on equal keys for setTombstone by @aliszka in https://github.com/semi-technologies/weaviate/pull/1954

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.13.1...v1.13.2

    Source code(tar.gz)
    Source code(zip)
  • v1.13.1(May 3, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Fix HNSW Delete performance degredation on concurrent deletes in #1942 by @etienendi

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.13.0...v1.13.1

    Source code(tar.gz)
    Source code(zip)
  • v1.13.0(May 3, 2022)

    Breaking Changes

    none

    New Features

    • Faceted Search / Aggregate + near<Media> @antas-marcin, @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1790 This release allows combining a vector search (using nearVector, nearObject, nearText, etc.) with an Aggregation. This allows for faceted vector search. In order for such an Aggregation to work the vector search needs to be limiting the space somehow. This can either happen by specifying an explicit limit or by specifying a desired target certainty/distance.

    • Sorting by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/1886, https://github.com/semi-technologies/weaviate/pull/1924 This release adds the ability to sort search results. Sorting does not currently make us of a columnar storage mechanism specified for this property and instead needs to read parts of the affected objects from disk. This has a performance on very large datasets and an improved solution is expected to follow later on.

    • Support filtering by timestamp by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1930 Prior to this release timestamps such as creationTimeUnix and lastUpdateTimeUnix could not be used in filters. This release adds the ability to optionally include those fields in the inverted index. If included, they can be used in filters using a special underscore (_) notation. E.g. path: ["_creationTimeUnix"].

    • Batch Delete by Filter by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/1935 This release adds a new /v1/batch endpoint which allows for deleting all objects that match a specific filter.

    • Support DPR transformers models in text2vec-transformers @aliszka in https://github.com/semi-technologies/weaviate/pull/1911 These models differ from "regular" transformers models in that they use two separate models for encoding the query and the passage instead of using the same models. This two-model configuration is now supported.

    Fixes

    • Reduce allocations for map compaction: Overall 264 GB -> 126 GB by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1909
    • Fix flaky aggregate acceptance test by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1925
    • Updated DocumentationHref to redirect to correct links by @Asmit2952 in https://github.com/semi-technologies/weaviate/pull/1923
    • Grammatically updated readme file by @omarzain27 in https://github.com/semi-technologies/weaviate/pull/1858

    New Contributors

    • @Asmit2952 made their first contribution in https://github.com/semi-technologies/weaviate/pull/1923
    • @omarzain27 made their first contribution in https://github.com/semi-technologies/weaviate/pull/1858

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.12.2...v1.13.0

    Source code(tar.gz)
    Source code(zip)
  • v1.12.2(Apr 13, 2022)

    Breaking Changes

    none

    New Features

    none

    Bug Fixes

    • Bugfix for #1903 (LSM crash recovery journey for "Map" type) by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1904 Prior to this release a crash or other unexpected interruption may have left the inverted index in an unrecoverable state. It would return a panic on startup after a crash.

    • Fix limiting unlimited vector search by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1906 A new bug was introduced in v1.12.0 where - if both a limit and certainty were set on a vector search - the limit may have been ignored in some cases. This release fixes this and increases test coverage around this area to prevent further issues.

    • Modules: init dependencies logic panics on specific module init order by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/1902 Prior to this release initiating Weaviate with modules that have dependencies on other modules has resulted in startup errors in some (rare) cases. This Fix makes dependencies between modules more explicit and solves the startup issues.

    • gh-1900 add nil-check on findBestEntryPoint by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1910 A missing nil-pointer check in the HNSW delete logic may have returned errors in rare cases.

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.12.1...v1.12.2

    Source code(tar.gz)
    Source code(zip)
  • v1.12.1(Apr 7, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Index out of range panic #1897 This release fixes an issue that was introduced in v1.12.0 if upgrading from a v1.11.0 or prior. See #1897 for details. If you have run into this issue with v1.12.0, use v1.12.1 instead. If you have imported from scratch into v1.12.0 you should not have been affected by this issue, but upgrading to v1.12.1 is still recommended.
    Source code(tar.gz)
    Source code(zip)
  • v1.12.0(Apr 5, 2022)

    Important: This release may introduce a new bug if you are upgrading from v1.11.0. Please use v1.12.1 instead where this bug has been fixed.

    Breaking Changes

    none

    New Features

    • Index full string field by @aliszka in https://github.com/semi-technologies/weaviate/pull/1862, #1821 This new feature allows turning off tokenization for string fields, so that instead of splitting and indexing at the word boundary, the whole field is indexed. This allows for matching a string including spaces, and avoiding undesired partial string matching, such as returning "light grey" when the search was for "grey".

    • Make Inverted Index stopword lists fully configurable by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1870 This feature introduces a fully configurable stopword list to all inverted-index features. This is in anticipation of BM25 support (and mixed BM25/dense vector search) coming soon, but the feature also applies to exact matches on the inverted index.

    • Unlimited vector search by Certainty by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1883 Prior to this feature, a vector search with a specified certainty might have cut off too early if the internal limit hit the search first. For example, if the search returned exactly 100 results, but the last result was still within the desired certainty range, there was a chance that there would have been more matches that were not returned. This is especially critical when doing a vector-search-based aggregation (coming soon). This feature allows returning all certainty matches, no matter how many. A global maximum can be configured to prevent a query that matches the whole DB to provoke an OOM situation which would be a potential attack vector.

    • Shard API (Mark shard(s) as read-only) by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1860 This new feature exposes the status of the individual shards over the API and allows for marking a shard as ready that was previously marked as read-only. When a shard is marked read-only all read queries can continue but write queries are prohibited.

    • Feature/periodically scan disk by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1861 This new feature is the first to make use of the new shard-status API. There are two new configurable thresholds for disk pressure. If the disk usage exceeds a certain percentage (e.g. 80%) a warning is printed. If the disk pressure continues to rise and a second threshold (e.g. 90%) is crossed, all shards on that particular node will be automatically marked read-only.

    Fixes

    • Improve import performance on many-core machines by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1879 tl;dr: With this improvement, we have been able to see 20% faster imports on machines with many cores (e.g. 60 cores) while reducing memory spikes. The long version: Please see #1879 for what changes were made internally. Mainly limiting import workers to the amount of available CPU cores and reducing the necessity of locking by copying more memory to a local thread.

    • Fix HNSW commit log issue where the index would be too large after restart or crash by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1871, #1868 The compaction process for the HNSW index commit logs was losing some information leading to a situation where the links inside the HNSW graph were appended indefinitely, instead of being replaced. This led to massive index sizes after restarts that degraded performance and lead to unnecessarily large memory usage. This fix makes sure that all information is propagated correctly and indices are identical whether initially built-in memory or rebuilt from commit logs that were individually compacted.

    • Fix broken dynamic ef calculation by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1880, #1878 Version v1.9.0 introduced more control over setting ef at runtime. However, it did not work as expected. This commit fixes the values. Those who have never touched the ef setting and were using small limits, will see an improvement in vector search quality with all default ef parameters due to this fix. If you had manually set ef already, this fix has no effect on you.

    • The following internal/non-user-facing fixes were made either to improve reliability or to improve the DX for Weaviate contributors (fix flaky tests, etc):

      • (internal) Upgrade and fix golangci lint errors by @aliszka in https://github.com/semi-technologies/weaviate/pull/1864
      • (internal) Build tag regex pattern update by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1869
      • (internal) gh-1872 clean up disk segment list properly on shutdown by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1873
      • (internal) WEAVIATE-62 Remove obsolete hnsw files by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1875
      • (internal) gh-1868 fix broken locks in commit logger, make hnsw clean up after themselves by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1877
      • (internal) gh-1884 WEAVIATE-70 fix flaky multi-shard integration test by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1885

    New Contributors

    • @aliszka made their first contribution in https://github.com/semi-technologies/weaviate/pull/1864

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.11.0...v1.12.0

    Source code(tar.gz)
    Source code(zip)
  • v1.11.0(Mar 14, 2022)

    Changes

    Breaking Changes

    none

    New Features

    • Open AI module: provide API key at query time by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/1817 For untrusted server environments, it is not advised to store the third-party API key along with the setup. Instead it should be provided at runtime which is now possible in this release.

    Fixes

    • Improve way objects are counted on disk #1811 Prior to this release a meta { count } in Aggregate would require reading every object from disk. With this change the net additions of each segment are calculated when initializing the segment and count only has to sum up each segments' values which is orders of magnitude faster

    • Internally version index/shard changes more precisely #1833 This release introduces a new internal shard/index versioning system that will allow introducing breaking changes in a non-breaking fashion. For example, new indexes created with v1.11.0 will store the keys of the Map type in the LSM store in an always-sorted fashion for additional performance (see below). Indexes built prior to v1.11.0 will still work with this version as they will simply be sorted at read-time.

    • Delete using filter leaves objects searchable by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/1845 #1836 This release fixes an issue where duplicate IDs across classes led to issues on delete. Since DELETE /v1/objects/{id} would previously only delete the first object found, there could be a situation where the specified UUID still existed on another class. With this fix, every object with the specified ID - regardless of class - will be deleted. As part of this investigation we have found out that the current DELETE API is suboptimal: It does not take the class name as a parameter and therefore prevents classes from acting as real namespaces. We will deprecate this API in a future release and will add an alternative as part of the same release.

    • HNSW index fails if the initial insert has doc id > 24999 by @antas-marcin in https://github.com/semi-technologies/weaviate/pull/1851 #1848 With the introduction of importing objects without a vector and then adding a vector later in one of the previous releases, a new buggy situation was created: When more than 25,000 objects without a vector were imported, the next import with a vector would fail as the initial size of the HNSW index was smaller than the doc id and there was no check for the initial insert. This release fixes this.

    • Fix segfault by copying memory by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1843 #1837 Prior to this release, it was possible to run into a SEGFAULT when importing objects and querying the inverted index concurrently. The cause for this was memory that was shared longer than a lock was held in the LSM store. Therefore an LSM compaction could remove old segments while the memory of that segment was still in use, thus leading to a SEGFAULT. Since it is not reasonable to hold such a lock for excessively long times, this was fixed by copying the respective memory on read which effectively makes it read-only and thus thread-safe.

    • Fix create/update timestamp issues in GraphQL and REST by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1847 #1844 There were two issues: (1) the create timestamp would be overwritten on an update, (2) neither the create, nor the update timestamp could be retrieved using GraphQL. This release fixes both.

    • Fix LSM net additions count and add WAL threshold by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1855 #1830 Prior to this release very frequent object updates, such as during a POST /v1/batch/references request would led to very frequent flushing of the memtable leading to a lot of small segments. Initializing and merging those unnecessarily small segments cost a lot of time later on during large imports. This was caused by the memtable size assuming each write was an addition. Thus any replaced value would count into the flush size counter leading to a premature flush. This release fixes this behavior by considering the net additions of a write. In addition, a new threshold is introduced to make sure that update-only requests do not lead to excessively large WALs which don't increase the memtable size counter.

    • Improve how inverted index is stored on disk #1832 For performance reasons keys of the LSM store type Map will now be stored in an always-sorted fashion. This allows for faster merging and scoring at runtime. This change only affects new indices built from this version on and is non-breaking to older versions due to #1833.

    • Sort memtable KV pairs on read by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1853 #1852 Fixes an unreleased performance regression that would have been introduced by #1832

    • Fix nil-pointer in segment cursor by @etiennedi in https://github.com/semi-technologies/weaviate/pull/1859 #1850 Prior to this fix, it was possible to run into errors when listing objects while importing them. The cause for this bug was an incorrectly placed lock that was obtained slightly too late, thus allowing for the possibility of a race to occur. This fixes this by placing the lock correctly.

    • Fix typos/documentation

      • Fix deprecation log field typo by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1823
      • Restore build tags that travisbot removed by @parkerduckworth in https://github.com/semi-technologies/weaviate/pull/1857
      • Fixed typo in error message by @illagrenan in https://github.com/semi-technologies/weaviate/pull/1818

    New Contributors

    • @illagrenan made their first contribution in https://github.com/semi-technologies/weaviate/pull/1818
    • @parkerduckworth made their first contribution in https://github.com/semi-technologies/weaviate/pull/1823

    Full Changelog: https://github.com/semi-technologies/weaviate/compare/v1.10.1...v1.10.2

    Source code(tar.gz)
    Source code(zip)
  • v1.10.1(Feb 1, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Fixes a bug where grouping in Get {} would error #1810
      • group was returning an error during a type: merge operation due to lack of information about the object's class which is now mandatory to pass due to updates in the module system introduced in v1.9.x
      • group with type:merge was returning an error when there was a reference requested in a GraphQL query. If there was a match and a reference existed the implementation of the group made incorrect assumptions about the internal type of the id field
    Source code(tar.gz)
    Source code(zip)
  • v1.10.0(Jan 27, 2022)

    Breaking Changes

    none

    New Features

    • Open AI module This module gives you a very convenient way to integrate OpenAI embeddings into Weaviate. The module will act as a vectorizer for both importing documents and vectorizing queries. You can choose from any of the supported Open AI models and at inference time Weaviate will send requests to OpenAI. Requires a valid OpenAI API Key.

      See the Weaviate Open AI Module docs page for full usage instructions.

    • QnA rerank based on answer quality #1779 Prior to this version, QnA would always extract the answer from the top 1 result as determined by the semantic search as part of ask: { question: "foo?" }. Now, multiple answer "candidates" can be taken from the top n results. If one of the lower down results has a better qna-specific score than a previous result, this result is pushed up in the results. To enable, set ask: { rerank: true }. Note that this feature also removes the limitation that answer extraction would only happen on the first result (#1791). Be aware that a high limit value will lead to a high number of QnA inference calls.

    • HNSW EF boundaries & better control #1789 This feature gives you more control over the HNSW query time ef parameter and prevents it from dropping too low. Prior to this feature result quality could degrade on requests with low limits if ef was set to -1 which is the default and means "let Weaviate pick". You can now set a lower boundary (dynamicEfMin) which defaults to 100 and an upper boundary (dynamicEfMax) which defaults to 500. You can also alter the factor (dynamicEfFactor) which defaults to 8 and controls how ef is automatically derived from limit.

    • HEAD /v1/objects/{id} #1784 You can now send a HEAD request instead of a GET request to /v1/objects/{id} to efficiently check if an object exists without having to load the entire object from disk and unmarshal all its properties. The response has no body and returns either 204 when the object exists or 404 when it doesn't.

    • Allow adding objects without a vector #1800 Prior to this version a class would either skip vector indexing and never have a vector or it would allow indexing, but then a vector was required. Now you can import objects without a vector and later update them to include a vector.

    • Manually overwrite vector - even when vectorizer module is present #1801 Prior to this version, you would either use a vectorizer or none. But if you decided to go for a vectorizer you could not manually influence the vector. With this new feature, you can now override it. You still have to make sure that your vector is compatible with both its form (same dims), as well as semantics (i.e. matching vector space).

    Fixes

    • Fix a bug where incorrect module defaults would prevent updating HNSW settings #1799

    • Fix unnecessarily strict class name restrictions #1786 The only restrictions are now that a class name starts with an uppercase letter (to distinguish it in ref types from primitive props which are all lowercase) and that it's GraphQL-compatible. So class names, such as A_super_awesome_cl4ss_9001 is now valid.

    Source code(tar.gz)
    Source code(zip)
  • v1.9.1(Jan 19, 2022)

    Breaking Changes

    none

    New Features

    none

    Fixes

    • Allow running "conflicting" modules in the same setup (#1744)

      Prior to this release, Weaviate would not start up if multiple modules would try to provide the same search operators, such as nearText. For example text2vec-contextionary and text2vec-transformers could not run in the same setup. The reason for this was that Explore{} which would search across classes would not be able to handle incompatible vector spaces. This release makes sure that the provided search operator belongs to the configured vectorizer. In turn, cross-class searching across incompatible vector spaces such as using Explore {} will be deactivated if conflicting modules are present.

    • Grouping by ref prop leads to error (#1778)

      Thanks to Alex Cannan for discovering this

    • Delete fails on multi-node setup (#1780)

      Thanks to ayoub louati for discovering this.

    • Bug: Querying Date attributes fail when sharding (#1775)

      Thanks to @zoltan-fedor for discovering this

    • where filter with 2 Anded Like clauses and a nearText filter causes weaviate to panic. (#1772)

      Thanks to @StefanBogdan for discovering this

    • PATCH (merge) fails on multi-node cluster (#1781)

      Thanks to @zoltan-fedor for discovering this

    • Bug: Chained filter finds results when it shouldn't since v1.8.0 (#1770)

      Thanks to Pranav Pawar for discovering this

    • Bug: (another) potential data race in compaction logic (#1762)

    • [Bug] Limit removes viable results when data has been previously deleted (#1765)

      Thanks to @ywchan2005 for your help in investigating this

    Source code(tar.gz)
    Source code(zip)
Owner
SeMI Technologies
SeMI Technologies creates database software like the Weaviate vector search engine
SeMI Technologies
Real-time Charging System for Telecom & ISP environments

Real-time Online/Offline Charging System (OCS) for Telecom & ISP environments Features Real-time Online/Offline Charging System (OCS). Account Balance

null 368 Dec 31, 2022
community search engine

Lieu an alternative search engine Created in response to the environs of apathy concerning the use of hypertext search and discovery.

Alexander Cobleigh 167 Dec 24, 2022
Self hosted search engine for data leaks and password dumps

Self hosted search engine for data leaks and password dumps. Upload and parse multiple files, then quickly search through all stored items with the power of Elasticsearch.

Davide Pataracchia 22 Aug 2, 2021
A search engine for XKCD

xkcd_searchtool a search engine for XKCD What is it? This tool can crawling the comic transcripts from XKCD.com Users can search a comic using key wor

null 1 Sep 29, 2021
Zinc Search engine. A lightweight alternative to elasticsearch that requires minimal resources, written in Go.

Zinc Search Engine Zinc is a search engine that does full text indexing. It is a lightweight alternative to Elasticsearch and runs using a fraction of

null 13.1k Jan 1, 2023
This Go based project of Aadhyarupam Innovators demonstrate the code examples for building microservices, integration with cloud services (Google Cloud Firestore), application configuration management (Viper) etc.

This Go based project of Aadhyarupam Innovators demonstrate the code examples for building microservices, integration with cloud services (Google Cloud Firestore), application configuration management (Viper) etc.

Aadhyarupam 1 Dec 22, 2022
State observer - StateObserver used to synchronize the local(cached) state of the remote object with the real state

state observer StateObserver used to synchronize the local(cached) state of the

Ilya 2 Jan 19, 2022
traning helper. Reading real METARs

pptrain Train reading real METARs Example: $ pptrain

Stanislav Vitko 0 Jan 23, 2022
The gofinder program is an acme user interface to search through Go projects.

The gofinder program is an acme user interface to search through Go projects.

null 22 Jun 14, 2021
Universal code search (self-hosted)

Sourcegraph OSS edition is a fast, open-source, fully-featured code search and navigation engine. Enterprise editions are available. Features Fast glo

Sourcegraph 7.2k Jan 9, 2023
using go search the Marvel universe characters via marvel api

go-marvel-api using go search the Marvel universe characters via marvel api Build and run tests on the local environemnt Build the project $ go build

Burak KÖSE 1 Oct 5, 2021
Alfred 4 workflow to easily search and launch bookmarks from the Brave Browser

Alfred Brave Browser Bookmarks A simple and fast workflow for searching and launching Brave Browser bookmarks. Why this workflow? No python dependency

Josh Newman 9 Nov 28, 2022
Quick search and short links for NYC Council Legislation

Quick Search and Short Links for NYC Council Legislation Quick Search Link to searches with /?q=${query}. In-browser searching is implemented with fle

Jehiah Czebotar 7 Oct 12, 2022
Search running process for a given dll/function. Exposes a bufio.Scanner-like interface for walking a process' PEB

Search running process for a given dll/function. Exposes a bufio.Scanner-like interface for walking a process' PEB

Alex Flores 2 Apr 21, 2022
Target Case Study - Document Search

Target Case Study - Document Search Goal The goal of this exercise is to create

Warren V 0 Feb 7, 2022
Native Go bindings for D-Bus

go.dbus go.dbus is a simple library that implements native Go client bindings for the D-Bus message bus system. Features Complete native implementatio

Georg Reinke 121 Nov 20, 2022
IBus Engine for GoVarnam. An easy way to type Indian languages on GNU/Linux systems.

IBus Engine For GoVarnam An easy way to type Indian languages on GNU/Linux systems. goibus - golang implementation of libibus Thanks to sarim and haun

Varnamproject 10 Feb 10, 2022
A BPMN engine, meant to be embedded in Go applications with minim hurdles, and a pleasant developer experience using it.

A BPMN engine, meant to be embedded in Go applications with minim hurdles, and a pleasant developer experience using it. This approach can increase transparency for non-developers.

Martin W. Kirst 94 Dec 29, 2022
Program to generate ruins using the Numenera Ruin Mapping Engine

Ruin Generator This is my attempt to build a program to generate ruins for Numenera using the rules from the Jade Colossus splatbook. The output only

Sean Hagen 0 Nov 7, 2021