This is a community-editable list of Weaviate Frequently Asked Questions. Contributions are welcome!
A1: Have a look at Explore instead.
A2: If you have a vector close to your target objects, and the objects have vectors with the same number of dimensions, you can use nearVector to search across classes.
Yes, you have multiple options:
- If you need pure filtering, you can use the
Equaloperator which matches at the word-boundary by default (could also be configured to match at the field boundary). For example,
Equal“Hello world” would match any text that contains both “hello” and “world”. This can be combined with similarity search.
- Weaviate supports BM25 scoring, if you also want full-text ranking.
- BM25 scoring can also be combined with vector search by using hybrid search
One limitation to keep in mind is that Weaviate doesn’t yet have support for stemming, but this is on the roadmap.
The behavior of the Equal operator depends on the
tokenization property. For
Equal to only match objects in which the
valueText is equal to the entire field, set the tokenization to
Weaviate will parallelize requests based on the number of CPU cores that the server has. For example if there are 16 CPUs and the batch size is 320, each CPU would have a backlog of 20 objects to import per request. It’s a matter of balancing the request time (keep it as short as possible) and making sure Weaviate does not run out of work. A very, very rough rule would be to use
20 * number_of_cpu_cores. Note that this assumes the user provides their own vectors. If they have a vectorization step which may be a bottleneck depending on setup, the ideal batch size may be much smaller.
The goal is for a single request to not take more than a few seconds at most. If a single request takes 10 or 15 seconds, there is no benefit over sending 10 or 15 requests that take 1s each, but there is a big risk of running into timeouts.
- Go is incredibly good at handling tons and tons of network requests
- Realistically the load that each request brings will be a problem long before the requests themselves will be a problem
Let’s say you have 1000 parallel batches of 100 objects each. That’s 100,000 concurrent imports, which is a much bigger performance drain than having 1,000 request open.
In other words, requests will timeout once Weaviate can’t handle the import load anymore, not because of network volume/connections.
img2vec module is not offered by default in WCS. Please contact us to set it up.
After importing millions of objects, I get a “Weaviate document store is read only” error. Is there a limit to the number of objects Weaviate can store?
No. The error means you are running low on disk space. You should increase your disk space and then refer to our docs to remove the error message.
The quality of the search results is determined by the model that produces the vector embeddings. Weaviate only performs the vector search portion of the search. To improve search results quality, consider using a different model, or creating a custom module for vectorization.