This is a community-editable list of Weaviate Frequently Asked Questions. Contributions are welcome!
How can a Get query search across multiple classes / How can I search not in a specific class?
A1: Have a look at Explore instead.
A2: If you have a vector close to your target objects, and the objects have vectors with the same number of dimensions, you can use nearVector to search across classes.
Does Weaviate support full-text filtering?
Yes, you have multiple options:
- If you need pure filtering, you can use the
Equal
operator which matches at the word-boundary by default (could also be configured to match at the field boundary). For example,Equal
“Hello world” would match any text that contains both “hello” and “world”. This can be combined with similarity search. - Weaviate supports BM25 scoring, if you also want full-text ranking.
- BM25 scoring can also be combined with vector search by using hybrid search
One limitation to keep in mind is that Weaviate doesn’t yet have support for stemming, but this is on the roadmap.
Why does the Equal
operator on a text
field return objects that include the operand?
The behavior of the Equal operator depends on the tokenization
property. For Equal
to only match objects in which the valueText
is equal to the entire field, set the tokenization to field
.
What is the optimal batch size?
Weaviate will parallelize requests based on the number of CPU cores that the server has. For example if there are 16 CPUs and the batch size is 320, each CPU would have a backlog of 20 objects to import per request. It’s a matter of balancing the request time (keep it as short as possible) and making sure Weaviate does not run out of work. A very, very rough rule would be to use 20 * number_of_cpu_cores
. Note that this assumes the user provides their own vectors. If they have a vectorization step which may be a bottleneck depending on setup, the ideal batch size may be much smaller.
The goal is for a single request to not take more than a few seconds at most. If a single request takes 10 or 15 seconds, there is no benefit over sending 10 or 15 requests that take 1s each, but there is a big risk of running into timeouts.
How many simultaneous network requests are allowed before getting time outs?
- Go is incredibly good at handling tons and tons of network requests
- Realistically the load that each request brings will be a problem long before the requests themselves will be a problem
Let’s say you have 1000 parallel batches of 100 objects each. That’s 100,000 concurrent imports, which is a much bigger performance drain than having 1,000 request open.
In other words, requests will timeout once Weaviate can’t handle the import load anymore, not because of network volume/connections.
Is the img2vec module included in WCS?
The img2vec
module is not offered by default in WCS. Please contact us to set it up.
Does auto-schema work with WCS?
Yes, auto-schema is enabled on Weaviate Cloud Services.
After importing millions of objects, I get a “Weaviate document store is read only” error. Is there a limit to the number of objects Weaviate can store?
No. The error means you are running low on disk space. You should increase your disk space and then refer to our docs to remove the error message.
Why does Weaviate appear to produce low-quality search results?
The quality of the search results is determined by the model that produces the vector embeddings. Weaviate only performs the vector search portion of the search. To improve search results quality, consider using a different model, or creating a custom module for vectorization.