Description
We’re currently using Pinecone in our company and would like to extend to an engine that lets us perform hybrid search, as doing this in Pinecone is non-trivial. We’ve looked at OpenSearch’s Neural Search plugin, but the problem is that this plugin also doesn’t allow for pre-filtering using Boolean queries as there seems to be an incompatibility with Lucene.
The way that we’re using our vector database is that we basically have a single index that contains a lot of different vectors. One good example as to why we need pre-filtering is because our clients use different languages. If a user’s query is in, for example, English, then we would only want to search within the subset of English vectors (i.e., metadata.lang == "en"
).
I thought that Weaviate supported this but it seems like it doesn’t?
Here’s my setup:
filters = (
Filter.by_property("type").equal("dummy-type") &
Filter.by_property("lang").equal("en")
)
dense_search_results = weaviate_index.query.near_vector(
near_vector=query_embedding_vector,
limit=20,
return_metadata=MetadataQuery(distance=True),
filters=filters,
)
hybrid_search_results = weaviate_index.query.hybrid(
query=query_text,
vector=query_embedding_vector,
alpha=0.5,
limit=20,
return_metadata=MetadataQuery(score=True),
filters=filters,
fusion_type=HybridFusion.RELATIVE_SCORE,
)
As you can see, I’m using a type called "dummy-type"
for testing purposes. the dense_search_results
is correctly []
, but the hybrid_search_results
just returns a bunch of different vectors that seem to completely disregard the filtering logic.
Adding post-filtering logic isn’t really an option right now, since making that work reliably also doesn’t seem that easy to do.
Any opinions are appreciated. Thanks!
Server Setup Information
- Weaviate Server Version: 1.30.0
- Deployment Method: Docker
- Multi Node? Number of Running Nodes: 1
- Client Language and Version: Python 3.12
- Multitenancy?: No.