Description
I tried to add groupby feature in my search code(hybrid, similarity and keywords). but I found a wired results:
- I set return_metadata=MetadataQuery(distance=True, score=True, explain_score=True).
- do three searchs(hybrid, similarity and keywords) with the same collection and the same query.
- In hybrid search results: no score and explain_score. Only distance info will be displayed
- In similarity search results: no score and explain_score(yes, similarity search will not return such results), the distance results seems resonable.
- In keywrod search results: no score and explain_score. All of distance results are 0 (it’s ok because keyword search will not return distance info)
- when I remove the groupby parameter, all response will display metadata correctly.
It seems there is a inner order/prompt in the groupby feature: only return distance results. If not, return 0.
Here is my code part and results:
....
# groupby setting
groupby_setting = GroupBy(prop="file_id",
number_of_groups=5,
objects_per_group=1)
# hybrid search
alpha = config.get_setting('search')['revrieval_alpha']
hybrid_response = self.db_instance.child_collection.query.hybrid(
query=query_content,
query_properties=["content"],
target_vector="content_vector",
return_metadata=MetadataQuery(distance=True, score=True, explain_score=True),
limit=config.get_setting("search")["revrieval_num"],
alpha=alpha,
group_by=groupby_setting,
filters=condition_filter)
# keyword search
keyword_rsponse = self.db_instance.child_collection.query.bm25(
query=query_content,
query_properties=["content"],
# target_vector="content_vector",
return_metadata=MetadataQuery(distance=True, score=True, explain_score=True),
limit=config.get_setting("search")["revrieval_num"],
# alpha=1-alpha,
group_by=groupby_setting,
filters=condition_filter)
# similarity search
similar_response = self.db_instance.child_collection.query.near_text(
query=query_content,
# query_properties=["content"],
target_vector="content_vector",
return_metadata=MetadataQuery(distance=True, score=True, explain_score=True),
limit=config.get_setting("search")["revrieval_num"],
# alpha=alpha,
group_by=groupby_setting,
filters=condition_filter)
print('hybrid search')
for key, value in hybrid_response.groups.items():
print(key)
print(value)
print('#################')
print('keywords search')
for key, value in keyword_rsponse.groups.items():
print(key)
print(value)
print('#################')
print('similarity search')
for key, value in similar_response.groups.items():
print(key)
print(value)
print('#################')
printed outputs:
hybrid search
6666
Group(name='6666', min_distance=0.5924214124679565, max_distance=0.5924214124679565, number_of_objects=1, objects=[GroupByObject(uuid=_WeaviateUUIDInt('7a455a5f-8287-428c-b3b6-fade1e4395e2'), metadata=GroupByMetadataReturn(distance=0.5924214124679565), properties={'chunk_id': 28, 'parent_uuid': 'f874a9c4-0f81-489e-bd26-2940f3dea768', 'file_id': '6666', 'user_id': 'tangliuzhao', 'chunk_page_number': -1, 'chunk_type': 'text', 'content': '文件的文件名 “他是个伟大的人。'}, references=None, vector={}, collection='Knowledge_child_collection', belongs_to_group='6666')], rerank_score=0.0)
#################
keywords search
6666
Group(name='6666', min_distance=0.0, max_distance=0.0, number_of_objects=1, objects=[GroupByObject(uuid=_WeaviateUUIDInt('7ee72cda-f72b-43b2-a156-1c4477910fa8'), metadata=GroupByMetadataReturn(distance=0.0), properties={'chunk_type': 'text', 'parent_uuid': 'f6b7e7a8-1c3a-4034-913c-58dff065e138', 'file_id': '6666', 'user_id': 'tangliuzhao', 'chunk_page_number': 0, 'chunk_id': 4, 'content': '文件的文件名 - Slide Page: 1 Huang Nan: A versatile talent with multiple fields of development'}, references=None, vector={}, collection='Knowledge_child_collection', belongs_to_group='6666')], rerank_score=0.0)
#################
similarity search
6666
Group(name='6666', min_distance=0.5924214124679565, max_distance=0.5924214124679565, number_of_objects=1, objects=[GroupByObject(uuid=_WeaviateUUIDInt('7a455a5f-8287-428c-b3b6-fade1e4395e2'), metadata=GroupByMetadataReturn(distance=0.5924214124679565), properties={'chunk_id': 28, 'parent_uuid': 'f874a9c4-0f81-489e-bd26-2940f3dea768', 'file_id': '6666', 'user_id': 'tangliuzhao', 'chunk_page_number': -1, 'chunk_type': 'text', 'content': '文件的文件名 “他是个伟大的人。'}, references=None, vector={}, collection='Knowledge_child_collection', belongs_to_group='6666')], rerank_score=0.0)
#################
Now I want to know how to return a “groupby” result based on score in hybrid search and keywords search not just based on distance.
Server Setup Information
- Weaviate Server Version:1.31.2
- Deployment Method: docker
- Multi Node? Number of Running Nodes: 1 node
- Client Language and Version: 4.15.2
- Multitenancy?: No