I have a problem about the scoring calculation method for hybrid search.
In the case of BM25 and vector search, the chunks that are ranked higher in the results are pushed back in the case of hybrid search.
There are also unknown scores added to the calculated results.
Why are these scores, marked in green, added?
Hi! Sorry for the delay here.
Could you produce a python notebook where we can reproduce this?
This would help a lot to scale with core team
Thanks!
Dear sir
I can attach below source.
Please check it.
#vector
response = (
weaviate_client.query
.get("KB_MD_1000_0115", [ "subject" ,"text" ])
.with_near_text({"concepts": ["¹«Áö°³´Ù¸®À§·Î±Ý(°¾ÆÁö, »ç¸Á)¡¼°»½Å°è¾à¡½"]})
.with_limit(5)
.with_additional(["id", "distance"])
.do()
)
response
#bm25
response = (
weaviate_client.query
.get("KB_MD_1000_0115", [ "subject","text"])
.with_bm25(
query="¹«Áö°³´Ù¸®À§·Î±Ý(°¾ÆÁö, »ç¸Á)¡¼°»½Å°è¾à¡½",
)
.with_additional("score" )
.with_limit(5)
.do()
)
response
#hybrid
response = (
weaviate_client.query
.get("KB_MD_1000_0115", [ "subject","text"])
.with_hybrid(
query="¹«Áö°³´Ù¸®À§·Î±Ý(°¾ÆÁö, »ç¸Á)¡¼°»½Å°è¾à¡½",
alpha=0.5,
)
.with_limit(5)
.with_additional(["id","score", "explainScore"])
.do()
)
response
class_obj = {
"class": "KB_MD_1000_0115",
"description": "KB document",
"properties": [
{
"dataType": ["text"],
"description": "Content",
"name": "text",
'indexFilterable': True,
'indexSearchable': True,
"moduleConfig": {
"text2vec-openai": {
"skip": False,
"vectorizePropertyName": False
}
}
},
{
"dataType": ["text"],
"description": "subject",
"name": "subject",
'indexFilterable': True,
'indexSearchable': True,
"moduleConfig": {
"text2vec-openai": {
"skip": False,
"vectorizePropertyName": False
}
}
},
{
"dataType": ["text"],
"description": "Document source",
"name": "source",
'indexFilterable': True,
'indexSearchable': False,
"moduleConfig": {
"text2vec-openai": {
"skip": True,
"vectorizePropertyName": False
}
}
}
],
"vectorizer": "text2vec-openai",
"moduleConfig": {
"text2vec-openai": {
"model": "ada",
"modelVersion": "002",
"type": "text",
"vectorizeClassName": False
}
}
}
weaviate_client.schema.create_class(class_obj)
Thanks
Hi! That is still hard to reproduce as we do not have the data or how it was ingested.
Can you produce a python notebook? Those has proven to be the best way for sharing this kind of issue.
Please, feel free to ping me in our slack so I can help you on that.
Thanks!