Hybrid search score calculation anomaly

Eunjoo · January 19, 2024, 1:04am

I have a problem about the scoring calculation method for hybrid search.
In the case of BM25 and vector search, the chunks that are ranked higher in the results are pushed back in the case of hybrid search.
There are also unknown scores added to the calculated results.
Why are these scores, marked in green, added?

DudaNogueira · January 24, 2024, 4:48pm

Hi! Sorry for the delay here.

Could you produce a python notebook where we can reproduce this?

This would help a lot to scale with core team

Thanks!

Eunjoo · January 24, 2024, 11:53pm

Dear sir

I can attach below source.

Please check it.

#vector

response = (

weaviate_client.query

.get("KB_MD_1000_0115", [ "subject" ,"text" ])

.with_near_text({"concepts": ["¹«Áö°³´Ù¸®À§·Î±Ý(°¾ÆÁö, »ç¸Á)¡¼°»½Å°è¾à¡½"]})

.with_limit(5)

.with_additional(["id", "distance"])

.do()

)

response

#bm25

response = (

weaviate_client.query

.get("KB_MD_1000_0115", [ "subject","text"])

.with_bm25(

query="¹«Áö°³´Ù¸®À§·Î±Ý(°¾ÆÁö, »ç¸Á)¡¼°»½Å°è¾à¡½",

)

.with_additional("score" )

.with_limit(5)

.do()

)

response

#hybrid

response = (

weaviate_client.query

.get("KB_MD_1000_0115", [ "subject","text"])

.with_hybrid(

query="¹«Áö°³´Ù¸®À§·Î±Ý(°¾ÆÁö, »ç¸Á)¡¼°»½Å°è¾à¡½",

alpha=0.5,

)

.with_limit(5)

.with_additional(["id","score", "explainScore"])

.do()

)

response

class_obj = {

"class": "KB_MD_1000_0115",

"description": "KB document",

"properties": [

{

"dataType": ["text"],

"description": "Content",

"name": "text",

'indexFilterable': True,

'indexSearchable': True,

"moduleConfig": {

"text2vec-openai": {

"skip": False,

"vectorizePropertyName": False

}

}

},

{

"dataType": ["text"],

"description": "subject",

"name": "subject",

'indexFilterable': True,

'indexSearchable': True,

"moduleConfig": {

"text2vec-openai": {

"skip": False,

"vectorizePropertyName": False

}

}

},

{

"dataType": ["text"],

"description": "Document source",

"name": "source",

'indexFilterable': True,

'indexSearchable': False,

"moduleConfig": {

"text2vec-openai": {

"skip": True,

"vectorizePropertyName": False

}

}

}

],

"vectorizer": "text2vec-openai",

"moduleConfig": {

"text2vec-openai": {

"model": "ada",

"modelVersion": "002",

"type": "text",

"vectorizeClassName": False

}

}

}

weaviate_client.schema.create_class(class_obj)

Thanks

DudaNogueira · January 30, 2024, 2:10pm

Hi! That is still hard to reproduce as we do not have the data or how it was ingested.

Can you produce a python notebook? Those has proven to be the best way for sharing this kind of issue.

Please, feel free to ping me in our slack so I can help you on that.

Thanks!

Topic		Replies	Views
Hybrid similarity scoring is so weird - it doesn't make any sense Support	1	170	November 12, 2024
How can we make hybrid search results more predictable? Support	8	1160	November 4, 2023
Weaviate HybridSearch explainScore General technical	1	337	March 17, 2025
Hybrid score returning NONE scores randomly for the same query with the same chunks Support bug	6	340	July 15, 2024
Scores for Hybrid search Support technical	6	275	January 6, 2025

Hybrid search score calculation anomaly

Related topics