Hybrid search score calculation anomaly

I have a problem about the scoring calculation method for hybrid search.
In the case of BM25 and vector search, the chunks that are ranked higher in the results are pushed back in the case of hybrid search.
There are also unknown scores added to the calculated results.
Why are these scores, marked in green, added?

Hi! Sorry for the delay here.

Could you produce a python notebook where we can reproduce this?

This would help a lot to scale with core team

Thanks!

Dear sir

I can attach below source.

Please check it.

#vector

response = (

weaviate_client.query

.get("KB_MD_1000_0115", [ "subject" ,"text" ])

.with_near_text({"concepts": ["¹«Áö°³´Ù¸®À§·Î±Ý(°­¾ÆÁö, »ç¸Á)¡¼°»½Å°è¾à¡½"]})

.with_limit(5)

.with_additional(["id", "distance"])

.do()

)

response

#bm25

response = (

weaviate_client.query

.get("KB_MD_1000_0115", [ "subject","text"])

.with_bm25(

query="¹«Áö°³´Ù¸®À§·Î±Ý(°­¾ÆÁö, »ç¸Á)¡¼°»½Å°è¾à¡½",

)

.with_additional("score" )

.with_limit(5)

.do()

)

response

#hybrid

response = (

weaviate_client.query

.get("KB_MD_1000_0115", [ "subject","text"])

.with_hybrid(

query="¹«Áö°³´Ù¸®À§·Î±Ý(°­¾ÆÁö, »ç¸Á)¡¼°»½Å°è¾à¡½",

alpha=0.5,

)

.with_limit(5)

.with_additional(["id","score", "explainScore"])

.do()

)

response

class_obj = {

"class": "KB_MD_1000_0115",

"description": "KB document",

"properties": [

{

"dataType": ["text"],

"description": "Content",

"name": "text",

'indexFilterable': True,

'indexSearchable': True,

"moduleConfig": {

"text2vec-openai": {

"skip": False,

"vectorizePropertyName": False

}

}

},

{

"dataType": ["text"],

"description": "subject",

"name": "subject",

'indexFilterable': True,

'indexSearchable': True,

"moduleConfig": {

"text2vec-openai": {

"skip": False,

"vectorizePropertyName": False

}

}

},

{

"dataType": ["text"],

"description": "Document source",

"name": "source",

'indexFilterable': True,

'indexSearchable': False,

"moduleConfig": {

"text2vec-openai": {

"skip": True,

"vectorizePropertyName": False

}

}

}

],

"vectorizer": "text2vec-openai",

"moduleConfig": {

"text2vec-openai": {

"model": "ada",

"modelVersion": "002",

"type": "text",

"vectorizeClassName": False

}

}

}

weaviate_client.schema.create_class(class_obj)

Thanks

Hi! That is still hard to reproduce as we do not have the data or how it was ingested.

Can you produce a python notebook? Those has proven to be the best way for sharing this kind of issue.

Please, feel free to ping me in our slack so I can help you on that.

Thanks!