Hybrid score between 2 sentences via Weaviate hybrid search

Hi community,
I would like to use a hybrid metric (sparse+dense) to compute similarity between 2 sentences, but struggles with using the hybrid search of Weaviate. Is it possible to compute the similarity between 2 sentences only using where_filter and with_hybrid ?
When I do this:

code_content = "CNT16421"
clean_user_def_mem_recall ="La biodiversité, c'est l'ensemble des espèces vivantes qui existent sur la planète et qui vivent ensemble en interdépendance."


    where_filter = {
        "path":["code"],
        "operator": "Equal",
        "valueString": code_content
    }

    ## Use hybrid search to compute similarity score between theoretical and user memory recall definitions
    payload = {"text_list": [clean_user_def_mem_recall]}
    clean_user_def_mem_recall_embed_list = await asyncio.create_task(post_query_main(api_sent_embed_address_mr, payload))

    search = client.query\
        .get("MemoryRecallLeitner", ["code","memoryRecallDefinition"])\
        .with_where(where_filter)\
        .with_additional('score')\
        .with_hybrid(clean_user_def_mem_recall, alpha=0.75, vector=clean_user_def_mem_recall_embed_list[0])\
        .with_limit(1)\
        .do()

I got the following results:
[{'_additional': {'score': '0.016393442'}, 'code': 'CNT16421', 'memoryRecallDefinition': "La biodiversité, c'est l'ensemble des espèces vivantes qui existent sur la planète et qui vivent ensemble en interdépendance."}]

But, if I modify the query:

 code_content = "CNT16421"
clean_user_def_mem_recall ="C'est la biodiversité quoi."


    where_filter = {
        "path":["code"],
        "operator": "Equal",
        "valueString": code_content
    }

    ## Use hybrid search to compute similarity score between theoretical and user memory recall definitions
    payload = {"text_list": [clean_user_def_mem_recall]}
    clean_user_def_mem_recall_embed_list = await asyncio.create_task(post_query_main(api_sent_embed_address_mr, payload))

    search = client.query\
        .get("MemoryRecallLeitner", ["code","memoryRecallDefinition"])\
        .with_where(where_filter)\
        .with_additional('score')\
        .with_hybrid(clean_user_def_mem_recall, alpha=0.75, vector=clean_user_def_mem_recall_embed_list[0])\
        .with_limit(1)\
        .do()

I still got the same score: [{'_additional': {'score': '0.016393442'}, 'code': 'CNT16421', 'memoryRecallDefinition': "La biodiversité, c'est l'ensemble des espèces vivantes qui existent sur la planète et qui vivent ensemble en interdépendance."}]

Whereas, if I use pure dense search on both input queries I got naturally different scores:

code_content = "CNT16421"
    clean_user_def_mem_recall ="La biodiversité, c'est l'ensemble des espèces vivantes qui existent sur la planète et qui vivent ensemble en interdépendance."


    where_filter = {
        "path":["code"], 
        "operator": "Equal", 
        "valueString": code_content
    }

    ## Use hybrid search to compute similarity score between theoretical and user memory recall definitions
    payload = {"text_list": [clean_user_def_mem_recall]}
    clean_user_def_mem_recall_embed_list = await asyncio.create_task(post_query_main(api_sent_embed_address_mr, payload))

    nearVector = {"vector": clean_user_def_mem_recall_embed_list[0]}
    search = client.query\
        .get("MemoryRecallLeitner", ["code","memoryRecallDefinition"])\
        .with_where(where_filter)\
        .with_additional('certainty')\
        .with_near_vector(nearVector)\
        .with_limit(1)\
        .do()

gave this result: [{'_additional': {'certainty': 0.999999612569809}, 'code': 'CNT16421', 'memoryRecallDefinition': "La biodiversité, c'est l'ensemble des espèces vivantes qui existent sur la planète et qui vivent ensemble en interdépendance."}]

Whereas:

code_content = "CNT16421"
    clean_user_def_mem_recall ="C'est la biodiversité quoi."


    where_filter = {
        "path":["code"], 
        "operator": "Equal", 
        "valueString": code_content
    }

    ## Use hybrid search to compute similarity score between theoretical and user memory recall definitions
    payload = {"text_list": [clean_user_def_mem_recall]}
    clean_user_def_mem_recall_embed_list = await asyncio.create_task(post_query_main(api_sent_embed_address_mr, payload))

    nearVector = {"vector": clean_user_def_mem_recall_embed_list[0]}
    search = client.query\
        .get("MemoryRecallLeitner", ["code","memoryRecallDefinition"])\
        .with_where(where_filter)\
        .with_additional('certainty')\
        .with_near_vector(nearVector)\
        .with_limit(1)\
        .do()

Gave this result: [{'_additional': {'certainty': 0.8760087788105011}, 'code': 'CNT16421', 'memoryRecallDefinition': "La biodiversité, c'est l'ensemble des espèces vivantes qui existent sur la planète et qui vivent ensemble en interdépendance."}]

Could anyone give me advice on how adequately compute hybrid score between 2 sentences, knowing that the reference sentence is in the Weaviate database (localized with where_filter ) while the second one is the query provided on input of the Weaviate search?

Hi @mattvan83,

(warning! I am making assumptions)
I have a feeling that you don’t need a hybrid search, but what you need is:

  1. where_filter – to find the sentence you want to compare against
  2. near_text – to calculate the similarity between two sentences
where_filter = {
    "path":["code"],
    "operator": "Equal",
    "valueString": "the sentence stored in the database goes here"
}

response = (
    client.query
    .get("MemoryRecallLeitner", ["code","memoryRecallDefinition"])
    .with_near_text({
        "concepts": ["the query sentence goes here"]
    })
    .with_limit(1)
    .with_additional(["distance"])
    .do()
)

print(json.dumps(response, indent=2))

I hope this helps.