Retrieved document score returns 1.0 (100% relevant) when used with LlamaIndex

SoftwearEnginear · May 21, 2024, 7:40am

Description

I am building a RAG pipeline using LlamaIndex and Weaviate as the vector database. I referred to the setup guide based on LlamaIndex documentation here Weaviate Vector Store - LlamaIndex

When performing a query with the query engine, it returns a tuple containing the response and the source nodes. The source nodes contains information such as file metadata and score.

The raw format of score is between 0 to 1, where 0 is irrelevant and 1 is exactly relevant.

Below is an example:

Query Index

query_engine = index.as_query_engine()
response = query_engine.query("Did the author live on Mars?")

source_nodes = response.source_nodes
for node in source_nodes:
    print(node.score)  # It printed 1.0

When querying something that is not found in the document, it is still returned as 100% relevant. This raises a question: how do I know if the retrieved document(s) were actually relevant to me?

I am unsure whether the scores were not returned from Weaviate, or if LlamaIndex couldn’t accept the scores provided and generated its own.

Server Setup Information

Weaviate Server Version: 1.24.0
Deployment Method: Docker
Running with LlamaIndex version 0.10.13.post1

DudaNogueira · May 21, 2024, 1:21pm

hi @SoftwearEnginear !

Can you try that with the latest version? I recall some issue with scores that may lead to this.

Also, do you see the same scores when performing the query directly? Not sure what happens between llamaindex and Weaviate, so isolating that would be interesting to debug.

Thanks!

SoftwearEnginear · May 30, 2024, 3:05am

Hi @DudaNogueira, sorry for my late response.

I am unable to use the latest LlamaIndex version, as it is incompatible with LangChain’s LLM Predict and no response can be generated. I have two things to address below.

Inconsistent Models Used:
After fiddling around, I noticed that it could be because different embedding models were used in LlamaIndex and Weaviate database.

I am using hkunlp/instructor-xl · Hugging Face model in LlamaIndex, while the Weaviate only have pre-built images such as sentence-transformers/all-mpnet-base-v2 when using text2vec-transformers module.

Since my setup is in an air-gapped environment, I won’t be able to use any external API calls. It seems to me that there isn’t a way to use hkunlp/instructor-xl in Weaviate at this moment.

Do advice if I should be using all-mpnet-base-v2 model in my LlamaIndex pipeline as well.

As for performing the query directly:
I took a look at Queries in detail | Weaviate - Vector Database on how to query directly with the database, but did not find how to get the certainty score when client = weaviate.connect_to_local().

Below is my code:

import weaviate
import json

client = weaviate.connect_to_local()
collection_name = “Example”
result = (
client.query
.get(collection_name)
.with_near_text(nearText)
.with_limit(2)
.with_additional([‘certainty’])
.with_where(where_filter)
.do()
)

print(json.dumps(result, indent=4))

Result:

AttributeError: ‘WeaviateClient’ object has no attribute ‘query’

Is there a method to get the score from query when connected to weaviate locally? I am aiming to get cosine similarity score for the retrieved documents and display to the user.

DudaNogueira · May 31, 2024, 6:55pm

Hi!

Regarding you code, I believe you are mixing the python v3 syntax with the python v4 syntax.

Check here in our quickstart the differences:

or here on this migration guide:

Regarding models, Weaviate can support any hugging face model with it’s integration:

While using those frameworks (llamaindex or langchain, for example), they will usually take care of the collection creation, configuration and generation of the embeddings.

I have created here a recipe on how to properly use langchain that will allow both querying using langchain and directly in Weaviate:

Let me know if this helps

SoftwearEnginear · June 1, 2024, 2:37pm

Thank you for the reference links! Could you check the github repo, as I found that it has missing files?

I am still confused about using the huggingface models in text2vec-huggingface | Weaviate - Vector Database, as it seems to require API key.

Correct me if I am wrong, but it seems to be connecting to the HuggingFace via API in the configuration. My embedding model is loaded into GPU memory locally using LlamaIndex, and I don’t see how the code actually can access it.

Embed Model loaded with LlamaIndex:

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import os

EMBED_MODEL_PATH = os.getenv(“EMBED_MODEL_PATH”) # Get the saved embedding model path
embed_model = HuggingFaceEmbedding(model_name=EMBED_MODEL_PATH)

Assuming Python Client v4:

import weaviate
from weaviate.classes.query import MetadataQuery, Move
import os

client = weaviate.connect_to_local(
headers={
“X-HuggingFace-Api-Key”: “YOUR_HUGGINGFACE_APIKEY”,
}
)

publications = client.collections.get(“Publication”)

response = publications.query.near_text(
query=“fashion”,
distance=0.6,
move_to=Move(force=0.85, concepts=“haute couture”),
move_away=Move(force=0.45, concepts=“finance”),
return_metadata=MetadataQuery(distance=True),
limit=2
)

for o in response.objects:
print(o.properties)
print(o.metadata)

client.close()

SoftwearEnginear · June 7, 2024, 1:31am

Hi @DudaNogueira, just checking in if you saw my reply above. Thank you!

DudaNogueira · June 10, 2024, 8:47pm

Hi! Sorry for the delay here

We have moved some of the recipes around:

the text2vec-huggingface will make use of the HF as a service, so it will require an API query and perform the vectorization on the cloud.

from your code I assume your are creating the collection and configuring the same model you are using to get the vectors from, right?

Also notice that from you query, you are getting a distance. so the bigger that number, the farthest the object is from your query:

Othmane_Hamzaoui · June 16, 2024, 4:49pm

Noticed the same recently…I’ll let @DudaNogueira and weaviate team confirm though.

TLDR: switch your llamaindex retriever config to hybrid and set alpha at 1. Scores will go back to be from 0 to 1 but will be slightly different (see why below). Aiming to open an issue and fix sometime next week in llamaindex.

–

Since the upgrade of weaviate from v3 to v4 llamaindex updated their code to support the new client (see here).

While upgrading the client they also changed how weaviate is being queried

Before update: They would construct the query builder to contain with_near_vector which is now done in weavaite4 through “collection.query.near_vector”.
After update: wether you ask for vector only search or hybrid search it will always call hybrid search (just overriding the alpha=1 thus giving you vector only when you ask for it). It does this using “collection.query.hybrid” in weavaite4.

In weaviate:

Calling “collection.query.near_vector” returns scores following which distance metric you pick (check metrics docs of weaviate). So if your query vector matches exactly the first result vector you’ll get 1.0. If not you get a score between 0 and 1.
Calling “collection.query.hybrid” (even with alpha=1.0) will use the fusion algorithm to combine text search scores and vector search scores. Reading through this blog, you’ll notice that the default fusion algorithm being relativeScoreFusion will put the most similar item to 1 even if the returned first vector is not exactly the same as the query. Whereas when using near_vector the first returned item can have a score lower than 1. So expect your scores to vary a bit because fusion is always used.

@DudaNogueira : When using hybrid and alpha=0 or alpha=1 why is weaviate performing a fusion still ? Shouldnt it just return the score of bm25 or cosine ?

@SoftwearEnginear : Why is it always 1 ?
→ The llamaindex code is expecting a similarity key in the metadata of type “distance”. Which is returned when you do “collection.query.near_vector” but not when you do “collection.query.hybrid”. That metadata being None it returns 1 all the time.

Dirk · June 17, 2024, 10:43am

The reason is that the format of the return should stay the same. I think it would be more confusing if you would suddenly get completely different return scores when chainging alpha.

Othmane_Hamzaoui · June 17, 2024, 1:26pm

The format of the return should be the same I agree. But the values of scores when alpha=1 or alpha=0 are a bit confusing. I was expecting to see the same score values when using pure bm25 or pure vector (i.e the same math happening). But since the hybrid method uses fusion algorithm all the time the scores change.

Topic		Replies	Views
How weaviate calculates score in similarity_search_with_score? Support	4	635	July 2, 2024
Unable to get expected results using BM25 or any search functions Support	8	775	July 3, 2024
Integration of weaviate and langchain, how to use hybrid in v4 like as_retriever in v3 Support python	1	346	March 19, 2025
Wrong retrieval results with near_vector and hybrid search Support	1	253	June 27, 2024
Weaviate Hybrid Retriever issue in Langchain for custom vectors Support	3	1409	December 11, 2023

Retrieved document score returns 1.0 (100% relevant) when used with LlamaIndex

Description

Server Setup Information

Related topics