Retrieved document score returns 1.0 (100% relevant) when used with LlamaIndex

Description

I am building a RAG pipeline using LlamaIndex and Weaviate as the vector database. I referred to the setup guide based on LlamaIndex documentation here Weaviate Vector Store - LlamaIndex

When performing a query with the query engine, it returns a tuple containing the response and the source nodes. The source nodes contains information such as file metadata and score.

The raw format of score is between 0 to 1, where 0 is irrelevant and 1 is exactly relevant.

Below is an example:

Query Index

query_engine = index.as_query_engine()
response = query_engine.query("Did the author live on Mars?")

source_nodes = response.source_nodes
for node in source_nodes:
    print(node.score)  # It printed 1.0

When querying something that is not found in the document, it is still returned as 100% relevant. This raises a question: how do I know if the retrieved document(s) were actually relevant to me?

I am unsure whether the scores were not returned from Weaviate, or if LlamaIndex couldn’t accept the scores provided and generated its own.

Server Setup Information

  • Weaviate Server Version: 1.24.0
  • Deployment Method: Docker
  • Running with LlamaIndex version 0.10.13.post1

hi @SoftwearEnginear !

Can you try that with the latest version? I recall some issue with scores that may lead to this.

Also, do you see the same scores when performing the query directly? Not sure what happens between llamaindex and Weaviate, so isolating that would be interesting to debug.

Thanks!

Hi @DudaNogueira, sorry for my late response.

I am unable to use the latest LlamaIndex version, as it is incompatible with LangChain’s LLM Predict and no response can be generated. I have two things to address below.

Inconsistent Models Used:
After fiddling around, I noticed that it could be because different embedding models were used in LlamaIndex and Weaviate database.

I am using hkunlp/instructor-xl ¡ Hugging Face model in LlamaIndex, while the Weaviate only have pre-built images such as sentence-transformers/all-mpnet-base-v2 when using text2vec-transformers module.

Since my setup is in an air-gapped environment, I won’t be able to use any external API calls. It seems to me that there isn’t a way to use hkunlp/instructor-xl in Weaviate at this moment.

Do advice if I should be using all-mpnet-base-v2 model in my LlamaIndex pipeline as well.

As for performing the query directly:
I took a look at Queries in detail | Weaviate - Vector Database on how to query directly with the database, but did not find how to get the certainty score when client = weaviate.connect_to_local().

Below is my code:

import weaviate
import json

client = weaviate.connect_to_local()
collection_name = “Example”
result = (
client.query
.get(collection_name)
.with_near_text(nearText)
.with_limit(2)
.with_additional([‘certainty’])
.with_where(where_filter)
.do()
)

print(json.dumps(result, indent=4))

Result:

AttributeError: ‘WeaviateClient’ object has no attribute ‘query’

Is there a method to get the score from query when connected to weaviate locally? I am aiming to get cosine similarity score for the retrieved documents and display to the user.

Hi!

Regarding you code, I believe you are mixing the python v3 syntax with the python v4 syntax.

Check here in our quickstart the differences:

or here on this migration guide:

Regarding models, Weaviate can support any hugging face model with it’s integration:

While using those frameworks (llamaindex or langchain, for example), they will usually take care of the collection creation, configuration and generation of the embeddings.

I have created here a recipe on how to properly use langchain that will allow both querying using langchain and directly in Weaviate:

Let me know if this helps :slight_smile:

1 Like

Thank you for the reference links! Could you check the github repo, as I found that it has missing files?

I am still confused about using the huggingface models in text2vec-huggingface | Weaviate - Vector Database, as it seems to require API key.

Correct me if I am wrong, but it seems to be connecting to the HuggingFace via API in the configuration. My embedding model is loaded into GPU memory locally using LlamaIndex, and I don’t see how the code actually can access it.

Embed Model loaded with LlamaIndex:

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import os

EMBED_MODEL_PATH = os.getenv(“EMBED_MODEL_PATH”) # Get the saved embedding model path
embed_model = HuggingFaceEmbedding(model_name=EMBED_MODEL_PATH)

Assuming Python Client v4:

import weaviate
from weaviate.classes.query import MetadataQuery, Move
import os

client = weaviate.connect_to_local(
headers={
“X-HuggingFace-Api-Key”: “YOUR_HUGGINGFACE_APIKEY”,
}
)

publications = client.collections.get(“Publication”)

response = publications.query.near_text(
query=“fashion”,
distance=0.6,
move_to=Move(force=0.85, concepts=“haute couture”),
move_away=Move(force=0.45, concepts=“finance”),
return_metadata=MetadataQuery(distance=True),
limit=2
)

for o in response.objects:
print(o.properties)
print(o.metadata)

client.close()

Hi @DudaNogueira, just checking in if you saw my reply above. Thank you!

Hi! Sorry for the delay here :grimacing:

We have moved some of the recipes around:

the text2vec-huggingface will make use of the HF as a service, so it will require an API query and perform the vectorization on the cloud.

from your code I assume your are creating the collection and configuring the same model you are using to get the vectors from, right?

Also notice that from you query, you are getting a distance. so the bigger that number, the farthest the object is from your query:

Noticed the same recently…I’ll let @DudaNogueira and weaviate team confirm though.

TLDR: switch your llamaindex retriever config to hybrid and set alpha at 1. Scores will go back to be from 0 to 1 but will be slightly different (see why below). Aiming to open an issue and fix sometime next week in llamaindex.

–

Since the upgrade of weaviate from v3 to v4 llamaindex updated their code to support the new client (see here).

While upgrading the client they also changed how weaviate is being queried

  • Before update: They would construct the query builder to contain with_near_vector which is now done in weavaite4 through “collection.query.near_vector”.
  • After update: wether you ask for vector only search or hybrid search it will always call hybrid search (just overriding the alpha=1 thus giving you vector only when you ask for it). It does this using “collection.query.hybrid” in weavaite4.

In weaviate:

  • Calling “collection.query.near_vector” returns scores following which distance metric you pick (check metrics docs of weaviate). So if your query vector matches exactly the first result vector you’ll get 1.0. If not you get a score between 0 and 1.
  • Calling “collection.query.hybrid” (even with alpha=1.0) will use the fusion algorithm to combine text search scores and vector search scores. Reading through this blog, you’ll notice that the default fusion algorithm being relativeScoreFusion will put the most similar item to 1 even if the returned first vector is not exactly the same as the query. Whereas when using near_vector the first returned item can have a score lower than 1. So expect your scores to vary a bit because fusion is always used.

@DudaNogueira : When using hybrid and alpha=0 or alpha=1 why is weaviate performing a fusion still ? Shouldnt it just return the score of bm25 or cosine ?

@SoftwearEnginear : Why is it always 1 ?
→ The llamaindex code is expecting a similarity key in the metadata of type “distance”. Which is returned when you do “collection.query.near_vector” but not when you do “collection.query.hybrid”. That metadata being None it returns 1 all the time.

The reason is that the format of the return should stay the same. I think it would be more confusing if you would suddenly get completely different return scores when chainging alpha.

The format of the return should be the same I agree. But the values of scores when alpha=1 or alpha=0 are a bit confusing. I was expecting to see the same score values when using pure bm25 or pure vector (i.e the same math happening). But since the hybrid method uses fusion algorithm all the time the scores change.