I am building a RAG pipeline using LlamaIndex and Weaviate as the vector database. I referred to the setup guide based on LlamaIndex documentation here Weaviate Vector Store - LlamaIndex
When performing a query with the query engine, it returns a tuple containing the response and the source nodes. The source nodes contains information such as file metadata and score.
The raw format of score is between 0 to 1, where 0 is irrelevant and 1 is exactly relevant.
Below is an example:
Query Index
query_engine = index.as_query_engine()
response = query_engine.query("Did the author live on Mars?")
source_nodes = response.source_nodes
for node in source_nodes:
print(node.score) # It printed 1.0
When querying something that is not found in the document, it is still returned as 100% relevant. This raises a question: how do I know if the retrieved document(s) were actually relevant to me?
I am unsure whether the scores were not returned from Weaviate, or if LlamaIndex couldnât accept the scores provided and generated its own.
Can you try that with the latest version? I recall some issue with scores that may lead to this.
Also, do you see the same scores when performing the query directly? Not sure what happens between llamaindex and Weaviate, so isolating that would be interesting to debug.
I am unable to use the latest LlamaIndex version, as it is incompatible with LangChainâs LLM Predict and no response can be generated. I have two things to address below.
Inconsistent Models Used:
After fiddling around, I noticed that it could be because different embedding models were used in LlamaIndex and Weaviate database.
Since my setup is in an air-gapped environment, I wonât be able to use any external API calls. It seems to me that there isnât a way to use hkunlp/instructor-xl in Weaviate at this moment.
Do advice if I should be using all-mpnet-base-v2 model in my LlamaIndex pipeline as well.
As for performing the query directly:
I took a look at Queries in detail | Weaviate - Vector Database on how to query directly with the database, but did not find how to get the certainty score when client = weaviate.connect_to_local().
AttributeError: âWeaviateClientâ object has no attribute âqueryâ
Is there a method to get the score from query when connected to weaviate locally? I am aiming to get cosine similarity score for the retrieved documents and display to the user.
Regarding you code, I believe you are mixing the python v3 syntax with the python v4 syntax.
Check here in our quickstart the differences:
or here on this migration guide:
Regarding models, Weaviate can support any hugging face model with itâs integration:
While using those frameworks (llamaindex or langchain, for example), they will usually take care of the collection creation, configuration and generation of the embeddings.
I have created here a recipe on how to properly use langchain that will allow both querying using langchain and directly in Weaviate:
Correct me if I am wrong, but it seems to be connecting to the HuggingFace via API in the configuration. My embedding model is loaded into GPU memory locally using LlamaIndex, and I donât see how the code actually can access it.
Embed Model loaded with LlamaIndex:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import os
EMBED_MODEL_PATH = os.getenv(âEMBED_MODEL_PATHâ) # Get the saved embedding model path
embed_model = HuggingFaceEmbedding(model_name=EMBED_MODEL_PATH)
Assuming Python Client v4:
import weaviate
from weaviate.classes.query import MetadataQuery, Move
import os
Noticed the same recentlyâŚIâll let @DudaNogueira and weaviate team confirm though.
TLDR: switch your llamaindex retriever config to hybrid and set alpha at 1. Scores will go back to be from 0 to 1 but will be slightly different (see why below). Aiming to open an issue and fix sometime next week in llamaindex.
â
Since the upgrade of weaviate from v3 to v4 llamaindex updated their code to support the new client (see here).
While upgrading the client they also changed how weaviate is being queried
Before update: They would construct the query builder to contain with_near_vector which is now done in weavaite4 through âcollection.query.near_vectorâ.
After update: wether you ask for vector only search or hybrid search it will always call hybrid search (just overriding the alpha=1 thus giving you vector only when you ask for it). It does this using âcollection.query.hybridâ in weavaite4.
In weaviate:
Calling âcollection.query.near_vectorâ returns scores following which distance metric you pick (check metrics docs of weaviate). So if your query vector matches exactly the first result vector youâll get 1.0. If not you get a score between 0 and 1.
Calling âcollection.query.hybridâ (even with alpha=1.0) will use the fusion algorithm to combine text search scores and vector search scores. Reading through this blog, youâll notice that the default fusion algorithm being relativeScoreFusion will put the most similar item to 1 even if the returned first vector is not exactly the same as the query. Whereas when using near_vector the first returned item can have a score lower than 1. So expect your scores to vary a bit because fusion is always used.
@DudaNogueira : When using hybrid and alpha=0 or alpha=1 why is weaviate performing a fusion still ? Shouldnt it just return the score of bm25 or cosine ?
@SoftwearEnginear : Why is it always 1 ?
â The llamaindex code is expecting a similarity key in the metadata of type âdistanceâ. Which is returned when you do âcollection.query.near_vectorâ but not when you do âcollection.query.hybridâ. That metadata being None it returns 1 all the time.
The reason is that the format of the return should stay the same. I think it would be more confusing if you would suddenly get completely different return scores when chainging alpha.
The format of the return should be the same I agree. But the values of scores when alpha=1 or alpha=0 are a bit confusing. I was expecting to see the same score values when using pure bm25 or pure vector (i.e the same math happening). But since the hybrid method uses fusion algorithm all the time the scores change.