Similarity search issues

Hamza_Rezgui · August 24, 2024, 9:01am

Description

So i’m using weaviate to store chunks of my documents so i can perform a similarity search between a sent query and these chunks, my problem is that im not getting the chunks that are really related to my query, so my question would be how can i know that i setup the schema in the correct way and that im using the right similarity search function.

Weaviate setup

client = weaviate.connect_to_local(host="weaviate")

Schema creation & Initialization

def create_schema():
    client.collections.create(
    "DocumentChunk",
    vectorizer_config=[
        Configure.NamedVectors.text2vec_transformers(
            name="vector",
            source_properties=["chunk"]
        )
    ],
    properties=[
        Property(name="source_document", data_type=DataType.TEXT),
        Property(name="chunk", data_type=DataType.TEXT),
    ]
)

def initialize_schema():
    try:
        # Check if the schema already exists
        response = client.collections.list_all(simple=False)
        classes = [cls['class'] for cls in response['classes']]
        if 'DocumentChunk' not in classes:
            create_schema()
            print("Schema created.")
        else:
            print("Schema already exists.")
    except Exception as e:
        print(f"Error initializing schema: {str(e)}")

Similarity search to my query

def search_vectors_comment(query):
    collection = client.collections.get("DocumentChunk")
    response = collection.query.near_text(
    query=query,  # The model provider integration will automatically vectorize the query
    limit=3,
    distance=0.75
    )
    search_results = []
    for obj in response.objects:
        result = OrderedDict([
            ("title", obj.properties["source_document"]),
            ("snippet", obj.properties["chunk"]),
            ("distance", obj.metadata)
        ])
        search_results.append(result)
    return search_results

the meta data attribute in the return values gives me this object on each returned chunk

{
    distance: {
      certainty: null,
      creation_time: null,
      distance: null,
      explain_score: null,
      is_consistent: null,
      last_update_time: null,
      rerank_score: null,
      score: null
    }

I would seriously want to know what’s the issue and how I can really fix it this is all new to me and thanks !

Fakhri_Prayatna_Putr · August 24, 2024, 12:52pm

Where is the target vector in your query?

Hamza_Rezgui · August 25, 2024, 6:25am

So i did specifiy a target vector in my search query but it did not change anything !

DudaNogueira · August 26, 2024, 12:37pm

hi @Hamza_Rezgui !!

You need to specify which metadata you want to return in your query, like so:

response = collection.query.near_text(
    query="teste",  # The model provider integration will automatically vectorize the query
    limit=3,
    distance=0.75,
    return_metadata=wvc.query.MetadataQuery(
        distance=True, creation_time=True, certainty=True, explain_score=True,
        is_consistent=True, last_update_time=True, score=True
    )
)
for obj in response.objects:
    print(obj.metadata)

note that explain_score and score for example should not have anything, as it is only used for hybrid and bm25.

Let me know if this helps.

Thanks!

Topic		Replies	Views
Similarity search returns chunks that all have exactly the same distance value Support bug	3	818	November 29, 2023
How to load existing db to similarity search? Support bug	8	474	August 13, 2024
Help searching inside the objects in my weaviate Support	1	94	July 25, 2024
Simple vectors storage and similarity search not working Support developer-experience	3	678	July 7, 2023
How to get unique results based on references General	6	435	March 9, 2024

Similarity search issues

Description

Weaviate setup

Schema creation & Initialization

Similarity search to my query

Related topics