Similarity search issues

Description

So i’m using weaviate to store chunks of my documents so i can perform a similarity search between a sent query and these chunks, my problem is that im not getting the chunks that are really related to my query, so my question would be how can i know that i setup the schema in the correct way and that im using the right similarity search function.

Weaviate setup

client = weaviate.connect_to_local(host="weaviate")

Schema creation & Initialization

def create_schema():
    client.collections.create(
    "DocumentChunk",
    vectorizer_config=[
        Configure.NamedVectors.text2vec_transformers(
            name="vector",
            source_properties=["chunk"]
        )
    ],
    properties=[
        Property(name="source_document", data_type=DataType.TEXT),
        Property(name="chunk", data_type=DataType.TEXT),
    ]
)
def initialize_schema():
    try:
        # Check if the schema already exists
        response = client.collections.list_all(simple=False)
        classes = [cls['class'] for cls in response['classes']]
        if 'DocumentChunk' not in classes:
            create_schema()
            print("Schema created.")
        else:
            print("Schema already exists.")
    except Exception as e:
        print(f"Error initializing schema: {str(e)}")

Similarity search to my query

def search_vectors_comment(query):
    collection = client.collections.get("DocumentChunk")
    response = collection.query.near_text(
    query=query,  # The model provider integration will automatically vectorize the query
    limit=3,
    distance=0.75
    )
    search_results = []
    for obj in response.objects:
        result = OrderedDict([
            ("title", obj.properties["source_document"]),
            ("snippet", obj.properties["chunk"]),
            ("distance", obj.metadata)
        ])
        search_results.append(result)
    return search_results

the meta data attribute in the return values gives me this object on each returned chunk

{
    distance: {
      certainty: null,
      creation_time: null,
      distance: null,
      explain_score: null,
      is_consistent: null,
      last_update_time: null,
      rerank_score: null,
      score: null
    }

I would seriously want to know what’s the issue and how I can really fix it this is all new to me and thanks !

Where is the target vector in your query?

So i did specifiy a target vector in my search query but it did not change anything !

hi @Hamza_Rezgui !!

You need to specify which metadata you want to return in your query, like so:

response = collection.query.near_text(
    query="teste",  # The model provider integration will automatically vectorize the query
    limit=3,
    distance=0.75,
    return_metadata=wvc.query.MetadataQuery(
        distance=True, creation_time=True, certainty=True, explain_score=True,
        is_consistent=True, last_update_time=True, score=True
    )
)
for obj in response.objects:
    print(obj.metadata)

note that explain_score and score for example should not have anything, as it is only used for hybrid and bm25.

Let me know if this helps.

Thanks!