Description
So i’m using weaviate to store chunks of my documents so i can perform a similarity search between a sent query and these chunks, my problem is that im not getting the chunks that are really related to my query, so my question would be how can i know that i setup the schema in the correct way and that im using the right similarity search function.
Weaviate setup
client = weaviate.connect_to_local(host="weaviate")
Schema creation & Initialization
def create_schema():
client.collections.create(
"DocumentChunk",
vectorizer_config=[
Configure.NamedVectors.text2vec_transformers(
name="vector",
source_properties=["chunk"]
)
],
properties=[
Property(name="source_document", data_type=DataType.TEXT),
Property(name="chunk", data_type=DataType.TEXT),
]
)
def initialize_schema():
try:
# Check if the schema already exists
response = client.collections.list_all(simple=False)
classes = [cls['class'] for cls in response['classes']]
if 'DocumentChunk' not in classes:
create_schema()
print("Schema created.")
else:
print("Schema already exists.")
except Exception as e:
print(f"Error initializing schema: {str(e)}")
Similarity search to my query
def search_vectors_comment(query):
collection = client.collections.get("DocumentChunk")
response = collection.query.near_text(
query=query, # The model provider integration will automatically vectorize the query
limit=3,
distance=0.75
)
search_results = []
for obj in response.objects:
result = OrderedDict([
("title", obj.properties["source_document"]),
("snippet", obj.properties["chunk"]),
("distance", obj.metadata)
])
search_results.append(result)
return search_results
the meta data attribute in the return values gives me this object on each returned chunk
{
distance: {
certainty: null,
creation_time: null,
distance: null,
explain_score: null,
is_consistent: null,
last_update_time: null,
rerank_score: null,
score: null
}
I would seriously want to know what’s the issue and how I can really fix it this is all new to me and thanks !