I’m creating an index in Weaviate named “sample_index” and populating it with the following content and vectors:
content1 = [
{
"title": "title1",
"article_id": "id1"
},
{
"title": "title2",
"article_id": "id2"
}
]
vector1 = {
"id1": [0.1, 0.2],
"id2": [0.3, 0.4]
}
Now, when attempting to push another set of data into the same “sample_index” class, I encounter an error due to the varying vector sizes:
content2 = [
{
"title": "title1",
"article_id": "id1"
},
{
"title": "title2",
"article_id": "id2"
}
]
vector2 = {
"id3": [0.1, 0.2, 0.3, 0.4],
"id4": [0.5, 0.6, 0.7, 0.8]
}
The error message states:
{'error': [{'message': "insert to vector index: insert doc id 3 to vector index: find best entrypoint: calculate distance between insert node and entry point at level 1: vector lengths don't match: 2 vs 4"}]}
{'error': [{'message': "insert to vector index: insert doc id 4 to vector index: find best entrypoint: calculate distance between insert node and entry point at level 1: vector lengths don't match: 2 vs 4"}]}
Although the error occurs, the new data seems to be indexed in the “sample_index” class, as observed when attempting to extract all "article_id"s from the index.
To avoid this scenario, it’s essential to validate the vector size or schema before indexing the data. This can be achieved by implementing a validation step prior to indexing, ensuring that all vectors adhere to the expected size and format. By enforcing consistent vector dimensions across the index, such errors can be prevented.
Does anyone have suggestions on how to effectively manage such discrepancies in vector sizes within Weaviate indexing? Any insights or best practices would be greatly appreciated. Thank you.