Hi there, I’m currently following the steps exactly in this tutorial [Bring your own vectors | Weaviate - Vector Database] on storing custom vectors using insert_many. I can see the embeddings are being produced however once I store them, I am unable to retrieve them using the queries below (the vector field returned is empty, {}). Am I missing something? Thank you!
import weaviate
client = weaviate.connect_to_local()
course_collection = client.collections.get(‘AgentName’)
course_collection.query.fetch_objects(include_vectors=True)
Hi @Jennifer_Jordache,
Welcome to our community!
The lines you shared are indeed correct for that. However it would be challenging to know where the issue lies without having a look into your code
Could you please share the code scripts including the import data method?
Thank you & Have a lovely weekend
1 Like
Thanks so much @Mohamed_Shahin! I’ve pasted my code below. I was doing some troubleshooting and it seemed like whenever I individually inserted the objects in the for loop (using .insert instead of .insert_many), I was able to store and retrieve the vectors correctly. Is it possible that my usage of insert_many is nulling out the vectors? Also I’d love to look at the definition of insert_many but am having trouble finding it
def store_in_weaviate(self, document_name, address, structured_text):
docname = os.path.split(address)[1]
# Copy the original file
self.mkdir(f"{self.course_content_store}/{document_name}")
self.filecopy(address, f"{self.course_content_store}/{document_name}/{docname}")
embeddings_data_objs = list()
# skills will unnest and become property, with the agent name being the collection
for page in structured_text:
page_number = page["page"]
text_df = page[
"df"
] # with possible columns 'text', 'heading', 'summary', 'clean'
embeddings = page["embeddings"]
summary_embeddings = page["summary_embeddings"]
# store the text in the vector db
for i, row in text_df.iterrows():
# store the clean text
properties = {
"document_name": document_name,
"page_number": page_number,
"chunk_number": i,
"raw_text": row["text"],
"heading": row["heading"],
"encoded_text": row["clean"],
"clean_text": row["clean"],
"is_summary": False,
}
embeddings_data_objs.append(
wvc.data.DataObject(properties=properties, vector=embeddings[i])
)
# now store the summary
properties = {
"document_name": document_name,
"page_number": page_number,
"chunk_number": i,
"raw_text": row["text"],
"heading": row["heading"],
"encoded_text": row["summary"],
"clean_text": row["clean"],
"is_summary": True,
}
embeddings_data_objs.append(
wvc.data.DataObject(
properties=properties, vector=summary_embeddings[i]
)
)
course_collection = client.collections.get(self.collection_name)
course_collection.data.insert_many(embeddings_data_objs)
Hey @Jennifer_Jordache,
Awesome, you did spot it! it happens me too, little things like that .
Here is the definition for insert_many function:
def insert_many(
self,
objects: Sequence[Union[Properties, DataObject[Properties, Optional[ReferenceInputs]]]],
) -> BatchObjectReturn:
"""Insert multiple objects into the collection.
Arguments:
`objects`
The objects to insert. This can be either a list of `Properties` or `DataObject[Properties, ReferenceInputs]`
If you didn't set `data_model` then `Properties` will be `Data[str, Any]` in which case you can insert simple dictionaries here.
If you want to insert references, vectors, or UUIDs alongside your properties, you will have to use `DataObject` instead.
Raises:
`weaviate.exceptions.WeaviateGRPCBatchError`:
If any unexpected error occurs during the batch operation.
`weaviate.exceptions.WeaviateInsertInvalidPropertyError`:
If a property is invalid. I.e., has name `id` or `vector`, which are reserved.
`weaviate.exceptions.WeaviateInsertManyAllFailedError`:
If every object in the batch fails to be inserted. The exception message contains details about the failure.
"""
Additionally, here is client repo, so you can read more: Weaviate Python Client
Furthermore to the func .insert_many, the vectors should not be nulled out by insert_many
if properly passed.
I would personally look at embeddings[i]
to ensure no null or incorrectly formatted before adding to DataObject
. Also I would be logging to verify vectors are correctly passed to DataObject
and insert_many
.
I would recommend you read through:
I hope this help you and I wish you a Happy Sunday!
Happy Coding
1 Like
Thank you again for the reply and for the links, super helpful!
Just wanted to clarify that I still haven’t figured out why insert_many results in null vectors but using only insert in a for loop works in populating the vectors. I’ve decided to not use insert_many anymore due to this but if you end up having any ideas on why this might be the case, I would appreciate it! I’m also not using threading so it’s quite confusing why insert_many is behaving this way. Either way, thank you for the help especially on a weekend!