[Question] Unable to store custom vectors in local instance

Hi there, I’m currently following the steps exactly in this tutorial [Bring your own vectors | Weaviate - Vector Database] on storing custom vectors using insert_many. I can see the embeddings are being produced however once I store them, I am unable to retrieve them using the queries below (the vector field returned is empty, {}). Am I missing something? Thank you!

import weaviate
client = weaviate.connect_to_local()
course_collection = client.collections.get(‘AgentName’)
course_collection.query.fetch_objects(include_vectors=True)

Hi @Jennifer_Jordache,

Welcome to our community! :slightly_smiling_face:

The lines you shared are indeed correct for that. However it would be challenging to know where the issue lies without having a look into your code

Could you please share the code scripts including the import data method?

Thank you & Have a lovely weekend

1 Like

Thanks so much @Mohamed_Shahin! I’ve pasted my code below. I was doing some troubleshooting and it seemed like whenever I individually inserted the objects in the for loop (using .insert instead of .insert_many), I was able to store and retrieve the vectors correctly. Is it possible that my usage of insert_many is nulling out the vectors? Also I’d love to look at the definition of insert_many but am having trouble finding it :sweat_smile:

def store_in_weaviate(self, document_name, address, structured_text):
        docname = os.path.split(address)[1]
        # Copy the original file
        self.mkdir(f"{self.course_content_store}/{document_name}")
        self.filecopy(address, f"{self.course_content_store}/{document_name}/{docname}")

        embeddings_data_objs = list()

        # skills will unnest and become property, with the agent name being the collection
        for page in structured_text:
            page_number = page["page"]
            text_df = page[
                "df"
            ]  # with possible columns 'text', 'heading', 'summary', 'clean'
            embeddings = page["embeddings"]
            summary_embeddings = page["summary_embeddings"]

            # store the text in the vector db
            for i, row in text_df.iterrows():
                # store the clean text
                properties = {
                    "document_name": document_name,
                    "page_number": page_number,
                    "chunk_number": i,
                    "raw_text": row["text"],
                    "heading": row["heading"],
                    "encoded_text": row["clean"],
                    "clean_text": row["clean"],
                    "is_summary": False,
                }                

                embeddings_data_objs.append(
                    wvc.data.DataObject(properties=properties, vector=embeddings[i])
                )
            
                # now store the summary
                properties = {
                    "document_name": document_name,
                    "page_number": page_number,
                    "chunk_number": i,
                    "raw_text": row["text"],
                    "heading": row["heading"],
                    "encoded_text": row["summary"],
                    "clean_text": row["clean"],
                    "is_summary": True,
                }
                
                embeddings_data_objs.append(
                    wvc.data.DataObject(
                        properties=properties, vector=summary_embeddings[i]
                    )
                )
                
        course_collection = client.collections.get(self.collection_name)
        course_collection.data.insert_many(embeddings_data_objs)

Hey @Jennifer_Jordache,

Awesome, you did spot it! it happens me too, little things like that :sweat_smile:.

Here is the definition for insert_many function:

def insert_many(
    self,
    objects: Sequence[Union[Properties, DataObject[Properties, Optional[ReferenceInputs]]]],
) -> BatchObjectReturn:
    """Insert multiple objects into the collection.

    Arguments:
        `objects`
            The objects to insert. This can be either a list of `Properties` or `DataObject[Properties, ReferenceInputs]`
                If you didn't set `data_model` then `Properties` will be `Data[str, Any]` in which case you can insert simple dictionaries here.
                    If you want to insert references, vectors, or UUIDs alongside your properties, you will have to use `DataObject` instead.

    Raises:
        `weaviate.exceptions.WeaviateGRPCBatchError`:
            If any unexpected error occurs during the batch operation.
        `weaviate.exceptions.WeaviateInsertInvalidPropertyError`:
            If a property is invalid. I.e., has name `id` or `vector`, which are reserved.
        `weaviate.exceptions.WeaviateInsertManyAllFailedError`:
            If every object in the batch fails to be inserted. The exception message contains details about the failure.
    """

Additionally, here is client repo, so you can read more: Weaviate Python Client

Furthermore to the func .insert_many, the vectors should not be nulled out by insert_many if properly passed.

I would personally look at embeddings[i] to ensure no null or incorrectly formatted before adding to DataObject. Also I would be logging to verify vectors are correctly passed to DataObject and insert_many.

I would recommend you read through:

I hope this help you and I wish you a Happy Sunday!

Happy Coding :partying_face:

1 Like

Thank you again for the reply and for the links, super helpful!

Just wanted to clarify that I still haven’t figured out why insert_many results in null vectors but using only insert in a for loop works in populating the vectors. I’ve decided to not use insert_many anymore due to this but if you end up having any ideas on why this might be the case, I would appreciate it! I’m also not using threading so it’s quite confusing why insert_many is behaving this way. Either way, thank you for the help especially on a weekend!