Slow deletion when using filter (and updating chunked documents)

emhagman · June 26, 2023, 3:47pm

We’re working on periodically updating our vector store with some data and we use chunking (~2000 characters per document) to keep things small. Because of this, we can’t easily “upsert” documents using UUID since those documents actually belong to a parent document. For example, take a Google Doc that is 10,000 words long. We might chunk that into 20 Weaviate documents with a “parent_uuid” that matches across all of them. If the content changes somewhere in the middle of the document, it’s very hard to chunk the document the same way.

Our solution was to use a “parent_uuid” that references the document and is available as metadata on each Weaviate document. Our “upsert” becomes a deletion based on “parent_uuid” and then new inserts.

The issue we’re running into is that deleting via the bulk api by “parent_uuid” is extremely slow (only 20 documents to delete from Weaviate). We have an index on “parent_uuid”.

Has anyone run into issues like this or have a better solution for “upserting” documents that are chunked in Weaviate? Should we not chunk things like this in Weaviate and just insert the entire document, no matter what size? Is this what the relationships in Weaviate are for? I feel like we are missing something obvious here.

emhagman · June 27, 2023, 4:41pm

Update: We figured out our own issue. We had code that was checking if the index existed every time we were calling our import code which was slowing things down. Moving that out to be checked once during startup fixed the problem

jphwang · June 30, 2023, 4:23pm

Hi @emhagman - Welcome, and I’m glad to hear that you guys worked it out!

Topic		Replies	Views
Batch Upsert functionality General	6	969	January 6, 2025
Update existing chunks in a document with more than QUERY_MAXIMUM_RESULTS entries Support	10	543	November 12, 2024
Problems with vector (length) validation Support	4	668	July 1, 2024
How to insert to two schemas in a transaction Support	3	663	November 2, 2023
Chunks are missing in weaviate Upgrade to 1.26.4 from 1.24.23 Support	8	400	October 14, 2024

Slow deletion when using filter (and updating chunked documents)

Related topics