Description
Python client weaviate-client==4.4.1
I have a schema of Documents and Chunks. I have a 2 way link Document->hasChunks, Chunk->ofDocument. The document I’m testing has 260 Chunks.
If I use client.batch.dynamic, all of the Document->hasChunks references are valid, but only about 40 of the Chunk->ofDocument are added, the rest are null. No errors are being reported.
If I use client.batch.fixed_size(100), I get more Chunks with ofDocument (usually around 200)
If I use client.batch.fixed_size(50, I get more Chunks (around 220)
I’m stumped.
Code:
with client.batch.dynamic() as batch:
for index, element in enumerate(chunks):
wv_chunk = Chunk.from_element(element, index)
chunk_uuid= batch.add_object(
collection="Chunk",
properties=wv_chunk.get_data(),
)
print(f"Cross ref Doc id: {doc_uuid} hasChunks-> Chunk id: {chunk_uuid}")
ref=batch.add_reference(
from_collection="Document",
from_uuid=doc_uuid,
from_property="hasChunks",
to=chunk_uuid
)
print(f"Cross ref Chunk id: {chunk_uuid} ofDocument-> Doc id: {doc_uuid}")
ref = batch.add_reference(
from_collection="Chunk",
from_uuid=chunk_uuid,
from_property="ofDocument",
to=doc_uuid,
)
print('-' * 80)
failed_objs = client.batch.failed_objects
failed_refs = client.batch.failed_references
print(f"Failed batch objects: {failed_objs}")
print(f"Failed batch refs: {failed_refs}")
client.close()
Output:
Cross ref Doc id: 69438102-4464-4edb-a2d4-32887b5281e4 hasChunks-> Chunk id: 6e2c4288-5538-42c5-b525-0838329125f0
Cross ref Chunk id: 6e2c4288-5538-42c5-b525-0838329125f0 ofDocument-> Doc id: 69438102-4464-4edb-a2d4-32887b5281e4
--------------------------------------------------------------------------------
Cross ref Doc id: 69438102-4464-4edb-a2d4-32887b5281e4 hasChunks-> Chunk id: c3a53462-f70d-4290-b0b9-90869790cd8b
Cross ref Chunk id: c3a53462-f70d-4290-b0b9-90869790cd8b ofDocument-> Doc id: 69438102-4464-4edb-a2d4-32887b5281e4
--------------------------------------------------------------------------------
Failed batch objects: []
Failed batch refs: []
Now count the chunks with ofDocument
{
Aggregate {
Chunk(where: {
operator: Equal,
path: ["ofDocument","Document","id"],
valueString: "69438102-4464-4edb-a2d4-32887b5281e4"
}) {
content {
count
}
}
}
}
{
"data": {
"Aggregate": {
"Chunk": [
{
"content": {
"count": 45
}
}
]
}
}
}
Here’s a run changing the code to client.batch.fixed_size(50):
Cross ref Doc id: 69438102-4464-4edb-a2d4-32887b5281e4 hasChunks-> Chunk id: 0aa245c0-7cba-4623-91de-3936503def2d
Cross ref Chunk id: 0aa245c0-7cba-4623-91de-3936503def2d ofDocument-> Doc id: 69438102-4464-4edb-a2d4-32887b5281e4
--------------------------------------------------------------------------------
Cross ref Doc id: 69438102-4464-4edb-a2d4-32887b5281e4 hasChunks-> Chunk id: 269d625e-ef04-46a4-99f3-bcf2e63d54b1
Cross ref Chunk id: 269d625e-ef04-46a4-99f3-bcf2e63d54b1 ofDocument-> Doc id: 69438102-4464-4edb-a2d4-32887b5281e4
--------------------------------------------------------------------------------
Failed batch objects: []
Failed batch refs: []
Now count:
{
Aggregate {
Chunk(where: {
operator: Equal,
path: ["ofDocument","Document","id"],
valueString: "69438102-4464-4edb-a2d4-32887b5281e4"
}) {
content {
count
}
}
}
}
{
"data": {
"Aggregate": {
"Chunk": [
{
"content": {
"count": 240
}
}
]
}
}
}
That was a good run.
Any idea what is going on?
Server Setup Information
- Weaviate Server Version: 1.23.7
- Deployment Method: Weaviate cluster
- Multi Node? Number of Running Nodes: