Description
I tried replicating this example: Bring your own vectors | Weaviate - Vector Database
with my own data.
When using 1000 documents everything works fine. But when i try to add 350k documents using <collection>.data.insert_many(document_objs)
then i get this error:
Query call with protocol GRPC batch failed with message Sent message larger than max (1546098086 vs. 104858000).
has anybody else seen it? and how do you suggest i debug it?
this is my minimal code for context:
client = weaviate.connect_to_local()
try:
ai_papers = client.collections.create(
"AIPapers",
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
vector_index_config=wvc.config.Configure.VectorIndex.hnsw(
distance_metric=wvc.config.VectorDistances.COSINE # select prefered distance metric
),
)
texts, embeddings = load_data()
ai_papers_objs = []
for i in range(len(texts)):
ai_papers_objs.append(wvc.data.DataObject(
properties={
"abstract": texts[i],
},
vector=embeddings[i]
))
ai_papers.data.insert_many(ai_papers_objs)
# retrive documents
query = "I am trying to build a RAG system, what LLM literature should I read?"
query_embedding = embedding_model.embed_query(query)
response = ai_papers.query.near_vector(
near_vector=query_embedding,
limit=5,
)
except Exception as e:
print(e)
finally:
client.close()
Server Setup Information
- Weaviate Version: weaviate-client==4.4.2
- Deployment Method: docker
- Multi Node? Number of Running Nodes: No
- Used Client Language and Version: Python v4