Hi
Just starting out with Weaviate. I had a 1000 documents that i split into 200 size chunks. I then attempted importing them into weaviate. Mostly followed the tutorial and getting started guides.
Following is the schema and the import code.
class_obj = {
'class': 'className',
'description': 'description',
'properties': [
{
'name': 'title',
'description': 'Title',
'dataType': ['text']
},
{
'name': 'source',
'description': 'Source',
'dataType': ['text']
},
{
'name': 'content',
'description': 'Content',
'dataType': ['text']
},
],
'vectorizer': 'text2vec-openai',
'moduleConfig': {
'text2vec-openai': { # this must match the vectorizer used
'vectorizeClassName': False,
'model': 'ada',
'modelVersion': '002',
'type': 'text'
}
}
}
# ===== Import data =====
# Configure the batch import
client.batch.configure(
batch_size=100,
)
for document in documents:
properties = {
"title": document.metadata["title"],
"content": document.page_content,
"source": document.metadata["source"]
}
try:
client.batch.add_data_object(properties, "className")
except Exception as e:
print(e)
print(document.metadata["title"])
client.batch.flush()
However this took quite a bit of time, 30+ mins and at the only about 60% of the documents had been added to weaviate. I am using OpenAI to generate embeddings.
What is the bottleneck in this situation?
My theory is that i’m being rate limited by OpenAI.