Description
Hi, I’m trying to import ~30 small chunks of text into a collection and the code to do so runs without visible errors, but oddly I only see 2 or 3 objects actually being created when I check using http://localhost:8080/v1/objects or through code.
Code is below
import weaviate
client = weaviate.connect_to_local()
schema = {
"class": "WeaviateBlogChunk",
"description": "A snippet from a Weaviate blogpost.",
"vectorIndexType": "hnsw",
"vectorizer": "text2vec-openai",
"moduleConfig": {
"text2vec-openai": {
"skip": False,
"vectorizeClassName": False,
"vectorizePropertyName": False,
"apiVersion": "<api version>",
"baseURL": "<base url>",
"deploymentId": "<model>",
"resourceName": "<resource name>"
},
"generative-openai": {
"apiVersion": "<api version>",
"baseURL": "<base url>",
"deploymentId": "<model>",
"resourceName": "<resource name>"
}
},
"properties": [
{
"name": "content",
"dataType": ["text"],
"description": "The text content of the podcast clip",
"moduleConfig": {
"text2vec-openai": {
"skip": False,
"vectorizePropertyName": False,
"vectorizeClassName": False,
"apiVersion": "<api version>",
"baseURL": "<base url>",
"deploymentId": "<model>",
"resourceName": "<resource name>"
}
}
}
]
}
client.collections.create_from_dict(schema)
collection = client.collections.get("WeaviateBlogChunk")
with collection.batch.fixed_size(batch_size=1) as batch:
for idx, blog_chunk in enumerate(blog_chunks):
batch.add_object(
properties={"content": blog_chunk},
)
This code is very similar to this weaviate tutorial Hurricane/import_blogs.py at main · weaviate-tutorials/Hurricane · GitHub. What’s strange as well is that if I modify some of the module configs, for example if I remove the generative-openai config, the number of properly imported text chunks changes, and I don’t know why that is.
Overall I’m pretty confused about this behavior because it seems like everything is running fine, there’s no errors or warnings, and yet only a very small number of text chunks are imported.
Here’s an example of a piece of text that doesn’t get imported, although I think the issue doesn’t have anything to do with the text itself, it occurs regardless of the text:
The Hurricane front-end user story is illustrated below:
<figure>
<video width="100%" autoplay loop muted controls>
<source src={demo} type="video/mp4" />
Your browser does not support the video tag. </video>
<figcaption>A walkthrough of the Hurricane user story</figcaption>
</figure>
- A user enters the question they want to write a blog post about. - Hurricane acknowledges the request and streams its progress while writing. - Hurricane returns a blog post to the user and updates the counter of AI-generated blog posts in Weaviate. As illuminated by [Arize Phoenix](https://docs.arize.com/phoenix), running Hurricane with GPT-3.5-Turbo takes about 45 seconds to convert a question into a blog post.
Server Setup Information
- Weaviate Server Version: 1.25.6
- Deployment Method: local docker container
- Multi Node? Number of Running Nodes: only one node
- Client Language and Version: Python, 4.6.5
- Multitenancy?: not specified in collection schema