I’m trying to create a RAG where I put the output of OCR into weaviate then run queries against the index that produces (I think, sorry if I get any terminology wrong, feel free to correct me). I’ve put a simplified version of my code here: example.py · GitHub with unnecessary things like secret loading stripped out and everything put in one file so it’s easier to read. When I run my code, weaviate never returns any nodes from the index.as_retriever().retrieve(prompt) call, no matter how I phrase the prompt/query. Likewise the index.as_query_engine(output_cls=Invoice, response_mode="compact").query(prompt).response call returns Empty Response.
Server Setup Information
Weaviate Server Version: 1.23.7
Deployment Method: docker-compose
Multi Node? Number of Running Nodes: 1
Client Language and Version: python weaviate-client 4.6.4
Multitenancy?: no
Any additional Information
I already tried asking the llama-index help chat on their discord, but they couldn’t help: Discord
Depends what you mean by directly. I just tried using the weaviate python client directly instead of via llama index, and I couldn’t get any sign that any collection had any data in it. However I didn’t use the graphql api directly.
here’s an example of what I used to check if weaviate had any data: [client.collections.get(collection).query.fetch_objects() for collection in client.collections.list_all()]
Assuming that I’m checking correctly, then weaviate only creates the collections but does not populate them when I run my above-linked code. Any idea why? I definitely pass a bunch of TextNodes to the constructor as I’ve printed them out before I do and they exist.
I updated the gist to contain an entire reproduction environment. You can just download all the files into one dir, and run docker compose up to run it.
The nodes are not being stored by weaviate, which is very strange.