The file is huge and contains nested JSON which is difficult to upload on Weaviate.
Hi @Vruti_Dobariya - it depends on what you want to do with it.
For example, if you were going to turn it into one vector, you could stringify it and save it that way. It will vectorize fine, and you could potentially use the Weaviate string filters with it. Would that work for your use case?
Thanks for replying!
I want to store the JSON content there and vectorize it and ultimately perform a vector search on it.
Okay. Are you storing a bunch of different OpenAPI specs? Or do you want to search through the one (Stripe) OpenAPI spec?
If you are storing a bunch of different OpenAPI specs and find similar ones, you can store the whole spec as one string.
On the other hand, if you want to search through the one spec, then you will have to split it into bits that you want to find.
I am storing only Stripe’s. And the problem arose because it is highly nested so to divide it will be a very difficult task.
What is your use case? Hard to make recommendations without knowing what your end goal is.
I want to perform semantic search on this data. To create personal auto gpt of Stripe API services.
I would chunk your input by route then and embed each separately. I have had good success building API help bots this way.
Yes. I am trying to do it that way. Thank you for your help!
Why can’t I define a class inside a class while defining the Weaviate schema?
Hi @Vruti_Dobariya classes aren’t configured to allow that - each class is like a SQL table, so just like you can’t have nested tables, you can’t have nested classes.
You can cross-reference them though. Does that help? (Cross-references | Weaviate - vector database)
Thanks @jphwang this problem was solved! However, I have one other doubt.
I am using generative search and I am gathering response over the list of elements:
for string in my_list:
response = (
client.query
.get("StripeAPI", ["path", "key", "value", "dataType"])
.with_limit(1123)
.with_where({
"path": ["path"],
"operator": "Equal",
"valueText": string
})
.do()
)
print(len(response['data']['Get']['StripeAPI']))
Here is the snippet.
Here, in the list of strings that I am traversing over, the first string returns 1123 objects i.e. the max number of objects. But, that string, only has 1054 objects. The problem remains the same even when I change the max limit. How do I solve it?
Hi @Vruti_Dobariya - sorry for the late reply, I’ve been away for a few days. Can you help me understand the question better?
Could you clarify what you mean when you say this?
the first string returns 1123 objects i.e. the max number of objects. But, that string, only has 1054 objects
If you mean that regardless of the input, it’s returning the same number of hits as the limit
, that is the expected behaviour.
Given that this is a vector search, all objects are “similar” to some degree. So Weaviate will return the n
best matching objects. If I’m misunderstanding the question, please let me know.
Hey, @jphwang. Yes. So, there was a glitch where, regardless of the number of objects present in a string, for the first string, it would generate 1123 objects, i.e., the maximum limit. But it was solved automatically without any changes once I reran my Docker instance. Thanks.
Fantastic! Glad to hear it’s been resolved
Hey @jphwang!
I have another query, if you can help me with that.
vector_store = WeaviateVectorStore(weaviate_client=client)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(data, storage_context=storage_context)
query_engine = index.as_query_engine()
response = query_engine.query(
"xyz")
print(response)
In the code above, I am loading data into the Weaviate vector store using Llama-index. The problem here is that every time I ask for the query, the data is loaded all over again in Weavite. What I want is that once the data is indexed and stored in Weaviate, when I run the query, I want to get access to the already loaded data instead of indexing it all over again. How can I solve this issue?Preformatted text
Hi @Vruti_Dobariya - could you please make a new thread per issue? That would help us to track each question and get help as needed. Thanks!
Yes, sure thing! Sorry for the inconvenience.