How do I store Stripe's OpenAPI JSON file (Which is highly unstructured JSON file) on weaviate?

Vruti_Dobariya · June 12, 2023, 6:24am

The file is huge and contains nested JSON which is difficult to upload on Weaviate.

jphwang · June 12, 2023, 6:41pm

Hi @Vruti_Dobariya - it depends on what you want to do with it.

For example, if you were going to turn it into one vector, you could stringify it and save it that way. It will vectorize fine, and you could potentially use the Weaviate string filters with it. Would that work for your use case?

Vruti_Dobariya · June 13, 2023, 8:14am

Thanks for replying!
I want to store the JSON content there and vectorize it and ultimately perform a vector search on it.

jphwang · June 13, 2023, 9:46am

Okay. Are you storing a bunch of different OpenAPI specs? Or do you want to search through the one (Stripe) OpenAPI spec?

If you are storing a bunch of different OpenAPI specs and find similar ones, you can store the whole spec as one string.

On the other hand, if you want to search through the one spec, then you will have to split it into bits that you want to find.

Vruti_Dobariya · June 13, 2023, 10:22am

I am storing only Stripe’s. And the problem arose because it is highly nested so to divide it will be a very difficult task.

felixthekraut · June 13, 2023, 11:43am

What is your use case? Hard to make recommendations without knowing what your end goal is.

Vruti_Dobariya · June 13, 2023, 12:05pm

I want to perform semantic search on this data. To create personal auto gpt of Stripe API services.

felixthekraut · June 13, 2023, 12:08pm

I would chunk your input by route then and embed each separately. I have had good success building API help bots this way.

Vruti_Dobariya · June 13, 2023, 12:09pm

Yes. I am trying to do it that way. Thank you for your help!

Vruti_Dobariya · June 14, 2023, 11:16am

Why can’t I define a class inside a class while defining the Weaviate schema?

jphwang · June 14, 2023, 6:28pm

Hi @Vruti_Dobariya classes aren’t configured to allow that - each class is like a SQL table, so just like you can’t have nested tables, you can’t have nested classes.

You can cross-reference them though. Does that help? (Cross-references | Weaviate - vector database)

Vruti_Dobariya · June 22, 2023, 10:19am

Thanks @jphwang this problem was solved! However, I have one other doubt.

Vruti_Dobariya · June 22, 2023, 10:22am

I am using generative search and I am gathering response over the list of elements:

for string in my_list:
    response = (
        client.query
        .get("StripeAPI", ["path", "key", "value", "dataType"])
        .with_limit(1123)
        .with_where({
            "path": ["path"],
            "operator": "Equal",
            "valueText": string
        })
        .do()
    )
   print(len(response['data']['Get']['StripeAPI']))

Here is the snippet.
Here, in the list of strings that I am traversing over, the first string returns 1123 objects i.e. the max number of objects. But, that string, only has 1054 objects. The problem remains the same even when I change the max limit. How do I solve it?

jphwang · June 30, 2023, 4:47pm

Hi @Vruti_Dobariya - sorry for the late reply, I’ve been away for a few days. Can you help me understand the question better?

Could you clarify what you mean when you say this?

the first string returns 1123 objects i.e. the max number of objects. But, that string, only has 1054 objects

If you mean that regardless of the input, it’s returning the same number of hits as the limit, that is the expected behaviour.

Given that this is a vector search, all objects are “similar” to some degree. So Weaviate will return the n best matching objects. If I’m misunderstanding the question, please let me know.

Vruti_Dobariya · July 3, 2023, 7:09am

Hey, @jphwang. Yes. So, there was a glitch where, regardless of the number of objects present in a string, for the first string, it would generate 1123 objects, i.e., the maximum limit. But it was solved automatically without any changes once I reran my Docker instance. Thanks.

jphwang · July 3, 2023, 3:41pm

Fantastic! Glad to hear it’s been resolved

Vruti_Dobariya · July 5, 2023, 12:55pm

Hey @jphwang!
I have another query, if you can help me with that.

vector_store = WeaviateVectorStore(weaviate_client=client)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(data, storage_context=storage_context)
query_engine = index.as_query_engine()
response = query_engine.query(
        "xyz")

print(response)

In the code above, I am loading data into the Weaviate vector store using Llama-index. The problem here is that every time I ask for the query, the data is loaded all over again in Weavite. What I want is that once the data is indexed and stored in Weaviate, when I run the query, I want to get access to the already loaded data instead of indexing it all over again. How can I solve this issue?Preformatted text

jphwang · July 5, 2023, 7:22pm

Hi @Vruti_Dobariya - could you please make a new thread per issue? That would help us to track each question and get help as needed. Thanks!

Vruti_Dobariya · July 6, 2023, 7:14am

Yes, sure thing! Sorry for the inconvenience.

Topic		Replies	Views
Storing JSON data inside weaviate vector db Support	2	1098	February 1, 2024
Can Weaviate read JSON data? Support python , technical	4	465	December 3, 2024
Is there a way to save the weaviate vectorstore in disk, and then reuse it for further querying? General	13	1671	May 17, 2024
Openai API key fails when i connect with weaviate but works with normal chat completion api Support	3	281	January 31, 2025
Cannot use vec. param. using OpenAI API key via GPT assistant yaml / json schema Support	3	674	November 21, 2023

How do I store Stripe's OpenAPI JSON file (Which is highly unstructured JSON file) on weaviate?

Related topics