How to debug Weaviate

I setup a local Weaviate instance with docker compose. I then follow the Quickstart tutorial to insert some data into the instance. In my case, I configured openai key and baseurl.

OPENAI_APIKEY = "xxx"
OPENAI_BASE_URL = "xxx"

client = weaviate.connect_to_local(
    host="192.168.5.116",
    headers={
        "X-OpenAI-BaseURL": OPENAI_BASE_URL,
        "X-OpenAI-Api-Key": OPENAI_APIKEY  # Replace with your inference API key
    }
)

try:
    questions = client.collections.create(
        name="Question",
        vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),
        # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
        generative_config=wvc.config.Configure.Generative.openai()
        # Ensure the `generative-openai` module is used for generative queries
    )

    resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
    data = json.loads(resp.text)  # Load data

    question_objs = list()
    for i, d in enumerate(data):
        question_objs.append({
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        })

    questions = client.collections.get("Question")
    questions.data.insert_many(question_objs)

finally:
    client.close()

When I run it, I got the error:

weaviate.exceptions.WeaviateInsertManyAllFailedError: Every object failed during insertion. Here is the set of all errors: update vector: unmarshal response body: invalid character '<' looking for beginning of value

From wireshark, I can see that Weaviate is talking to the baseUrl via https. However, I could not see what is sent and what is the response. How to debug it? Do I need to setup Weaviate from source and trace it?

Hi @Charles_Ju ! Welcomet to our community :hugs:

Have you tried not specifying the X-OpenAI-BaseURL? Also, what is this value for you? This seems some issue while vectorizing your objects.

This worked for me:

import os

client = weaviate.connect_to_local(
    headers={
        "X-OpenAI-Api-Key": os.environ.get("OPENAI_APIKEY")
    }
)
client.collections.delete("Question")
try:
    questions = client.collections.create(
        name="Question",
        vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),
        # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
        generative_config=wvc.config.Configure.Generative.openai()
        # Ensure the `generative-openai` module is used for generative queries
    )

    resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
    data = json.loads(resp.text)  # Load data

    question_objs = list()
    for i, d in enumerate(data):
        question_objs.append({
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        })

    questions = client.collections.get("Question")
    questions.data.insert_many(question_objs)
    print(questions.aggregate.over_all(total_count=True))

finally:
    client.close()

I could not use OpenAI directly. So I made a guess and set X-OpenAI-BaseURL to an OpenAI proxy. Anyway, I studied golang and will give it a try to setup Weaviate from source and trace down the code. It looks a fun project to dive deep.

1 Like