Help Needed: Issue with Vectorizers in Weaviate Collections Console

Frosslee · July 25, 2024, 7:22am

Hello everyone,

I’m currently working on a project involving two different bots that use different vectorizers for creating and managing schemas and databases in Weaviate. However, I’m encountering an issue where only one of the bots’ vectorizers previews correctly in the Weaviate collections console. using amazon.titan-embed-text-v2:0

TLDR
The vectorizer for GPT is working correctly, but it’s not functioning for AWS Bedrock. Attempts to fix it have been unsuccessful so far.

Details of Implementations:

ChatGPT Bot

Get Client Function:

def get_client(database_name):
    WCS_API_KEY, WCS_CLUSTER_URL, OPENAI_APIKEY, class_name = database_picker(database_name)
    max_retries = 7
    retry_delay_seconds = 2 
    for attempt in range(1, max_retries + 1):
        try:
            client = weaviate.connect_to_wcs(
                cluster_url=WCS_CLUSTER_URL,
                auth_credentials=weaviate.auth.AuthApiKey(WCS_API_KEY),
                headers={"X-OpenAI-Api-Key": OPENAI_APIKEY},
                skip_init_checks=True
            )
            return client
        except Exception as e:
            print(f"Attempt {attempt} failed with error: {str(e)}")
            if attempt < max_retries:
                print(f"Retrying in {retry_delay_seconds} seconds...")
                time.sleep(retry_delay_seconds)

Generate Database Function:

def generate_database(class_name, client):
    try:
        client.collections.create(
            name=class_name,
            vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),
            generative_config=wvc.config.Configure.Generative.openai(),
            properties=[
                wvc.config.Property(
                    name=class_name,
                    data_type=wvc.config.DataType.TEXT
                )
            ]
        )
    except Exception as e:
        return {"error": str(e)}
    finally:
        client.close()

Bedrock Bot

using amazon.titan-embed-text-v2:0

Get Client Function:

for attempt in range(0, RETRY_AMOUNT):
    try:
        client = None
        client = weaviate.connect_to_wcs(
            cluster_url=os.getenv("WCS_CLUSTER_URL"),
            auth_credentials=AuthApiKey(os.getenv("WCS_API_KEY")),
            skip_init_checks=os.getenv("W_SKIP_INIT_CHECK"),
            headers={
                "X-AWS-Access-Key": os.getenv("ACCESS_KEY_ID"),
                "X-AWS-Secret-Key": os.getenv("SECRET_ACCESS_KEY"),
            },
        )
        while client.is_ready() == False:
            pass
        return client

Generate Database Function:

try:
    client.collections.create(
        class_name,
        vectorizer_config=[
            Configure.NamedVectors.text2vec_aws(
                name=class_name,
                region=os.getenv("WEAVIATE_REGION"),
                service=os.getenv("WEAVIATE_SERVICE"),
                model=os.getenv("WEAVIATE_MODEL"),
            )
        ],
        properties=[
            wvc.config.Property(
                name=class_name,
                data_type=wvc.config.DataType.TEXT
            )
        ],
    )
    return json.dumps({"result": "Database class created successfully."})
except Exception as e:
    return {"error": str(e)}

Issue:
While both bots create schemas and databases successfully, only the ChatGPT bot’s vectorizer appears correctly in the Weaviate collections console. The Bedrock bot’s vectorizer does not show up as expected.

Things I’ve Tried:

Double-check the configuration settings for both bots.
Verified API keys and permissions.
Ensured that both bots have the same class name and data type configurations.

Request for Help:

Has anyone faced a similar issue with different vectorizers in Weaviate?
Are there specific settings or configurations that I might be missing for the Bedrock bot?
Any troubleshooting steps or advice would be greatly appreciated.

Thank you in advance for your help!

image for GPT on the console

image for the using amazon.titan-embed-text-v2:0

DudaNogueira · July 25, 2024, 9:47pm

hi @Frosslee !!

Welcome back

Does the collections and queries work as expected?

This may be a UI problem in our console.

Let me know if this is just the UI. I try to reproduce this meanwhile.

Thanks!

Frosslee · July 26, 2024, 4:36am

HI @DudaNogueira

So querying works fine for GPT to get a distance lower than 0.24 bit for AWS bedrock nothing s under 0.4 really. I did multiple tests and no distance i can’t get below 0.4 (cosine) the data returned is not usable or completely mumbled and doesn’t even return a chunk with the correct information

DudaNogueira · July 26, 2024, 11:47pm

Well, the absolute comparison of distance between models per se doesn’t giving it much as it can be relative to other objects, but the similarity/relevance of the results against the query that is the main thing to look for.

Have you seen this blog post?

Looks like you are in to some benchmarks, so it may be relevant here

Other than that, unless there is something going wrong on the second batch import, it points to how the model vectorized your dataset

Let me know if this helps

Frosslee · July 30, 2024, 7:35am

Thank you for the blog post recommendation. I appreciate the resource on choosing embedding models.

However, the issue I’m facing isn’t with selecting the embedding model itself. The problem lies in the console’s inability to preview the vectorizer, which results in significant discrepancies in the semantic search. The distances returned are quite poor, as shown in the images and code I provided. I’m uncertain if there’s a mistake in my implementation of the Bedrock code.

Could you provide any guidance or insight on this?

Topic		Replies	Views
Failed to create collection with vectorizers using demo code in nodejs Support	3	62	October 14, 2024
AWS Bedrock invalid template issue Support integration , python	2	195	November 11, 2024
Attribute missing when querying near vector Support	5	519	May 14, 2024
VoyageAI text embedding in Weaviate Cloud - Not working Support integration , wcs , python	3	144	May 30, 2024
Looking for a way to vectorize a data object using WCS internal vectorizer module General	1	365	July 7, 2023

Help Needed: Issue with Vectorizers in Weaviate Collections Console

Bedrock Bot

Related topics