Help Needed: Issue with Vectorizers in Weaviate Collections Console

Hello everyone,

I’m currently working on a project involving two different bots that use different vectorizers for creating and managing schemas and databases in Weaviate. However, I’m encountering an issue where only one of the bots’ vectorizers previews correctly in the Weaviate collections console. using amazon.titan-embed-text-v2:0

TLDR
The vectorizer for GPT is working correctly, but it’s not functioning for AWS Bedrock. Attempts to fix it have been unsuccessful so far.

Details of Implementations:

ChatGPT Bot

Get Client Function:

def get_client(database_name):
    WCS_API_KEY, WCS_CLUSTER_URL, OPENAI_APIKEY, class_name = database_picker(database_name)
    max_retries = 7
    retry_delay_seconds = 2 
    for attempt in range(1, max_retries + 1):
        try:
            client = weaviate.connect_to_wcs(
                cluster_url=WCS_CLUSTER_URL,
                auth_credentials=weaviate.auth.AuthApiKey(WCS_API_KEY),
                headers={"X-OpenAI-Api-Key": OPENAI_APIKEY},
                skip_init_checks=True
            )
            return client
        except Exception as e:
            print(f"Attempt {attempt} failed with error: {str(e)}")
            if attempt < max_retries:
                print(f"Retrying in {retry_delay_seconds} seconds...")
                time.sleep(retry_delay_seconds)

Generate Database Function:

def generate_database(class_name, client):
    try:
        client.collections.create(
            name=class_name,
            vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),
            generative_config=wvc.config.Configure.Generative.openai(),
            properties=[
                wvc.config.Property(
                    name=class_name,
                    data_type=wvc.config.DataType.TEXT
                )
            ]
        )
    except Exception as e:
        return {"error": str(e)}
    finally:
        client.close()

Bedrock Bot

using amazon.titan-embed-text-v2:0

Get Client Function:

for attempt in range(0, RETRY_AMOUNT):
    try:
        client = None
        client = weaviate.connect_to_wcs(
            cluster_url=os.getenv("WCS_CLUSTER_URL"),
            auth_credentials=AuthApiKey(os.getenv("WCS_API_KEY")),
            skip_init_checks=os.getenv("W_SKIP_INIT_CHECK"),
            headers={
                "X-AWS-Access-Key": os.getenv("ACCESS_KEY_ID"),
                "X-AWS-Secret-Key": os.getenv("SECRET_ACCESS_KEY"),
            },
        )
        while client.is_ready() == False:
            pass
        return client

Generate Database Function:

try:
    client.collections.create(
        class_name,
        vectorizer_config=[
            Configure.NamedVectors.text2vec_aws(
                name=class_name,
                region=os.getenv("WEAVIATE_REGION"),
                service=os.getenv("WEAVIATE_SERVICE"),
                model=os.getenv("WEAVIATE_MODEL"),
            )
        ],
        properties=[
            wvc.config.Property(
                name=class_name,
                data_type=wvc.config.DataType.TEXT
            )
        ],
    )
    return json.dumps({"result": "Database class created successfully."})
except Exception as e:
    return {"error": str(e)}

Issue:
While both bots create schemas and databases successfully, only the ChatGPT bot’s vectorizer appears correctly in the Weaviate collections console. The Bedrock bot’s vectorizer does not show up as expected.

Things I’ve Tried:

  1. Double-check the configuration settings for both bots.
  2. Verified API keys and permissions.
  3. Ensured that both bots have the same class name and data type configurations.

Request for Help:

  • Has anyone faced a similar issue with different vectorizers in Weaviate?
  • Are there specific settings or configurations that I might be missing for the Bedrock bot?
  • Any troubleshooting steps or advice would be greatly appreciated.

Thank you in advance for your help!

image for GPT on the console

image for the using amazon.titan-embed-text-v2:0

hi @Frosslee !!

Welcome back :slight_smile:

Does the collections and queries work as expected?

This may be a UI problem in our console.

Let me know if this is just the UI. I try to reproduce this meanwhile.

Thanks!

HI @DudaNogueira

So querying works fine for GPT to get a distance lower than 0.24 bit for AWS bedrock nothing s under 0.4 really. I did multiple tests and no distance i can’t get below 0.4 (cosine) the data returned is not usable or completely mumbled and doesn’t even return a chunk with the correct information

Well, the absolute comparison of distance between models per se doesn’t giving it much as it can be relative to other objects, but the similarity/relevance of the results against the query that is the main thing to look for.

Have you seen this blog post?

Looks like you are in to some benchmarks, so it may be relevant here :slight_smile:

Other than that, unless there is something going wrong on the second batch import, it points to how the model vectorized your dataset :grimacing:

Let me know if this helps

Thank you for the blog post recommendation. I appreciate the resource on choosing embedding models.

However, the issue I’m facing isn’t with selecting the embedding model itself. The problem lies in the console’s inability to preview the vectorizer, which results in significant discrepancies in the semantic search. The distances returned are quite poor, as shown in the images and code I provided. I’m uncertain if there’s a mistake in my implementation of the Bedrock code.

Could you provide any guidance or insight on this?