Identical Text Returns Distance ~0.18 Instead of ~0.0 with Azure OpenAI text-embedding-3-large

Description

I’m experiencing an issue where identical text strings are returning a distance of approximately 0.18 instead of 0.0 when using nearText queries. This suggests that the vectors created by Weaviate’s text2vec-openai module are different from vectors created by direct calls to the same Azure OpenAI model.

Problem Details:

  1. I have configured my collection to use text2vec-openai with Azure OpenAI’s text-embedding-3-large model

  2. When I insert a record with text XZY, Weaviate creates a vector that starts with: -0.013,-0.040,-0.003

  3. When I call Azure OpenAI directly with the same text XYZ using text-embedding-3-large, I get a vector starting with: 0.006469, -0.019619, -0.011873

  4. When I query Weaviate using nearText with the exact same text that was stored, the distance is ~0.18 instead of 0.0

Collection Configuration:

csharp

var classDefinition = new
{
    @class = "MyContainer",
    vectorizer = "text2vec-openai",
    vectorIndexConfig = new Dictionary<string, object>
    {
        ["distance"] = "cosine"
    },
    moduleConfig = new Dictionary<string, object>
    {
        ["text2vec-openai"] = new Dictionary<string, object>
        {
            ["model"] = "text-embedding-3-large",
            ["dimensions"] = 3072,
            ["sourceProperties"] = new[] { "textWithoutTags" },
            ["vectorizeClassName"] = false,
            ["vectorizePropertyName"] = false,
            ["resourceName"] = "my-azure-resource",
            ["deploymentId"] = "text-embedding-3-large",
            ["baseURL"] = "https://my-azure-resource.openai.azure.com",
            ["apiKey"] = "my-api-key",
            ["apiVersion"] = "2024-02-01"
        }
    }
    // ... properties
};

Query Used:

graphql

{
    Get {
        MyContainer(
            nearText: { concepts: ["exact same text"], distance: 0.5 }
            where: {
                path: ["agentId"], operator: Equal, valueString: "test-agent"
            }
        ) {
            _additional { id distance }
            textWithoutTags
        }
    }
}

Expected Behavior: Distance should be 0.0 (or very close) when searching for identical text Actual Behavior: Distance is approximately 0.18

Server Setup Information

  • Weaviate Server Version: 1.31.5

    Deployment Method: Weaviate Cloud

    Multi Node? Number of Running Nodes: Single node (Weaviate Cloud)

    Client Language and Version: C# / .NET

    Multitenancy?: No

Any additional Information

  • I have verified that the Azure OpenAI deployment is working correctly when called directly

  • The API key and deployment configuration are correct (no authentication errors)

  • I’ve set both vectorizeClassName = false and vectorizePropertyName = false to ensure only the specified properties are vectorized without additional metadata

  • The issue occurs consistently across different text inputs

  • Azure OpenAI logs show successful API calls from Weaviate

Questions:

  1. Could there be preprocessing differences between Weaviate’s text2vec-openai module and direct Azure OpenAI calls?

  2. Is there a way to verify that Weaviate is actually using my configured Azure OpenAI endpoint rather than falling back to OpenAI’s direct API?

  3. Are there any known issues with Azure OpenAI integration in version 1.31.5?

Thanks for your help!