Description
I’m experiencing an issue where identical text strings are returning a distance of approximately 0.18 instead of 0.0 when using nearText
queries. This suggests that the vectors created by Weaviate’s text2vec-openai
module are different from vectors created by direct calls to the same Azure OpenAI model.
Problem Details:
-
I have configured my collection to use
text2vec-openai
with Azure OpenAI’stext-embedding-3-large
model -
When I insert a record with text XZY, Weaviate creates a vector that starts with:
-0.013,-0.040,-0.003
-
When I call Azure OpenAI directly with the same text XYZ using
text-embedding-3-large
, I get a vector starting with:0.006469, -0.019619, -0.011873
-
When I query Weaviate using
nearText
with the exact same text that was stored, the distance is ~0.18 instead of 0.0
Collection Configuration:
csharp
var classDefinition = new
{
@class = "MyContainer",
vectorizer = "text2vec-openai",
vectorIndexConfig = new Dictionary<string, object>
{
["distance"] = "cosine"
},
moduleConfig = new Dictionary<string, object>
{
["text2vec-openai"] = new Dictionary<string, object>
{
["model"] = "text-embedding-3-large",
["dimensions"] = 3072,
["sourceProperties"] = new[] { "textWithoutTags" },
["vectorizeClassName"] = false,
["vectorizePropertyName"] = false,
["resourceName"] = "my-azure-resource",
["deploymentId"] = "text-embedding-3-large",
["baseURL"] = "https://my-azure-resource.openai.azure.com",
["apiKey"] = "my-api-key",
["apiVersion"] = "2024-02-01"
}
}
// ... properties
};
Query Used:
graphql
{
Get {
MyContainer(
nearText: { concepts: ["exact same text"], distance: 0.5 }
where: {
path: ["agentId"], operator: Equal, valueString: "test-agent"
}
) {
_additional { id distance }
textWithoutTags
}
}
}
Expected Behavior: Distance should be 0.0 (or very close) when searching for identical text Actual Behavior: Distance is approximately 0.18
Server Setup Information
-
Weaviate Server Version: 1.31.5
Deployment Method: Weaviate Cloud
Multi Node? Number of Running Nodes: Single node (Weaviate Cloud)
Client Language and Version: C# / .NET
Multitenancy?: No
Any additional Information
-
I have verified that the Azure OpenAI deployment is working correctly when called directly
-
The API key and deployment configuration are correct (no authentication errors)
-
I’ve set both
vectorizeClassName = false
andvectorizePropertyName = false
to ensure only the specified properties are vectorized without additional metadata -
The issue occurs consistently across different text inputs
-
Azure OpenAI logs show successful API calls from Weaviate
Questions:
-
Could there be preprocessing differences between Weaviate’s text2vec-openai module and direct Azure OpenAI calls?
-
Is there a way to verify that Weaviate is actually using my configured Azure OpenAI endpoint rather than falling back to OpenAI’s direct API?
-
Are there any known issues with Azure OpenAI integration in version 1.31.5?
Thanks for your help!