Query call with protocol GRPC search failed with message connection to: Azure OpenAI API failed with status: 404 error: Resource not found

I am trying to create a locally hosted Weaviate RAG wherein I am calling the Azure Government Cloud OpenAI models for generation but it is failing with the error message: I have specified the correct API key, Deployment Id, Resource Name, API version and the base url which has azure.us in it. I am still getting this error. I have checked through a powershell script that the api call is working fine and the model is accessible (all the api and endpoint details are correct) but it isn’t working when called from weaviate. Can someone help on this please or know the solution to this?

The config is as follows:

client = weaviate.connect_to_local(

headers = {

"X-Azure-Api-Key": AZURE_APIKEY,

\# "X-Azure-Api-Version": AZURE_API_VERSION,

"X-Azure-Deployment-Id": AZURE_DEPLOYMENT,

"X-Azure-Resource-Name":AZURE_RESOURCE

\# "X-Openai-Baseurl": AZURE_ENDPOINT

}

)

  generative_config=Configure.Generative.azure_openai(

        resource_name=\*\*\*\*\*\*\*\*\*\*,

        deployment_id=\*\*\*\*\*\*\*\*\*\*\*,

        base_url=“$($openai.api_base)/openai/deployments/$($openai.name)/chat/completions?api-version=$($openai.api_version)”

    )

Here is the complete traceback error:

Error type: WeaviateQueryError
Error in RAG search: WeaviateQueryError(‘Query call with protocol GRPC search failed with message connection to: Azure OpenAI API failed with status: 404 error: Resource not found.’)
Full traceback:
Traceback (most recent call last):
File “c:\Users\j.kaur\AI\librechat-local\docker\weaviate_perftest.venv\Lib\site-packages\weaviate\connect\v4.py”, line 955, in grpc_search
res = _Retry(4).with_exponential_backoff(
0,
…<4 lines>…
timeout=self.timeout_config.query,
)
File “c:\Users\j.kaur\AI\librechat-local\docker\weaviate_perftest.venv\Lib\site-packages\weaviate\retry.py”, line 54, in with_exponential_backoff
raise e
File “c:\Users\j.kaur\AI\librechat-local\docker\weaviate_perftest.venv\Lib\site-packages\weaviate\retry.py”, line 50, in with_exponential_backoff
return f(*args, **kwargs)
File “c:\Users\j.kaur\AI\librechat-local\docker\weaviate_perftest.venv\Lib\site-packages\grpc_channel.py”, line 1166, in call
return _end_unary_response_blocking(state, call, False, None)
File “c:\Users\j.kaur\AI\librechat-local\docker\weaviate_perftest.venv\Lib\site-packages\grpc_channel.py”, line 996, in _end_unary_response_blocking
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNKNOWN
details = “connection to: Azure OpenAI API failed with status: 404 error: Resource not found”
debug_error_string = “UNKNOWN:Error received from peer {grpc_message:“connection to: Azure OpenAI API failed with status: 404 error: Resource not found”, grpc_status:2}”

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Users\j.kaur\AI\librechat-local\docker\weaviate_perftest\app.py”, line 210, in run_example_queries
response = collection.generate.near_text(
query=“project status updates”,
…<2 lines>…
limit=3 # Retrieve top 3 messages
)
File “c:\Users\j.kaur\AI\librechat-local\docker\weaviate_perftest.venv\Lib\site-packages\weaviate\collections\queries\near_text\generate\executor.py”, line 491, in near_text
return executor.execute(
~~~~~~~~~~~~~~~~^
response_callback=resp,
^^^^^^^^^^^^^^^^^^^^^^^
method=self._connection.grpc_search,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
request=request,
^^^^^^^^^^^^^^^^
)
^
File “c:\Users\j.kaur\AI\librechat-local\docker\weaviate_perftest.venv\Lib\site-packages\weaviate\connect\executor.py”, line 99, in execute
return cast(T, exception_callback(e))
~~~~~~~~~~~~~~~~~~^^^
File “c:\Users\j.kaur\AI\librechat-local\docker\weaviate_perftest.venv\Lib\site-packages\weaviate\connect\executor.py”, line 38, in raise_exception
raise e
File “c:\Users\j.kaur\AI\librechat-local\docker\weaviate_perftest.venv\Lib\site-packages\weaviate\connect\executor.py”, line 80, in execute
call = method(*args, **kwargs)
File “c:\Users\j.kaur\AI\librechat-local\docker\weaviate_perftest.venv\Lib\site-packages\weaviate\connect\v4.py”, line 968, in grpc_search
raise WeaviateQueryError(str(error.details()), “GRPC search”) # pyright: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Hi @jasnoor ,

Good Day!

Welcome to Weaviate Community!

To help you further with this issue, could you confirm the following:

  1. Could verify which part are you seeing this error? Is this during connect_to_local() call or when creating collection and defining generative_config?
  2. Ensure generative-openai module is enabled on your local Weaviate instance. To enable module, you may check on our documentation. See notes for Self-hosted users.

3.Verify values for header parameters when connecting to your instance. To add please verify that base_url is equivalent to endpoint URL, sample https://<your-resouce-name>.openai.azure.com/openai/deployments/<your-deployment-name>. See Azure OpenAI API documentation.

Hope this helps.

This is how my dockerfile looks:

weaviate:

image: cr.weaviate.io/semitechnologies/weaviate:1.32.2

ports:

  - "8080:8080"

  - "50051:50051"

environment:

  ENABLE_MODULES: text2vec-model2vec,generative-openai,Generative.azure_openai,text2vec-openai

  MODEL2VEC_INFERENCE_API: http://text2vec-model2vec:8080

  LOG_LEVEL: 'debug'

  DEBUG: "true"

  AZURE_APIKEY: ${AZURE_APIKEY}

  AZURE_RESOURCE: ${AZURE_RESOURCE}

  AZURE_DEPLOYMENT: ${AZURE_DEPLOYMENT}

  AZURE_API_VERSION: ${AZURE_API_VERSION}

  AZURE_ENDPOINT: ${AZURE_ENDPOINT}

text2vec-model2vec:

image: cr.weaviate.io/semitechnologies/model2vec-inference:minishlab-potion-base-32M
  1. I am getting this error at collection.generate.near_text() call.
  2. I have enabled generative-openai module.

This is the output snippet for meta_info = client.get_meta()

{‘grpcMaxMessageSize’: 104858000, ‘hostname’: ‘http://[::]:8080’, ‘modules’: {‘generative-openai’: {‘documentationHref’: ‘https://platform.openai.com/docs/api-reference/completions’, ‘name’: ‘Generative Search - OpenAI’}, ‘text2vec-model2vec’: {‘apply_pca’: 512, ‘apply_zipf’: True, ‘architectures’: [‘StaticModel’], ‘hidden_dim’: 512, ‘model_type’: ‘model2vec’, ‘normalize’: True, ‘seq_length’: 1000000, ‘tokenizer_name’: ‘baai/bge-base-en-v1.5’}}

  1. Header params are as follows:
    “X-Azure-Api-Key”: "90db************************8b”,

“X-Azure-Api-Version”: “2024-10-21”,

“X-Azure-Deployment-Id”: “aidev-gpt-4o-mini” ,

“X-Azure-Resource-Name”:“usga-oai-dev-oai”

“X-Openai-Baseurl”: “https://usga-oai-dev-oai.openai.azure.us/openai/deployments/aidev-gpt-4o-mini/chat/completions?api-version=2024-10-21”

Yes, the base URL is equivalent to the endpoint URL.
I added the api version and chat/completions to the URL but even before adding that the error was the same. Please note that my URL has azure . us rather than azure . com because I am using Azure Gov Cloud.

@jasnoor , Thanks for the update.

Were you able to try using the following endpoint when connecting or creating the collection?
https://usga-oai-dev-oai.openai.azure.us/openai/deployments/aidev-gpt-4o-mini

Additionally, could you please let me know which client version you are currently using? This will help us further investigate and narrow down the issue.

Looking forward to your response.

Hello,

I have tried the above mentioned URL but got the same error.

The weaviate-client version I am working on is 4.19.2

Please help on this since I am unable to debug

Hi @jasnoor !!

Here is a reproducible code to use OpenAi Azure in Weaviate:

import weaviate
from weaviate.classes.config import Configure

client = weaviate.connect_to_local(
    headers = {
        "X-Azure-Api-Key": "9vL.......YOUR-API-KEY-HERE"
    },
)

client.collections.delete("Test")
collection = client.collections.create(
    "Test",
    vector_config=Configure.Vectors.text2vec_azure_openai(
        name="default",
        deployment_id="text-embedding-3-small",
        resource_name="teste-duda",
    ),
    generative_config=Configure.Generative.azure_openai(
        resource_name="teste-duda",
        deployment_id="gpt-4.1-mini"
    )
)
collection.data.insert({"text": "This is an embedding test"})
object = collection.generate.fetch_objects(
    limit=1, 
    include_vector=True, 
    single_prompt="translate to Portuguese: {text} "
).objects[0]

print("Vector Dimension", len(object.vector.get("default")))
print(object.generative.text)

This should be the output:

Vector Dimension 1536
Este é um teste de incorporação.

I have deployed both gpt-4.1-mini, for generative, and text-embedding-3-small for embeddings in a resource called teste-duda. This is how my Azure Portal looks like:

Let me know if this helps!

It is working now, thanks! Kindly inform from where can I read more on fine-tuning the RAG params or using different weaviate functions for configuring a rag pipeline?

1 Like

hi @jasnoor !

I recommend you to take a look at this cool project we have: https://elysia.weaviate.io/

It is open source, so you can use it with your own collections locally.

You can learn more about it here: Elysia: Building an end-to-end agentic RAG app | Weaviate

And buckle up! You are in for a ride :slight_smile:

Let me know if this helps!

1 Like

Hi @jasnoor,

It might be helpful to the point for you so I thought I would share. I’m currently working on a series of Weaviate Optimization Guides covering different areas of WeaviateDB. They’re designed to be concise, practical, and to the point, highlighting key DOs and DON’Ts for building with Weaviate efficiently.

The first one I have today, focused on the Inverted Index, is especially important for RAG performance. You can find it here: https://github.com/Shah91n/WeaviateDB-Docs-Snippets-Python-Client/tree/main/Optimization_Guides

I’ll keep adding more guides on topics like Vector Index Optimization, Batching, Querying, and others.

In the meantime, I highly recommend reading this blog post for a deep dive into advanced RAG techniques:

Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00 / +01:00)

1 Like