Help diagnosing a 502 error in QnA module

Hey all!

I’m running a basic self-hosted Weaviate instance with the following Docker Swarm configuration (secrets redacted, obviously):

version: "3.7"

services:
  weaviate:
    image: semitechnologies/weaviate:1.19.6
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    networks:
      - default
    environment:
      QNA_INFERENCE_API: 'http://qna-transformers:8080'
      NER_INFERENCE_API: 'http://ner-transformers:8080'
      SUM_INFERENCE_API: 'http://sum-transformers:8080'
      OPENAI_APIKEY: 'xxxxx'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'false'
      AUTHENTICATION_APIKEY_ENABLED: 'true'
      AUTHENTICATION_APIKEY_USERS: 'xxx'
      AUTHENTICATION_APIKEY_ALLOWED_KEYS: 'xxx'
      AUTHORIZATION_ADMINLIST_ENABLED: 'true'
      AUTHORIZATION_ADMINLIST_USERS: 'xxx'
      AUTHORIZATION_ADMINLIST_READONLY_USERS: 'xxx'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
      ENABLE_MODULES: 'text2vec-openai,qna-transformers,ner-transformers,sum-transformers,generative-openai'
      CLUSTER_HOSTNAME: 'node1'
    volumes:
    - weaviate-data:/var/lib/weaviate
  qna-transformers:
    image: semitechnologies/qna-transformers:bert-large-uncased-whole-word-masking-finetuned-squad
    environment:
      ENABLE_CUDA: '0'
    networks:
      - default
  ner-transformers:
    image: semitechnologies/ner-transformers:dbmdz-bert-large-cased-finetuned-conll03-english
    environment:
      ENABLE_CUDA: '0'
    networks:
      - default
  sum-transformers:
    image: semitechnologies/sum-transformers:facebook-bart-large-cnn-1.0.0
    environment:
      ENABLE_CUDA: '0'
    networks:
      - default

volumes:
  weaviate-data:
    driver: zfs

Everything is coming up fine, I can create my class and embed some documents. However, when I try to execute an ask query via the QnA module, like so:

ask = {
  "question": "Which papers deal with aquatic biomes?",
  "properties": ["text"]
}

result = (
  client.query
  .get("Paper", ["title", "_additional {answer {hasAnswer certainty property result startPosition endPosition} }"])
  .with_ask(ask)
  .with_limit(1)
  .do()
)

print(result)

I can see the module working (all 24 cores on my server go to 100% for about a minute or two), but after a while the client gets a 502 status code:

---------------------------------------------------------------------------
UnexpectedStatusCodeException             Traceback (most recent call last)
Cell In[7], line 11
      1 ask = {
      2   "question": "Which papers deal with aquatic biomes?",
      3   "properties": ["text"]
      4 }
      6 result = (
      7   client.query
      8   .get("Paper", ["title", "_additional {answer {hasAnswer certainty property result startPosition endPosition} }"])
      9   .with_ask(ask)
     10   .with_limit(1)
---> 11   .do()
     12 )
     14 print(result)

File ~/Library/Caches/pypoetry/virtualenvs/natgpt-GWGRGYEc-py3.11/lib/python3.11/site-packages/weaviate/gql/get.py:1295, in GetBuilder.do(self)
   1293     return results
   1294 else:
-> 1295     return super().do()

File ~/Library/Caches/pypoetry/virtualenvs/natgpt-GWGRGYEc-py3.11/lib/python3.11/site-packages/weaviate/gql/filter.py:81, in GraphQL.do(self)
     79 if response.status_code == 200:
     80     return response.json()  # success
---> 81 raise UnexpectedStatusCodeException("Query was not successful", response)

UnexpectedStatusCodeException: Query was not successful! Unexpected status code: 502, with response body: None.

Unfortunately, the error code is not very useful. The Weaviate service logs don’t give any information at all (at least not at this default log level), but the QnA service logs do say:

INFO:     Started server process [7]
INFO:     Waiting for application startup.
INFO:     Running on CPU
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
Token indices sequence length is longer than the specified maximum sequence length for this model (7748 > 512). Running this sequence through the model will result in indexing errors
INFO:     10.0.13.4:50306 - "POST /answers/ HTTP/1.1" 200 OK
INFO:     10.0.13.4:52976 - "GET /meta HTTP/1.1" 200 OK
INFO:     10.0.13.4:42930 - "GET /meta HTTP/1.1" 200 OK
INFO:     10.0.13.4:45152 - "GET /meta HTTP/1.1" 200 OK
INFO:     10.0.13.4:42404 - "GET /meta HTTP/1.1" 200 OK
INFO:     10.0.13.4:48024 - "GET /meta HTTP/1.1" 200 OK
INFO:     10.0.13.4:34494 - "GET /meta HTTP/1.1" 200 OK

I have no idea if that warning about the sequence length is related to the error or not (strangely neither the QnA logs nor the main service logs indicate that they returned any 502s which the client says it did), but I thought it might be related. I’m not sure how to interpret it - obviously some of my documents are longer than 512 tokens, I am assuming the module can handle that under the hood. (The question itself is nowhere near 512 tokens, it was 6 words).

So my question overall is:

  • Does anyone have any suspicions as to what the culprit is here for the particular issue I’m seeing with my ask query?
  • In general, how can I get more visibility into Weaviate so I can diagnose these kinds of problems better myself? I can’t even seem to get the server logs to admit they returned a 502, let alone get them to give me an error message or traceback.

Thanks a ton!
-Adrian

Hello,

qna-transformers module has built-in window feature, so no matter how long the texts are the module will handle the incoming question.

This warning is a standard one, you should not be worry about it:

Token indices sequence length is longer than the specified maximum sequence length for this model (7748 > 512). Running this sequence through the model will result in indexing errors

I noticed one thing, that you are running transformers containers without the CUDA support, in this case QnA is working on CPU which means that it works significantly slower then running on GPU maybe the 502 error that you are getting is in fact a timeout from Weaviate? bc he can’t get answer in time from qna-transformers? Can you also paste the logs from Weaviate?

If you can’t switch to CUDA you can try also our ONNX docker images. ONNX is optimized for CPU so you can try this image:

semitechnologies/qna-transformers:bert-large-uncased-whole-word-masking-finetuned-squad-onnx-avx512_vnni-1.5.0

This image :arrow_up: should work faster on modern x86-64 CPU's with AVX512_VNNI instruction set so maybe those 502 errors would then disappear?

Let me know if that setup works better for you!

Hi Marcin! Thanks for the reply :slight_smile:

qna-transformers module has built-in window feature, so no matter how long the texts are the module will handle the incoming question.

This warning is a standard one, you should not be worry about it

Okay cool, that’s what I thought, thanks for confirming.

I noticed one thing, that you are running transformers containers without the CUDA support, in this case QnA is working on CPU which means that it works significantly slower then running on GPU maybe the 502 error that you are getting is in fact a timeout from Weaviate? bc he can’t get answer in time from qna-transformers? Can you also paste the logs from Weaviate?

Yeah, I’m running it on a relatively old server for now during the experimentation phase, when performance is not a concern for me. I did consider the timeout possibility; it’s certainly not a timeout on the client side because I created the Weaviate client object with timeout_config=(2,600), and it’s getting the 502 back in way less than 600 seconds.

It could still be a timeout between Weaviate and the QnA service, but there’s nothing in the Weaviate logs that would indicate that. I didn’t post the Weaviate service logs originally because there’s nothing in them that indicates this query happened at all, even at LOG_LEVEL=debug. It’s all just a bunch of requests which I assume are coming from console.weaviate.cloud:

2023-06-15T13:28:05Z DBG action=restapi_request method=GET msg=received HTTP request url={"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/.well-known/openid-configuration","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""}

2023-06-15T13:28:05Z DBG action=restapi_request method=POST msg=received HTTP request url={"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/graphql","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""}

2023-06-15T13:28:06Z DBG action=restapi_request method=GET msg=received HTTP request url={"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/.well-known/openid-configuration","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""}

2023-06-15T13:28:06Z DBG action=restapi_request method=POST msg=received HTTP request url={"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/graphql","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""}

2023-06-15T13:28:08Z DBG action=restapi_request method=GET msg=received HTTP request url={"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/.well-known/openid-configuration","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""}

2023-06-15T13:28:08Z DBG action=restapi_request method=POST msg=received HTTP request url={"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/graphql","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""}

If I do a direct request like curl "https://weaviate.apetre.sc/v1/objects" that shows up too. But I don’t see anything directly related to the QnA request (although the request is clearly going through, since the QnA service goes to 100% CPU on all 24 cores). Is there some other place that access logs are written to besides stdout/stderr on the docker container?

This image :arrow_up: should work faster on modern x86-64 CPU's with AVX512_VNNI instruction set so maybe those 502 errors would then disappear?

Unfortunately my CPU doesn’t have AVX512_VNNI either :frowning: It’s an Intel(R) Xeon(R) CPU E5-2620. At this phase of experimentation I don’t care how slow it is, though; is the timeout between Weaviate and the QnA module configurable somewhere? If I could just increase that, I’d be satisfied.

Thanks again!
-Adrian

Actually we don’t set any timeout value in Weaviate for the http.Client that connects to qna-transformers container and that may be in fact the cause of your errors.

I think that we need to define there some timeout value, so that those long running won’t get timed out that easily.

Could you test one thing, query the qna-transformers container directly? and check how long does it take to respond? (Here are smoke_tests.py were you can look up how you could send a direct query to qna-transfomers container)