Hey all!
I’m running a basic self-hosted Weaviate instance with the following Docker Swarm configuration (secrets redacted, obviously):
version: "3.7"
services:
weaviate:
image: semitechnologies/weaviate:1.19.6
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
networks:
- default
environment:
QNA_INFERENCE_API: 'http://qna-transformers:8080'
NER_INFERENCE_API: 'http://ner-transformers:8080'
SUM_INFERENCE_API: 'http://sum-transformers:8080'
OPENAI_APIKEY: 'xxxxx'
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'false'
AUTHENTICATION_APIKEY_ENABLED: 'true'
AUTHENTICATION_APIKEY_USERS: 'xxx'
AUTHENTICATION_APIKEY_ALLOWED_KEYS: 'xxx'
AUTHORIZATION_ADMINLIST_ENABLED: 'true'
AUTHORIZATION_ADMINLIST_USERS: 'xxx'
AUTHORIZATION_ADMINLIST_READONLY_USERS: 'xxx'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
ENABLE_MODULES: 'text2vec-openai,qna-transformers,ner-transformers,sum-transformers,generative-openai'
CLUSTER_HOSTNAME: 'node1'
volumes:
- weaviate-data:/var/lib/weaviate
qna-transformers:
image: semitechnologies/qna-transformers:bert-large-uncased-whole-word-masking-finetuned-squad
environment:
ENABLE_CUDA: '0'
networks:
- default
ner-transformers:
image: semitechnologies/ner-transformers:dbmdz-bert-large-cased-finetuned-conll03-english
environment:
ENABLE_CUDA: '0'
networks:
- default
sum-transformers:
image: semitechnologies/sum-transformers:facebook-bart-large-cnn-1.0.0
environment:
ENABLE_CUDA: '0'
networks:
- default
volumes:
weaviate-data:
driver: zfs
Everything is coming up fine, I can create my class and embed some documents. However, when I try to execute an ask
query via the QnA module, like so:
ask = {
"question": "Which papers deal with aquatic biomes?",
"properties": ["text"]
}
result = (
client.query
.get("Paper", ["title", "_additional {answer {hasAnswer certainty property result startPosition endPosition} }"])
.with_ask(ask)
.with_limit(1)
.do()
)
print(result)
I can see the module working (all 24 cores on my server go to 100% for about a minute or two), but after a while the client gets a 502 status code:
---------------------------------------------------------------------------
UnexpectedStatusCodeException Traceback (most recent call last)
Cell In[7], line 11
1 ask = {
2 "question": "Which papers deal with aquatic biomes?",
3 "properties": ["text"]
4 }
6 result = (
7 client.query
8 .get("Paper", ["title", "_additional {answer {hasAnswer certainty property result startPosition endPosition} }"])
9 .with_ask(ask)
10 .with_limit(1)
---> 11 .do()
12 )
14 print(result)
File ~/Library/Caches/pypoetry/virtualenvs/natgpt-GWGRGYEc-py3.11/lib/python3.11/site-packages/weaviate/gql/get.py:1295, in GetBuilder.do(self)
1293 return results
1294 else:
-> 1295 return super().do()
File ~/Library/Caches/pypoetry/virtualenvs/natgpt-GWGRGYEc-py3.11/lib/python3.11/site-packages/weaviate/gql/filter.py:81, in GraphQL.do(self)
79 if response.status_code == 200:
80 return response.json() # success
---> 81 raise UnexpectedStatusCodeException("Query was not successful", response)
UnexpectedStatusCodeException: Query was not successful! Unexpected status code: 502, with response body: None.
Unfortunately, the error code is not very useful. The Weaviate service logs don’t give any information at all (at least not at this default log level), but the QnA service logs do say:
INFO: Started server process [7]
INFO: Waiting for application startup.
INFO: Running on CPU
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
Token indices sequence length is longer than the specified maximum sequence length for this model (7748 > 512). Running this sequence through the model will result in indexing errors
INFO: 10.0.13.4:50306 - "POST /answers/ HTTP/1.1" 200 OK
INFO: 10.0.13.4:52976 - "GET /meta HTTP/1.1" 200 OK
INFO: 10.0.13.4:42930 - "GET /meta HTTP/1.1" 200 OK
INFO: 10.0.13.4:45152 - "GET /meta HTTP/1.1" 200 OK
INFO: 10.0.13.4:42404 - "GET /meta HTTP/1.1" 200 OK
INFO: 10.0.13.4:48024 - "GET /meta HTTP/1.1" 200 OK
INFO: 10.0.13.4:34494 - "GET /meta HTTP/1.1" 200 OK
I have no idea if that warning about the sequence length is related to the error or not (strangely neither the QnA logs nor the main service logs indicate that they returned any 502s which the client says it did), but I thought it might be related. I’m not sure how to interpret it - obviously some of my documents are longer than 512 tokens, I am assuming the module can handle that under the hood. (The question itself is nowhere near 512 tokens, it was 6 words).
So my question overall is:
- Does anyone have any suspicions as to what the culprit is here for the particular issue I’m seeing with my ask query?
- In general, how can I get more visibility into Weaviate so I can diagnose these kinds of problems better myself? I can’t even seem to get the server logs to admit they returned a 502, let alone get them to give me an error message or traceback.
Thanks a ton!
-Adrian