Query score 0

Description

When I perform a query against the named vectors I have set up, I get a score of 0.

q = 'example query'
chunks = client.collections.get("Chunks")
response = chunks.query.near_text(query=q,  
        limit=2,
        return_metadata=wvc.query.MetadataQuery(score=True, explain_score=True, distance=True),
        target_vector=["textvector"]
    )

for r in response.objects:
    print(r.metadata.distance)
    print(r.metadata.score)
    print(r.metadata.explain_score)

Example output:

0.18377846479415894
0.0
...
0.5245028734207153
0.0
  • The query score of exactly 0 makes me suspect the encoding of my query might be going wrong or there might be an issue with the score calculation on the query.
  • Additionally, the metadata.explain_score is not populated.
  • I do see sensible distances between the objects. Also, I see that vectors have been calculated for my objects (through the REST API).

My questions:

  • Is my assumption that a near_text query should return scores >0 correct? If not, what attributes to use to understand the results?
  • Any ideas on how to debug this? What are the typical areas to start investigating when encountering query issues?

Server Setup Information

  • Weaviate Server Version: 1.28
  • Deployment Method: Docker Compose (cr.weaviate.io/semitechnologies/weaviate:1.28.0)
  • Multi Node?: No (Single Node)
  • Client Language and Version: Python
    pip list | grep weav
    weaviate                 0.1.2
    weaviate-cli             3.0.2
    weaviate-client          4.9.6
    
  • Multitenancy?: Not specified

Any Additional Information

Here is the docker-compose.yml file:

services:
  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:1.28.0
    restart: on-failure: 0
    ports:
      - "8080:8080"
      - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: 20
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: "./data"
      DEFAULT_VECTORIZER_MODULE: text2vec-transformers
      ENABLE_MODULES: text2vec-transformers
      TRANSFORMERS_INFERENCE_API: http://t2v-transformers:8080
      CLUSTER_HOSTNAME: 'node1'
  
  t2v-transformers:
    image: cr.weaviate.io/semitechnologies/transformers-inference:sentence-transformers-paraphrase-multilingual-MiniLM-L12-v2
    environment:
      ENABLE_CUDA: 0 # set to 1 to enable
      # NVIDIA_VISIBLE_DEVICES: all # enable if running with CUDA

Hi @chunker !!

Welcome to our community :hugs:

Because near_text is a purely vector search, you get a distance instead of a score.

With bm25 you get a score, and the same for hybrid, where the vector distance and the bm25 score are fused.

Only for hybrid, it will populate the explain_score. It will show the distance and score that it got for the hybrid query and also the normalized numbers.

Let me know if that helps!

Thanks!