NearText Queries returning NULL on Distance

SomebodySysop · April 10, 2024, 7:12am

Description

My NearText queries where I request the “distance” are returning NULL for distance. Here is a sample query:

{ Get { SolrCopy ( limit: 10 hybrid: { query: "Key concerns surrounding genocide and its impact on social justice" alpha: 0.8 } where: { operator: And, operands: [ { path: ["site"], operator: Equal, valueText:"https://bible.booksai.org/"},{ operator: Or, operands: [ { path: ["groups"], operator: Equal, valueText: "Tanakh" }, { path: ["groups"], operator: Equal, valueText: "Talmud" }, { path: ["groups"], operator: Equal, valueText: "Jewish Thought" } ] } ] } ){ _additional { distance } docId site title nid type public url content taxonomy groups date summary questions sourceUrl } } }

And this is what is returned:

{
  "data": {
    "Get": {
      "SolrCopy": [
        {
          "_additional": null,
          "content": "blah, blah...",
          "date": "",
          "docId": "demo9-13891-01-fid-147876-6",
          "groups": [
            "Public",
            "Jewish Thought",
            "Theology",
            "Abraham Joshua Heschel"
          ],
          "nid": 13891,
          "public": "N",
          "questions": "",
          "site": "https://bible.booksai.org/",
          "sourceUrl": "https://archive.org/details/manisnotalonephi0000abra/page/179/mode/2up",
          "summary": "",
          "taxonomy": [],
          "title": "Heschel | Man is Not Alone | 18. The Problem of Needs",
          "type": "file",
          "url": "https://bible.booksai.org/system/files/[Man is Not Alone]_18 The Problem of Needs.pdf"
        },

and so on…

Server Setup Information

Weaviate Server Version: 1.24.6
Deployment Method:

Any additional Information

This query does return the distance property:

Get {
    SolrCopy (
      limit: 10
      nearText: {
        concepts: ["show only content with nid equal to 3041"],
      }
      where: {
        operator: Equal
        valueInt: 3041 
        path: ["nid"]
      }
    ){
      _additional{
        distance
      }
      docId
      site
      title
      nid
      public
      url
      content
    }
  }
}

{
  "data": {
    "Get": {
      "SolrCopy": [
        {
          "_additional": {
            "distance": 0.22641623
          },
          "content": "t control",
          "docId": "realestatebooksai-3041-03-pid-3745-1",
          "nid": 3041,
          "public": "Y",
          "site": "https://ca.realestatebooksai.com/",
          "title": "Do a city’s annual rental inspections violate tenants’ rights to privacy or state law?",
          "url": "https://ca.realestatebooksai.com/node/3041"
        },

I also noticed that if I query for both id and distance:

_additional { id distance }

I get:

{
  "data": {
    "Get": {
      "SolrCopy": [
        {
          "_additional": {
            "distance": null,
            "id": "2da32b6a-6899-445b-5f4b-a7dcb4be4e3e"
          },
          "content": "pecial attention paid to some object. But do we pay

Confused. Is this a bug? How do I address this?

DudaNogueira · April 11, 2024, 7:02pm

Hi!

As discussed in our support ticket, the issue here is that while the near text will return you a distance, hybrid (as well as bm25) will return a score.

So for example, take this code as an example:

result = collection.query.near_text(
    "Searching for a near text", limit=10,
    return_metadata=wvc.query.MetadataQuery(distance=True, score=True)
)
 
for e in result.objects:
    print(e.properties.get("title"), e.metadata.score, e.metadata.distance)

you will end up with something like:

Universal Reinforcement Learning 0.0 0.19682461023330688
The Role of Time in the Creation of Knowledge 0.0 0.2159237265586853
…

notice that score will be 0, but the distance will be difference (lowest == closest)

now, for a hybrid search:
result = collection.query.hybrid(
    "reinforcement learning", limit=10,
    return_metadata=wvc.query.MetadataQuery(distance=True, score=True)
)
 
for e in result.objects:
    print(e.properties.get("title"), e.metadata.score, e.metadata.distance)

Statistical Mechanics of Nonlinear On-line Learning for Ensemble Teachers 0.48838376998901367 None
Evolving Classifiers: Methods for Incremental Learning 0.44732069969177246 None

SomebodySysop · April 11, 2024, 8:46pm

Yes, I think I’ve got this part. I’m now using certainty for nearText and score for hybrid.

They obviously have very little in common, except being from 0 to 1 and higher value being closest to query.

Topic		Replies	Views
Hybrid Search near_text distance filtering Support python	2	122	September 17, 2024
Issue with duplicate doc results (same Id) when searching by nearText Support wcs	2	57	June 28, 2024
Duplicate UUIDs in GraphQL nearText with different vectors, one document in REST API Support bug , wcs	6	257	March 21, 2024
Geosearch WithinGeoRange maxdistance metric not accurate Support python	3	54	November 21, 2024
Go Client: nearText returns nothing. GraphQL does Support	2	393	July 17, 2023

NearText Queries returning NULL on Distance

Description

Server Setup Information

Any additional Information

Related topics