nearVector search on 3d vector space returns unexpected results

Hi there. I am completely new to Weaviate and experimenting with loading and querying some simple objects. Each object has properties x, y, and z, each of which is a number, and a 3d vector consisting of these same numbers.

Here is my schema:

{
      'class': "XYZ",
      'description': 'A 3d vector class',
      'vectorizeClassName': false,
      'vectorizer': 'none',
      'properties': [
        {
          'name': 'x',
          'dataType': [ 'number' ],
          'description': 'x',
          'vectorizePropertyName': false
        },
        {
          'name': 'y',
          'dataType': [ 'number' ],
          'description': 'y',
          'vectorizePropertyName': false
        },
        {
          'name': 'z',
          'dataType': [ 'number' ],
          'description': 'z',
          'vectorizePropertyName': false
        },
      ]
    }

I load 1000 randomly-generated objects, and they seem to load fine:

curl -s "http://localhost:8080/v1/objects?class=XYZ&limit=1&include=vector" | jq .
{
  "deprecations": [],
  "objects": [
    {
      "class": "XYZ",
      "creationTimeUnix": 1689563219367,
      "id": "00d3baec-56ec-43eb-b860-4958afc8da07",
      "lastUpdateTimeUnix": 1689563219367,
      "properties": {
        "x": 178,
        "y": 840,
        "z": 330
      },
      "vector": [
        178,
        840,
        330
      ],
      "vectorWeights": null
    }
  ],
  "totalResults": 1
}

However, I find that regardless of the vector property that I pass in my nearVector query, I always get the same top-n results back:

echo '{
    "query": "{
      Get{
        XYZ(
          nearVector: {
            vector: [100, 100, 100]
          }
          limit: 1
        ){
          x
          y
          z
        }
      }
    }"
  }' | curl -s -X POST -H 'Content-Type: application/json' -d @- http://localhost:8080/v1/graphql | jq .
{
  "data": {
    "Get": {
      "XYZ": [
        {
          "x": 454,
          "y": 552,
          "z": 508
        }
      ]
    }
  }
}

and

echo '{
    "query": "{
      Get{
        XYZ(
          nearVector: {
            vector: [800, 800, 800]
          }
          limit: 1
        ){
          x
          y
          z
        }
      }
    }"
  }' | curl -s -X POST -H 'Content-Type: application/json' -d @- http://localhost:8080/v1/graphql | jq .
{
  "data": {
    "Get": {
      "XYZ": [
        {
          "x": 454,
          "y": 552,
          "z": 508
        }
      ]
    }
  }
}

Can someone help point me to where I’m going wrong?

Thanks!
–Scott

Hi, the HNSW is not good with random data. If you want to test it you need something more realistic such as the SIFT dataset: http://corpus-texmex.irisa.fr/

Thanks. Can you recommend a good resource on what the “shape” of the vectors/dataset needs to be to get effective results? All real-world data has some randomness in it, so I’m wondering if there are degrees of randomness that are acceptable?

Can’t answer that in general, but I ran into a similar problem when writing a test recently and what I did here worked