Is it possible to combine keyword search with (geo)spatial constraints, e.g. based on a bounding box?

I am storing GML and associated PDF documents in Weaviate. Each GML file has a bounding box. I would like to provide keyword search across these documents, and I am wondering if it is possible to do keyword search combined with a spatial query. In this scenario, the user would be searching for a keyword or attribute value within a user defined bounding box.

Do you have any hints on how to go about this?

Hi @Kate :wave:, and welcome.

You should be able to do this with a combination of a filter with a search (BM25 or vector, for example).

A filter can be used to provide conditions like less than / greater than or equal to, etc. So you could put the spatial parameters there, and combine that with the keyword search as you suggested.

A filter looks like this:

{
    "path": ["points"],
    "operator": "GreaterThan",
    "valueInt": 200
}

As to how to add the spatial data, you could use the geoCoordinates data type, and filter it using lat/lon properties. It might lead to slightly verbose queries (like this GraphQL - Conditional filters | Weaviate - vector database), as I believe they are not natively supported by client libraries yet.

So one option is to just save the geo data as individual number properties (Data types | Weaviate - vector database).

I hope that answers your question - but lmk of course if you need any more information.

Read more:

Hi,

Are there any plans to implement an R-Tree spatial index [1], like what is used by Oracle Spatial and PostGIS, for example? Or some other type of spatial index?

Is is correctly understood that I can only search a radius around a point location?

Kind regards,
Kate

[1] A. Guttman, R-Trees: A Dynamic Index Structure for Spatial Searching (1984), Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, p. 47–57

I don’t believe there are current plans to do so. Sorry about that.

If that’s something that you would like to see - I suggest opening an issue on our GitHub page (Issues · weaviate/weaviate · GitHub). We take these & votes into account. I actually opened one myself earlier today!

1 Like

Hi,

Unfortunately I am still struggling to make a spatial query.

Here is my schema:

# create schema for Xplan GML data

schema = {
  "classes": [{
    "class": "XPlan",
    "description": "Simplied XPlanung schema that contains information relevant to search",
    "invertedIndexConfig": {
    "bm25": {
      "b": 0.75,
      "k1": 1.2
    },
    "properties": [
      {
        "dataType": [
          "string"
        ],
        "description": "XPlanung namespace",
        "name": "namespace"
      },
      {
        "dataType": [
          "string"
        ],
        "description": "BP_Plan id",
        "name": "bp_plan_id"
      },
      {
        "dataType": [
          "string"
        ],
        "description": "BP_Plan name",
        "name": "name"
      },
      {
        "dataType": [
          "string"
        ],
        "description": "BP_Plan number",
        "name": "number"
      },
      {
        "dataType": [
          "string"
        ],
        "description": "BP_Plan acceptance data",
        "name": "acceptance_date"
      },
      {
        "dataType": [
          "string"
        ],
        "description": "Gemeinde name",
        "name": "gemeinde_name"
      },
      {
        "dataType": [
          "string"
        ],
        "description": "Plan type",
        "name": "plan_type"
      },
      {
        "dataType": [
          "string"
        ],
        "description": "Legal status",
        "name": "legal_status"
      },
      {
        "dataType": [
          "string"
        ],
        "description": "BauNVO date",
        "name": "baunvo_date"
      },
      {
        "dataType": [
            "geoCoordinates"
        ],
        "description": "Lower corner of bounding box",
        "name": "lower_corner"
      },
      {
        "dataType": [
          "geoCoordinates"
        ],
        "description": "Upper corner of bounding box",
        "name": "upper_corner"
      },  
    ]
  }}]}

client.schema.create(schema)
client.schema.get()

Here I load the data:

# batch load csv file of gml data to Weaviate instance

inputFile = file_name_gml
i = 0
client.batch.configure(
  batch_size=100, 
  dynamic=True,
  timeout_retries=3,
  callback=None,
)

with open(inputFile, 'r', encoding='utf-8', newline='') as f:
  reader = csv.DictReader(f) 
  for row in reader:                                           
    try:
        properties = {
            "gemeinde_name": row["GemeindeName"],
            "name": row["BP_PlanName"],
            "bp_plan_id": row["BP_PlanID"],
            "namespace": row["XPlanNamespace"],
            "baunvo_date": row["BauNVODate"],
            "legal_status": row["LegalStatus"],
            "lower_corner": {"latitude": row["LowerCorner1"], "longitude": row["LowerCorner2"]},
            "upper_corner": {"latitude": row["UpperCorner1"], "longitude": row["UpperCorner2"]}
        }
   
        client.batch.add_data_object(properties, "XPlan")
        i += 1
    except:
        print("Error at row " + str(i))
    #print(properties)
        
    if (i % 100 == 0):
       client.batch.flush()
       print(i)
                                                                    

  # Flush the remaining buffer to make sure all objects are imported
  client.batch.flush()

…and when I try to run my query I get an error response that I don’t understand…

get_results_where = """
  {
    Get {
      XPlan(where: {
        operator: WithinGeoRange,
        valueGeoRange: {
          geoCoordinates: {
            latitude: 52.42,    
            longitude: 4.82   
          },
          distance: {
            max: 2000           
          }
        },
        path: ["lower_corner"] 
      }) {
        name
        lower_corner {
          latitude
          longitude
        }
      }
    }
  }
"""

query_result = client.query.raw(get_results_where)
print(query_result)

Error:

{'data': {'Get': {'XPlan': None}}, 'errors': [{'locations': [{'column': 7, 'line': 4}], 'message': 'explorer: list class: search: object search at index xplan: local shard object search xplan_jichOTq0Uljq: fetch doc ids for prop/value pair: geo index range search on prop "lower_corner": entrypoint was deleted in the object store, it has been flagged for cleanup and should be fixed in the next cleanup cycle', 'path': ['Get', 'XPlan']}]}

Do you have any hints on where this is going wrong?