An object is somehow only visible in the REST API but not via GraphQL

Description

I saw an error in production where a vector was not found:

explorer: get class: concurrentTargetVectorSearch): explorer: get class: vectorize search vector: nearObject params: vector not found

Upon further inspection, I noticed the object being queried does not exist when querying via GraphQL:

query {
  Get {
    ContentOpenAI(where: {
      path: ["datagraph_id"]
      operator: Equal
      valueString: "cqpnth0vub2s5hpce720"
    }) {
      datagraph_id
      datagraph_type
      name
    }
  }
}

However, when querying the object list via REST, it is there:

GET {{baseUrl}}/objects/9f27c258-05d1-5987-bc13-7c39436d3a8a?include=vector

{
    "class": "ContentOpenAI",
    "creationTimeUnix": 1723041716137,
    "id": "9f27c258-05d1-5987-bc13-7c39436d3a8a",
    "lastUpdateTimeUnix": 1723041716137,
    "properties": {
        "content": "Things that make me more money\r\n\r\n",
        "datagraph_id": "cqpnth0vub2s5hpce720",
        "datagraph_type": "profile",
        "description": "Things that make me more money",
        "name": "david"
    },
    "vector": [ ... ]
}

Note how the “datagraph_id” is present in the response, the exact same one used in the GQL query above. I’ve also pulled all objects via GQL and it’s not in the list.

Am I misunderstanding something about Weaviate here? Are there cases where objects appear in one API but not in another? I wondered if it had not indexed correctly via embeddings, but that seems not the case because a full vector is present (omitted from the post for readability.)

Server Setup Information

  • Weaviate Server Version: 1.26.1
  • Deployment Method: Fly.io
  • Client Language and Version: Just hitting GQL/REST directly

Hi @southclaws !!

Welcome to our community! :hugs:

Can you try:

1 - searching in graphql for that id.

query {
  Get {
    ContentOpenAI(where: {
      path: ["id"]
      operator: Equal
      valueString: "9f27c258-05d1-5987-bc13-7c39436d3a8a"
    }) {
      datagraph_id
      datagraph_type
      name
    }
  }
}

2 - searching with python v4 client. This will search using the GRPC endpoint, so we can try isolating this on graphql.

This error you pointed should not be related to that query, because you are not doing any kind of vector search, but only filtering :thinking:

Are you sure this error pop whenever the query is run?

Another thing to try is aggregating the objects using it as filter.

Let me know if you need help crafting those queries.

Thanks!

Hey, thanks! Searching for the ID did not yield a result either, sadly we’ve re-indexed since this issue occurred so the problem isn’t present any more for debugging.

This did occur repeatedly though, whenever a GQL query was run, the query specifically was a vector nearby search, this query specifically: storyden/app/services/semdex/weaviate/relevance.go at main · Southclaws/storyden · GitHub

Oh!

We would love to have a reproducible way for this bug :grimacing: we have a chaos pipeline, but not always can create this cases.

So after reindexing, you didn’t have this issue anymore, right?

This could be an index corruption. We are working on mitigating this kind of issue with some async read/repair operations.

Let us know if you face any other issues so we can help you on this journey :slight_smile:

Thanks!