AND/OR query logic - is this a BUG?

I have been trying to wrap my head around this for a while and thought of asking the community if this is a BUG and needs to be reported.

Let’s imagine we have a simple set of documents in weaviate collection, that can be defined for simplicity as this list:

test_documents = [
    {"document_name": "test_document_1.txt", "content": "This is the first test document."},
    {"document_name": "test_document_2.txt", "content": "Another document for testing."},
    {"document_name": "project_plan.txt", "content": "This document contains the project plan."},
    {"document_name": "summary_report.txt", "content": "The summary of all reports."},
    {"document_name": "test_document_3.txt", "content": "This document is for additional tests."},
]

We then have this GraphQL query:

{
  Aggregate {
    Test_collection(
      where: {
        operator: And, 
        operands: [
          {
            operator: Or, 
            operands: [
              {path: ["document_name"], operator: Like, valueText: "*test*"}, 
              {path: ["content"], operator: NotEqual, valueText: "This is the first test document."}
            ]
          }, 
          {
            operator: Or, 
            operands: [
              {path: ["document_name"], operator: Equal, valueText: "project_plan.txt"}, 
              {path: ["content"], operator: Like, valueText: "*project*"}
            ]
          }
        ]
      }, 
      tenant: "test_collection"
    ) {
      meta {
        count
      }
    }
  }
}

The query should be returning 1 document (project_plan.txt) but it is returning 0.

I took This logical approach when breaking down the query and running the operands separately.

The first part matches all 5 documents:

operator: Or, 
            operands: [
              {path: ["document_name"], operator: Like, valueText: "*test*"}, 
              {path: ["content"], operator: NotEqual, valueText: "This is the first test document."}
            ]

The second part matches only 1 document:

operator: Or, 
            operands: [
              {path: ["document_name"], operator: Equal, valueText: "project_plan.txt"}, 
              {path: ["content"], operator: Like, valueText: "*project*"}
            ]

The AND operator between the filters for all 5 documents and 1 document should be returning 1 document as a result.

Am I missing something here?

Good morning @Analitiq,

Thank you so much for the details in this details.

Would you mind sharing your schema creation method as well?

I will look into it and replicate if needed, but I would like to understand the schema behind the scenes.

Thanks!

@Mohamed_Shahin Thank you for your reply. The shcema is created autmatically.

I chunk the data like this:

    Chunk(
        content=chunk_of_text,
        source=metadata["source"],
        document_type=metadata["document_type"],
        document_name=metadata["document_name"],
        document_num_char=len(chunk),
        chunk_num_char=len(chunk),
        date_loaded=datetime.now(timezone.utc),
        content_kw=keyword_extractions.extract_keywords(chunk),
    )

one the chunk is created, I pass it to Weavaite cloud:

collection = self.__get_tenant_collection_object(collection_name, tenant_name)

                with collection.batch.dynamic() as batch:
                    for chunk in chunks:
                        uuid = generate_uuid5(chunk.model_dump())
                        hf_vector = self.vectorizer.vectorize(chunk.content)
                        try:
                            response = batch.add_object(
                                properties=chunk.model_dump(),
                                uuid=uuid,
                                vector=hf_vector,
                            )
                            logger.info(response)
                            chunks_loaded += 1
                        except Exception as e:
                            raise e

I am running latest Weaviate 4.8.1 and python 3.10

Here is how the schema looks in the Weaviate cloud.

Hey @Analitiq,

Let’s break this down. In your query:

  1. First Condition OR matches documents where document_name contains “test” OR content is not equal to “This is the first test document.” This is broad and will match almost all documents.

  2. Second Condition OR matches documents where document_name equals “project_plan.txt” OR content contains “project.” This is more specific and applies to just one document.

I think you expected that combining these with AND would return “project_plan.txt,” but because the first condition is too broad and the second is narrow, it’s unlikely that both will be true for the same document. The AND operator requires both sets of conditions to match a single document.

If your goal is to find documents related to “test” or “project,” I suggest adjusting the OR conditions to focus on those specific terms, while grouping them together in a single OR condition with more refined filters, right?

Have you tried to switch the main operator AND to OR then the two OR conditions two ANDs?