Retrieving fields following two cross-references


I have got a question on querying vs. filtering with cross-references…

In our database, we have three data classes: Chunk, Document, and Corpus.
Corpora contain Documents, which is stored by the documents having a corpus property that cross-references the Corpus object they belong to.
Likewise, Documents contain Chunks, which is again stored by chunks having a document property that cross-references the Document object they belong to.

Cross-reference scheme: Chunk → Document → Corpus

When querying chunks, I would like to retrieve fields for the chunks themselves as well as fields of the cross-referenced document and the corpus that this document cross-references, that is, from a chunk object I would like to follow the path:
["document", "Document", "corpus", "Corpus", "source"], where source is a property of class Corpus.
This works fine when filtering using exactly this value given here for path.

        # Get all chunks for the corpus
        where_clause = {
            "path": ["document", "Document", "corpus", "Corpus, "id"],
            "operator": "Equal", "valueText": corpus_id

However, I am only able to get properties of the cross-referenced Document object but not of the further cross-referenced Corpus object when querying chunks like this:

        result = self.client.query \
                     "document {... on Document {url}}",
                     "document {... on Document { corpus {... on Corpus {source}}}}"
                 ]) \
            .with_near_vector({"vector": query_vector}) \

The result always contains the url property from the cross-referenced Document object, but not the source property from the nested cross-referenced Corpus object. Instead, I get a corpus property with a null value back:
'document': ['corpus': None, 'url': ''}].

Since filtering seems to work fine with these multi-level cross-references, I figure it should also work with querying. However, I have found no way to get it to work. The syntax validation of the property string basically tells me that document {... on Document { corpus {... on Corpus {source}}}} is well-formed (and everything else I tried is not). I am not excluding that I am just using the wrong syntax or approach but I also did not find any hints in the Weaviate documentation.

Would it be a possible solution to add a raw query using .with_additional?

Thank you very much for any help or feedback!

Hi Jan!

Welcome to our community :hugs:

I will need to set up a python notebook to create such an environment.

One way to further debug this is, instead of calling the .do() call .build() and inspect the generated graphql.

maybe declaring both “document…” properties is messing up the end query.

I will try to reproduce this scenario in the next days.


Hi @DudaNogueira!

Thank you very much for looking into this and also for the tip with using .build() instead of of .do(). I’ll try this out and see what GraphQL query is being produced.

Thanks a lot!
