Long text, chunking, top document , aggregate results

I’ve implemented a system with two classes, Document and Paragraph , using chunking techniques for very long documents. Currently, my query retrieves the top 3 most similar paragraphs, but I want to extend it to get the top 3 unique documents that contain these similar paragraphs. Can you suggest a good query for this scenario? P.S I can consider both schemas with cross reference or without.

Hi @Poliakova_A_Anna ! Welcome to our community :hugs:

Sorry, I completely missed this question :frowning:

Have you tried the GroupBy?

It’s the only feature I recall that could help here.

This or asking for a higher number of entries, then processing the data to find the set of documents that have those.

Let me know if that helps or if you were able to find a solution for this.

Thanks!