Hi all! I have a question how to create search query in a correct way. I have two classes in schema:
Meta data for the document
Document {
docname: …,
author: …,
…
}
chunks for the documents with reference:
Chunk {
textBody: …,
refToDocument: reference to the Document object
}
So, I’m searching through the chunks well and getting good results, but… the problem happens when I want to get results for 30 documents - because I’m searching in the chunks - ‘limit’ field is useless here. Only one way I see - do the search with max limit, then “page” results manualy and divide it by 30 documents blocks. It’s some ugly and potentialy resource unoptimized solution. Is there another solution exists by Weaviate stuff?
No, I have another scenario. For example I have 100K documents, each has 10 chunks(so, total 1M chunks). Search “animals” - got 100K results - 100K chunk texts(could be lesser if chunk contains two or more different sentences about animals) which contains something interesting about animals - but there would be few chunks reffer to doc1, one chunk refers to doc2, 10 chunks refer to doc3 and so on(I don’t know exact number of documents - could be 10K or 40K or 60K). And I need only doc1, …, doc30 (not all the 10K or more documents) from the results(so I need chunks that refers to the first 30 unique document objects only, and of course I can’t predict which documents should be there). Then I will need doc31,…,doc60. But in my case ‘limit’ and ‘offset’ options work with chunks only not with documents. That’s why I’m looking for solution similar to one I could do in SQL DBs.
As I wrote previously I see only one solution - get ALL(there could be hundreds of thousands) results from the search and parse them by Python(or any other client) stuff to create “cache” for the “animals” and work with that cache until search request will change. As for me it’s a bad solution(
So, I’ve also could upgrade previous code with after parameter instead of offset. Looks like it more suitable for my needs. But seems like it doesn’t work with query searches, needs to check.