I’m exploring Weaviate to provide semantic search to find content of attachments in an enterprise application.
I currently chop up each attachment file (usually have a few dozen or a few hundred pages) into chunks. The chunk record would have other meta info like page number, file name, owner application and record id.
Currently when querying using “with_near_text”, it usually returns multiple different chunks from the same file.
I’d like it to return one chunk per document (file name) or one chunk per unique application / record Id pair.
I’m wondering is it possible?
After going through the documentation, my solution is to have two Classes:
- DocumentChunk - properties: text, pageNumber, document (reference to Document class)
- Document - properties: name, summary, ownerId, ownerTable, appName
Query:
{
Get {
DocumentChunk (
nearText: {
concepts: ["overpressure"]
},
groupBy: {
path: ["document"]
groups: 3,
objectsPerGroup: 2
},
) {
text
pageNumber
document {
...on Document {
name
}
}
}
}
}
1 Like
Hi @Viet_Tran - yeah I think grouping by the parent document makes sense.
Did you know you can group by the cross-referenced property?
So, depending on what your x-ref is called, you can use replace “document” here with the cross-referenced property.