As we grow with Weaviate, we realize that retaining document structure is important. So for example, when vectorizing FAQ docs, we want to keep track that a given answer matches to a given question. Or if modeling word documents, tracking which portion of the document is Title vs. Paragraph vs. Section Header etc… all of these document structure details are important.
I realize we can build a schema for each structure, ie:
FAQSchema
TitleParagraphSectionSchema
But this feels tedious and manual. Instead, I imagine using metadata tags.
GeneralSchema{
# FAQ Implementation
metadata: {
is-question: True
answer: <id-of-answer-object>
}
Each document type would impose structure through this metadata.
The problem is that, as far as I understand, weaviate schemas are fixed. So the tags would either have to be known a-priori, or they would be stored inefficently in a list like:
metadata: ['tag1:value1', 'tag2:value2']
It would then be inefficient or impossible to do searches like “where tag = foo”.
Or am I wrong and schemas can/should be dynamic? IE we can add additional fields and its ok as long as they are roughly the same throughout (ie. all the docs in the schema ave the same metadata?)
Do you guys (weaviate team) have any recommendations for best practices on how to approach something like this. My initial thoughts are:
-
Go with dynamic schemas but try to keep objects with similar metadata isolated (ie. FAQ docs aren’t mixed with other docs)
-
Add nameless placeholder tags into the shared schema and let callers manage them
GeneralSchema{ ... tag1: str tag2: str tag3: str
}
These can be empty, or optionally used like
# Faq impl
GeneralSchema {
tag1: "some question"
tag2: "answer-id"
}
Where it is up to the caller to understand that tag1 is a question and tag2 is an answer ID and so on. By the way, does adding empty fields like these tags add overhead to indexing/search performance, even if they remain empty for most use cases?