I have a usecase where the users will have many documents. No user will be able to access any other users documents. Also each user can select which files they can access .
I am using weaviate-python client , langchain (RetrievalQAWithSourcesChain).
First I tried to create a single class “Data” which has properties “content” and “source” , then user will be ble to filter the data using the “source” property. But this method has a problem. Even after filtering , the user is able to access other users files.
Then I tried another method. A class for each user and inside the user class , there will be a “data” field that will be linking to the “Data” class.
Below is the schema.
"classes": [
{
"class": username,
"description": f"Class for user {username}",
"properties": [
{
"name": "username",
"description": "Username of the user",
"dataType": ["text"]
},
{
"name": "data",
"description": "Data associated with the user",
"dataType": ["Data"]
}
]
},
{
"class": "Data",
"description": "Documents/data in the system",
"vectorizer": "text2vec-openai",
"moduleConfig": {"text2vec-openai": {"model": "ada", "type": "text"}},
"properties": [
{
"name": "content",
"description": "The content of the paragraph",
"dataType": ["text"],
"moduleConfig": {
"text2vec-openai": {
"skip": False,
"vectorizePropertyName": False,
}
},
}, {
"name": "source",
"description": "The link to the document",
"dataType": ["text"]
}
],
}
]
I am using the below code to create a vectorstore .
vectorstore = Weaviate(client, user, “data{ … on Data { source content }}”, attributes=[‘data { … on Data { source } }’], embedding=embed)
- I am getting the below error
KeyError: 'data{ ... on Data { source content }}'
- How can I retrieve specific data using the “source” from the user class? Is filtering a good approach?
can anyone help me with this? Thanks in advance.