Hi guys,
I need some help with searching objects in graphql.
Background:
Using weaviate in RAG context to get the legal doc elements to feed to answering model (openai)
The query contains the question and 0-x “precisions” to help target the element, the idea of having “precisions” is to allow users move the targeted vector cetroid by adding some keywords/phrases to their question (show weaviate how the text might look like in DB)
Query example:
{
Get {
Element(
nearText: {
concepts: ["Does the contract specify the duration?"],
moveTo: {
concepts: ["The present contract has a duration of", "This agreement is signed for ... years"],
force: 0.63
},
certainty: 0.689
},
where: {
path: ["document"],
operator: Equal,
valueText: "abcde-12345"
},
autocut: 3,
limit: 15,
) {
content,
title,
name,
outline,
path,
parentPath,
parentName,
document,
order,
type,
_additional {
id,
certainty
}
}
}
}
The problem
- At about 5% of the cases I have SSL errors vectorizing the concepts of this query, so even if I implemented the retry strategy to get the result anyway (3 attempts max to get it done) - it still adds some extra time which I would love to eliminate. Here are details about the errors I have: Random SSL/TLS Errors while vectorizing strings inside docker container
- I’m not sure how the weaviate handles the vectors of concepts in the background, especially if they are stored for a long period in cache. But as my concepts(strings) are pretty much the same (up to 1k of various strings for now, but may grow to about 100k later on), I’d like to be sure not to vectorize them again and again.
What I tried
I added another class to the instance schema: Vector with simply one property: value, containing the text of the string, hoping I can reuse the stored vectors in the query for document elements.
Vector items have UUID v5 (deterministic UUID generated in my namespace using normalized string value as name to get the same UUID for the same string on all installations).
Before running the query above (GET Element) the strings are added as Vector objects via the REST batch endpoint in the hope of using their UUIDs in the query (from the forum I see the batch does not re-vectorize the objects if the object submitted is not modified, so running the same batch a second time always goes way faster than the first, and if the first batch had errors at object level - the second run corrects them).
Then I tried using the nearObject ( Search operators | Weaviate - Vector Database ) search referencing the UUID of the question (question only in the beginning to test) - no luck, got an error about the object not being found.
Then I tried nearText ( I’m a noob on this forum so no link as only 2 links max for me ) with moveTo.objects for the “precisions” - same error of objects not being found.
All objects are accessible via REST endpoint v1/objects/{UUID} before and after the query. Maybe the fact that the referenced objects are of another class plays a role here?
What would be ideal:
I handle the Vector objects before the query to make sure they are vectorized and stored in DB.
Then I use graphql query to select Element objects where I use question Vector object UUID to target the Element objects and still have the ability to “correct” the question vector by moving it with (predefined in a parameter) force to the centroid of the vectors of the “precisions” Vector objects passed by their UUID in parameters.
Any help to get this thing done would be welcome.