Help needed with moving from nearText + moveTo to nearObject or nearVector

SergeLiatko · July 24, 2024, 9:22am

Hi guys,

I need some help with searching objects in graphql.

Background:

Using weaviate in RAG context to get the legal doc elements to feed to answering model (openai)

The query contains the question and 0-x “precisions” to help target the element, the idea of having “precisions” is to allow users move the targeted vector cetroid by adding some keywords/phrases to their question (show weaviate how the text might look like in DB)

Query example:

{
  Get {
    Element(
      nearText: {
        concepts: ["Does the contract specify the duration?"],
        moveTo: {
          concepts: ["The present contract has a duration of", "This agreement is signed for ... years"],
          force: 0.63
        },
        certainty: 0.689
      },
      where: {
        path: ["document"],
        operator: Equal,
        valueText: "abcde-12345"
      },
      autocut: 3,
      limit: 15,
    ) {
      content,
      title,
      name,
      outline,
      path,
      parentPath,
      parentName,
      document,
      order,
      type,
      _additional {
        id,
        certainty
      }
    }
  }
}

The problem

At about 5% of the cases I have SSL errors vectorizing the concepts of this query, so even if I implemented the retry strategy to get the result anyway (3 attempts max to get it done) - it still adds some extra time which I would love to eliminate. Here are details about the errors I have: Random SSL/TLS Errors while vectorizing strings inside docker container
I’m not sure how the weaviate handles the vectors of concepts in the background, especially if they are stored for a long period in cache. But as my concepts(strings) are pretty much the same (up to 1k of various strings for now, but may grow to about 100k later on), I’d like to be sure not to vectorize them again and again.

What I tried

I added another class to the instance schema: Vector with simply one property: value, containing the text of the string, hoping I can reuse the stored vectors in the query for document elements.

Vector items have UUID v5 (deterministic UUID generated in my namespace using normalized string value as name to get the same UUID for the same string on all installations).

Before running the query above (GET Element) the strings are added as Vector objects via the REST batch endpoint in the hope of using their UUIDs in the query (from the forum I see the batch does not re-vectorize the objects if the object submitted is not modified, so running the same batch a second time always goes way faster than the first, and if the first batch had errors at object level - the second run corrects them).

Then I tried using the nearObject ( Search operators | Weaviate - Vector Database ) search referencing the UUID of the question (question only in the beginning to test) - no luck, got an error about the object not being found.

Then I tried nearText ( I’m a noob on this forum so no link as only 2 links max for me ) with moveTo.objects for the “precisions” - same error of objects not being found.

All objects are accessible via REST endpoint v1/objects/{UUID} before and after the query. Maybe the fact that the referenced objects are of another class plays a role here?

What would be ideal:

I handle the Vector objects before the query to make sure they are vectorized and stored in DB.

Then I use graphql query to select Element objects where I use question Vector object UUID to target the Element objects and still have the ability to “correct” the question vector by moving it with (predefined in a parameter) force to the centroid of the vectors of the “precisions” Vector objects passed by their UUID in parameters.

Any help to get this thing done would be welcome.

DudaNogueira · July 24, 2024, 1:27pm

hi @SergeLiatko !!

the UUID reference, both for nearObject and moveto or moveaway should be in the same collection.

Considering that you can use UUID for moveto and moveaway, one solution would:

Include those strings that you want to use in moveto in your collection
filter them out while querying (using a flag property)
reference the UUID in moveto or moveaway

Let me know if this helps!

SergeLiatko · July 24, 2024, 3:08pm

@DudaNogueira

Again thanks for the help.

I actually do not want to mix classes, especially that elements are likely to have multitenants in the future and vectors are more “internal” without any particular security concerns.

I definitely need more info about whether the concept vectors are stored for a long duration (to avoid revectorizing the same thing again and again), so any link would be welcome.

As for the approach, I think the cleaner and more future proof solution would be to:

keep Vector class as is
make sure all strings in nearX and moveTo/moveAway are vectorized before running the query
use any solution to calculate mean centroid of the “precisions”
use same tool to move the “question” vector towards/away from the calculated centroid by the force specified in the parameter
get the resulting vector and use is with nearVector search operator

This way, even if it seems to be a bit more “intimidating” on the first approach, I keep all the control/flexibility of the Element items targeting inside my app and reduce the breaking points of weaviate usage to minimum (only nearVector queries will be needed that are supported by most of the vector DB solutions).

Anything to point me to to do the calculations of the vectors on my end (do you guys have an api for such calls or modules I could install and use inside my setup)? I’m on PHP if that will make a difference

SergeLiatko · July 24, 2024, 3:12pm

Or even crazier idea which might actually work if texttovec open ai module allows it: use my own url for the module where I would have a cache of vectors available, so that if I get a request to a string I don’t have in cache - produce a vector (ask openai for it) and store it there or serve directly from the cache if I have the vector

DudaNogueira · July 24, 2024, 9:09pm

Both approaches seems doable and really interesting.

There are a lot of tools nowadays that will “capture” the embeddings and llms payload.

For example, I just learned about LM Studio, that can run models and emulate the openai API both for generative and embedding models.

So you can basically run any model and “trick” text2vec_openai that it is actually using OpenAI APIs

The calculation part is also possible, as you will end up with a “biased” vector query.

I am wondering if it would be interesting to allow passing something like movetoVectors and moveawaytoVectors

SergeLiatko · July 24, 2024, 10:44pm

I was thinking of setting up another docker container on the same network with API gateway that would use the same instance of weaviate where I store my vector collection as a cache and a middleware to pass the request to open ai when needed. Might actually work out, especially if I use grpc to lookup in cache and http towards open ai.

SergeLiatko · July 24, 2024, 10:46pm

I don’t see that in documentation, does nearVector search operator supports moveTo.vectors ? That would be even easier to solve my problem.

DudaNogueira · July 25, 2024, 3:44pm

There isn’t this feature. Yet

Please, feel free to open a feature request on this in:

With a issue request we can track the popularity of this feature and where it stands in our roadmap:

Topic		Replies	Views
Can't find documentation for the search via rest Support developer-experience	1	136	February 3, 2025
Searching named vectors from the Javascript graphQL client Support typescript	1	152	June 19, 2024
Limit parameter change results of near_vector query Support	3	194	November 6, 2024
nearVector search on 3d vector space returns unexpected results Support	3	387	July 17, 2023
Weaviate FAQ Resources	1	1783	June 20, 2023