Distance doesn't correspond to projected points

micartey · August 22, 2023, 2:34pm

Hello there,

So after I resolved my previous question (“Visualize Database Contents”) I plotted the data using pythons math plot library and… let’s say I am a tiny bit confused.

I understand that there might be some losses when projecting something from n dimensions to 3.
But I cannot answer 2 questions:

What ANN Algorithm is used? Is it Euclid?
Can I change the AAN Algorithm to other functions (that are built-in) ?

EDIT: The interesting behavior is that 7 is nearer than everything else but only second nearest. 10 should be second nearest, but is 6. (Expected was 1 and 2 to be right near the query which is marked as a red x)

EDIT 2: I just realized that while the distance is always deterministic - the projection is not… I generated the same response several times and realized that the left values remain constant (as expected) but the representation changes, which means that while the distance is the same, the vectors are different. Sometimes the result looks good on the right and sometimes it doesn’t… This might be a huge design flaw if my understanding is right. It doesn’t change anything for queries, though.

EDIT 3: I also generated a 2d view and increased the iterations as it leads to “[…] lead to more stable results […]”.

The same result described in EDIT can be seen.

Thanks in advance

trengrj · August 24, 2023, 1:25am

Hi @micartey,

The library / algorithm used for used for the feature projection feature is t-SNE GitHub - danaugrs/go-tsne: t-Distributed Stochastic Neighbor Embedding (t-SNE) in Go.

More details are provided here t-SNE – Laurens van der Maaten including why the projection is different for each query.

Every time I run t-SNE, I get a (slightly) different result?

In contrast to, e.g., PCA, t-SNE has a non-convex objective function. The objective function is minimized using a gradient descent optimization that is initiated randomly. As a result, it is possible that different runs give you different solutions. Notice that it is perfectly fine to run t-SNE a number of times (with the same data and parameters), and to select the visualization with the lowest value of the objective function as your final visualization.

By ANN Algorithm, I think you are referring to the distance metric. From looking at the go-tsne library it seems to assume Euclidean distance will be used and is not configurable. There is a scikit learn t-NSE library which does have a metric parameter you could test with your data sklearn.manifold.TSNE — scikit-learn 1.3.2 documentation.

micartey · August 24, 2023, 8:18am

Hi @trengrj

Thank you for the response.
That solves half of my problems and helps me with further understanding.

Do you happen to know why the distance doesn’t correspond with the projection at all?
When running the same query several times, it looks different each time, but mainly just from an angle (it rotates). But the distance on the left and the points on the right have nothing to do with each other…

micartey · August 24, 2023, 8:35am

Could it be that the query is not at (0, 0) but somewhere else in space? If that is the case, how do I get the (projected) vector of my query?

trengrj · August 24, 2023, 1:35pm

Yes that could be the issue. As featureProjection is an _additional property it will only be returned for each object in Weaviate and not the vector supplied to nearVector.

One workaround for this could be to use nearObject instead of nearVector. In this case the original vector / object will usually be returned in the results list (as it will be the closest vector to itself).

micartey · August 24, 2023, 1:42pm

I am actually using nearText. Is there a method to get the query vector from within the result (e.g. as an _additional property). I am using the Cohere Model and thus I am not really able to get the vector that is being calculated.

Topic		Replies	Views
Returning wrong nearest item General bug	13	1061	September 20, 2023
Visualize Database Contents Support	8	1583	August 23, 2023
Simple vectors storage and similarity search not working Support developer-experience	3	689	July 7, 2023
Similarity search returns chunks that all have exactly the same distance value Support bug	3	838	November 29, 2023
[Non deterministic vector search return] Support	4	371	April 12, 2024

Distance doesn't correspond to projected points

Related topics