Latency benchmarking for generative search?

Wayne_Wang · May 24, 2024, 7:06pm

Description

Hi! I am switching from Pinecone to self hosted weaviate on AWS. I am worried about the latency since I have extremely large amount of data flow.

I am a read heavy application. I want to leverage generative search ability to pick the best option based on the content it looks up from the database. Is this possible? So example is I want to lookup my client’s number and name, and in the database this is linked to several other actions like call, email, etc. And I want to use generative search to decide whether it’s a call or a email and return that action to me. Is this possible?

bobvanluijt · May 26, 2024, 7:12pm

That’s certainly possible. Probably the generative search will take longest in a request (not the actual retrieval). Good luck building!

waynewang1119 · May 26, 2024, 8:44pm

Thank you so much for the reply?

Also curious: is there any benchmark results on the latency? Or is it just the same as using raw openai endpoints?

bobvanluijt · May 26, 2024, 9:00pm

The benchmarks for Weaviate itself can be found here: ANN Benchmark | Weaviate - Vector Database

It’s a bit tricky for us to include OpenAI (or any other service) because we have no control over their services. If that’s important to you, you might want to look into self-hosting, e.g., Weaviate + Ollama

Topic		Replies	Views
Assistance Needed to Improve Weaviate's Vector Search Performance General	2	309	March 6, 2025
Advice Needed on Optimizing Vector Search in Weaviate Support	1	177	September 6, 2024
Generative search / RAG guide preview - feedback thread Resources	6	1614	October 16, 2023
Best practice for fast embedding with OpenAI? ( or similar performance ) General technical	0	28	April 24, 2025
Help me fix this 500ms latency for vector search! Support	3	110	March 24, 2025

Latency benchmarking for generative search?

Description

Related topics