Latency benchmarking for generative search?


Hi! I am switching from Pinecone to self hosted weaviate on AWS. I am worried about the latency since I have extremely large amount of data flow.

I am a read heavy application. I want to leverage generative search ability to pick the best option based on the content it looks up from the database. Is this possible? So example is I want to lookup my client’s number and name, and in the database this is linked to several other actions like call, email, etc. And I want to use generative search to decide whether it’s a call or a email and return that action to me. Is this possible?

That’s certainly possible. Probably the generative search will take longest in a request (not the actual retrieval). Good luck building!

Thank you so much for the reply?

Also curious: is there any benchmark results on the latency? Or is it just the same as using raw openai endpoints?

The benchmarks for Weaviate itself can be found here: ANN Benchmark | Weaviate - Vector Database

It’s a bit tricky for us to include OpenAI (or any other service) because we have no control over their services. If that’s important to you, you might want to look into self-hosting, e.g., Weaviate + Ollama