Best practice for fast embedding with OpenAI? ( or similar performance )

Hi there!

We’ve been integrating Weaviate with LangChain into our agentic application and the performance of the vector searches seemed great, although lately I noticed a bottleneck coming from the embedding. Even for just very simple queries it would take like 1.5-2 seconds to complete.

We’re using the weaviate-langchain lib to integrate weaviate and currently rely on it to build the vector store like:

               WeaviateVectorStore.from_documents(
                    documents,
                    client=self.client,
                    embedding=self.embeddings_model,
                    index_name=index_name,
                    uuids=uuids
                )

which creates the Weaviate vector store without a configured vectorizer, instead using our custom embedding models to do near_vector searches.

I noticed that this is currently causing additional latencies of like +0.5-0.8s for just embedding a short message, due to latencies caused by interacting with the OpenAI embeddings API endpoint.

What’s the best practice with Weaviate to mitigate this latency while still maintaining high performing embeddings for similarity searches?

Any insights or advice would be highly appreciated!

Thanks in advance.

Kind regards,
Jasper