Vectorizer for hybrid search

llmwill · January 19, 2024, 3:08pm

Hi, I am kind of new to large language models. My goal is to implement hybrid search in rag. I am not quite sure on what vectorizer to make use of , any suggestions? (I was reading on the internet that bm25 requires sparse vectors and sematic search requires dense vectors, so how can I narrow it down to one type of vectorizer?)

Thank you!!

DudaNogueira · January 22, 2024, 9:36pm

Hi @llmwill !!

Welcome! I believe you are at the best place to put all this together

We have some great recipes that can get you up and running in no time:

This for example, will guide you on how to do a Generative Search/RAG:

github.com

weaviate/recipes/blob/main/generative-search/generative_search_openai.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rJD9aP9eVcsT"
      },
      "source": [
        "## Dependencies"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ReE5TWeXSDTe"
      },
      "outputs": [],
      "source": [
        "!pip install weaviate-client"

This file has been truncated. show original

Now, this other recipe, is about hybrid search (but not generating an answer)

github.com

weaviate/recipes/blob/main/hybrid-search/hybrid_search_openai.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rJD9aP9eVcsT"
      },
      "source": [
        "## Dependencies"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3IgZm3pYwWa8"
      },
      "outputs": [],
      "source": [
        "!pip install weaviate-client"

This file has been truncated. show original

Combining both, you should end up with something like:

generateTask = "Explain why these Jeopardy questions are under the Animals category."

result = (
  client.query
  .get("JeopardyQuestion", ["question"])
  .with_generate(grouped_task = generateTask)
  #.with_near_text({
  #  "concepts": ["Elephants"]
  #})
  .with_hybrid(
        query = "Elephants",
        properties = ["question"],
        alpha = 0.80
    )
  .with_limit(3)
).do()

print(json.dumps(result, indent=1))

By the way, check out our events page. We have some great free workshops that will help you.

Let me know if that helps

Topic		Replies	Views
[Docs] Weaviate Academy hypbrid search code snippet is incomplete Support documentation	1	154	April 10, 2025
Weaviate Hybrid Retriever issue in Langchain for custom vectors Support	3	1239	December 11, 2023
Integration of weaviate and langchain, how to use hybrid in v4 like as_retriever in v3 Support python	1	244	March 19, 2025
Hybrid search with embedding outside the database Support	1	169	September 16, 2024
Hybrid Search Implementation Without Predefined Sparse Vectors in Weaviate Support	5	379	September 16, 2024

Vectorizer for hybrid search

Related topics