Vectorizer for hybrid search

Hi, I am kind of new to large language models. My goal is to implement hybrid search in rag. I am not quite sure on what vectorizer to make use of , any suggestions? (I was reading on the internet that bm25 requires sparse vectors and sematic search requires dense vectors, so how can I narrow it down to one type of vectorizer?)

Thank you!!

Hi @llmwill !!

Welcome! I believe you are at the best place to put all this together :slight_smile:

We have some great recipes that can get you up and running in no time:

This for example, will guide you on how to do a Generative Search/RAG:

Now, this other recipe, is about hybrid search (but not generating an answer)

Combining both, you should end up with something like:

generateTask = "Explain why these Jeopardy questions are under the Animals category."

result = (
  client.query
  .get("JeopardyQuestion", ["question"])
  .with_generate(grouped_task = generateTask)
  #.with_near_text({
  #  "concepts": ["Elephants"]
  #})
  .with_hybrid(
        query = "Elephants",
        properties = ["question"],
        alpha = 0.80
    )
  .with_limit(3)
).do()

print(json.dumps(result, indent=1))

By the way, check out our events page. We have some great free workshops that will help you.

Let me know if that helps :slight_smile:

1 Like