Hi, I am kind of new to large language models. My goal is to implement hybrid search in rag. I am not quite sure on what vectorizer to make use of , any suggestions? (I was reading on the internet that bm25 requires sparse vectors and sematic search requires dense vectors, so how can I narrow it down to one type of vectorizer?)
Thank you!!
Hi @llmwill !!
Welcome! I believe you are at the best place to put all this together
We have some great recipes that can get you up and running in no time:
This for example, will guide you on how to do a Generative Search/RAG:
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "rJD9aP9eVcsT"
},
"source": [
"## Dependencies"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ReE5TWeXSDTe"
},
"outputs": [],
"source": [
"!pip install weaviate-client"
This file has been truncated. show original
Now, this other recipe, is about hybrid search (but not generating an answer)
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "rJD9aP9eVcsT"
},
"source": [
"## Dependencies"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3IgZm3pYwWa8"
},
"outputs": [],
"source": [
"!pip install weaviate-client"
This file has been truncated. show original
Combining both, you should end up with something like:
generateTask = "Explain why these Jeopardy questions are under the Animals category."
result = (
client.query
.get("JeopardyQuestion", ["question"])
.with_generate(grouped_task = generateTask)
#.with_near_text({
# "concepts": ["Elephants"]
#})
.with_hybrid(
query = "Elephants",
properties = ["question"],
alpha = 0.80
)
.with_limit(3)
).do()
print(json.dumps(result, indent=1))
By the way, check out our events page. We have some great free workshops that will help you.
Let me know if that helps
1 Like