Description: I am seeking help to resolve the issue where the vector search in Weaviate returns no results. Any insights or suggestions on what might be going wrong or what additional configurations might be needed would be highly appreciated. Thank you!
Description: I have set up a Weaviate instance on an Amazon EC2 instance using Docker Compose. I followed the official Weaviate documentation for the installation process. My setup involves generating embeddings using Amazon Titan and performing vector searches within Weaviate. Despite following the instructions and modifying the docker-compose.yml
file to include the necessary modules, I am encountering issues with the vector search functionality. Specifically, the search returns no results. Below is a detailed description of my setup and the issue I am facing. Any assistance to resolve this would be greatly appreciated.
Installation and Version: My V4 Weaviate instance with Docker Compose is hosted on an Amazon EC2 instance. I followed this instruction to set up my Weaviate instance in AWS EC2:
Steps Taken in Coding:
- Connecting to Weaviate Instance on AWS EC2:
Python
host_dns = "host_dns.amazonaws.com"
client = weaviate.WeaviateClient(
connection_params=ConnectionParams.from_params(
http_host=host_dns,
http_port="8080",
http_secure=False,
grpc_host=host_dns,
grpc_port="50051",
grpc_secure=False,
),
additional_config=AdditionalConfig(
timeout=Timeout(init=30, query=60, insert=120),
),
skip_init_checks=True
)
client.connect() # Connect to Weaviate
- Creating Embeddings Using Amazon Titan:
Python
modelId = "amazon.titan-embed-text-v1"
accept = "application/json"
contentType = "application/json"
client_titan = boto3.client('bedrock-runtime', region_name='us-west-2')
def generate_embedding(value):
try:
body = json.dumps({"inputText": value})
response = client_titan.invoke_model(
body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())
embeddings = response_body['embedding']
return embeddings
except botocore.exceptions.ClientError as error:
print(error)
embeddings_updated = []
for index, row in df_new.iterrows():
embeddings_updated.append(generate_embedding(row['title_chunks']))
- Defining the Collection for the ‘Document’ Class:
Python
collection_new = {
"class": "Document_new",
"description": "A class to represent documents",
"vectorizer": "none", # Set to "none" because embeddings are provided
"moduleConfig": {
# "text2vec-openai": {}, # Configure if using OpenAI vectorization
# "generative-openai": {} # Configure if using generative queries
},
"properties": [
{"name": "url", "dataType": ["string"]},
{"name": "title", "dataType": ["string"]},
{"name": "chunks", "dataType": ["string"]},
{"name": "embeddings_updated", "dataType": ["blob"]}, # Use "blob" or "float[]" depending on your Weaviate setup
]
}
- Batch Import Objects from the DataFrame into Weaviate:
Python
with client.batch.dynamic() as batch:
for i, row in df_new.iterrows():
print(f"Importing document: {i+1}")
properties = {
"url": row["url"],
"title": row["title"],
"chunks": row["chunks"],
"embeddings_updated": row["embeddings_updated"]
}
batch.add_object(
collection="Document_new", # Specify the collection name here
properties=properties
)
print("Batch import completed successfully.")
- Performing the Vector Search in Weaviate:
Python
from weaviate.classes.query import MetadataQuery
document_collection = client.collections.get("Document_new")
# Generate query embeddings
query_vector = generate_embedding(prompt)
response = document_collection.query.near_vector(
near_vector=query_vector, # your query vector goes here
limit=10,
return_metadata=MetadataQuery(distance=True)
)
for o in response.objects:
print(o.properties)
print(o.metadata.distance)
Result: No results are returned from the vector search.
Docker-Compose Configuration: I modified the docker-compose.yml
to add text2vec-aws
as the default model following the suggested resolutions to the same issue reported by another user. However, the issue persists. Even though we added or enabled models, we get an error that the model is not present.
Docker-Compose.yml:
YAML
version: '3.4'
services:
weaviate:
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.26.1
ports:
- 8080:8080
- 50051:50051
volumes:
- weaviate_data:/var/lib/weaviate
restart: on-failure:0
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'none'
ENABLE_MODULES: 'text2vec-cohere,text2vec-huggingface,text2vec-palm,text2vec-openai,generative-openai,generative-cohere,generative-palm,generative-claude,ref2vec-centroid,reranker-cohere,qna-openai'
CLUSTER_HOSTNAME: 'node1'
AWS_REGION: 'us-west-2'
BEDROCK_ENDPOINT: 'https://bedrock.us-west-2.amazonaws.com'
TITAN_ENDPOINT: 'https://titan.us-west-2.amazonaws.com'
CLAUDE_ENDPOINT: 'https://claude.us-west-2.amazonaws.com' # Add Claude endpoint if required
volumes:
weaviate_data: