hi @Anna_Caroline_Symond !!
Welcome to our community
This message comes directly from hugging face Api. It indicates that your Hugging Face api token is wrong, not Weaviate’s.
Also, there are some other issues with this code, like not passing the vectors your encoded yourself. This will trigger Weaviate to vectorize your object using the Hugging Face.
You also created a named vector that uses title
property as source, but was using text
as the property receiving the data.
I also changed the code to use batch insert, as they can perform better for large amounts of data ingestion.
Here is my take on this:
import weaviate
import os
import weaviate.classes as wvc
from sentence_transformers import SentenceTransformer
client = weaviate.connect_to_local(
headers={
"X-HuggingFace-Api-Key": os.getenv("HUGGINGFACE_APIKEY")
#"X-HuggingFace-Api-Key": "WRONG"
}
)
print(client.is_ready())
client.collections.delete("Test")
try:
collection = client.collections.create(
name="Test",
vectorizer_config=[
wvc.config.Configure.NamedVectors.text2vec_huggingface(
name="title_vector",
source_properties=["title"],
model="sentence-transformers/all-MiniLM-L6-v2",
)
],
)
model = SentenceTransformer("all-MiniLM-L6-v2")
lines = ["Something about dogs", "Something about wolfs"]
vectors = model.encode(lines)
objs = []
for line, vector in zip(lines, vectors):
with collection.batch.dynamic() as batch:
batch.add_object(
properties={"title": line},
vector={
"title_vector": vector
}
)
if collection.batch.failed_objects:
print("FAILED OBJECTS", collection.batch.failed_objects)
finally:
client.close()
Now you can check if your objects were indeed ingested:
client.connect()
objects = collection.query.fetch_objects(include_vector=True).objects
print(objects[0].vector)
print(objects[0].properties)
client.close()
It should output something similar to:
{‘title_vector’: [-0.05317388474941254, 0.02120402827858925, 0.06296592950820923, …, 0.09222045540809631]}
{‘title’: ‘Something about dogs’}
Now, we can query this using Hugging face Api, or vectorizing the query ourselves:
client.connect()
collection = client.collections.get("Test")
objects = collection.query.near_text(
"pet animals",
target_vector="title_vector",
return_metadata=wvc.query.MetadataQuery(distance=True),
include_vector=True
).objects
for object in objects:
print("#" * 10)
print(object.metadata.distance)
print(object.properties)
client.close()
Outputs:
##########
0.3560590147972107
{‘title’: ‘Something about dogs’}
##########
0.5862653255462646
{‘title’: ‘Something about wolfs’}
If you don’t want to use the hugging face api, you can vectorize your query. Instead of near_text
, you use near_vector
:
client.connect()
collection = client.collections.get("Test")
query_vector = vectors = model.encode(["pet animals"])
objects = collection.query.near_vector(
near_vector=query_vector[0],
target_vector="title_vector",
return_metadata=wvc.query.MetadataQuery(distance=True),
include_vector=True
).objects
for object in objects:
print("#" * 10)
print(object.metadata.distance)
print(object.properties)
client.close()
Let me know if this helps