Authorization header is correct, but the token seems invalid

I am trying to use Weaviate with Hugging Face to create vectors to compare semantic similarity, but I keep getting this error: “Every object failed during insertion. Here is the set of all errors: failed with status: 400 error: Authorization header is correct, but the token seems invalid”

When I test to make sure my client is properly connected, it returns true, so I’m confused how the token can be invalid. I’m trying to convert each line in a txt file into a vector, but the error arises when I try to insert the objects themselves. All of my env variables match the keys I was given on WCD and Hugging Face. Also, on Hugging Face, I edited permissions like this:

Here’s my code. The error arises from the line: test.data.insert_many(objs).

import weaviate, os
import weaviate.classes as wvc
from sentence_transformers import SentenceTransformer


client = weaviate.connect_to_weaviate_cloud(
    cluster_url= os.getenv("WEAVIATE_INSTANCE_URL"),
    auth_credentials=weaviate.auth.AuthApiKey(os.getenv("W_API_KEY")),
    headers={
         "X-HuggingFace-Api-Key": "H_F_API_KEY"  
    }
)
print(os.getenv("H_F_API_KEY"))
print(client.is_ready())
client.collections.delete("Test")
try:
    test = client.collections.create(
        name="Test",
        vectorizer_config=[
        wvc.config.Configure.NamedVectors.text2vec_huggingface(
            name="title_vector",
            source_properties=["title"],
            model="sentence-transformers/all-MiniLM-L6-v2",
        )
    ],
    )
    model = SentenceTransformer("all-MiniLM-L6-v2")
    with open('test1.txt', 'r') as file:
        lines = [line.strip() for line in file.readlines()]

    vectors = model.encode(lines)

    objs = []
    for line,vector in zip(lines, vectors):
        objs.append({
            "text": line,
            
    })
        
    test = client.collections.get("Test")
    test.data.insert_many(objs)

finally:
    client.close()

hi @Anna_Caroline_Symond !!

Welcome to our community :hugs:

This message comes directly from hugging face Api. It indicates that your Hugging Face api token is wrong, not Weaviate’s.

Also, there are some other issues with this code, like not passing the vectors your encoded yourself. This will trigger Weaviate to vectorize your object using the Hugging Face.

You also created a named vector that uses title property as source, but was using text as the property receiving the data.

I also changed the code to use batch insert, as they can perform better for large amounts of data ingestion.

Here is my take on this:

import weaviate
import os
import weaviate.classes as wvc
from sentence_transformers import SentenceTransformer

client = weaviate.connect_to_local(
    headers={
        "X-HuggingFace-Api-Key": os.getenv("HUGGINGFACE_APIKEY")
        #"X-HuggingFace-Api-Key": "WRONG"
    }
)
print(client.is_ready())
client.collections.delete("Test")
try:
    collection = client.collections.create(
        name="Test",
        vectorizer_config=[
            wvc.config.Configure.NamedVectors.text2vec_huggingface(
                name="title_vector",
                source_properties=["title"],
                model="sentence-transformers/all-MiniLM-L6-v2",
            )
        ],
    )
    model = SentenceTransformer("all-MiniLM-L6-v2")
    lines = ["Something about dogs", "Something about wolfs"]

    vectors = model.encode(lines)

    objs = []
    for line, vector in zip(lines, vectors):
        with collection.batch.dynamic() as batch:
            batch.add_object(
                properties={"title": line},
                vector={
                    "title_vector": vector
                }
            )
        if collection.batch.failed_objects:
            print("FAILED OBJECTS", collection.batch.failed_objects)
finally:
    client.close()

Now you can check if your objects were indeed ingested:

client.connect()
objects = collection.query.fetch_objects(include_vector=True).objects
print(objects[0].vector)
print(objects[0].properties)
client.close()

It should output something similar to:

{‘title_vector’: [-0.05317388474941254, 0.02120402827858925, 0.06296592950820923, …, 0.09222045540809631]}
{‘title’: ‘Something about dogs’}

Now, we can query this using Hugging face Api, or vectorizing the query ourselves:

client.connect()
collection = client.collections.get("Test")
objects = collection.query.near_text(
    "pet animals", 
    target_vector="title_vector", 
    return_metadata=wvc.query.MetadataQuery(distance=True),
    include_vector=True
).objects
for object in objects:
    print("#" * 10)
    print(object.metadata.distance)
    print(object.properties)
client.close()

Outputs:

##########
0.3560590147972107
{‘title’: ‘Something about dogs’}
##########
0.5862653255462646
{‘title’: ‘Something about wolfs’}

If you don’t want to use the hugging face api, you can vectorize your query. Instead of near_text, you use near_vector:

client.connect()
collection = client.collections.get("Test")

query_vector = vectors = model.encode(["pet animals"])

objects = collection.query.near_vector(
    near_vector=query_vector[0], 
    target_vector="title_vector",
    return_metadata=wvc.query.MetadataQuery(distance=True),
    include_vector=True
).objects
for object in objects:
    print("#" * 10)
    print(object.metadata.distance)
    print(object.properties)
client.close()

Let me know if this helps :slight_smile: