I have been trying to integrate Weaviate with HuggingFace, but I have encountered an issue when building the embeddings.
This is the schema I have developed, with the fields and all the required configurations.
client = weaviate.Client(
url=url,
additional_headers={"X-HuggingFace-Api-Key": "XXX"}
)
class_obj = {
"class": WEAVIATE_CLASS_NAME,
"description": "Weaviate class for embeddings",
"moduleConfig": {
"text2vec-huggingface": {
# "model": EMBED_MODEL_NAME,
"endpointURL": "https://api-inference.huggingface.co/models/hackathon-pln-es/paraphrase-spanish-distilroberta",
"options": {
"waitForModel": True,
"useGPU": False,
"useCache": True
},
"vectorizeClassName": False
}
},
"properties": [
{"dataType": ["text"], "name": "subject_label"},
{"dataType": ["text"],
"moduleConfig": {
"text2vec-huggingface": {
"skip": False,
"vectorizePropertyName": False
}
},
"name": "text"},
{"dataType": ["text"], "name": "source"},
],
# "vectorizer": "none",
"vectorizer": "text2vec-huggingface"
}
client.schema.create_class(class_obj)
Additionally, this is the code snippet I use to populate the database.
with client.batch(batch_size=100):
for idx, row in data.iterrows():
data_obj = {"subject_label": row["subject_label"], "text": row["text"],
"source": row["source"]}
client.batch.add_data_object(data_obj, WEAVIATE_CLASS_NAME)
However, this is the result I always get once the code is executed.
{‘error’: [{‘message’: ‘update vector: send POST request: Post “https://api-inference.huggingface.co/models/hackathon-pln-es/paraphrase-spanish-distilroberta”: dial tcp 54.209.188.203:443: i/o timeout’}]}
{‘error’: [{‘message’: ‘update vector: send POST request: Post “https://api-inference.huggingface.co/models/hackathon-pln-es/paraphrase-spanish-distilroberta”: dial tcp 54.209.188.203:443: i/o timeout’}]}