I am using locally setup Weaviate. And I have class with schema as
class_obj = {
"class": "Test3",
"vectorizer": "text2vec-openai",
"moduleConfig": {
"text2vec-openai": {
"model": "ada",
"modelVersion": "002",
"type": "text"
}
}
}
and ingested data. I have data in csv file, so I generated json object and added to class in batches.
Now I want to use .near_text() to get relevant results from the data using the query:
result = (
client.query
.get("Test3", ["title", "abstract", "number", "_additional {distance}"])
.with_near_text({"concepts": ["machine learning"]})
.with_additional(['certainty'])
.do()
)
But I am not satisfied with the results generated by Weaviate. My data doesn’t consists of any information related to ‘machine learning’. But still I got results with distance <0.25 and certainity > 0.8. I am using ‘cosine’ similarity here.
I should not have got any results, but it fetched all the data of the file with approximately same similarity.
Note: I tried with several keywords(appeared in my data) but the score for the more similar vs irrelevant data is 0.17 vs 0.24 which can be considered as relevant data.
Please provide me some support in this regard.