Digging further into the documentation and classes, I found the following solution to add a list of stopwords to the collection.
To see the configuration of the collection use:
collection.config.get()
To set the stopwords using the wvc.Configure.inverted_index( ??? )
function:
set the params of the function (all default values except for stopwords_additions)
params = {
"bm25_b": 0.75,
"bm25_k1": 1.2,
"cleanup_interval_seconds": 60,
"index_timestamps": False,
"index_property_length": False,
"index_null_state": False,
"stopwords_preset": None,
"stopwords_additions": list_stopwords,
"stopwords_removals": None,
}
The create the collection, passing params to the function:
collection = client.collections.create(
name= <collection_name>,
vectorizer_config=vectorizer,
generative_config=wvc.Configure.Generative.openai(),
inverted_index_config = wvc.Configure.inverted_index(**params),
properties=[
< list of properties>
]
)
However this fails to set the stopword_preset to None
although it seems to be a valid value for stopwords_preset
.
You can also create the collection without specifying the stopwords, and later on update the stopwords property with
collection.config.update(
# Note, use Reconfigure here (not Configure)
inverted_index_config=wvc.Reconfigure.inverted_index(
stopwords_additions=["le", "la", "il", "elle"]
)
)