How to specify stopwords with python V4 API

Hi,
I’m using the python V4 API and I’d like to specify stopwords for French.
In V3, this comes down to setting up invertedIndexConfig.
I can’t find the equivalent for V4

So far I got

client.collections.create(
                name= collection_name,
                vectorizer_config=vectorizer,
                generative_config=wvc.Configure.Generative.openai(),
                inverted_index_config = wvc.Configure.inverted_index( ??? ),

                properties=[ list of properties               ]
            )

what parameters to pass to wvc.Configure.inverted_index( ... ) so that I can specify a list of stopwords

Thanks

Digging further into the documentation and classes, I found the following solution to add a list of stopwords to the collection.

To see the configuration of the collection use:

collection.config.get()

To set the stopwords using the wvc.Configure.inverted_index( ??? ) function:

set the params of the function (all default values except for stopwords_additions)

params = {
    "bm25_b": 0.75,
    "bm25_k1": 1.2,
    "cleanup_interval_seconds": 60,
    "index_timestamps":  False,
    "index_property_length":  False,
    "index_null_state":  False,
    "stopwords_preset": None,
    "stopwords_additions":  list_stopwords,
    "stopwords_removals": None,
}

The create the collection, passing params to the function:

collection = client.collections.create(
    name= <collection_name>,
    vectorizer_config=vectorizer,
    generative_config=wvc.Configure.Generative.openai(),
    inverted_index_config = wvc.Configure.inverted_index(**params),
    properties=[
        < list of properties>
    ]
)

However this fails to set the stopword_preset to None although it seems to be a valid value for stopwords_preset.

You can also create the collection without specifying the stopwords, and later on update the stopwords property with

collection.config.update(
    # Note, use Reconfigure here (not Configure)
    inverted_index_config=wvc.Reconfigure.inverted_index(
        stopwords_additions=["le", "la", "il", "elle"]
    )
)
1 Like

Wrote a short blog post to summarize everything

Adding French stopwords tp a weaviate collection with Python V4 API

1 Like

Hi @alexisperrier !!

Wow! Thanks a lot for writing and sharing your blog post! Glad you figured it out!

We should be updating the docs soon.

Thanks!