We have a weaviate collection that contains embedings in multiple languages, with metadata ‘langauge’ that we use to filter the values.
The languages include: “en”, “it” “de” etc…
When creating the collection, no special settings for inverted_index_config were used, so the default en setting is in place.
The problem is with ‘stopwords’, when trying to create a language filter for italian lanauge:
filters = wq.Filter.by_property("language").equal("it")
I will always get an error in query.near_text:
Exception has occurred: WeaviateQueryError
Query call with protocol GRPC search failed with message explorer: get class: concurrentTargetVectorSearch): explorer: get class: vector search: object vector search at index knowledgebase_customer_dev: shard knowledgebase_customer_dev_acj1FiCmxIOF: build inverted filter allow list: invalid search term, only stopwords provided. Stopwords can be configured in class.invertedIndexConfig.stopwords.
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNKNOWN
details = “explorer: get class: concurrentTargetVectorSearch): explorer: get class: vector search: object vector search at index knowledgebase_customer_dev: shard knowledgebase_customer_dev_acj1FiCmxIOF: build inverted filter allow list: invalid search term, only stopwords provided. Stopwords can be configured in class.invertedIndexConfig.stopwords”
debug_error_string = “UNKNOWN:Error received from peer {created_time:“2025-06-19T14:24:02.414391+02:00”, grpc_status:2, grpc_message:“explorer: get class: concurrentTargetVectorSearch): explorer: get class: vector search: object vector search at index knowledgebase_customer_dev: shard knowledgebase_customer_dev_acj1FiCmxIOF: build inverted filter allow list: invalid search term, only stopwords provided. Stopwords can be configured in class.invertedIndexConfig.stopwords”}”
I checked some forums and added “it” to the list of stopwords removals:
def update_stopwords_in_collection(
collection: weaviate.collections.Collection,
stopwords_additions: Optional[List[str]] = None,
stopwords_removals: Optional[List[str]] = None,
) -> None:
"""
Update stopwords in a Weaviate collection.
Args:
collection: The Weaviate collection object to update
stopwords_additions: List of stopwords to add (remove from indexing)
stopwords_removals: List of stopwords to remove (allow indexing)
"""
print(f"Updating stopwords for collection: {collection.name}")
if not stopwords_removals:
stopwords_removals = []
if not stopwords_additions:
stopwords_additions = []
try:
config = collection.config.get()
print(f"Collection config pre-update: {config.inverted_index_config.stopwords}")
collection.config.update(
inverted_index_config=Reconfigure.inverted_index(
stopwords_additions=stopwords_additions,
stopwords_removals=stopwords_removals,
)
)
config = collection.config.get()
print(
f"Collection config post-update: {config.inverted_index_config.stopwords}"
)
except Exception as e:
print(f"Error updating stopwords: {e}")
raise
And even though the update appears to be succesfull (the collection post update has the correct stopwords in the config), the language “it” filter still causes the same error. This is the print post update from the function:
Collection config post-update: _StopwordsConfig(preset=<StopwordsPreset.EN: ‘en’>, additions=, removals=[‘it’])
Please help me how to navigate this? Why are stopwords even included when doing metadata filtering?
Server Setup Information
Weaviate Database version 1.27.27 (hosted on cloud → on weaviate)
Code we use for querying:
def query_weaviate_chunks(
collection: weaviate.collections.Collection,
query: str,
filters: wq.Filter,
limit: int = 5,
) -> List[Dict[str, Any]]:
"""
Query the Weaviate collection using a vector and return matching chunks.
Parameters:
- collection: The Weaviate collection object.
- query: A string to query against
- filters: A valid Weaviate Filter object, that can prefilter data before embedding search
- limit: The maximum number of results to return (default is 5).
Returns:
- List of matching objects with their properties and metadata distance.
"""
try:
response = collection.query.near_text(
query=query,
limit=limit,
return_metadata=wq.MetadataQuery(distance=True),
filters=filters,
target_vector=["content_vector"],
)
results = []
for o in response.objects:
results.append(
{
"properties": o.properties,
"distance": o.metadata.distance,
"uuid": o.uuid,
}
)
except UnexpectedStatusCodeException as err:
print(err)
return []
return results
def get_data_from_knowledge_base_w_language_filter(
collection: weaviate.collections.Collection,
user_query: str,
n: int = 5,
language: Optional[str] = None,
) -> List[str]:
"""
Returns n most related chunks for the user query
Args:
collection: The Weaviate collection object
user_query: The search query
n: Number of results to return
language: Language filter for the search
Returns:
List of content chunks
"""
filters = wq.Filter.by_property("language").equal(language)
weaviate_result = query_weaviate_chunks(collection, user_query, filters, limit=n)
chunks = [result["properties"]["content"] for result in weaviate_result]
print(f"Found these text chunks: {chunks}")
return chunks