Hello,
at the following link, there is a nice description of how the texts to be vectorized are constructed: Collection schema | Weaviate - Vector Database
The description says nothing about numbers, but the example suggests that numbers are removed before vectorization:
Article = {
summary: "Cows lose their jobs as milk prices drop",
text: "As his 100 diary cows lumbered over for their Monday..."
}
will be vectorized as:
article cows lose their jobs as milk prices drop as his diary cows lumbered over for their monday...
Is it correct that numbers are removed or is this a mistake in the example in the documentation? Or is the removal of numbers somehow included in the conversion to lowercase as they cannot be transformed into lowercase?
Follow up question: What about other characters, which have no case, e.g. punctuation or Chinese, Japanese, Korean characters?
Thanks for any help and best regards!
Hi @AccessPointAI !!
Thanks for pointing this out. I believe the doc is not correct 
I have just ran a test:
from weaviate import classes as wvc
collection = client.collections.create(
"Article",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(
base_url="https://webhook.site/e0fd3f7d-a00b-4f8b-979f-2c67ca57d8f8"
)
)
collection.data.insert(
{
"summary": "Cows lose their jobs as milk prices drop",
"text": "As his 100 diary cows lumbered over for their Monday..."
}
)
So I am basically pointing to a different endpoint from the default in order to capture the payload.
And this was the payload I got:
{
"input": [
"article cows lose their jobs as milk prices drop as his 100 diary cows lumbered over for their monday..."
],
"model": "text-embedding-ada-002"
}
So no, numbers are not removed while vectorizing your objects
I will fix that in the docs 
Regarding your second question, using the same trick to capture the payload, I got it like this:
{
"input": [
"article 随着牛奶价格下跌,奶牛失去了工作 当他的 100 头奶牛在星期一缓慢地过来时……"
],
"model": "text-embedding-ada-002"
}
Let me know if this helps.
Thanks!
Great, thanks for clarifying this!