Numbers in text to be vectorized

Hello,

at the following link, there is a nice description of how the texts to be vectorized are constructed: Collection schema | Weaviate - Vector Database

The description says nothing about numbers, but the example suggests that numbers are removed before vectorization:

Article = {
  summary: "Cows lose their jobs as milk prices drop",
  text: "As his 100 diary cows lumbered over for their Monday..."
}

will be vectorized as:

article cows lose their jobs as milk prices drop as his diary cows lumbered over for their monday...

Is it correct that numbers are removed or is this a mistake in the example in the documentation? Or is the removal of numbers somehow included in the conversion to lowercase as they cannot be transformed into lowercase?

Follow up question: What about other characters, which have no case, e.g. punctuation or Chinese, Japanese, Korean characters?

Thanks for any help and best regards!

Hi @AccessPointAI !!

Thanks for pointing this out. I believe the doc is not correct :see_no_evil:

I have just ran a test:

from weaviate import classes as wvc
collection = client.collections.create(
    "Article",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(
        base_url="https://webhook.site/e0fd3f7d-a00b-4f8b-979f-2c67ca57d8f8"
    )
)
collection.data.insert(
    {
        "summary": "Cows lose their jobs as milk prices drop",
        "text": "As his 100 diary cows lumbered over for their Monday..."
    }
)

So I am basically pointing to a different endpoint from the default in order to capture the payload.

And this was the payload I got:

{
  "input": [
    "article cows lose their jobs as milk prices drop as his 100 diary cows lumbered over for their monday..."
  ],
  "model": "text-embedding-ada-002"
}

So no, numbers are not removed while vectorizing your objects

I will fix that in the docs :slight_smile:

Regarding your second question, using the same trick to capture the payload, I got it like this:

{
  "input": [
    "article 随着牛奶价格下跌,奶牛失去了工作 当他的 100 头奶牛在星期一缓慢地过来时……"
  ],
  "model": "text-embedding-ada-002"
}

Let me know if this helps.

Thanks!

Thanks! PR created:

1 Like

Great, thanks for clarifying this!