What is the best way to create schema and index for my business use case?

Hi,
New to Weaviate and GenAI. I followed the QuickStart guide and trying to build a proof of concept. My industry is manufacturer sourcing. I have a data file of 47 manufacturers. The data contains, business name, business description, products they manufacture, certifications, and countries they export to. I want to be able to query using combination of the properties. Example query could be :

  1. Manufacturers that produces men’s shoes and exports to the USA.
  2. Agriculture companies that produce spices and have ISO certification.
  3. Leather bag manufacturer that has European certifications.

I am getting irrelevant result because, I think Weaviate is embedding list of countries as one unit and list of certifications as one unit.

Example data set is below.

{
  "Business Name": "Random Company Name",
  "Business Description": "Established in the year 2017, in New Delhi, India, Random Company Name is one of the distinguished manufacturers, suppliers, 
  and exporters of LEATHER GOODS & ACCESSORIES. These are appreciated by domestic and international clients owing to 
  features like perfect finish, shine, color, high tear strength, and glossy appearance. Our organization is supported by 
  a team of numerous adroit professionals who give their best efforts for taking us to new heights of success. 
  We use quality leather like cow, buff, sheep, goat, and NDM in the manufacturing unit to ensure that the end products are flawless in quality. 
  The products are available in various sizes, designs, and patterns. Further, these can be customized as per the demands of our valued customers.</p>",
  "Products": "Leather - Wallets, Sling Bags, Fanny Packs, Back Packs, Coin Pouches, Hand Pouches, and Toiletry Bags.\nHandloom Cushions and Covers\nHandicraft Product's ",
  "Certifications": "ISO 9001:2015 Certified Company, ISO 14001 Certified Company",
  "What countries are you currently exporting?": "UK, USA, Canada, Australia and Europe."
},

Hi Mohan,

Sorry for the late reply.
I am sure you found a solution elsewhere, but I want to add a solution here, in case someone else has the same question.

Weaviate allows you to skip properties you don’t want to vectorise.
To do that, you need to define the property schema when you create a new collection. You can see an example of it in the docs.

Here is a preview of a Python example, using the latest Python Client (v4) – see the skip_vectorization=True part.

import weaviate.classes.config as wc

client.collections.create(
    "Article",
    vectorizer_config=wc.Configure.Vectorizer.text2vec_huggingface(),

    properties=[
        wc.Property(
            name="title",
            data_type=wc.DataType.TEXT,
            vectorize_property_name=True,
        ),
        wc.Property(
            name="body",
            data_type=wc.DataType.TEXT,
            skip_vectorization=True,  # <=== here is how you skip the property
        ),
    ]
)

docs

NamedVectors

We are workig on a new feature in Weaviate called NamedVectors.
It will allow you to use multiple vectors per object.
But also (which is relevant to this post), it will allow you to select properties you want to vectorise, instead of skipping those you don’t want.

We plan to release the first instalment of Named Vectors with Weaviate 1.24.
(planned around ~22 Feb 2024)

For anyone curious, here is a github issue.