V4 client with custom vectorizer question

rjalex · February 12, 2024, 3:34pm

As V4 is very new I am a bit struggling to understand details from various videos and tutorials. I am learning in a class/collection that has three properties:

kicker: this holds a paragraph of text and I want to vectorize it with my custom model
author: a string of names; I wish to tokenize, index and search terms to find the names of an author.
slug: this is an article id and I do not need to neither vectorize nor do text searches on this field.
The following is the code I am using to create the collection/class:

with weaviate.connect_to_local( # this will connect and then at the end implicitely close
    host = "localhost",
    port = 8077,
    headers = {
        "X-OpenAI-Api-Key": openai_key,  #  for generative queries
    }
    )  as client:
    client.collections.delete(schema_name) 
    client.collections.create(
        schema_name,
        description="A class to store articles with a semantic kicker and searchable author.",
        vectorizer_config=None,
        generative_config=wc.Configure.Generative.openai(),
        inverted_index_config=wc.Configure.inverted_index(
            index_property_length=True
        ),
        vector_index_config=wc.Configure.VectorIndex.hnsw(
            distance_metric=wc.VectorDistances.COSINE
        ),
        properties=[
            wc.Property(name="kicker", data_type=wc.DataType.TEXT, skip_vectorization=True),
            wc.Property(name="slug", data_type=wc.DataType.TEXT, skip_vectorization=True), 
            wc.Property(name="author", data_type=wc.DataType.TEXT,skip_vectorization=True)
        ]
    )
    print(f"Successfully created the {schema_name} schema.")
    articles = client.collections.get(schema_name)
    response = articles.aggregate.over_all(
        total_count=True
    )
    print(f"We have {response.total_count} in the {schema_name} collection")

Is it right that I declare the skip_vectorization like this since I am providing a vector manuallly at insertion time?

How do I declare that author is the only field on which I will do BM25 searches?

Thanks

DudaNogueira · February 12, 2024, 5:56pm

Hi!

As you are providing the vector yourself, and you don’t have a vectorizer for that class, the skip_vectorization parameter becomes useless.

While performing a bm25 field, you can not only specify the fields to search, but also weight in for fields:

for example:

    collection = client.collections.get(schema_name)
    response = jeopardy.query.bm25(
        query="Pelé",
        query_properties=["kicker^2", "author"],
        limit=3
    )

    for o in response.objects:
        print(o.properties)

Let me know if that helps

rjalex · February 13, 2024, 3:01pm

Edson Arantes do Nascimento is a planetary myth and I do not need weaviate to find him. He’s always in our hearts !!!
More seriously I will have to find the right V4 way of declaring which fields I want to be indexed for BM25 searches.
If I do understand your example that “^2” is a weight right? Can you point me out to the new V4 BM25 query syntax to better grasp it?
Muito obrigado.

DudaNogueira · February 14, 2024, 10:55am

Mestre Pelé

The syntax is in the doc:

Note that you have tabs for different languages.

rjalex · February 14, 2024, 1:14pm

Thank you @DudaNogueira . The docs you link are very clear and fine but the problems I have arise from probable mismatches between the way I am setting up the server, instantiating the client and declaring the V4 collection and the queries. It might be that both that each are correct but overall hide some problems.

Topic		Replies	Views
Python V4 client silently skipping objects if property is an object? Support developer-experience	1	287	February 14, 2024
[Feedback] Update to the Python client – collections, search, CRUD operations General developer-experience , feedback	18	1362	July 1, 2023
Help wrapping my head on two named vectors configs Support	2	52	October 24, 2024
[Question] YOUR TOPIC Support python	1	41	July 30, 2024
Unable to fully comprehend the computed score Support	6	58	August 12, 2024

V4 client with custom vectorizer question

Related topics