As V4 is very new I am a bit struggling to understand details from various videos and tutorials. I am learning in a class/collection that has three properties:
- kicker: this holds a paragraph of text and I want to vectorize it with my custom model
- author: a string of names; I wish to tokenize, index and search terms to find the names of an author.
- slug: this is an article id and I do not need to neither vectorize nor do text searches on this field.
The following is the code I am using to create the collection/class:
with weaviate.connect_to_local( # this will connect and then at the end implicitely close
host = "localhost",
port = 8077,
headers = {
"X-OpenAI-Api-Key": openai_key, # for generative queries
}
) as client:
client.collections.delete(schema_name)
client.collections.create(
schema_name,
description="A class to store articles with a semantic kicker and searchable author.",
vectorizer_config=None,
generative_config=wc.Configure.Generative.openai(),
inverted_index_config=wc.Configure.inverted_index(
index_property_length=True
),
vector_index_config=wc.Configure.VectorIndex.hnsw(
distance_metric=wc.VectorDistances.COSINE
),
properties=[
wc.Property(name="kicker", data_type=wc.DataType.TEXT, skip_vectorization=True),
wc.Property(name="slug", data_type=wc.DataType.TEXT, skip_vectorization=True),
wc.Property(name="author", data_type=wc.DataType.TEXT,skip_vectorization=True)
]
)
print(f"Successfully created the {schema_name} schema.")
articles = client.collections.get(schema_name)
response = articles.aggregate.over_all(
total_count=True
)
print(f"We have {response.total_count} in the {schema_name} collection")
Is it right that I declare the skip_vectorization like this since I am providing a vector manuallly at insertion time?
How do I declare that author is the only field on which I will do BM25 searches?
Thanks