Description
I wanted to perform a search using creation_time in weaviate, and it requires timestamp index to be added
jobss =jps.get_objects(
filters=Filter.by_creation_time().greater_than(filter_time),
metadata={'creation_time':True}, limit=80
)
With this error outputted:
Timestamps must be indexed to be filterable! Add
`IndexTimestamps: true` to the InvertedIndexConfig in JobProfile"
How to update the collection definition so that it supports filters on creation_time. I’m using weaviate v4 python client.
Server Setup Information
- Weaviate Server Version: 1.25.3
- Deployment Method: docker
- Client Language and Version: Python and version 4.6.5
Hi @abdimussa,
Have you enabled the indexTimestamps
?
Example:
import weaviate.classes.config as wcc
import weaviate.classes as wvc
client.collections.create(
name=“Test_time”,
vectorizer_config=wcc.Configure.Vectorizer.text2vec_openai(),
inverted_index_config=wcc.Configure.inverted_index(
index_timestamps = True
),
properties=[
wcc.Property(
name=“question”,
data_type=wcc.DataType.TEXT,
tokenization=wcc.Tokenization.WORD,
),
wcc.Property(
name=“answer”,
data_type=wcc.DataType.TEXT,
tokenization=wcc.Tokenization.FIELD,
)
],
)
Hi @Mohamed_Shahin. I haven’t enabled it during the creation of the collection. I want to update the collection currently. Is there a way to do that?
Good evening @abdimussa,
You will need to re-index the collection with:
inverted_index_config=wcc.Configure.inverted_index(
index_timestamps=True
)
index_timestamps is an immutable parameter that must be initialized at the schema creation.
Furthermore, I would like to share that we have a mutability list available here:
Some parameters are mutable after you create your collection.
Good evening @Mohamed_Shahin, so this means I’ll need to recreate the collection?
@abdimussa yes that’s correct.
@Mohamed_Shahin ok thank you. I’d appreciate it if you can share a resource on how to do that efficiently.
1 Like
@abdimussa,
Absolutely, I am more than happy to ensure you are well supported. Before sharing some snippets with you to re-use, is there anything specific you’re using or would like to use in your cluster? If so, I can prepare some snippets that cover everything, so you won’t have to face that again.
Otherwise, please make sure you go through the mutability list to familiarize yourself with what can be re-configured after creation.
If you’d like, you can share your schema creation method with me, and I’d be happy to tweak it programmatically for you as well.
@Mohamed_Shahin let’s take the below as an example.
client = weaviate.connect_to_custom(
http_host=valid_url,
http_port="443",
http_secure=True,
grpc_host=valid_url,
grpc_port="50051",
grpc_secure=True,
auth_credentials=auth_config,
headers=headers,
additional_config=AdditionalConfig(
timeout=Timeout(init=30, query=60, insert=120),
),
skip_init_checks=True
)
client.collections.create(
"ArticleMetadata",
properties=[
Property(name="metadata", data_type=DataType.TEXT),
],
vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(
model="text-embedding-3-small",
),
generative_config=wc.Configure.Generative.openai(),
)
client.collections.create(
"Article",
properties=[
Property(name="title", data_type=DataType.TEXT),
Property(name="body", data_type=DataType.TEXT),
],
references=[
wc.ReferenceProperty(
name="hasMetaData",
target_collection="ArticleMetaData"
)
],
vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(
model="text-embedding-3-small",
),
generative_config=wc.Configure.Generative.openai(),
)
@Mohamed_Shahin Thank you for your response. However, I wanted to see more on the data migration part from my old collection to the new one where the index_timestamp is set to true. I’m facing issue with how I can migrate the references of each object.
The below is a sample code I’ve got:
def migrate_data(collection_src:Collection, collection_tgt:Collection):
with collection_tgt.batch.fixed_size(batch_size=20) as batch:
for q in tqdm(collection_src.iterator(include_vector=True,return_references=QueryReference(
link_on="hasMetaData",
return_properties=["metadata"]
))):
batch.add_object(
properties=q.properties,
vector=q.vector["default"],
uuid=q.uuid,
)
return True
hi @abdimussa !!
You can run thru all objects using the iterator api, and then migrate the cross references using this:
questions = client.collections.get("JeopardyQuestion")
questions.data.reference_add(
from_uuid=question_obj_id,
from_property="hasCategory",
to=category_obj_id
)
Let me know if that helps!
Thanks!
@DudaNogueira, thank you, this is helpful.
1 Like