Filtering based on creation_time requires index_timestamps

abdimussa · December 4, 2024, 1:09pm

Description

I wanted to perform a search using creation_time in weaviate, and it requires timestamp index to be added

jobss =jps.get_objects(
    filters=Filter.by_creation_time().greater_than(filter_time),
    metadata={'creation_time':True}, limit=80
    )

With this error outputted:

Timestamps must   be indexed to be filterable! Add
`IndexTimestamps: true` to the   InvertedIndexConfig in JobProfile"

How to update the collection definition so that it supports filters on creation_time. I’m using weaviate v4 python client.

Server Setup Information

Weaviate Server Version: 1.25.3
Deployment Method: docker
Client Language and Version: Python and version 4.6.5

Mohamed_Shahin · December 4, 2024, 4:38pm

Hi @abdimussa,

Have you enabled the indexTimestamps?

Example:

import weaviate.classes.config as wcc
import weaviate.classes as wvc

client.collections.create(
name=“Test_time”,
vectorizer_config=wcc.Configure.Vectorizer.text2vec_openai(),
inverted_index_config=wcc.Configure.inverted_index(
index_timestamps = True
),
properties=[
wcc.Property(
name=“question”,
data_type=wcc.DataType.TEXT,
tokenization=wcc.Tokenization.WORD,
),
wcc.Property(
name=“answer”,
data_type=wcc.DataType.TEXT,
tokenization=wcc.Tokenization.FIELD,
)
],

)

abdimussa · December 4, 2024, 5:05pm

Hi @Mohamed_Shahin. I haven’t enabled it during the creation of the collection. I want to update the collection currently. Is there a way to do that?

Mohamed_Shahin · December 5, 2024, 4:04pm

Good evening @abdimussa,

You will need to re-index the collection with:

inverted_index_config=wcc.Configure.inverted_index(

index_timestamps=True

)

index_timestamps is an immutable parameter that must be initialized at the schema creation.

Furthermore, I would like to share that we have a mutability list available here:

Some parameters are mutable after you create your collection.

abdimussa · December 5, 2024, 5:44pm

Good evening @Mohamed_Shahin, so this means I’ll need to recreate the collection?

Mohamed_Shahin · December 5, 2024, 6:25pm

@abdimussa yes that’s correct.

abdimussa · December 5, 2024, 7:00pm

@Mohamed_Shahin ok thank you. I’d appreciate it if you can share a resource on how to do that efficiently.

Mohamed_Shahin · December 5, 2024, 9:07pm

@abdimussa,

Absolutely, I am more than happy to ensure you are well supported. Before sharing some snippets with you to re-use, is there anything specific you’re using or would like to use in your cluster? If so, I can prepare some snippets that cover everything, so you won’t have to face that again.

Otherwise, please make sure you go through the mutability list to familiarize yourself with what can be re-configured after creation.

If you’d like, you can share your schema creation method with me, and I’d be happy to tweak it programmatically for you as well.

abdimussa · December 6, 2024, 8:55am

@Mohamed_Shahin let’s take the below as an example.

client = weaviate.connect_to_custom(
                http_host=valid_url,
                http_port="443",
                http_secure=True,
                grpc_host=valid_url,
                grpc_port="50051",
                grpc_secure=True,
                auth_credentials=auth_config,
                headers=headers,
                additional_config=AdditionalConfig(
                    timeout=Timeout(init=30, query=60, insert=120), 
                ),
                skip_init_checks=True 
            )

client.collections.create(
    "ArticleMetadata",
    properties=[
        Property(name="metadata", data_type=DataType.TEXT),
    ],
    vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(
                model="text-embedding-3-small",
            ),
    generative_config=wc.Configure.Generative.openai(),
)


client.collections.create(
    "Article",
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="body", data_type=DataType.TEXT),
    ],
    references=[
                wc.ReferenceProperty(
                    name="hasMetaData",
                    target_collection="ArticleMetaData"
                )
            ],        
    vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(
                model="text-embedding-3-small",
            ),
    generative_config=wc.Configure.Generative.openai(),
)

Mohamed_Shahin · December 6, 2024, 5:30pm

@abdimussa

from weaviate.classes.config import Configure
from weaviate.classes.config import Property, DataType, ReferenceProperty, Tokenization

client.collections.create(
name=“Article”,
vectorizer_config=Configure.Vectorizer.text2vec_openai(model=“text-embedding-3-small”),
generative_config=Configure.Generative.openai(),
inverted_index_config=Configure.inverted_index(
index_timestamps = True
),
replication_config=Configure.replication(factor=3, async_enabled=True),
properties=[
Property(
name=“title”,
data_type=DataType.TEXT,
tokenization=Tokenization.WORD,
),
Property(
name=“body”,
data_type=DataType.TEXT,
tokenization=Tokenization.FIELD,
)
],
references=[
ReferenceProperty(
name=“hasMetaData”,
target_collection=“ArticleMetaData”
)
]
)

abdimussa · December 12, 2024, 4:23pm

@Mohamed_Shahin Thank you for your response. However, I wanted to see more on the data migration part from my old collection to the new one where the index_timestamp is set to true. I’m facing issue with how I can migrate the references of each object.

The below is a sample code I’ve got:

def migrate_data(collection_src:Collection, collection_tgt:Collection):

    with collection_tgt.batch.fixed_size(batch_size=20) as batch:
        for q in tqdm(collection_src.iterator(include_vector=True,return_references=QueryReference(
                link_on="hasMetaData",
                return_properties=["metadata"]
            ))):
            batch.add_object(
                properties=q.properties,
                vector=q.vector["default"],
                uuid=q.uuid,
            )
            

    return True

DudaNogueira · December 19, 2024, 3:42pm

hi @abdimussa !!

You can run thru all objects using the iterator api, and then migrate the cross references using this:

questions = client.collections.get("JeopardyQuestion")

questions.data.reference_add(
    from_uuid=question_obj_id,
    from_property="hasCategory",
    to=category_obj_id
)

Let me know if that helps!

Thanks!

abdimussa · December 20, 2024, 12:28pm

@DudaNogueira, thank you, this is helpful.

Topic		Replies	Views
Modify Indices after Schema Creation Support	2	507	February 21, 2024
How to filter by date field Support	9	686	July 19, 2024
How to add new property to an existing collection with V4 client Support python	6	670	December 19, 2024
Metadata properties Support	2	207	November 15, 2024
Filtering date range does not work (combining dates in where clause) Support bug	7	549	February 28, 2024

Filtering based on creation_time requires index_timestamps

Description

Server Setup Information

Related topics