Getting all items with filter?

I get error:

search: invalid pagination params: query maximum results exceeded.

I am trying to find all objects belonging to a source url and I am doing it in a loop with pagination:

filters = Filter.by_property("url").equal(url)
        returned = self.client.collections.get(class_name).query.fetch_objects(limit=limit, offset=offset, filters=filters)
        if(returned is None):
            return -1
        all = returned
        while(len(returned.objects)>=limit):
            offset = offset+ limit
            returned =self.client.collections.get(class_name).query.fetch_objects(limit=limit, offset=offset, filters=filters)
            if(returned is not None and returned.objects is not None):
                all.objects += returned.objects
        return all

Cursor doesn’t seem to allow filtering from the documentation on website? How can I do this?

This is sadly not possible and you’d have to do filtering on the client

Oh gosh, so in theory there might be records that are unreachable via pagination… Is there a workaround people use? Like if I need the names of people in a specific city and they exceed the pagination limit?

Do I need to pair this with a different database for such issues?

There are two related but slightly different things:

a) pagination

If you do a search (hybrid, near_vector, filter, sort etc), weaviate does the search and then creates a list with all objects that fit the search in memory. To avoid crashing weaviate with OOM there is a limit. You can use pagination (offset+limit) to walk through this list.

b) Cursor API

The objects have a “natural” order on disk and can be accessed in this order without loading the objects into memory. You can use the cursor API/iterator in python to do this and access all objects in weaviate this way. Then you can filter the objects that you need locally.

Thanks - with the cursor api though - I would need to go through 100 million objects and then filter them on the client… If I make a feature request for this to be added to cursor is there a way to make this feasible to be done on the database?

In all honesty, even if the server filtered itself this would be useful to lower data transfer speeds. So even an inefficient database protocol here would have benefits?

Or is this the type of thing that is so bad that it shouldn’t be offered at all?

The database would have to iterate through all objects and do the check one-by-one. In principle it should be possible and feel free to do a feature request, but I don’t think that it would be added soonish (although that is not my decision)

1 Like

Not an ideal solution tbh.

Problem: How can I add data if it might exist already (python client throws error). so i can’t add without paginating… I feel like this is new, is this a database or client change? ie: throwing error on existing data.

Workaround: Delete all data matching filter and re-upload in batch.

I can’t paginate, but at least it can be searched…

1 Like

How can I add data if it might exist already (python client throws error).

The python client (or rather weaviate) only throws an error if there is an object with the same UUID present (for single object inserts, for batch inserts it overwrites existing obejcts). You can easily check with collection.data.exists(UUID) if an objec with that UUID is already present.

You can add as many objects with identical data as you want as long as the UUID is different.

What exactly are you trying to do? Maybe there is an easier way to structure your data?

@Dirk Oh, that’s a good point. I am generating uuids based off of unique data for the object (so yes, I can check the uuid! Good point!) - and in fact I don’t want collisions - so that makes sense.

I hadn’t thought of looping through individuals by generating the uuid for doing search as well - thanks! Obvious, but didn’t occur to me - so this works well enough for now.