search: invalid pagination params: query maximum results exceeded.
I am trying to find all objects belonging to a source url and I am doing it in a loop with pagination:
filters = Filter.by_property("url").equal(url)
returned = self.client.collections.get(class_name).query.fetch_objects(limit=limit, offset=offset, filters=filters)
if(returned is None):
return -1
all = returned
while(len(returned.objects)>=limit):
offset = offset+ limit
returned =self.client.collections.get(class_name).query.fetch_objects(limit=limit, offset=offset, filters=filters)
if(returned is not None and returned.objects is not None):
all.objects += returned.objects
return all
Cursor doesn’t seem to allow filtering from the documentation on website? How can I do this?
Oh gosh, so in theory there might be records that are unreachable via pagination… Is there a workaround people use? Like if I need the names of people in a specific city and they exceed the pagination limit?
Do I need to pair this with a different database for such issues?
There are two related but slightly different things:
a) pagination
If you do a search (hybrid, near_vector, filter, sort etc), weaviate does the search and then creates a list with all objects that fit the search in memory. To avoid crashing weaviate with OOM there is a limit. You can use pagination (offset+limit) to walk through this list.
b) Cursor API
The objects have a “natural” order on disk and can be accessed in this order without loading the objects into memory. You can use the cursor API/iterator in python to do this and access all objects in weaviate this way. Then you can filter the objects that you need locally.
Thanks - with the cursor api though - I would need to go through 100 million objects and then filter them on the client… If I make a feature request for this to be added to cursor is there a way to make this feasible to be done on the database?
In all honesty, even if the server filtered itself this would be useful to lower data transfer speeds. So even an inefficient database protocol here would have benefits?
Or is this the type of thing that is so bad that it shouldn’t be offered at all?
The database would have to iterate through all objects and do the check one-by-one. In principle it should be possible and feel free to do a feature request, but I don’t think that it would be added soonish (although that is not my decision)
Problem: How can I add data if it might exist already (python client throws error). so i can’t add without paginating… I feel like this is new, is this a database or client change? ie: throwing error on existing data.
Workaround: Delete all data matching filter and re-upload in batch.
I can’t paginate, but at least it can be searched…
How can I add data if it might exist already (python client throws error).
The python client (or rather weaviate) only throws an error if there is an object with the same UUID present (for single object inserts, for batch inserts it overwrites existing obejcts). You can easily check with collection.data.exists(UUID) if an objec with that UUID is already present.
You can add as many objects with identical data as you want as long as the UUID is different.
What exactly are you trying to do? Maybe there is an easier way to structure your data?
@Dirk Oh, that’s a good point. I am generating uuids based off of unique data for the object (so yes, I can check the uuid! Good point!) - and in fact I don’t want collisions - so that makes sense.
I hadn’t thought of looping through individuals by generating the uuid for doing search as well - thanks! Obvious, but didn’t occur to me - so this works well enough for now.
No, both is done by weaviate, BUT the complete results list is kept in memory in Weaviate. Eg if you do limit=10k, offset=9k it will load 10k object into memory, and then return the last 1k back to you. To not overload weaviate this is capped, eg you cannot load more than 100k (I think, haven’t checked - there is an env var to change this number) into memory, eg you can never return more than 100k results this way.
This is correct. There is no other way to iterate through all objects with a filter
If you have less results thant the limit of objects weaviate will load into memory and don’t mind the extra memory consumption. Just keep in mind that you will need to make sure that you don’t hit that limit otherwise you will silently not get results