Query Across multiple classes

Ansh_Gaur · June 30, 2024, 2:16pm

I have a use case where I might have required data distributed across multiple classes and I want to use this data from multiple classes in a RAG chain to generate answers for a query .

Assuming I have a limit of 8k Tokens for the LLM model used in RAG chain what are some good ways to get the top tokens combined from these classes (top k tokens across the classes and not individually).

I know of one methods I found :-

def fetch_and_combine_results(query: str, classes: list, per_class_limit: int = 4096) -> list:
 combined_results = [ ]

 for class_name in classes:
    response = (
        client.query
        .get(class_name, ["link", "scraped_text"])
        .with_hybrid(query, alpha=0.8)
        .with_limit(per_class_limit)
        .with_additional(["distance", "id"])
        .do()
    )

    if response and response['data']['Get'][class_name]:
        class_results = response['data']['Get'][class_name]
        combined_results.extend(class_results)

return combined_results



def sort_and_limit_results(results: list, token_limit: int = 4096) -> list:

results.sort(key=lambda x: x['_additional']['distance'])

limited_results = []
total_tokens = 0

for result in results:
    text = result['scraped_text']
    tokens = Total_tokens(text)

    if total_tokens + tokens > token_limit:
        break

    limited_results.append(result)
    total_tokens += tokens

return limited_results

Are there any other better ways to do this ?
Thank you for your help!!

DudaNogueira · July 1, 2024, 7:48pm

hi @Ansh_Gaur !

Welcome to our community

Unless you use cross references (and maybe ref2vec?) and some modelling, you will not be able to query the two collections at once and get a single score/distance.

You will, off course, be able to get two+ separate queries, as you are doing.

Now, you need to keep in mind that, when doing a hybrid search, the fusion algorithm will kick in.

So you are not resorting the two queries results by cosine vector distance, but by a normalized score ranking from the first result on each query (the score of the first object in each query is 1), as explained in that blog post.

Maybe the distance (or doing a nearText with outcut instead of hybrid) can get you better different sorting. Or not hehehe

So something to explore

Also, I see you are using the v3 client. We strongly suggest using the python v4 client.

With that said… there is a way of getting the two results with only one http query, by using graphql raw queries

Check here for more on that:

Let me know if that helps

Topic		Replies	Views
How to look for answers by evaluating multiple properties of a class Support	4	595	June 12, 2023
How do I search multiple collections at once? General developer-experience , wcs	5	2006	May 7, 2024
Query Multiple Data Sets Support	4	214	May 5, 2025
Multi Vector Search in a single class Support integration , developer-experience	7	1330	July 14, 2024
WeaviateHybridSearchRetriever with WeaviateAsyncClient Support	2	172	September 8, 2024

Query Across multiple classes

Related topics