Open ai llm not able to understand field's meaning

bhupendra_singh · February 19, 2024, 3:27pm

I have created class with following properties. And ingestion around 13K object.

# ===== define collection =====
class_obj = {
    "class": "ProductCatalogNumeric",
    "vectorizer": "text2vec-openai",  # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
    "moduleConfig": {
        "text2vec-openai": {},
        "generative-openai": {}  # Ensure the `generative-openai` module is used for generative queries
    },
    "properties": [
        {
            "name": "title",
            "dataType": ["text"],
            "description": "Name of product that we are selling in our marketplace"
        },
        {
            "name": "jpin",
            "dataType": ["text"],
            "description": "Jpin represent unique identifier for every product"
        },
        {
            "name": "price",
            "dataType": ["number"],
            "description": "The selling price of the product"
        },
        {
            "name": "margin",
            "dataType": ["number"],
            "description": "Percenatge Margin we earn after saling of the product"
        },
    ],
}

client.schema.create_class(class_obj)

But when i am asking below question , its giving data/field not present ?

this is my question:

response = (
    client.query
    .get("ProductCatalogNumeric", ['title','jpin','price','margin'])
    .with_near_text({"concepts": ["basmati rice"]})
    .with_generate(grouped_task="Tell me average price of all basmati rice products. And show calculation ? ")
    .with_limit(100)
    .do()
)

print(json.dumps(response, indent=4))

And this is open ai llm response

{
    "data": {
        "Get": {
            "ProductCatalogNumeric": [
                {
                    "_additional": {
                        "generate": {
                            "error": null,
                            "groupedResult": "To calculate the average price of all basmati rice products, we need to first gather the prices of all the products listed. Since the prices are not provided in the given data, we cannot calculate the average price."
                        }
                    },
                    "jpin": "JPIN-1304444084",
                    "margin": 0.064957,
                    "price": 121.03000000000002,
                    "title": "Daawat Devaaya Basmati Rice, 1Kg Pack"
                },
                {
                    "_additional": {
                        "generate": null
                    },
                    "jpin": "JPIN-1304511345",
                    "margin": 0.0241038235,
                    "price": 2580,
                    "title": "Gauri Rozana Steamed Basmati Rice, 30Kg Bag"
                },
                {
                    "_additional": {
                        "generate": null
                    },
                    "jpin": "JPIN-1304351614",
                    "margin": 0.0638388,
                    "price": 113.20000000000002,
                    "title": "Daawat Heritage Platinum Basmati Rice, Classic, 1Kg Pack"
                },
                {
                    "_additional": {
                        "generate": null
                    },
                    "jpin": "JPIN-1304472302",
                    "margin": 0.063287,
                    "price": 182.56,
                    "title": "Daawat Traditional Basmati Rice, 1Kg Pack"
                },
}

sebawita · February 19, 2024, 7:14pm

Hi @bhupendra_singh, welcome to the Weaviate forum.
Have you tried using prompt_properties?
You can find an example in the docs: generative search

response = (
    client.query
    .get("ProductCatalogNumeric", ['title','jpin','price','margin'])
    .with_near_text({"concepts": ["basmati rice"]})
    .with_generate(
        grouped_task="Tell me average price of all basmati rice products. And show calculation ? ",
    grouped_properties=["title", "price"] # <== the list of properties to pass to the LLM
    )
    .with_limit(100)
    .do()
)

Btw. I haven’t tested if group_task also accepts number properties.
I hope that is not causing the issue.

Side note - on limit

Btw. if you set limit to 100, then the query returns the 100 nearest results.

What I mean by that? You ask.
If in your database you have 8 objects related to rice, then vector search will return first the 8 rice objects, then it will continue to look for any other related objects. i.e. the 100th object could be pasta or something, that could considered similar because that is also food/carb product.

You could use autocut (see an example of autocut in our docs), which returns a group of similar objects, and if there is a drop in quality of results, then it cuts off the rest. This way you have a better chance for Weaviate to only use the most relevant group of objects for your generative task.

You can swap

.with_limit(100)

For

.with_autocut(1) # returns the first group of similar objects

I hope this helps.

sebawita · February 20, 2024, 1:46pm

Hi @bhupendra_singh,
I’ve just checked with the team. Generative modules currently only use text properties for generative tasks (both single_prompt and group_task).

A workaround would be to convert the price property to a string.

        {
            "name": "price",
            "dataType": ["text"],
            "description": "The selling price of the product",
        },

Btw. I would be careful with relying on LLMs for calculations, as sometimes they might hallucinate and you might get inaccurate calculations

Topic		Replies	Views
Change OpenAI Generative Model for Existing Classes Support	3	518	December 10, 2024
Use of generative-openai moduleConfig Support neartext	5	361	April 8, 2024
Facing issue on loading json data in weaviate Support	1	405	January 19, 2024
How to Use different embedding than OpenAI Support	1	329	August 16, 2024
Searching on two different classes with same objects but different vectorizers Support	2	682	July 13, 2023

Open ai llm not able to understand field's meaning

Side note - on limit

Related topics