How to use local QnA transformers?

For the ENABLE_MODULES: text2vec-transformers,qna-transformers,ner-transformers,sum-transformers,text-spellcheck,ref2vec-centroid,reranker-transformers, how to use local qna-transformers?

I tried using below as highlighted in QuickStart tutorial:

   coll_create = client.collections.create(
        name="forums_test_qna",
        vectorizer_config = wvc.config.Configure.Vectorizer.text2vec_transformers()
        generative_config = wvc.config.Configure.Generative.qna_transformers()

Vectorizer.text2vec_transformers() worked fine. However, for Generative.qna_transformers() got this error:

$ python  weaviate_data_ingest_qna.py 
Traceback (most recent call last):
  File "weaviate_data_ingest_qna.py", line 35, in <module>
    generative_config = wvc.config.Configure.Generative.qna_transformers(),
AttributeError: type object '_Generative' has no attribute 'qna_transformers'

Official doc - Question Answering - transfomers | Weaviate - Vector Database seems to missing the info where/how to configure local qna-transformers ?

Where I’m doing wrong? Pls help.

Thanks.

hi @curious !

Indeed there is no support for this module in the python v4 for now, both for creating the collection and running the query.

For now, you can create the collection using:

client.collections.create_from_dict()

And for running the query, using as stated in:

Hi, @DudaNogueira , Thank you for your reply.

I tried using client.collections.create_from_dict() in different ways, but could not succeed.

In every try, “generative_config” is shown as null, so I doubt whether qna_transformer is configured properly for collection or not.

Kindly guide and help. Thanks.

Here are my tries:

collection_dict = {
    "class": "forums_qna_1",  
    "vectorizer": "text2vec-transformers",  
    "moduleConfig": {
        "text2vec-transformers": {
            "vectorizeClassName": True
        },
        "qna-transformers": {
            "model": "deepset-bert-large-uncased-whole-word-masking-squad2",
            "use_cuda": False
        }
    },
    "properties": [
        {"name": "post_title", "dataType": ["text"]},
        {"name": "post_content", "dataType": ["text"]},
    ]
}

coll_create = client.collections.create_from_dict(collection_dict)
pp.pprint(f"\nCollection Created: {coll_create}")
collection_dict = {
    "class": "forums_qna_2",  
    "vectorizerConfig": {
        "module": "text2vec-transformers",
        "config": {  
            "model": "text2vec-transformers"  
        }
    },
    "generativeConfig": {
        "module": "generative-qna",
        "config": { 
            "model": "qna-transformers"  
        }
    },
    "properties": [
         ..... 
    ]
}

coll_create = client.collections.create_from_dict(collection_dict)
collection_dict = {
    "class": "forums_qna_3",
    "vectorizerConfig": {
        "module": "text2vec-transformers",
        "moduleConfig": {
            "name": "text2vec-transformers"
        }
    },
    "generativeConfig": {
        "module": "generative-qna",
        "moduleConfig": {
            "name": "qna-transformers"
        }
    },
    "properties": [
         ..... 
    ]
}

coll_create = client.collections.create_from_dict(collection_dict)

Got same RESULT - generative_config": null for all tries:

('\n'
 'Collection Created: <weaviate.Collection config={\n'
 '  "name": "Forums_qna_1",\n'
 '  "description": null,\n'
 '  "generative_config": null,\n'
 '  "inverted_index_config": {\n'
.
.

hi @curious !

The issue is that as it’s not supported yet in the client, it probably is not being parsed correctly back to the client.

I have created the below class your code, and when I accessed the endpoint:

http://localhost:8080/v1/schema

I got this content:

{
    "classes": [
        {
            "class": "Forums_qna_1",
            "invertedIndexConfig": {
                "bm25": {
                    "b": 0.75,
                    "k1": 1.2
                },
                "cleanupIntervalSeconds": 60,
                "stopwords": {
                    "additions": null,
                    "preset": "en",
                    "removals": null
                }
            },
            "moduleConfig": {
                "qna-transformers": {
                    "model": "deepset-bert-large-uncased-whole-word-masking-squad2",
                    "use_cuda": false
                },
                "text2vec-transformers": {
                    "poolingStrategy": "masked_mean",
                    "vectorizeClassName": true
                }
            },
            "multiTenancyConfig": {
                "enabled": false
            },
            "properties": [
                {
                    "dataType": [
                        "text"
                    ],
                    "indexFilterable": true,
                    "indexSearchable": true,
                    "moduleConfig": {
                        "text2vec-transformers": {
                            "skip": false,
                            "vectorizePropertyName": false
                        }
                    },
                    "name": "post_title",
                    "tokenization": "word"
                },
                {
                    "dataType": [
                        "text"
                    ],
                    "indexFilterable": true,
                    "indexSearchable": true,
                    "moduleConfig": {
                        "text2vec-transformers": {
                            "skip": false,
                            "vectorizePropertyName": false
                        }
                    },
                    "name": "post_content",
                    "tokenization": "word"
                }
            ],
            "replicationConfig": {
                "factor": 1
            },
            "shardingConfig": {
                "virtualPerPhysical": 128,
                "desiredCount": 1,
                "actualCount": 1,
                "desiredVirtualCount": 128,
                "actualVirtualCount": 128,
                "key": "_id",
                "strategy": "hash",
                "function": "murmur3"
            },
            "vectorIndexConfig": {
                "skip": false,
                "cleanupIntervalSeconds": 300,
                "maxConnections": 64,
                "efConstruction": 128,
                "ef": -1,
                "dynamicEfMin": 100,
                "dynamicEfMax": 500,
                "dynamicEfFactor": 8,
                "vectorCacheMaxObjects": 1000000000000,
                "flatSearchCutoff": 40000,
                "distance": "cosine",
                "pq": {
                    "enabled": false,
                    "bitCompression": false,
                    "segments": 0,
                    "centroids": 256,
                    "trainingLimit": 100000,
                    "encoder": {
                        "type": "kmeans",
                        "distribution": "log-normal"
                    }
                },
                "bq": {
                    "enabled": false
                }
            },
            "vectorIndexType": "hnsw",
            "vectorizer": "text2vec-transformers"
        }
    ]
}

Let me know if this helps. We should see better support for this less popular modules soon.

Thanks!

Thank you @DudaNogueira for your help.

That worked for us. We are able to ingest and query data using text2vec as well as qna module. Though the result of qna module isn’t that generative kind like the models such as chatgpt :stuck_out_tongue: It is simply giving the snippet of answer if found in the content.

Reason for us to go for this local qna module is to not send large data over api which would incur additional expenses.

Thank you once again for your help so far.

1 Like