nearText operaion isn't work

chris31522 · December 12, 2024, 9:45am

i already have the text2vec and multi2vec modules, why the result shows dont have the nearText option when i use it to query

Mohamed_Shahin · December 12, 2024, 10:09am

Hello @chris31522

Welcome to our community! We’re glad to have you here.

The error you’re encountering suggests there may be an issue with the vectorizer configuration—either it isn’t being recognized, or the collection might have been created without specifying a vectorizer.

Could you please share the following details with me so I can help you further:

Your Weaviate version
The deployment details
The code you’re using for vectorizer configuration & Creating the schema

This will help me pinpoint where the issue might be coming from.

Best regards,
Mohamed Shahin
Weaviate Support Engineer

chris31522 · December 12, 2024, 10:46am

thanks for the help!

schema = {
    "classes": [
        {
            "class": "Article", # name of the class
            "description": "An Article class to store the article summary and its authors", # a description of what this class represents
            "properties": [ # class properties
                {
                    "name": "title",
                    "dataType": ["string"],
                    "description": "The title of the article", 
                },
                {
                    "name": "summary",
                    "dataType": ["text"],
                    "description": "The summary of the article",
                },
                {
                    "name": "wordCount",
                    "dataType": ["int"],
                    "description": "The number of words in the article's summary",
                },
                {
                    "name": "hasAuthors",
                    "dataType": ["Author"],
                    "description": "The authors this article has",
                },
                {
                    "name": "hasCategory",
                    "dataType": ["Category"],
                    "description": "The category of this article",
                }
            ]
        }, {
            # Write the Author class here
            "class": "Author", 
            "description": "An Author class to store the author's name and the articles who wrote", 
            "properties": [
                {
                    "name": "name",
                    "dataType": ["string"],
                    "description": "The name of the author", 
                },
                {
                    "name": "wroteArticles",
                    "dataType": ["Article"],
                    "description": "The articles this author has",
                }
            ]
        }, {
            # Write the Category class here
            "class":"Category",
            "description":"A Category class to store the category that article belongs to",
            "properties":[
                {
                    "name":"name",
                    "dataType":["string"],
                    "description":"the name of the category"
                }
            ]
        }

 ]
}

this is the schema and i download the news from the cnn.com as my data

import newspaper
import uuid
import json
from tqdm import tqdm

def get_articles_from_newspaper(
        news_url: str, 
        max_articles: int=100
    ) -> None:
    """
    Download and save newspaper articles as weaviate schemas.
    Parameters
    ----------
    newspaper_url : str
        Newspaper title.
    """
    
    objects = []
    
    # Build the actual newspaper    
    news_builder = newspaper.build(news_url, memoize_articles=False)
    
    if max_articles > news_builder.size():
        max_articles = news_builder.size()
    pbar = tqdm(total=max_articles)
    pbar.set_description(f"{news_url}")
    i = 0
    while len(objects) < max_articles and i < news_builder.size():
        article = news_builder.articles[i]
        try:
            article.download()
            article.parse()
            article.nlp()

            if (article.title != '' and \
                article.title is not None and \
                article.summary != '' and \
                article.summary is not None and\
                article.authors):

                # create an UUID for the article using its URL
                article_id = uuid.uuid3(uuid.NAMESPACE_DNS, article.url)

                # create the object
                objects.append({
                    'id': str(article_id),
                    'title': article.title,
                    'summary': article.summary,
                    'authors': article.authors,
                    'word_count': len(article.summary.split())
                })
                
                pbar.update(1)

        except:
            # something went wrong with getting the article, ignore it
            pass
        i += 1
    pbar.close()
    return objects

data = []
data += get_articles_from_newspaper('http://cnn.com')

and then i upload my data

from weaviate.batch import Batch # for the typing purposes
from weaviate.util import generate_uuid5


def add_article(batch: Batch, article_data: dict) -> str:
    
    article_object = {
        'title': article_data['title'],
        'wordCount': article_data['word_count'],
        'summary': article_data['summary'].replace('\n', '') # remove newline character
    }
    article_id = article_data['id']
    
    # add article to the batch
    batch.add_data_object( 
        data_object=article_object,
        class_name='Article',
        uuid=article_id
    )
    
    return article_id

def add_author(batch: Batch, author_name: str) -> str:
    
    author_object = {'name': author_name}

    # generate an UUID for the Author
    author_id = generate_uuid5(author_name)
    
    # add author to the batch
    # EXERCISE: call here the batch.add_data_object function to add the author to the batch
    batch.add_data_object( 
        data_object=author_object,
        class_name='Author',
        uuid=author_id
    )
    
    return author_id

def add_references(batch: Batch, article_id: str, author_id: str)-> None:
    # add references to the batch
    ## Author -> Article
    batch.add_reference(
        from_object_uuid=author_id,
        from_object_class_name='Author',
        from_property_name='wroteArticles',
        to_object_uuid=article_id
    )
    
    ## Article -> Author 
    # EXERCISE: call here the batch.add_reference function to add the article->author reference
    batch.add_reference(
        from_object_uuid=article_id,
        from_object_class_name='Article',
        from_property_name='hasAuthors',
        to_object_uuid=author_id
    )
client.batch.configure(batch_size=50, dynamic=True, callback=None)
with client.batch as batch:

    for i in data:

        # add article to the batch
        article_id = add_article(batch, i)

        for author in i['authors']:

            # add author to the batch
            author_id = add_author(batch, author)

            # add cross references to the batch
            add_references(batch, article_id=article_id, author_id=author_id)

Mohamed_Shahin · December 12, 2024, 12:10pm

Hi @chris31522,

Thank you so much for sharing this information — it’s really helpful! I’ve taken a look, and I can see that the issue is to the fact that a vectorizer hasn’t been specified in your schema ‘vectorizer’.

Here’s an example of how you can add a vectorizer - in the deprecated client:

class_obj = {
“class”: “Article”,
“properties”: [
{
“name”: “title”,
“dataType”: [“text”],
},
],
“vectorizer”: “text2vec-openai” # This can be any vectorizer of your choice
client.schema.create_class(class_obj)
}

Additionally, you’re using an old version of the client / deprecated, which is making things a bit more complicated for you. I highly recommend upgrading to Python Client v4. This version offers friendly syntax and better performance, making it much easier to work with.

Here is how easy you can specify vectorizer in V4:

Here’s what I recommend for your use case:

Remove the current schema.
Upgrade your Weaviate Client locally by running the following: pip install -U weaviate-client
Recreate the schema using the updated syntax (including the vectorizer) - see below:

from weaviate.classes.config import Configure, Property, DataType
client.collections.create(
“Article”,
vectorizer_config=Configure.Vectorizer.text2vec_openai(),
properties=[ # This part is optional, depending on your needs
Property(name=“title”, data_type=DataType.TEXT),
Property(name=“body”, data_type=DataType.TEXT),
]
)

As you can see, the new syntax is much more user-friendly and easier compared to the old version!

Ingest data using the batch method in v4 — you’ll find it much faster and easier.

One of the main benefits of using the v4 client is that uses a gRPC interface. It is based on HTTP/2 and Protocol Buffers, and is therefore very fast and efficient.

I hope this helps! If you have any questions or need further assistance, don’t hesitate to reach out — I’m here to help.

Best regards,
Mohamed Shahin
Weaviate Support Engineer

Mohamed_Shahin · December 12, 2024, 12:52pm

Hi @chris31522,

I was curious to know why you opted for the v3 client, as it’s an older version. Were you following a specific tutorial or guide? If so, I’d be happy to assist in updating it to the latest version (v4).

Best regards,
Mohamed Shahin
Weaviate Support

chris31522 · December 13, 2024, 6:59am

i tried your solution,but there is a new problem:(
i can’t upload my data successfully and the query result is empty! Before I change my code, it can be upload successfully…

Mohamed_Shahin · December 13, 2024, 7:39am

Hi @chris31522,

Could you please share the code you used to create the collection? and I’ll help you with that.

Best regards,
Mohamed Shahin,
Weaviate Support Engineer

chris31522 · December 13, 2024, 8:02am

it seems like the collection are created automatically,here is the detail of a collection

Mohamed_Shahin · December 13, 2024, 3:03pm

Hi @chris31522,

Could you please share the scripts you used to create the collection and ingest data? This will help me better understand the configuration and pinpoint the issue.

Additionally, could you provide the Sandbox URL you’re working with?

Best regards,
Mohamed Shahin
Weaviate Support

Topic		Replies	Views
Use nearVector? General	1	332	January 30, 2024
How to configure weaviate env General developer-experience	1	186	October 24, 2024
WeaviateQueryError when using weaviate.collections.collection.Collection.query.near_text() Support wcs , python	1	325	April 26, 2024
Multi2vec-clip without storing image Support	2	415	May 8, 2024
.near_text results are not satisfactory (distance scores too close) Support neartext	2	861	June 20, 2023

nearText operaion isn't work

Related topics