How ingest pdf into weaviate and perform RAG

I’m trying to ingest data into weavite, a mix of text data and other formats like pdf that I convert to text batches using “unstructured”.
I’m basically following what reported at ingesting PDF but I suppose I’m missing something and/or I’m doing something wrong.
If I query data as follow:

response = coll.query.bm25(
        query="metal oxide",
        limit=2,   
        return_metadata=MetadataQuery(distance=True)
    )

I get a result, while using:

  res = coll.generate.near_text(
        query="metal oxide",
        limit=2,   
        # return_metadata=MetadataQuery(distance=True),
        single_prompt="Summarize {coll_name}, use a maximum of 20 words."
    )

I get nothing.
I would like to perform semantic search + RAG on property “files” (DataType.TEXT_ARRAY) containing batch text extracted using partition_pdf from usntructured.
Schema is the following:

hi @SergioEanX !!

Welcome to our community :hugs:

Check this recipe as it shows how to use Langchain to ingest some pdfs:

https://github.com/weaviate/recipes/tree/main/integrations/langchain/loading-data

While you may not use Langchain entirely, it will give you some hints on how to use the unstructured. That recipe specifically doesn’t use unstructured, but there is a lot of docs covering this, like here:

Also, you can not only load a single pdf, but an entire folder of contents, like in here:

Let me know if this helps.

Also, check this Academy course we have on chunking, as this is not a “one size fits all”, and some changes can be done for each use case to improve the overall quality of your results:

Let me know if this helps!

Thanks!