When running VectorStoreIndex.from_documents and pass nodes I get an error AttributeError: 'TextNode' object has no attribute 'get_doc_id'

While using llama_index to load a PDF with llama-index’s PDFReader() I cant create a VectorStoreIndex.from_documents as I get an error:

AttributeError                            Traceback (most recent call last)
Input In [35], in <cell line: 15>()
     12 storage_context = StorageContext.from_defaults(vector_store = vector_store)
     14 # set up the index
---> 15 index = VectorStoreIndex.from_documents(nodes, storage_context = storage_context)

File ~\anaconda3\lib\site-packages\llama_index\indices\base.py:97, in BaseIndex.from_documents(cls, documents, storage_context, service_context, show_progress, **kwargs)
     95 with service_context.callback_manager.as_trace("index_construction"):
     96     for doc in documents:
---> 97         docstore.set_document_hash(doc.get_doc_id(), doc.hash)
     98     nodes = service_context.node_parser.get_nodes_from_documents(
     99         documents, show_progress=show_progress
    100     )
    102     return cls(
    103         nodes=nodes,
    104         storage_context=storage_context,
   (...)
    107         **kwargs,
    108     )

AttributeError: 'TextNode' object has no attribute 'get_doc_id'

I think it has to do with the nodes generated by the PDFReader may not have this info? not sure. Any help would be great

I think I worked it out. As I have only one Document I didn’t need to use the Node Parser at all. Instead just feed in the Doc as is

1 Like

Hi @simondpalmer ! Welcome to our community :hugs:

THanks for sharing :slight_smile:

I encountered the same issue.

The issue is that you are passing nodes to the from_documents() method.

There are two solutions for this issue:

  1. You need to either pass documents to thefrom_documents() method as @simondpalmer has already shown, like this: index = VectorStoreIndex.from_documents(documents, storage_context = storage_context)
  2. Or you can pass nodes into VectorStoreIndex like this index = VectorStoreIndex(nodes, storage_context = storage_context)
2 Likes