When running VectorStoreIndex.from_documents and pass nodes I get an error AttributeError: 'TextNode' object has no attribute 'get_doc_id'

simondpalmer · October 2, 2023, 6:07am

While using llama_index to load a PDF with llama-index’s PDFReader() I cant create a VectorStoreIndex.from_documents as I get an error:

AttributeError                            Traceback (most recent call last)
Input In [35], in <cell line: 15>()
     12 storage_context = StorageContext.from_defaults(vector_store = vector_store)
     14 # set up the index
---> 15 index = VectorStoreIndex.from_documents(nodes, storage_context = storage_context)

File ~\anaconda3\lib\site-packages\llama_index\indices\base.py:97, in BaseIndex.from_documents(cls, documents, storage_context, service_context, show_progress, **kwargs)
     95 with service_context.callback_manager.as_trace("index_construction"):
     96     for doc in documents:
---> 97         docstore.set_document_hash(doc.get_doc_id(), doc.hash)
     98     nodes = service_context.node_parser.get_nodes_from_documents(
     99         documents, show_progress=show_progress
    100     )
    102     return cls(
    103         nodes=nodes,
    104         storage_context=storage_context,
   (...)
    107         **kwargs,
    108     )

AttributeError: 'TextNode' object has no attribute 'get_doc_id'

I think it has to do with the nodes generated by the PDFReader may not have this info? not sure. Any help would be great

simondpalmer · October 2, 2023, 6:04pm

I think I worked it out. As I have only one Document I didn’t need to use the Node Parser at all. Instead just feed in the Doc as is

DudaNogueira · October 3, 2023, 11:28am

Hi @simondpalmer ! Welcome to our community

THanks for sharing

iamleonie · February 14, 2024, 12:30pm

I encountered the same issue.

The issue is that you are passing nodes to the from_documents() method.

There are two solutions for this issue:

You need to either pass documents to thefrom_documents() method as @simondpalmer has already shown, like this: index = VectorStoreIndex.from_documents(documents, storage_context = storage_context)
Or you can pass nodes into VectorStoreIndex like this index = VectorStoreIndex(nodes, storage_context = storage_context)

Topic		Replies	Views
[Question] client.batch.failed_objects Support technical	1	712	July 30, 2024
[Question] client.batch.failed_objects or collection.batch.failed_objects for the failed objects. Support technical	1	387	July 29, 2024
Storing KnowledgeGraph Index and Vector Index via llama_index General	2	709	January 6, 2024
Querying on llama-index Weaviate Vector Store General	4	1812	March 9, 2025
Vector database Support	5	181	January 30, 2025

When running VectorStoreIndex.from_documents and pass nodes I get an error AttributeError: 'TextNode' object has no attribute 'get_doc_id'

Related topics