While using llama_index to load a PDF with llama-index’s PDFReader() I cant create a VectorStoreIndex.from_documents as I get an error:
AttributeError Traceback (most recent call last)
Input In [35], in <cell line: 15>()
12 storage_context = StorageContext.from_defaults(vector_store = vector_store)
14 # set up the index
---> 15 index = VectorStoreIndex.from_documents(nodes, storage_context = storage_context)
File ~\anaconda3\lib\site-packages\llama_index\indices\base.py:97, in BaseIndex.from_documents(cls, documents, storage_context, service_context, show_progress, **kwargs)
95 with service_context.callback_manager.as_trace("index_construction"):
96 for doc in documents:
---> 97 docstore.set_document_hash(doc.get_doc_id(), doc.hash)
98 nodes = service_context.node_parser.get_nodes_from_documents(
99 documents, show_progress=show_progress
100 )
102 return cls(
103 nodes=nodes,
104 storage_context=storage_context,
(...)
107 **kwargs,
108 )
AttributeError: 'TextNode' object has no attribute 'get_doc_id'
I think it has to do with the nodes generated by the PDFReader may not have this info? not sure. Any help would be great