Poor retrieval quality while using CSV and XLSX files

While working with CSVs and using langchain’s csv loader & recursive character text splitter, the retrieval qualities are very poor.

Few records from the CSV:

111111,John,Doe,01/2000,Hispanic,M,FT,Fall 2008,2.71,Albuquerque,New Mexico,87112,jdoe@example.com,17.9,FALSE,FALSE,TRUE

111112,Jane,Smith,05/2001,Hispanic,F,TRANSFER,Fall 2006,3.73,New York,New York,10009,jsmith@example.com,18.1,FALSE,FALSE,TRUE

If I ask what is the date of birth of John Doe, retrieval for the John Doe entry is coming out towards the end (when sorted by certainty).

We tried decreasing certainty & improving the number of retrievals but it is not helping. What would be the right way to deal with this issue?


I’m also facing a similar thing with CSV data. any help would be appreciated.


I believe the issue here is how you are parsing your objects.

Considering that this is an object:

111111,John,Doe,01/2000,Hispanic,M,FT,Fall 2008,2.71,Albuquerque,New Mexico,87112,jdoe@example.com,17.9,FALSE,FALSE,TRUE

There is no way that it will know that 01/2000 is a birth date. You will need to add the head of that dataset also into the content so it has a chance to understand it better.