While working with CSVs and using langchain’s csv loader & recursive character text splitter, the retrieval qualities are very poor.
Few records from the CSV:
id,first_name,last_name,date_of_birth,ethnicity,gender,status,entry_academic_period,exclusion_type,act_composite,act_math,act_english,act_reading,sat_combined,sat_math,sat_verbal,sat_reading,hs_gpa,hs_city,hs_state,hs_zip,email,entry_age,ged,english_2nd_language,first_generation
111111,John,Doe,01/2000,Hispanic,M,FT,Fall 2008,2.71,Albuquerque,New Mexico,87112,jdoe@example.com,17.9,FALSE,FALSE,TRUE
111112,Jane,Smith,05/2001,Hispanic,F,TRANSFER,Fall 2006,3.73,New York,New York,10009,jsmith@example.com,18.1,FALSE,FALSE,TRUE
…
If I ask what is the date of birth of John Doe, retrieval for the John Doe entry is coming out towards the end (when sorted by certainty).
We tried decreasing certainty & improving the number of retrievals but it is not helping. What would be the right way to deal with this issue?