I have a object cluster with a schema where the title is vectorized:
"name" => "title",
"dataType" => ["text"],
"description" => "The title of the document",
"moduleConfig" => [
"text2vec-openai" => [ "skip" => false, "vectorizePropertyName" => true ]
However, if I initiate nearText or even hybrid queries where I ask to only return the document with a specific title, that document either doesn’t appear in the list of documents returned from Weaviate, or appear lower on list than expected. I know I could simply use the title field in a filtered search, but I am building a Q and A application where I won’t know when the question will be of this type. I need some strategies for addressing this, as that will look awful strange to have a document with a specific title that users can NOT search for using the title. Attached are examples.
Others can comment on this as well but the first modification that I think of is: instead of querying with “show only content with title equal to ‘Drupal AI SolrAI - CSS’” or “Retrieve documents with title ‘Drupal AI SolrAI - CSS’” try querying with just the title eg. “Drupal AI SolrAI - CSS” this should yield a vector that is closer to the intended document and thus give better results.
Let me know if this helps improve results, if not we can try other strategies.
Thanks. Using just the title yields even worse results. Now, I am thinking that it fails because most of the titles that do come up contain many of the same words, and it’s not really a “semantic” idea as opposed to key words signifying the content.
But, outside of filtering on the title itself, is there a way to do this? I mean, I kind of get it, but I also see the problems trying to explain to an end user why, if they put in the exact title to the exact document they are looking for, every other document but that one comes up in the search results. How is that better than keyword search?
This is the result after the document is actually found by Weaviate, but OpenAI still says it isn’t, even though the “title” is contained in the context document: