I came up with this proposal to create an AI module for the Drupal CMS: https://youtu.be/qgKiXYP8UBs?si=0o0pUdjDfA1Rsq1y
It was not funded, but since I had been working on it for the previous several months, I decided to go ahead and complete it myself.
The first implementation of it was a California real estate law knowledgebase consisting of all the law, regulations and publications available on the California Department of Real Estate website. In addition to that, hundreds of bulletins, articles, blog posts and case law going back 20 years. The beta site is here: https://ca.RealEstateBooksAI.com
Initially, the plan was to use PineCone. I switched to Weaviate because of the cost. Turns out, it was the best decision I could have made. In the months spent developing and working with RAG chatbots, I have discovered that there are many elements that go into generating a good AI response: prompt, temperature, token limits, source documents, etc… But, none of this is as important as the Large Language Model used to process your queries. However, if your initial query does not retrieve the correct context documents, then it doesn’t matter how powerful your LLM is – it can’t respond to what it can’t see. So, in the world of AI embeddings, your vector store retrieval system is as important as your large language model.
Now, in my personal experience, I have had all sorts of problems getting OpenAI’s gpt-3.5 to comprehend complex text as opposed to gpt-4 and Anthropic’s Claude-2. But, regardless of which model I use, my Weaviate neartext queries utilizing the text2vec transformer seem to always bring back the best context documents. And, the addition of the hybrid query gives me even further success.
So, I’m a big Weaviate fan. I came because of the cost, but I stay because of the great results and amazing feature set – which seems to be expanding every day.