This is my scenario:
- The client has an Azure SQL database with a profiles table with demographic information.
- We created an Azure Cognitive Search and indexed that database, we concatenated all fields into one called content. Because according to the documentation everything needs to be in one field.
Azure Cognitive Search | 🦜️🔗 Langchain
Now we are creating a chatbot with LangChain where we can ask questions like:
Who is John Smith?, How old is Jane Smith, Who likes gardening.
The way I found is here:
Basically first cognitive search is queried and some documents are returned, then those documents are saved as vectors in ChromaDB, and then ChromaDB is queried and the results are received in plain english with langchain and openAI.
However ChromaDB is very slow. and it takes about 50 seconds in this step.
so I wanted to try weaviate, but then I get very weird errors like:
[ERROR] Batch ConnectionError Exception occurred! Retrying in 2s. [1/3]
{'error': [{'message': "'@search.score' is not a valid property name. Property names in Weaviate are restricted to valid GraphQL names, which must be “/[_A-Za-z][_0-9A-Za-z]*/”., no such prop with name '@search.score' found in class 'LangChain_df32d6b6d10c4bb895db75f88aaabd75' in the schema. Check your schema files for which properties in this class are available"}]}
My code is as this:
@timer
def from_documentsWeaviate(docs, embeddings):
return Weaviate.from_documents(docs, embeddings, weaviate_url=WEAVIATE_URL, by_text=False)
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
embeddings = OpenAIEmbeddings(deployment=OPENAI_EMBEDDING_DEPLOYMENT_NAME, model=OPENAI_EMBEDDING_MODEL_NAME, chunk_size=1)
user_input = get_text()
retriever = AzureCognitiveSearchRetriever(content_key="content")
llm = AzureChatOpenAI(
openai_api_base=OPENAI_DEPLOYMENT_ENDPOINT,
openai_api_version=OPENAI_API_VERSION ,
deployment_name=OPENAI_DEPLOYMENT_NAME,
openai_api_key=OPENAI_API_KEY,
openai_api_type = OPENAI_API_TYPE ,
model_name=OPENAI_MODEL_NAME,
temperature=0)
docs = get_relevant_documents(retriever, user_input)
#vectorstore = from_documentsChromaDb(docs=docs, embedding=embeddings)
vectorstore = from_documentsWeaviate(docs, embeddings)
I wonder if I should first index all rows from the table and avoid the cognitive search part.?