Weaviate with OpenAi

starskiin3d · August 16, 2023, 10:03am

Hello,

I was wondering if anyone has ever done the following, and is Weaviate good solution for this.

Connect to OpenAi.
Feed Weaviate with different documents (text only no images).
Have server run on localhost via docker or any other means.
Have users ask questions via http-client and OpenAI could work with Weaviate to get faster and more accurate response.

If so is there any tutorial/repo where someone might point me to.

I was using GitHub - mayooear/gpt4-pdf-chatbot-langchain: GPT4 & LangChain Chatbot for large PDF docs however it has Pinecone integration, I would like to use Weaviate since it supports running on localhost.

Thanks in advance.

DudaNogueira · August 16, 2023, 6:53pm

Hi @starskiin3d ! Welcome to our community

Weaviate is ~~a great~~ the best solution for this.

Considering the Github repository you mentioned, as it’s using Langchain, you can ingest the PDFs into weaviate by changing the Vectorstore, like so:

At this line

      const client = weaviate.client({
        scheme: process.env.WEAVIATE_SCHEME || "http",
        host: process.env.WEAVIATE_HOST || "localhost:8080"
        // apiKey: new (weaviate as any).ApiKey(
        //   process.env.WEAVIATE_API_KEY || "default"
        // ),
      });
      await WeaviateStore.fromDocuments(docs, embeddings, {client: client, indexName: "Document", textKey:"text"})

For that you will have to run Weaviate locally, using docker. Here is a nice docker configurator tool that will guide you into creating the perfect docker-compose.yaml:

that should be enough to store your embeddings into Weaviate. You will also need to change the querying part to adapt it.

Checkout this recipe notebook on how to use generative search with Open AI:

Also, we have a set of great examples in:

Let me know if that helps. Thanks!

starskiin3d · August 17, 2023, 10:07am

Hi,

First of all thanks for the fast reply, it seems I have chosen the right software with you guys.

Following up what you wrote:

Managed to get docker locally, can’t believe it was that easy.
Changed the line where you pointed out but first i had to import weaviate from:

import weaviate from ‘weaviate-ts-client’

After I added suggested code i get following error: Cannot find name ‘WeaviateStore’

   const client = weaviate.client({
      scheme: process.env.WEAVIATE_SCHEME || "http",
      host: process.env.WEAVIATE_HOST || "localhost:8080"
      // apiKey: new (weaviate as any).ApiKey(
      //   process.env.WEAVIATE_API_KEY || "default"
      // ),
    });

    /*create and store the embeddings in the vectorStore*/
    const embeddings = new OpenAIEmbeddings();


    await WeaviateStore.fromDocuments(docs, embeddings, {client: client, indexName: "Document", textKey:"text"})

Also thanks for the provided links, I will go through them and try to find info i need.

Best of luck.

DudaNogueira · August 17, 2023, 10:24am

Oh, sorry.

Forgot to mention tha you will need to import the WeaviateStore.

Here is the Langchain JS docs, where you can find more info:

This is the import you need:

import { WeaviateStore } from "langchain/vectorstores/weaviate";

Glad to know you are enjoying Weaviate! We are here to help!

Thanks!

starskiin3d · August 17, 2023, 11:27am

Now that is Amazing !!!

Managed to read from documents and push do my local docker :).

Once i set wcs|query to

{
  Get{
  Document{
     text
     source
  }
  }
}

I got source document and text that it read nicely, its just that not entire text is shown in query, unsure since documents are really long does it just shows me some part and hides rest or there is command to get full.

I got two last questions before i give you 5/5 stars

Would this be a proper way to get text from ingested files, have to use async function since await will give await error unable to use it at top level and i’m new in typescript.


async function Response(){
    const response = await client.graphql
    .get()
    .withClassName('Document')
    .withFields('text' 'sources')
    .do();
  
  console.log(response['data']['Get']['Document']);
}

Response()

- await WeaviateStore.fromDocuments(docs, embeddings, {client: client, indexName: "Document", textKey:"text"}) - Will ingest text into my local docker.

I will use following code in wcs|query to get source file and text


{
  Get{
  Document{
     text
     source
  }
  }
}

Text is not fully displayed in wcs query, I can see it has gone through documents, but docs are really long and I can see only about 10-13 lines of text displayed per ingested document, while there is a lot lot more of text in those docs.
```
                            **To sum things up** 
```

Experience with forum: Immense help from your team, I am already more than happy with the way you assisted me.

Final questions:

Is code query good.
Why does query display only partial text from files rather than whole.

Once more thanks for assistance and have a nice day.

starskiin3d · August 18, 2023, 10:26am

I will answer this myself, most of stuff i asked is in links you provided. Made it all work. Thanks again.

DudaNogueira · August 18, 2023, 2:47pm

Oh, Glad you were able to solve it!!

I missed this last message and just go to it now.

Let’s keep in touch! We are also very active on our Slack, so let us know thru here or at Slack if you need any further assistance!

Thanks!

Topic		Replies	Views
How do I modify this script to create a weaviate vectorstore for multiple documents instead of one? General	1	543	November 1, 2023
Trying to resolve the error ,here are my code and error Support technical	2	381	January 29, 2025
How to access/search data ingested through Weaviate client in langchain / langchain-weaviate? Support wcs , python	7	928	July 15, 2024
Weaviate-python-client or langhchain for using weaviate db Support integration	4	1100	September 6, 2023
Text2vec-openai Batch API Support integration , wcs , python	1	378	July 8, 2024

Weaviate with OpenAi

Related topics