The search for a very large number of documents does not work

Hello everybody! I’m new to using Weaviate and have only recently started using it, starting with Verba.

I’ve installed it as

git clone https://github.com/weaviate/Verba
pip install -e .

I put my OpenAI key to an .env file, so Verba uses OpenAI.
I successfully launched it and started uploading documents on the topic necessary for my work. Each document is .txt, but inside it looks like this json describing the objects we would like to create a search for:

{
  "serial-number": "123456",
  "company-name": "Some name",
  "owners": [
    "owner 1",
    "owner 2",
    "owner 3"
  ],
  "statements": [
    {
      "code": "123",
      "description": "some description"
    }
  ],
  "status": 123,
  "date": "2024-02-09"
}

After that, I tested Verba and saw that it easily helps me find records for my queries. For example, if I search for ‘a company “Some name”’, it will show me this document above.
The problem has arisen since I uploaded a large number of documents (now there are ~ 700 thousand documents, but in our plan there should be ~4 million documents with the same structure).
Using the same query, Verba does not show information for every existing document. Even if I see a document on the Documents page, it can’t be found during my search query.
I tried to increase the QUERY_MAXIMUM_RESULTS to 1 million and it didn’t help. I changed the line Verba/goldenverba/verba_manager.py at main · weaviate/Verba · GitHub like this:
embedded_options=EmbeddedOptions(additional_env_vars={‘QUERY_MAXIMUM_RESULTS’: ‘1000000’})

Please guide me what can I do to overcome this problem?
Should I change my documents structure or any program code?
Thank you in advance very much!

Server Setup Information

  • Weaviate Server Version: 1.21.1
  • Deployment Method: embedded
  • Multi Node? no
  • Client Language and Version: Python 3.10

Any additional Information

Please tell me if any info is needed. I just followed this Verba guide Verba/README.md at main · weaviate/Verba · GitHub

verba_config.json generated like this:
{
“reader”: “SimpleReader”,
“chunker”: “TokenChunker”,
“embedder”: “MiniLMEmbedder”,
“retriever”: “WindowRetriever”,
“generator”: “GPT4Generator”
}

Hi! Welcome to our community! :hugs:

Do you see the same results when you query Weaviate directly, bypassing verba?

Bear in mind that Verba is a demo project, not a product. It was built to show off what Weaviate can do.

For example, it uses the Embedded Option for running, that is experimental and certainly not suited for this number of objects.

Thanks!