Create persisted database

if i wanted to store persisted data. is it possible with weaviate?

hi @AbhinavKasubojula !!

Welcome to our community! :hugs:

Sure thing!

How are you deploying Weaviate?

This information (along with others that we ask when you open the thread) will allow me to help you better.

Thanks!

Thanks @DudaNogueira, for reviewing my query.
here is my docker-compose yaml:

version: '3.4'
 
services:
  weaviate:
    image: semitechnologies/weaviate:1.28.1
    ports:
        - "8080:8080"
        - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
      ENABLE_MODULES: text2vec-ollama,generative-ollama  # Ensure both modules are enabled
      DEFAULT_VECTORIZER_MODULE: text2vec-ollama
      TEXT2VEC_OLLAMA_APIKEY: "http://ollama:11434"  # Use container name 'ollama' instead of localhost
      TEXT2VEC_OLLAMA_ENDPOINT: "http://ollama:11434"  # Use container name 'ollama' instead of localhost
      GENERATIVE_MODEL_APIKEY: ""  # Empty as you're not using this for now
      GENERATIVE_MODEL_ENDPOINT: "http://ollama:11434"  # Use container name 'ollama' instead of localhost
 
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
 
volumes:
  ollama_data:

we are using docker container to run the weaviate - client.

data = [{
"company_name" : "XXX",
"projects" : "1.Dam 42 rehabilitation Design, scope of work: Design, project value: 217000, 2.Dam44,45,48 and 50 rehabilitation design, scope of work: design, project value:962,000,3.Ajies & daguey rehab design, scope of work:design and assessments, project value:112,500,4.south Carolina dam assessment, scope of work: design and assessments, project value:87,500",
"engineers" : "Administrative:5, architect:1, CADD technician:4, civil engineer:4, construction inspector:10, electrical engineer:1, environmental engineer: 1, geotechnical engineer:3, gis specialist:1, hydrologist:1, mechanical engineer:2,project manager:4"
},
{
"company_name" : "YYY",
"projects" : "1.Cherokee nation roads department multiple task orders, scope of work:roadway, bridge,ROW,drainage,waterline, sewer, structural design, storm design, project value: 2,449794, 2.BIA A-E services, scope of work:field investigation, waterline, construction docs, SUE level B, project value:243931, 3.BIA A-E services, TO-3 Quinault detention, WA, scope of work:stormwater drainage, site design, waterline, sewer, project value:172,000" ,
"engineers" : "administrative:7, CADD technician:2, civil engineer:12, land surveyor:1, engineer intern:4, land survey intern:2, survey technician:5, right of way agent:3, field technician:3"
},
{
"company_name" : "AA Engineering, Inc.",
"projects" : "Automation; Controls; Instrumentation;  Educational Facilities; Classrooms;  Industrial; Manufacturing",
"engineers" : "Administravite:14, CADD Technician:14, construction inspector:1, cost engineer/estimator:1, electrical engineer:9, mechanical engineer:17"
},
{
"company_name" : "VV Consultants, Inc",
"projects" : "Airports; Terminals and Hangars; Freight; Bridges; commecal building; shopping; das(concrete arch); urba renewals; comunitydevelopment",
"engineers" : "Administravite:114, CADD Technician:31, construction inspector:1, cost engineer/estimator:1, electrical engineer:9, mechanical engineer:17,civil engineer:181,archaeologist:12,structual engineer:44"
}]

def CreateCollectionAndLoad():

    client = weaviate.connect_to_local()
    print(f"Client: {weaviate.__version__}, Server: client.get_meta().get('version')")
    collection_name1 = "name"

    client.collections.delete(collection_name1)
    client.collections.create(
            name=collection_name1,
            vectorizer_config=wvc.config.Configure.Vectorizer.none(),
            #vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_ollama(
            #   api_endpoint="http://host.docker.internal:11434",
            #  model="nomic-embed-text"
            
            generative_config=wvc.config.Configure.Generative.ollama(
                api_endpoint="http://host.docker.internal:11434",  
                model="llama3.2"
            )
        )
    collection = client.collections.get("name")
    with collection.batch.dynamic() as batch:
        for item in data:
            emb = compute_embeddings(item["projects"]).tolist()
            batch.add_object({
                "company_name":item["company_name"],
                "projects":item["projects"],
                "engineers":item["engineers"]
            },
            vector=emb)


CreateCollectionAndLoad()

Hi!

Where did you get this environment variable? It doesn’t exist. :thinking:

Note that you are not defining any vectorizer:

vectorizer_config=wvc.config.Configure.Vectorizer.none(),

it must be:

vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_ollama(),

Apart from that, you have a mounted volume, so your data should persist.

Can you send the exact steps you are doing to spin it up and down?

Let me know if that helps.

Thanks!

Hi @DudaNogueira ,
Thanks for pointing that out!
Here’s what I’m doing to spin it up and down:

  • To start: docker-compose up -d
  • To stop: docker-compose down
    Let me know if I should be doing anything differently or if you have any suggestions.

if I change
vectorizer_config=wvc.config.Configure.Vectorizer.none(),

to: vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_ollama(),

I’m unable to load data in collection,

client = weaviate.connect_to_local(
    port=8080,
    grpc_port=50051,
    additional_config=AdditionalConfig(
        timeout=Timeout(init=30, query=60, insert=120)  # Values in seconds
    )
)
print(f"Client: {weaviate.__version__}, Server: client.get_meta().get('version')")
collection_name = "Notices"
client.collections.create(
        name=collection_name,
        vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_ollama(),
        generative_config=wvc.config.Configure.Generative.ollama(
            api_endpoint="http://host.docker.internal:11434",  
            model="llama3.2"
        )
    )
collection = client.collections.get(collection_name)
with collection.batch.dynamic() as batch:
    for item in data:
        emb = compute_embeddings(item["engineers"]).tolist()
        batch.add_object({
            "company_name":item["company_name"],
            "engineers":item["engineers"]
        },
        vector=emb)
for i in collection.iterator():
    print(i)

collection = client.collections.get(collection_name)
print(len(list(collection.iterator())))
for i in collection.iterator():
print(i.properties)

client.close()

Can you check this:

Basically, adding a way to check if there was any issues in the batch:
this part, outside of the with context

failed_objects = collection.batch.failed_objects
if failed_objects:
    print(f"Number of failed imports: {len(failed_objects)}")
    print(f"First failed object: {failed_objects[0]}")

Could you please help me resolve the issue with data persistence?

Can you try this one?

version: '3.4'
 
services:
  weaviate:
    image: semitechnologies/weaviate:1.28.1
    volumes:
        - weaviate_data:/var/lib/weaviate
    ports:
        - "8080:8080"
        - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
      ENABLE_MODULES: text2vec-ollama,generative-ollama  # Ensure both modules are enabled
      DEFAULT_VECTORIZER_MODULE: text2vec-ollama
      TEXT2VEC_OLLAMA_APIKEY: "http://ollama:11434"  # Use container name 'ollama' instead of localhost
      TEXT2VEC_OLLAMA_ENDPOINT: "http://ollama:11434"  # Use container name 'ollama' instead of localhost
      GENERATIVE_MODEL_APIKEY: ""  # Empty as you're not using this for now
      GENERATIVE_MODEL_ENDPOINT: "http://ollama:11434"  # Use container name 'ollama' instead of localhost
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
 
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
 
volumes:
  ollama_data:
  weaviate_data:

Check here for more information on using Weaviate with docker compose: