if i wanted to store persisted data. is it possible with weaviate?
hi @AbhinavKasubojula !!
Welcome to our community! ![]()
Sure thing!
How are you deploying Weaviate?
This information (along with others that we ask when you open the thread) will allow me to help you better.
Thanks!
Thanks @DudaNogueira, for reviewing my query.
here is my docker-compose yaml:
version: '3.4'
services:
weaviate:
image: semitechnologies/weaviate:1.28.1
ports:
- "8080:8080"
- "50051:50051"
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
ENABLE_MODULES: text2vec-ollama,generative-ollama # Ensure both modules are enabled
DEFAULT_VECTORIZER_MODULE: text2vec-ollama
TEXT2VEC_OLLAMA_APIKEY: "http://ollama:11434" # Use container name 'ollama' instead of localhost
TEXT2VEC_OLLAMA_ENDPOINT: "http://ollama:11434" # Use container name 'ollama' instead of localhost
GENERATIVE_MODEL_APIKEY: "" # Empty as you're not using this for now
GENERATIVE_MODEL_ENDPOINT: "http://ollama:11434" # Use container name 'ollama' instead of localhost
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
volumes:
ollama_data:
we are using docker container to run the weaviate - client.
data = [{
"company_name" : "XXX",
"projects" : "1.Dam 42 rehabilitation Design, scope of work: Design, project value: 217000, 2.Dam44,45,48 and 50 rehabilitation design, scope of work: design, project value:962,000,3.Ajies & daguey rehab design, scope of work:design and assessments, project value:112,500,4.south Carolina dam assessment, scope of work: design and assessments, project value:87,500",
"engineers" : "Administrative:5, architect:1, CADD technician:4, civil engineer:4, construction inspector:10, electrical engineer:1, environmental engineer: 1, geotechnical engineer:3, gis specialist:1, hydrologist:1, mechanical engineer:2,project manager:4"
},
{
"company_name" : "YYY",
"projects" : "1.Cherokee nation roads department multiple task orders, scope of work:roadway, bridge,ROW,drainage,waterline, sewer, structural design, storm design, project value: 2,449794, 2.BIA A-E services, scope of work:field investigation, waterline, construction docs, SUE level B, project value:243931, 3.BIA A-E services, TO-3 Quinault detention, WA, scope of work:stormwater drainage, site design, waterline, sewer, project value:172,000" ,
"engineers" : "administrative:7, CADD technician:2, civil engineer:12, land surveyor:1, engineer intern:4, land survey intern:2, survey technician:5, right of way agent:3, field technician:3"
},
{
"company_name" : "AA Engineering, Inc.",
"projects" : "Automation; Controls; Instrumentation; Educational Facilities; Classrooms; Industrial; Manufacturing",
"engineers" : "Administravite:14, CADD Technician:14, construction inspector:1, cost engineer/estimator:1, electrical engineer:9, mechanical engineer:17"
},
{
"company_name" : "VV Consultants, Inc",
"projects" : "Airports; Terminals and Hangars; Freight; Bridges; commecal building; shopping; das(concrete arch); urba renewals; comunitydevelopment",
"engineers" : "Administravite:114, CADD Technician:31, construction inspector:1, cost engineer/estimator:1, electrical engineer:9, mechanical engineer:17,civil engineer:181,archaeologist:12,structual engineer:44"
}]
def CreateCollectionAndLoad():
client = weaviate.connect_to_local()
print(f"Client: {weaviate.__version__}, Server: client.get_meta().get('version')")
collection_name1 = "name"
client.collections.delete(collection_name1)
client.collections.create(
name=collection_name1,
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
#vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_ollama(
# api_endpoint="http://host.docker.internal:11434",
# model="nomic-embed-text"
generative_config=wvc.config.Configure.Generative.ollama(
api_endpoint="http://host.docker.internal:11434",
model="llama3.2"
)
)
collection = client.collections.get("name")
with collection.batch.dynamic() as batch:
for item in data:
emb = compute_embeddings(item["projects"]).tolist()
batch.add_object({
"company_name":item["company_name"],
"projects":item["projects"],
"engineers":item["engineers"]
},
vector=emb)
CreateCollectionAndLoad()
Hi!
Where did you get this environment variable? It doesn’t exist. ![]()
Note that you are not defining any vectorizer:
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
it must be:
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_ollama(),
Apart from that, you have a mounted volume, so your data should persist.
Can you send the exact steps you are doing to spin it up and down?
Let me know if that helps.
Thanks!
Hi @DudaNogueira ,
Thanks for pointing that out!
Here’s what I’m doing to spin it up and down:
- To start:
docker-compose up -d - To stop:
docker-compose down
Let me know if I should be doing anything differently or if you have any suggestions.
if I change
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
to: vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_ollama(),
I’m unable to load data in collection,
client = weaviate.connect_to_local(
port=8080,
grpc_port=50051,
additional_config=AdditionalConfig(
timeout=Timeout(init=30, query=60, insert=120) # Values in seconds
)
)
print(f"Client: {weaviate.__version__}, Server: client.get_meta().get('version')")
collection_name = "Notices"
client.collections.create(
name=collection_name,
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_ollama(),
generative_config=wvc.config.Configure.Generative.ollama(
api_endpoint="http://host.docker.internal:11434",
model="llama3.2"
)
)
collection = client.collections.get(collection_name)
with collection.batch.dynamic() as batch:
for item in data:
emb = compute_embeddings(item["engineers"]).tolist()
batch.add_object({
"company_name":item["company_name"],
"engineers":item["engineers"]
},
vector=emb)
for i in collection.iterator():
print(i)
collection = client.collections.get(collection_name)
print(len(list(collection.iterator())))
for i in collection.iterator():
print(i.properties)
client.close()
Can you check this:
Basically, adding a way to check if there was any issues in the batch:
this part, outside of the with context
failed_objects = collection.batch.failed_objects
if failed_objects:
print(f"Number of failed imports: {len(failed_objects)}")
print(f"First failed object: {failed_objects[0]}")
Could you please help me resolve the issue with data persistence?
Can you try this one?
version: '3.4'
services:
weaviate:
image: semitechnologies/weaviate:1.28.1
volumes:
- weaviate_data:/var/lib/weaviate
ports:
- "8080:8080"
- "50051:50051"
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
ENABLE_MODULES: text2vec-ollama,generative-ollama # Ensure both modules are enabled
DEFAULT_VECTORIZER_MODULE: text2vec-ollama
TEXT2VEC_OLLAMA_APIKEY: "http://ollama:11434" # Use container name 'ollama' instead of localhost
TEXT2VEC_OLLAMA_ENDPOINT: "http://ollama:11434" # Use container name 'ollama' instead of localhost
GENERATIVE_MODEL_APIKEY: "" # Empty as you're not using this for now
GENERATIVE_MODEL_ENDPOINT: "http://ollama:11434" # Use container name 'ollama' instead of localhost
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
volumes:
ollama_data:
weaviate_data:
Check here for more information on using Weaviate with docker compose: