if i wanted to store persisted data. is it possible with weaviate?
hi @AbhinavKasubojula !!
Welcome to our community!
Sure thing!
How are you deploying Weaviate?
This information (along with others that we ask when you open the thread) will allow me to help you better.
Thanks!
Thanks @DudaNogueira, for reviewing my query.
here is my docker-compose yaml:
version: '3.4'
services:
weaviate:
image: semitechnologies/weaviate:1.28.1
ports:
- "8080:8080"
- "50051:50051"
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
ENABLE_MODULES: text2vec-ollama,generative-ollama # Ensure both modules are enabled
DEFAULT_VECTORIZER_MODULE: text2vec-ollama
TEXT2VEC_OLLAMA_APIKEY: "http://ollama:11434" # Use container name 'ollama' instead of localhost
TEXT2VEC_OLLAMA_ENDPOINT: "http://ollama:11434" # Use container name 'ollama' instead of localhost
GENERATIVE_MODEL_APIKEY: "" # Empty as you're not using this for now
GENERATIVE_MODEL_ENDPOINT: "http://ollama:11434" # Use container name 'ollama' instead of localhost
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
volumes:
ollama_data:
we are using docker container to run the weaviate - client.
data = [{
"company_name" : "XXX",
"projects" : "1.Dam 42 rehabilitation Design, scope of work: Design, project value: 217000, 2.Dam44,45,48 and 50 rehabilitation design, scope of work: design, project value:962,000,3.Ajies & daguey rehab design, scope of work:design and assessments, project value:112,500,4.south Carolina dam assessment, scope of work: design and assessments, project value:87,500",
"engineers" : "Administrative:5, architect:1, CADD technician:4, civil engineer:4, construction inspector:10, electrical engineer:1, environmental engineer: 1, geotechnical engineer:3, gis specialist:1, hydrologist:1, mechanical engineer:2,project manager:4"
},
{
"company_name" : "YYY",
"projects" : "1.Cherokee nation roads department multiple task orders, scope of work:roadway, bridge,ROW,drainage,waterline, sewer, structural design, storm design, project value: 2,449794, 2.BIA A-E services, scope of work:field investigation, waterline, construction docs, SUE level B, project value:243931, 3.BIA A-E services, TO-3 Quinault detention, WA, scope of work:stormwater drainage, site design, waterline, sewer, project value:172,000" ,
"engineers" : "administrative:7, CADD technician:2, civil engineer:12, land surveyor:1, engineer intern:4, land survey intern:2, survey technician:5, right of way agent:3, field technician:3"
},
{
"company_name" : "AA Engineering, Inc.",
"projects" : "Automation; Controls; Instrumentation; Educational Facilities; Classrooms; Industrial; Manufacturing",
"engineers" : "Administravite:14, CADD Technician:14, construction inspector:1, cost engineer/estimator:1, electrical engineer:9, mechanical engineer:17"
},
{
"company_name" : "VV Consultants, Inc",
"projects" : "Airports; Terminals and Hangars; Freight; Bridges; commecal building; shopping; das(concrete arch); urba renewals; comunitydevelopment",
"engineers" : "Administravite:114, CADD Technician:31, construction inspector:1, cost engineer/estimator:1, electrical engineer:9, mechanical engineer:17,civil engineer:181,archaeologist:12,structual engineer:44"
}]
def CreateCollectionAndLoad():
client = weaviate.connect_to_local()
print(f"Client: {weaviate.__version__}, Server: client.get_meta().get('version')")
collection_name1 = "name"
client.collections.delete(collection_name1)
client.collections.create(
name=collection_name1,
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
#vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_ollama(
# api_endpoint="http://host.docker.internal:11434",
# model="nomic-embed-text"
generative_config=wvc.config.Configure.Generative.ollama(
api_endpoint="http://host.docker.internal:11434",
model="llama3.2"
)
)
collection = client.collections.get("name")
with collection.batch.dynamic() as batch:
for item in data:
emb = compute_embeddings(item["projects"]).tolist()
batch.add_object({
"company_name":item["company_name"],
"projects":item["projects"],
"engineers":item["engineers"]
},
vector=emb)
CreateCollectionAndLoad()
Hi!
Where did you get this environment variable? It doesn’t exist.
Note that you are not defining any vectorizer:
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
it must be:
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_ollama(),
Apart from that, you have a mounted volume, so your data should persist.
Can you send the exact steps you are doing to spin it up and down?
Let me know if that helps.
Thanks!
Hi @DudaNogueira ,
Thanks for pointing that out!
Here’s what I’m doing to spin it up and down:
- To start:
docker-compose up -d
- To stop:
docker-compose down
Let me know if I should be doing anything differently or if you have any suggestions.
if I change
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
to: vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_ollama(),
I’m unable to load data in collection,
client = weaviate.connect_to_local(
port=8080,
grpc_port=50051,
additional_config=AdditionalConfig(
timeout=Timeout(init=30, query=60, insert=120) # Values in seconds
)
)
print(f"Client: {weaviate.__version__}, Server: client.get_meta().get('version')")
collection_name = "Notices"
client.collections.create(
name=collection_name,
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_ollama(),
generative_config=wvc.config.Configure.Generative.ollama(
api_endpoint="http://host.docker.internal:11434",
model="llama3.2"
)
)
collection = client.collections.get(collection_name)
with collection.batch.dynamic() as batch:
for item in data:
emb = compute_embeddings(item["engineers"]).tolist()
batch.add_object({
"company_name":item["company_name"],
"engineers":item["engineers"]
},
vector=emb)
for i in collection.iterator():
print(i)
collection = client.collections.get(collection_name)
print(len(list(collection.iterator())))
for i in collection.iterator():
print(i.properties)
client.close()
Can you check this:
Basically, adding a way to check if there was any issues in the batch:
this part, outside of the with
context
failed_objects = collection.batch.failed_objects
if failed_objects:
print(f"Number of failed imports: {len(failed_objects)}")
print(f"First failed object: {failed_objects[0]}")
Could you please help me resolve the issue with data persistence?
Can you try this one?
version: '3.4'
services:
weaviate:
image: semitechnologies/weaviate:1.28.1
volumes:
- weaviate_data:/var/lib/weaviate
ports:
- "8080:8080"
- "50051:50051"
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
ENABLE_MODULES: text2vec-ollama,generative-ollama # Ensure both modules are enabled
DEFAULT_VECTORIZER_MODULE: text2vec-ollama
TEXT2VEC_OLLAMA_APIKEY: "http://ollama:11434" # Use container name 'ollama' instead of localhost
TEXT2VEC_OLLAMA_ENDPOINT: "http://ollama:11434" # Use container name 'ollama' instead of localhost
GENERATIVE_MODEL_APIKEY: "" # Empty as you're not using this for now
GENERATIVE_MODEL_ENDPOINT: "http://ollama:11434" # Use container name 'ollama' instead of localhost
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
volumes:
ollama_data:
weaviate_data:
Check here for more information on using Weaviate with docker compose: