Embeded weaviate with objects stuck at 20; connect_to_local shows thousands objects

When I use connect_to_local, I can tell chunks inserted properly : one file has 10; other pdf has 15. When I switch to use embeded, it stuck with 20, seems last file over-written the fist file. How to avoid this?

2024-11-10 00:57:04,305 - INFO - === utils.py counts per file
{
“/Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf”: 10,
“/Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what-is-a-constitution-primer.pdf”: 15
}

hi @cw257900 !

Are you sure you are running the same version on both scenarios?

What is the version you are running?

Is the code and data used sharable?

Thanks!

it’s all from same code base, in vector_store.py file, I can switch between embeded and local while the rest code stays the same

Code is open in git

local client works perfect. when i use connected_to_embeded; length of response are always 20 after new file being uploaded;

This is the collection I print out: {‘classes’: [{‘class’: ‘PDF_COLLECTION’, ‘invertedIndexConfig’: {‘bm25’: {‘b’: 0.75, ‘k1’: 1.2}, ‘cleanupIntervalSeconds’: 60, ‘indexNullState’: True, ‘indexPropertyLength’: True, ‘indexTimestamps’: True, ‘stopwords’: {‘additions’: None, ‘preset’: ‘en’, ‘removals’: None}}, ‘moduleConfig’: {‘generative-cohere’: {}, ‘text2vec-openai’: {‘baseURL’: ‘https://api.openai.com’, ‘model’: ‘ada’, ‘vectorizeClassName’: True}}, ‘multiTenancyConfig’: {‘autoTenantActivation’: False, ‘autoTenantCreation’: False, ‘enabled’: False}, ‘properties’: [{‘dataType’: [‘text’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: True, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘page_content’, ‘tokenization’: ‘word’}, {‘dataType’: [‘int’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: False, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘page_number’}, {‘dataType’: [‘text’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: True, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘source’, ‘tokenization’: ‘word’}, {‘dataType’: [‘date’], ‘description’: “This property was generated by Weaviate’s auto-schema feature on Sun Nov 10 23:44:25 2024”, ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: False, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: False}}, ‘name’: ‘uploadDate’}], ‘replicationConfig’: {‘asyncEnabled’: False, ‘deletionStrategy’: ‘DeleteOnConflict’, ‘factor’: 1}, ‘shardingConfig’: {‘actualCount’: 1, ‘actualVirtualCount’: 128, ‘desiredCount’: 1, ‘desiredVirtualCount’: 128, ‘function’: ‘murmur3’, ‘key’: ‘_id’, ‘strategy’: ‘hash’, ‘virtualPerPhysical’: 128}, ‘vectorIndexConfig’: {‘bq’: {‘enabled’: True}, ‘cleanupIntervalSeconds’: 300, ‘distance’: ‘cosine’, ‘dynamicEfFactor’: 8, ‘dynamicEfMax’: 500, ‘dynamicEfMin’: 100, ‘ef’: -1, ‘efConstruction’: 128, ‘filterStrategy’: ‘sweeping’, ‘flatSearchCutoff’: 40000, ‘maxConnections’: 32, ‘pq’: {‘bitCompression’: False, ‘centroids’: 256, ‘enabled’: False, ‘encoder’: {‘distribution’: ‘log-normal’, ‘type’: ‘kmeans’}, ‘segments’: 0, ‘trainingLimit’: 100000}, ‘skip’: False, ‘sq’: {‘enabled’: False, ‘rescoreLimit’: 20, ‘trainingLimit’: 100000}, ‘vectorCacheMaxObjects’: 1000000000000}, ‘vectorIndexType’: ‘hnsw’, ‘vectorizer’: ‘text2vec-openai’}]}

Embedded will spawn a weaviate instance from the python client code. So a lot can happen there :thinking:

What you mean length of response?

collection = client.collections.get(class_name)
response = collection.query.fetch_objects()
object_cnts = len(response.objects)

It shows 20 when I load first file; 15 when I upload 2nd file. After both files are done, cnts is always 20 even after I tried to upload a 3rd file.
Weaviate version is 3.4

I can hardly find any documentation on how to trouble shoot issue like this.

I did dump all objects to local file; and found some chunks from 1st file are missing file_counts = Counter()
all_files= get_all_filenames(pdf_file_path)
for filename in all_files:
count = sum(1 for o in response.objects if o.properties.get(“source”) == filename)
file_counts[filename] = count

logging.info(f" === utils.py counts per file \n {json.dumps(file_counts, indent=2)}")

# Define the path to save the JSON file
output_file_path = "temp.txt"  # Update with your desired path

with open(output_file_path, "w") as f:
    for i, o in enumerate(response.objects, start=1):
        f.write(f"Object {i} properties:\n")
        # Access only the properties dictionary of each object
        for key, value in o.properties.items():
            f.write(f"  {key}: {value}\n")
        f.write("\n")  # Separate objects by a newline


return object_cnts

Ok, can you share some code I can reproduce this?

If possible, separated from the project.

Weaviate current version is 1.27.2, if you are running with Embedded, you can specify a version there.

Let me know if this helps!

I shared entire project. Code in calling create are the same for embedded and use_local_connection.

import os
import sys
import weaviate
from weaviate import WeaviateClient
from weaviate.classes.init import Auth
from weaviate.connect import ConnectionParams
import weaviate
from weaviate.embedded import EmbeddedOptions

Add the parent directory (or wherever “with_pinecone” is located) to the Python path

sys.path.append(os.path.abspath(os.path.join(os.path.dirname(file), ‘…’)))
from configs import configs

import logging

Configure logging for development

logging.basicConfig(
format=‘%(asctime)s - %(levelname)s - %(message)s’,
level=logging.INFO, # Changed from WARNING to INFO
handlers=[
logging.StreamHandler() # This ensures output to console
]
)

os.environ[‘OPENAI_API_KEY’]=configs.OPENAI_API_KEY

Function to create and return a Weaviate client object

def create_client():

headers = {“X-OpenAI-Api-Key”: configs.OPENAI_API_KEY}

Initialize connection params

“”"
connection_params = ConnectionParams(
http={“host”: WEAVIATE_HOST, “port”: WEAVIATE_HTTP_PORT, “secure”: False, “additional_headers”: headers},
grpc={“host”: WEAVIATE_HOST, “port”: WEAVIATE_GRPC_PORT, “secure”: False}
)
“”"
#client = weaviate.connect_to_local( headers = {“X-OpenAI-Api-Key”: configs.OPENAI_API_KEY})
“”"
client = weaviate.use_async_with_embedded (
version=“1.26.1”,
headers={“X-OpenAI-Api-Key”: OPENAI_API_KEY},
port=8079,
grpc_port=50051,
)
“”"

client = weaviate.connect_to_embedded(
version=“latest”,
persistence_data_path=configs.WEAVIATE_PERSISTENCE_PATH,
headers= headers,
environment_variables={
“ENABLE_MODULES”: “text2vec-openai,text2vec-cohere,text2vec-huggingface,ref2vec-centroid,generative-openai,qna-openai”,
}
)

logging.info (" === vectore_stores.py - embeded client initated {}".format(client))

return client

def close_client(client):
if client:
client.close()
print(“Weaviate client closed.”)

if name == “main”:

client = create_client()
print (client)
if not client.collections.exists(“Test”):
collection = client.collections.create(“Test”)
else:
collection = client.collections.get(“Test”)
collection.data.insert({“text”: "this is a test " })
print (collection)
client.close()

The code is not formatted. I was not able to run it.

Can you for example create a google collab?

One thing I noticed, is that you seem to be instantiating two embedded, use_async_with_embedded and connect_to_embedded, also, try using the latest version (not the latest word, but for now 1.27.3)

The code for aysnc is commented out. I’ll try to create a google collab . thanks

will try 1.27.3 as well. thanks

Here is the log; shows first file has 18 chucks loaded; 2nd has 10. When final counts the object, it shows 20 2024-11-13 16:32:09,935 - INFO -
=== file_path: /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf

chunking_recursiveCharacterTextSplitter.py: file is being chunked: /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“warning”,“msg”:“prop len tracker file weaviate_data/pdf_collection/Ib2JkZKYAJpm/proplengths does not exist, creating new tracker”,“time”:“2024-11-13T16:32:10-06:00”}
{“action”:“hnsw_prefill_cache_async”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“not waiting for vector cache prefill, running in background”,“time”:“2024-11-13T16:32:10-06:00”,“wait_for_cache_prefill”:false}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“Created shard pdf_collection_Ib2JkZKYAJpm in 3.017834ms”,“time”:“2024-11-13T16:32:10-06:00”}
2024-11-13 16:32:10,833 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 0 - Chunk 0
2024-11-13 16:32:11,310 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 1 - Chunk 1
2024-11-13 16:32:11,640 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 2 - Chunk 2
2024-11-13 16:32:12,396 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 3 - Chunk 3
2024-11-13 16:32:12,842 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 4 - Chunk 4
2024-11-13 16:32:13,188 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 5 - Chunk 5
2024-11-13 16:32:13,701 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 6 - Chunk 6
2024-11-13 16:32:14,206 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 7 - Chunk 7
2024-11-13 16:32:14,969 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 8 - Chunk 8
{“action”:“hnsw_compressed_vector_cache_prefill_progress”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“elapsed_total”:5000034500,“level”:“info”,“loaded”:0,“msg”:“loaded 0 vectors in 5s, current rate is 0 vectors/s, total rate is 0 vectors/s”,“rate_per_second”:0,“time”:“2024-11-13T16:32:15-06:00”,“total_rate_per_second”:0}
2024-11-13 16:32:15,531 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 9 - Chunk 9
2024-11-13 16:32:15,951 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 10 - Chunk 10
2024-11-13 16:32:16,424 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 11 - Chunk 11
2024-11-13 16:32:16,754 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 12 - Chunk 12
2024-11-13 16:32:16,976 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 13 - Chunk 13
2024-11-13 16:32:17,388 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 14 - Chunk 14
2024-11-13 16:32:17,832 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 15 - Chunk 15
2024-11-13 16:32:18,164 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 16 - Chunk 16
2024-11-13 16:32:19,974 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 17 - Chunk 17
{“action”:“hnsw_compressed_vector_cache_prefill_progress”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“elapsed_total”:10000003084,“level”:“info”,“loaded”:0,“msg”:“loaded 0 vectors in 10s, current rate is 0 vectors/s, total rate is 0 vectors/s”,“rate_per_second”:0,“time”:“2024-11-13T16:32:20-06:00”,“total_rate_per_second”:0}
2024-11-13 16:32:20,372 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 18 - Chunk 18
2024-11-13 16:32:20 - All chunks inserted for /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf
2024-11-13 16:32:20,373 - INFO -
Document /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf Processing Status:
{
“status”: true,
“message”: ,
“error”:
}
2024-11-13 16:32:20,373 - INFO -
=== file_path: /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf

chunking_recursiveCharacterTextSplitter.py: file is being chunked: /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf
2024-11-13 16:32:20,745 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 0 - Chunk 0
2024-11-13 16:32:21,027 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 1 - Chunk 1
2024-11-13 16:32:21,489 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 2 - Chunk 2
2024-11-13 16:32:22,146 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 3 - Chunk 3
2024-11-13 16:32:22,590 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 4 - Chunk 4
2024-11-13 16:32:22,965 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 5 - Chunk 5
2024-11-13 16:32:23,338 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 6 - Chunk 6
2024-11-13 16:32:23,568 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 7 - Chunk 7
2024-11-13 16:32:23,940 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 8 - Chunk 8
2024-11-13 16:32:24,390 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 9 - Chunk 9
2024-11-13 16:32:24,672 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 10 - Chunk 10
2024-11-13 16:32:24,971 - INFO - HTTP Request: POST http://localhost:8079/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 11 - Chunk 11
2024-11-13 16:32:24 - All chunks inserted for /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf
2024-11-13 16:32:24,972 - INFO -
Document /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf Processing Status:
{
“status”: true,
“message”: ,
“error”:
}
2024-11-13 16:32:24,972 - INFO - === utils.py url: http://localhost:8079/v1/objects/
2024-11-13 16:32:24,978 - INFO - === utils.py
{‘classes’: [{‘class’: ‘PDF_COLLECTION’, ‘invertedIndexConfig’: {‘bm25’: {‘b’: 0.75, ‘k1’: 1.2}, ‘cleanupIntervalSeconds’: 60, ‘indexNullState’: True, ‘indexPropertyLength’: True, ‘indexTimestamps’: True, ‘stopwords’: {‘additions’: None, ‘preset’: ‘en’, ‘removals’: None}}, ‘moduleConfig’: {‘generative-cohere’: {}, ‘text2vec-openai’: {‘baseURL’: ‘https://api.openai.com’, ‘model’: ‘text-embedding-3-small’, ‘vectorizeClassName’: True}}, ‘multiTenancyConfig’: {‘autoTenantActivation’: False, ‘autoTenantCreation’: False, ‘enabled’: False}, ‘properties’: [{‘dataType’: [‘text’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: True, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘page_content’, ‘tokenization’: ‘word’}, {‘dataType’: [‘int’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: False, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘page_number’}, {‘dataType’: [‘text’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: True, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘source’, ‘tokenization’: ‘word’}], ‘replicationConfig’: {‘asyncEnabled’: False, ‘deletionStrategy’: ‘DeleteOnConflict’, ‘factor’: 1}, ‘shardingConfig’: {‘actualCount’: 1, ‘actualVirtualCount’: 128, ‘desiredCount’: 1, ‘desiredVirtualCount’: 128, ‘function’: ‘murmur3’, ‘key’: ‘_id’, ‘strategy’: ‘hash’, ‘virtualPerPhysical’: 128}, ‘vectorIndexConfig’: {‘bq’: {‘enabled’: True}, ‘cleanupIntervalSeconds’: 300, ‘distance’: ‘cosine’, ‘dynamicEfFactor’: 8, ‘dynamicEfMax’: 500, ‘dynamicEfMin’: 100, ‘ef’: -1, ‘efConstruction’: 128, ‘filterStrategy’: ‘sweeping’, ‘flatSearchCutoff’: 40000, ‘maxConnections’: 32, ‘pq’: {‘bitCompression’: False, ‘centroids’: 256, ‘enabled’: False, ‘encoder’: {‘distribution’: ‘log-normal’, ‘type’: ‘kmeans’}, ‘segments’: 0, ‘trainingLimit’: 100000}, ‘skip’: False, ‘sq’: {‘enabled’: False, ‘rescoreLimit’: 20, ‘trainingLimit’: 100000}, ‘vectorCacheMaxObjects’: 1000000000000}, ‘vectorIndexType’: ‘hnsw’, ‘vectorizer’: ‘text2vec-openai’}]}

2024-11-13 16:32:24,984 - INFO -
=== utils.py total objects 20 in PDF_COLLECTION

2024-11-13 16:32:24,984 - INFO - === utils.py counts per file
{
“/Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf”: 11,
“/Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf”: 9
}
{‘status’: True, ‘message’: [‘20 already in http://localhost:8079/v1/objects/’], ‘error’: }
2024-11-13 16:32:24,985 - INFO - === *created.py - url: http://localhost:8079/v1/objects/
2024-11-13 16:32:24,985 - INFO - === *created.py - object_count: 20
2024-11-13 16:32:24,985 - INFO -
Document Processing Status: for /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data
{
“status”: true,
“message”: [
“20 already in http://localhost:8079/v1/objects/
],
“error”:
}
{“action”:“restapi_management”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“Shutting down… “,“time”:“2024-11-13T16:32:24-06:00”,“version”:“1.27.3”}
{“action”:“restapi_management”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“Stopped serving weaviate at http://127.0.0.1:8079”,“time”:“2024-11-13T16:32:24-06:00”,“version”:“1.27.3”}
{“action”:“hnsw_compressed_vector_cache_prefill_progress”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“elapsed_total”:15000027792,“level”:“info”,“loaded”:0,“msg”:“loaded 0 vectors in 15s, current rate is 0 vectors/s, total rate is 0 vectors/s”,“rate_per_second”:0,“time”:“2024-11-13T16:32:25-06:00”,“total_rate_per_second”:0}
{“action”:“telemetry_push”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“telemetry terminated”,“payload”:”\u0026{MachineID:857f2ecf-d343-4a0c-8023-8363be41eafc Type:TERMINATE Version:1.27.3 NumObjects:0 OS:darwin Arch:arm64 UsedModules:[generative-cohere text2vec-openai]}”,“time”:“2024-11-13T16:32:25-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“closing raft FSM store …”,“time”:“2024-11-13T16:32:25-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“shutting down raft sub-system …”,“time”:“2024-11-13T16:32:25-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“transferring leadership to another server”,“time”:“2024-11-13T16:32:25-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“error”:“cannot find peer”,“level”:“error”,“msg”:“transferring leadership”,“time”:“2024-11-13T16:32:25-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“closing raft-net …”,“time”:“2024-11-13T16:32:25-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“closing log store …”,“time”:“2024-11-13T16:32:25-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“closing data store …”,“time”:“2024-11-13T16:32:25-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“closing loaded database …”,“time”:“2024-11-13T16:32:25-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“closing raft-rpc client …”,“time”:“2024-11-13T16:32:25-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“closing raft-rpc server …”,“time”:“2024-11-13T16:32:25-06:00”}
(.venv) connie.wang@Connies-MacBook-Pro-M3 fastapi_onazure % python app/rag/with_weaviate/utils/utils.py
2024-11-13 16:32:40,122 - INFO - === configs.py - blob_name for azure: rag/data/constitution.pdf
2024-11-13 16:32:40,122 - INFO - === configs.py - pdf_file_path : /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data
2024-11-13 16:32:40,127 - INFO - Started /Users/connie.wang/.cache/weaviate-embedded: process ID 17425
{“action”:“startup”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“default_vectorizer_module”:“none”,“level”:“info”,“msg”:“the default vectorizer modules is set to "none", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer”,“time”:“2024-11-13T16:32:40-06:00”}
{“action”:“startup”,“auto_schema_enabled”:true,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“auto schema enabled setting is set to "true"”,“time”:“2024-11-13T16:32:40-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true”,“time”:“2024-11-13T16:32:40-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“module offload-s3 is enabled”,“time”:“2024-11-13T16:32:40-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“warning”,“msg”:“Multiple vector spaces are present, GraphQL Explore and REST API list objects endpoint module include params has been disabled as a result.”,“time”:“2024-11-13T16:32:40-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“open cluster service”,“servers”:{“Embedded_at_8079”:51416},“time”:“2024-11-13T16:32:40-06:00”}
{“address”:“192.168.1.44:51417”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“starting cloud rpc server …”,“time”:“2024-11-13T16:32:40-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“starting raft sub-system …”,“time”:“2024-11-13T16:32:40-06:00”}
{“address”:“192.168.1.44:51416”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“tcp transport”,“tcpMaxPool”:3,“tcpTimeout”:10000000000,“time”:“2024-11-13T16:32:40-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“loading local db”,“time”:“2024-11-13T16:32:40-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“local DB successfully loaded”,“time”:“2024-11-13T16:32:40-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“schema manager loaded”,“n”:0,“time”:“2024-11-13T16:32:40-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“metadata_only_voters”:false,“msg”:“construct a new raft node”,“name”:“Embedded_at_8079”,“time”:“2024-11-13T16:32:40-06:00”}
{“action”:“raft”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“index”:61,“level”:“info”,“msg”:“initial configuration”,“servers”:“[[{Suffrage:Voter ID:Embedded_at_8079 Address:192.168.1.44:51277}]]”,“time”:“2024-11-13T16:32:40-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“last_snapshot_index”:0,“last_store_applied_index_on_start”:62,“level”:“info”,“msg”:“raft node constructed”,“raft_applied_index”:0,“raft_last_index”:62,“time”:“2024-11-13T16:32:40-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“hasState”:true,“level”:“info”,“msg”:“raft init”,“time”:“2024-11-13T16:32:40-06:00”}
{“action”:“raft”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“follower”:{},“index”:61,“leader-address”:“”,“leader-id”:“”,“level”:“info”,“msg”:“entering follower state”,“servers”:“[[{Suffrage:Voter ID:Embedded_at_8079 Address:192.168.1.44:51277}]]”,“time”:“2024-11-13T16:32:40-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“attempting to join”,“remoteNodes”:[“192.168.1.44:51416”],“time”:“2024-11-13T16:32:40-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“attempted to join and failed”,“remoteNode”:“192.168.1.44:51416”,“status”:8,“time”:“2024-11-13T16:32:40-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“attempting to join”,“remoteNodes”:[“192.168.1.44:51416”],“time”:“2024-11-13T16:32:41-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“attempted to join and failed”,“remoteNode”:“192.168.1.44:51416”,“status”:8,“time”:“2024-11-13T16:32:41-06:00”}
{“action”:“raft”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“follower”:{},“index”:61,“last-leader-addr”:“”,“last-leader-id”:“”,“leader-address”:“”,“leader-id”:“”,“level”:“warning”,“msg”:“heartbeat timeout reached, starting election”,“servers”:“[[{Suffrage:Voter ID:Embedded_at_8079 Address:192.168.1.44:51277}]]”,“time”:“2024-11-13T16:32:41-06:00”}
{“action”:“raft”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“follower”:{},“index”:61,“last-leader-addr”:“”,“last-leader-id”:“”,“leader-address”:“”,“leader-id”:“”,“level”:“info”,“msg”:“entering candidate state”,“node”:{},“servers”:“[[{Suffrage:Voter ID:Embedded_at_8079 Address:192.168.1.44:51277}]]”,“term”:29,“time”:“2024-11-13T16:32:41-06:00”}
{“action”:“raft”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“follower”:{},“from”:“Embedded_at_8079”,“id”:“Embedded_at_8079”,“index”:61,“last-leader-addr”:“”,“last-leader-id”:“”,“leader-address”:“”,“leader-id”:“”,“level”:“info”,“msg”:“pre-vote successful, starting election”,“needed”:1,“node”:{},“refused”:0,“servers”:“[[{Suffrage:Voter ID:Embedded_at_8079 Address:192.168.1.44:51277}]]”,“tally”:1,“term”:29,“time”:“2024-11-13T16:32:41-06:00”,“votesNeeded”:1}
{“action”:“raft”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“follower”:{},“from”:“Embedded_at_8079”,“id”:“Embedded_at_8079”,“index”:61,“last-leader-addr”:“”,“last-leader-id”:“”,“leader-address”:“”,“leader-id”:“”,“level”:“info”,“msg”:“election won”,“needed”:1,“node”:{},“refused”:0,“servers”:“[[{Suffrage:Voter ID:Embedded_at_8079 Address:192.168.1.44:51277}]]”,“tally”:1,“term”:29,“time”:“2024-11-13T16:32:41-06:00”,“votesNeeded”:1}
{“action”:“raft”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“follower”:{},“from”:“Embedded_at_8079”,“id”:“Embedded_at_8079”,“index”:61,“last-leader-addr”:“”,“last-leader-id”:“”,“leader”:{},“leader-address”:“”,“leader-id”:“”,“level”:“info”,“msg”:“entering leader state”,“needed”:1,“node”:{},“refused”:0,“servers”:“[[{Suffrage:Voter ID:Embedded_at_8079 Address:192.168.1.44:51277}]]”,“tally”:1,“term”:29,“time”:“2024-11-13T16:32:41-06:00”,“votesNeeded”:1}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“Schema catching up: applying log entry: [7/62]”,“time”:“2024-11-13T16:32:41-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“Schema catching up: applying log entry: [8/62]”,“time”:“2024-11-13T16:32:41-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“Schema catching up: applying log entry: [19/62]”,“time”:“2024-11-13T16:32:41-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“Schema catching up: applying log entry: [22/62]”,“time”:“2024-11-13T16:32:41-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“Schema catching up: applying log entry: [23/62]”,“time”:“2024-11-13T16:32:41-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“Schema catching up: applying log entry: [24/62]”,“time”:“2024-11-13T16:32:41-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“Schema catching up: applying log entry: [57/62]”,“time”:“2024-11-13T16:32:41-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“Schema catching up: applying log entry: [62/62]”,“time”:“2024-11-13T16:32:41-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“last_store_log_applied_index”:62,“level”:“info”,“log_index”:62,“log_name”:“LogCommand”,“log_type”:0,“msg”:“reloading local DB as RAFT and local DB are now caught up”,“time”:“2024-11-13T16:32:41-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“reload local db: update schema …”,“time”:“2024-11-13T16:32:41-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“index”:“PDF_COLLECTION”,“level”:“info”,“msg”:“reload local index”,“time”:“2024-11-13T16:32:41-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“configured versions”,“server_version”:“1.27.3”,“time”:“2024-11-13T16:32:42-06:00”,“version”:“1.27.3”}
{“action”:“grpc_startup”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“grpc server listening at [::]:50050”,“time”:“2024-11-13T16:32:42-06:00”}
{“address”:“192.168.1.44:51416”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“current Leader”,“time”:“2024-11-13T16:32:42-06:00”}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“attempting to join”,“remoteNodes”:[“192.168.1.44:51416”],“time”:“2024-11-13T16:32:42-06:00”}
{“action”:“raft”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“command”:0,“follower”:{},“from”:“Embedded_at_8079”,“id”:“Embedded_at_8079”,“index”:61,“last-leader-addr”:“”,“last-leader-id”:“”,“leader”:{},“leader-address”:“”,“leader-id”:“”,“level”:“info”,“msg”:“updating configuration”,“needed”:1,“node”:{},“refused”:0,“server-addr”:“192.168.1.44:51416”,“server-id”:“Embedded_at_8079”,“servers”:“[[{Suffrage:Voter ID:Embedded_at_8079 Address:192.168.1.44:51416}]]”,“tally”:1,“term”:29,“time”:“2024-11-13T16:32:42-06:00”,“votesNeeded”:1}
{“action”:“restapi_management”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“Serving weaviate at http://127.0.0.1:8079”,“time”:“2024-11-13T16:32:42-06:00”,“version”:“1.27.3”}
2024-11-13 16:32:42,325 - INFO - HTTP Request: GET http://localhost:8079/v1/.well-known/openid-configuration “HTTP/1.1 404 Not Found”
2024-11-13 16:32:42,336 - INFO - HTTP Request: GET http://localhost:8079/v1/meta “HTTP/1.1 200 OK”
2024-11-13 16:32:42,337 - INFO - HTTP Request: GET http://localhost:8079/v1/.well-known/ready “HTTP/1.1 200 OK”
2024-11-13 16:32:42,440 - INFO - HTTP Request: GET https://pypi.org/pypi/weaviate-client/json “HTTP/1.1 200 OK”
2024-11-13 16:32:42,468 - INFO - === vectore_stores.py - embeded client initated <weaviate.client.WeaviateClient object at 0x1040461e0>
2024-11-13 16:32:42,468 - INFO - === utils.py url: http://localhost:8079/v1/objects/
2024-11-13 16:32:42,472 - INFO - === utils.py
{‘classes’: [{‘class’: ‘PDF_COLLECTION’, ‘invertedIndexConfig’: {‘bm25’: {‘b’: 0.75, ‘k1’: 1.2}, ‘cleanupIntervalSeconds’: 60, ‘indexNullState’: True, ‘indexPropertyLength’: True, ‘indexTimestamps’: True, ‘stopwords’: {‘additions’: None, ‘preset’: ‘en’, ‘removals’: None}}, ‘moduleConfig’: {‘generative-cohere’: {}, ‘text2vec-openai’: {‘baseURL’: ‘https://api.openai.com’, ‘model’: ‘text-embedding-3-small’, ‘vectorizeClassName’: True}}, ‘multiTenancyConfig’: {‘autoTenantActivation’: False, ‘autoTenantCreation’: False, ‘enabled’: False}, ‘properties’: [{‘dataType’: [‘text’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: True, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘page_content’, ‘tokenization’: ‘word’}, {‘dataType’: [‘int’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: False, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘page_number’}, {‘dataType’: [‘text’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: True, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘source’, ‘tokenization’: ‘word’}], ‘replicationConfig’: {‘asyncEnabled’: False, ‘deletionStrategy’: ‘DeleteOnConflict’, ‘factor’: 1}, ‘shardingConfig’: {‘actualCount’: 1, ‘actualVirtualCount’: 128, ‘desiredCount’: 1, ‘desiredVirtualCount’: 128, ‘function’: ‘murmur3’, ‘key’: ‘_id’, ‘strategy’: ‘hash’, ‘virtualPerPhysical’: 128}, ‘vectorIndexConfig’: {‘bq’: {‘enabled’: True}, ‘cleanupIntervalSeconds’: 300, ‘distance’: ‘cosine’, ‘dynamicEfFactor’: 8, ‘dynamicEfMax’: 500, ‘dynamicEfMin’: 100, ‘ef’: -1, ‘efConstruction’: 128, ‘filterStrategy’: ‘sweeping’, ‘flatSearchCutoff’: 40000, ‘maxConnections’: 32, ‘pq’: {‘bitCompression’: False, ‘centroids’: 256, ‘enabled’: False, ‘encoder’: {‘distribution’: ‘log-normal’, ‘type’: ‘kmeans’}, ‘segments’: 0, ‘trainingLimit’: 100000}, ‘skip’: False, ‘sq’: {‘enabled’: False, ‘rescoreLimit’: 20, ‘trainingLimit’: 100000}, ‘vectorCacheMaxObjects’: 1000000000000}, ‘vectorIndexType’: ‘hnsw’, ‘vectorizer’: ‘text2vec-openai’}]}

{“action”:“hnsw_prefill_cache_async”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“not waiting for vector cache prefill, running in background”,“time”:“2024-11-13T16:32:42-06:00”,“wait_for_cache_prefill”:false}
{“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“level”:“info”,“msg”:“Completed loading shard pdf_collection_Ib2JkZKYAJpm in 5.526791ms”,“time”:“2024-11-13T16:32:42-06:00”}
{“action”:“hnsw_compressed_vector_cache_prefill”,“build_git_commit”:“4258bdfc2”,“build_go_version”:“go1.23.3”,“build_image_tag”:“HEAD”,“build_wv_version”:“1.27.3”,“count”:31,“level”:“info”,“maxID”:30,“msg”:“prefilled compressed vector cache”,“time”:“2024-11-13T16:32:42-06:00”,“took”:344667}
2024-11-13 16:32:42,480 - INFO -
=== utils.py total objects 20 in PDF_COLLECTION

2024-11-13 16:32:42,481 - INFO - === utils.py counts per file
{
“/Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf”: 11,
“/Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf”: 9
}

here is same code, by switching to local connection; the counts of object is 25 iner fastapi_onazure-t2v-transformers-1 Started 0.2s
:heavy_check_mark: Container fastapi_onazure-contextionary-1 Started 0.2s
:heavy_check_mark: Container fastapi_onazure-weaviate-1 Started 0.3s
(.venv) connie.wang@Connies-MacBook-Pro-M3 fastapi_onazure % python app/rag/with_weaviate/*create.py
2024-11-13 16:43:35,687 - INFO - === configs.py - blob_name for azure: rag/data/constitution.pdf
2024-11-13 16:43:35,688 - INFO - === configs.py - pdf_file_path : /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data
2024-11-13 16:43:36,087 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings “HTTP/1.1 200 OK”
== 0.1. embeddings initiated from embedding_openai.py: text-embedding-ada-002 and dimension: 1536

2024-11-13 16:43:36,125 - INFO - HTTP Request: GET http://localhost:8080/v1/.well-known/openid-configuration “HTTP/1.1 404 Not Found”
2024-11-13 16:43:36,148 - INFO - HTTP Request: GET http://localhost:8080/v1/meta “HTTP/1.1 200 OK”
2024-11-13 16:43:36,243 - INFO - HTTP Request: GET https://pypi.org/pypi/weaviate-client/json “HTTP/1.1 200 OK”
2024-11-13 16:43:36,273 - INFO - === vectore_stores.py - embeded client initated <weaviate.client.WeaviateClient object at 0x30057cb90>
2024-11-13 16:43:36,276 - INFO - HTTP Request: GET http://localhost:8080/v1/schema/PDF_COLLECTION “HTTP/1.1 404 Not Found”
2024-11-13 16:43:36,279 - INFO - HTTP Request: GET http://localhost:8080/v1/schema/PDF_COLLECTION “HTTP/1.1 404 Not Found”
2024-11-13 16:43:36,452 - INFO - HTTP Request: POST http://localhost:8080/v1/schema “HTTP/1.1 200 OK”
2024-11-13 16:43:36,456 - INFO -
=== file_path: /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/.DS_Store

2024-11-13 16:43:36,456 - INFO -
Document /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/.DS_Store Processing Status:
{
“status”: true,
“message”: ,
“error”:
}
2024-11-13 16:43:36,456 - INFO -
=== file_path: /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf

chunking_recursiveCharacterTextSplitter.py: file is being chunked: /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf
2024-11-13 16:43:37,267 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 0 - Chunk 0
2024-11-13 16:43:37,523 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 1 - Chunk 1
2024-11-13 16:43:37,909 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 2 - Chunk 2
2024-11-13 16:43:38,386 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 3 - Chunk 3
2024-11-13 16:43:38,785 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 4 - Chunk 4
2024-11-13 16:43:39,070 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 5 - Chunk 5
2024-11-13 16:43:39,445 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 6 - Chunk 6
2024-11-13 16:43:40,080 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 7 - Chunk 7
2024-11-13 16:43:40,446 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 8 - Chunk 8
2024-11-13 16:43:40,799 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 9 - Chunk 9
2024-11-13 16:43:41,220 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 10 - Chunk 10
2024-11-13 16:43:41,776 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 11 - Chunk 11
2024-11-13 16:43:42,107 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 12 - Chunk 12
2024-11-13 16:43:42,344 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 13 - Chunk 13
2024-11-13 16:43:42,607 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 14 - Chunk 14
2024-11-13 16:43:43,177 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 15 - Chunk 15
2024-11-13 16:43:43,409 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 16 - Chunk 16
2024-11-13 16:43:43,935 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 17 - Chunk 17
2024-11-13 16:43:44,466 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 18 - Chunk 18
2024-11-13 16:43:44 - All chunks inserted for /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf
2024-11-13 16:43:44,468 - INFO -
Document /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf Processing Status:
{
“status”: true,
“message”: ,
“error”:
}
2024-11-13 16:43:44,468 - INFO -
=== file_path: /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf

chunking_recursiveCharacterTextSplitter.py: file is being chunked: /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf
2024-11-13 16:43:44,761 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 0 - Chunk 0
2024-11-13 16:43:45,217 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 1 - Chunk 1
2024-11-13 16:43:45,741 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 2 - Chunk 2
2024-11-13 16:43:46,267 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 3 - Chunk 3
2024-11-13 16:43:46,606 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 4 - Chunk 4
2024-11-13 16:43:47,222 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 5 - Chunk 5
2024-11-13 16:43:47,746 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 6 - Chunk 6
2024-11-13 16:43:48,216 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 7 - Chunk 7
2024-11-13 16:43:48,458 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 8 - Chunk 8
2024-11-13 16:43:48,934 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 9 - Chunk 9
2024-11-13 16:43:49,185 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 10 - Chunk 10
2024-11-13 16:43:49,495 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 11 - Chunk 11
2024-11-13 16:43:49 - All chunks inserted for /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf
2024-11-13 16:43:49,497 - INFO -
Document /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf Processing Status:
{
“status”: true,
“message”: ,
“error”:
}
2024-11-13 16:43:49,498 - INFO - === utils.py url: http://localhost:8080/v1/objects/
2024-11-13 16:43:49,509 - INFO - === utils.py
{‘classes’: [{‘class’: ‘PDF_COLLECTION’, ‘invertedIndexConfig’: {‘bm25’: {‘b’: 0.75, ‘k1’: 1.2}, ‘cleanupIntervalSeconds’: 60, ‘indexNullState’: True, ‘indexPropertyLength’: True, ‘indexTimestamps’: True, ‘stopwords’: {‘additions’: None, ‘preset’: ‘en’, ‘removals’: None}}, ‘moduleConfig’: {‘generative-cohere’: {}, ‘text2vec-openai’: {‘baseURL’: ‘https://api.openai.com’, ‘model’: ‘ada’, ‘vectorizeClassName’: True}}, ‘multiTenancyConfig’: {‘autoTenantActivation’: False, ‘autoTenantCreation’: False, ‘enabled’: False}, ‘properties’: [{‘dataType’: [‘text’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: True, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘page_content’, ‘tokenization’: ‘word’}, {‘dataType’: [‘int’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: False, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘page_number’}, {‘dataType’: [‘text’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: True, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘source’, ‘tokenization’: ‘word’}], ‘replicationConfig’: {‘asyncEnabled’: False, ‘deletionStrategy’: ‘DeleteOnConflict’, ‘factor’: 1}, ‘shardingConfig’: {‘actualCount’: 1, ‘actualVirtualCount’: 128, ‘desiredCount’: 1, ‘desiredVirtualCount’: 128, ‘function’: ‘murmur3’, ‘key’: ‘_id’, ‘strategy’: ‘hash’, ‘virtualPerPhysical’: 128}, ‘vectorIndexConfig’: {‘bq’: {‘enabled’: True}, ‘cleanupIntervalSeconds’: 300, ‘distance’: ‘cosine’, ‘dynamicEfFactor’: 8, ‘dynamicEfMax’: 500, ‘dynamicEfMin’: 100, ‘ef’: -1, ‘efConstruction’: 128, ‘filterStrategy’: ‘sweeping’, ‘flatSearchCutoff’: 40000, ‘maxConnections’: 32, ‘pq’: {‘bitCompression’: False, ‘centroids’: 256, ‘enabled’: False, ‘encoder’: {‘distribution’: ‘log-normal’, ‘type’: ‘kmeans’}, ‘segments’: 0, ‘trainingLimit’: 100000}, ‘skip’: False, ‘sq’: {‘enabled’: False, ‘rescoreLimit’: 20, ‘trainingLimit’: 100000}, ‘vectorCacheMaxObjects’: 1000000000000}, ‘vectorIndexType’: ‘hnsw’, ‘vectorizer’: ‘text2vec-openai’}]}

2024-11-13 16:43:49,520 - INFO -
=== utils.py total objects 25 in PDF_COLLECTION

2024-11-13 16:43:49,520 - INFO - === utils.py counts per file
{
“/Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf”: 14,
“/Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf”: 11
}
{‘status’: True, ‘message’: [‘25 already in http://localhost:8080/v1/objects/’], ‘error’: }
2024-11-13 16:43:49,521 - INFO - === *created.py - url: http://localhost:8080/v1/objects/
2024-11-13 16:43:49,521 - INFO - === *created.py - object_count: 25
2024-11-13 16:43:49,521 - INFO -
Document Processing Status: for /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data
{
“status”: true,
“message”: [
“25 already in http://localhost:8080/v1/objects/
],
“error”:
}
(.venv) connie.wang@Connies-MacBook-Pro-M3 fastapi_onazure % python app/rag/with_weaviate/utils/utils.py
2024-11-13 16:44:09,454 - INFO - === configs.py - blob_name for azure: rag/data/constitution.pdf
2024-11-13 16:44:09,454 - INFO - === configs.py - pdf_file_path : /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data
2024-11-13 16:44:09,465 - INFO - HTTP Request: GET http://localhost:8080/v1/.well-known/openid-configuration “HTTP/1.1 404 Not Found”
2024-11-13 16:44:09,479 - INFO - HTTP Request: GET http://localhost:8080/v1/meta “HTTP/1.1 200 OK”
2024-11-13 16:44:09,571 - INFO - HTTP Request: GET https://pypi.org/pypi/weaviate-client/json “HTTP/1.1 200 OK”
2024-11-13 16:44:09,597 - INFO - === vectore_stores.py - embeded client initated <weaviate.client.WeaviateClient object at 0x107d5a0c0>
2024-11-13 16:44:09,597 - INFO - === utils.py url: http://localhost:8080/v1/objects/
2024-11-13 16:44:09,603 - INFO - === utils.py
{‘classes’: [{‘class’: ‘PDF_COLLECTION’, ‘invertedIndexConfig’: {‘bm25’: {‘b’: 0.75, ‘k1’: 1.2}, ‘cleanupIntervalSeconds’: 60, ‘indexNullState’: True, ‘indexPropertyLength’: True, ‘indexTimestamps’: True, ‘stopwords’: {‘additions’: None, ‘preset’: ‘en’, ‘removals’: None}}, ‘moduleConfig’: {‘generative-cohere’: {}, ‘text2vec-openai’: {‘baseURL’: ‘https://api.openai.com’, ‘model’: ‘ada’, ‘vectorizeClassName’: True}}, ‘multiTenancyConfig’: {‘autoTenantActivation’: False, ‘autoTenantCreation’: False, ‘enabled’: False}, ‘properties’: [{‘dataType’: [‘text’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: True, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘page_content’, ‘tokenization’: ‘word’}, {‘dataType’: [‘int’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: False, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘page_number’}, {‘dataType’: [‘text’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: True, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘source’, ‘tokenization’: ‘word’}], ‘replicationConfig’: {‘asyncEnabled’: False, ‘deletionStrategy’: ‘DeleteOnConflict’, ‘factor’: 1}, ‘shardingConfig’: {‘actualCount’: 1, ‘actualVirtualCount’: 128, ‘desiredCount’: 1, ‘desiredVirtualCount’: 128, ‘function’: ‘murmur3’, ‘key’: ‘_id’, ‘strategy’: ‘hash’, ‘virtualPerPhysical’: 128}, ‘vectorIndexConfig’: {‘bq’: {‘enabled’: True}, ‘cleanupIntervalSeconds’: 300, ‘distance’: ‘cosine’, ‘dynamicEfFactor’: 8, ‘dynamicEfMax’: 500, ‘dynamicEfMin’: 100, ‘ef’: -1, ‘efConstruction’: 128, ‘filterStrategy’: ‘sweeping’, ‘flatSearchCutoff’: 40000, ‘maxConnections’: 32, ‘pq’: {‘bitCompression’: False, ‘centroids’: 256, ‘enabled’: False, ‘encoder’: {‘distribution’: ‘log-normal’, ‘type’: ‘kmeans’}, ‘segments’: 0, ‘trainingLimit’: 100000}, ‘skip’: False, ‘sq’: {‘enabled’: False, ‘rescoreLimit’: 20, ‘trainingLimit’: 100000}, ‘vectorCacheMaxObjects’: 1000000000000}, ‘vectorIndexType’: ‘hnsw’, ‘vectorizer’: ‘text2vec-openai’}]}

2024-11-13 16:44:09,609 - INFO -
=== utils.py total objects 25 in PDF_COLLECTION

2024-11-13 16:44:09,609 - INFO - === utils.py counts per file
{
“/Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf”: 14,
“/Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf”: 11
}

Those are the only thing close to an error I could see.

But this is probably the first run, checking if the collection exists.

How can I run this myself? Can you share the dataset with step by step?