Prometheus metrics showing n/a for class name

Description

I have successfully running my weaviate and also writing data already. Now i am trying to use Prometheus to get the monitoring stuff.
However when i directly port-forward 2112, i can see all metrics having classname are equals to na

for example:

batch_durations_ms_count{class_name="n/a",operation="total_persistence_level",shard_name="n/a"} 10499
batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="10"} 7971
batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="50"} 8915
batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="100"} 10079
batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="500"} 10498
batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="1000"} 10499
batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="5000"} 10499
batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="+Inf"} 10499
batch_durations_ms_sum{class_name="n/a",operation="total_preprocessing",shard_name="n/a"} 214940.27587399905
batch_durations_ms_count{class_name="n/a",operation="total_preprocessing",shard_name="n/a"} 10499
batch_durations_ms_bucket{class_name="n/a",operation="total_uc_level",shard_name="n/a",le="10"} 217
batch_durations_ms_bucket{class_name="n/a",operation="total_uc_level",shard_name="n/a",le="50"} 5912
batch_durations_ms_bucket{class_name="n/a",operation="total_uc_level",shard_name="n/a",le="100"} 6874
batch_durations_ms_bucket{class_name="n/a",operation="total_uc_level",shard_name="n/a",le="500"} 8507
batch_durations_ms_bucket{class_name="n/a",operation="total_uc_level",shard_name="n/a",le="1000"} 9151
batch_durations_ms_bucket{class_name="n/a",operation="total_uc_level",shard_name="n/a",le="5000"} 10496

Server Setup Information

  • Weaviate Server Version: 1.25.0
  • Deployment Method: k8s
  • Multi Node? Number of Running Nodes: 3
  • Client Language and Version: Python weaviate-client==4.5.5
  • Multitenancy?: no

Any additional Information

Hi @Alan_Sun !!

Have you deployed using our helm charts?

I was not able to reproduce this on a single deployment in docker.

I will need to follow up on this to try replicating the same environment.

Can you see any outstanding logs?

Thanks!

Hi @DudaNogueira ,
Yes I am using your official helm chart as following:

|NAME               |NAMESPACE    |REVISION|UPDATED                             |STATUS  |CHART                    |APP VERSION|
|---|---|---|---|---|---|---|
|ssdl-weaviate      |ssdl-weaviate|34      |2024-06-03 14:25:44.220168 +0800 CST|deployed|weaviate-17.0.0          |1.25.0|

Of course we created our collections and inserted data into this collections by using following python code

!pip install "weaviate-client==4.*"
!pip install -U weaviate-client

init get client then

import weaviate.classes.config as wvcc

client.collections.create(
    name="EmilyTest1",
    properties=[
        wvcc.Property(
          name="solution_number",
          data_type=wvcc.DataType.NUMBER
        )
      ],
    replication_config=Configure.replication(
        factor=3
    ),
)

Then batch import

start_time = datetime.datetime.now()
with client.batch.fixed_size(batch_size=200) as batch:
    with open("embedding_3m.pkl", "rb") as f:
        loaded_data = pickle.load(f)
        # objects = ijson.items(f, "item")
        for obj_soln, obj_vector in loaded_data.items():
            properties = {
                "solution_number": obj_soln,
            }
            batch.add_object(
                collection="EmilyTest1",
                properties=properties,
                vector=obj_vector
            )

            # Calculate and display progress
            counter += 1
            if counter % interval == 0:
                print(f"Imported {counter} solutions...")

end_time = datetime.datetime.now()
delta_time = end_time - start_time
print("Time taken:", delta_time)
print(f"Finished importing {counter} solutions.")

Hi!

I believe this is only the case for the totals.

In my environment I get:

batch_durations_ms_count{class_name="Test_Batch",operation="object_storage",shard_name="2uApOMYRXmM7"} 247
....
batch_durations_ms_bucket{class_name="n/a",operation="total_persistence_level",shard_name="n/a",le="10"} 0
.....

So all that entries that has class_name as “n/a” is referring to the overall.

those were my two configurations for the exposed metrics:

Expose metrics on port 2112 for Prometheus to scrape

PROMETHEUS_MONITORING_ENABLED: true
PROMETHEUS_MONITORING_GROUP: false

Let me know if this helps.

Thanks!

Hi,

Yes, i have enabled prometheus monitoring thats why i am able to see the metrics through 2112.
But i am still not seeing class_name even for ms_count.
Are you also testing with batch upload with weaviate-client 4.* ?

batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="+Inf"} 10509
batch_durations_ms_sum{class_name="n/a",operation="total_preprocessing",shard_name="n/a"} 47322.00007900016
batch_durations_ms_count{class_name="n/a",operation="total_preprocessing",shard_name="n/a"} 10509

Can you check your values.yaml for those variables:

PROMETHEUS_MONITORING_ENABLED: true
PROMETHEUS_MONITORING_GROUP: false

if PROMETHEUS_MONITORING_GROUP is set to true, it will not expose per collection metrics.

Let me know if this helps.

Thanks!

Oh thanks for your tips. Looks good now.

1 Like

Are you planning to change grouping in a way that it would expose class data? It makes sense to group shards if multi-tenancy is enabled, but it would still be good to see per class metrics

hi @SStalciuss !! Welcome to our community :hugs:

What metrics are you looking for?

We had recently a PR that touches this:

There are probably some more metrics that could be interesting to expose.

I suggest opening a new thread so we can discuss this further :slight_smile:

Thanks!