Prometheus metrics showing n/a for class name

Description

I have successfully running my weaviate and also writing data already. Now i am trying to use Prometheus to get the monitoring stuff.
However when i directly port-forward 2112, i can see all metrics having classname are equals to na

for example:

batch_durations_ms_count{class_name="n/a",operation="total_persistence_level",shard_name="n/a"} 10499
batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="10"} 7971
batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="50"} 8915
batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="100"} 10079
batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="500"} 10498
batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="1000"} 10499
batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="5000"} 10499
batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="+Inf"} 10499
batch_durations_ms_sum{class_name="n/a",operation="total_preprocessing",shard_name="n/a"} 214940.27587399905
batch_durations_ms_count{class_name="n/a",operation="total_preprocessing",shard_name="n/a"} 10499
batch_durations_ms_bucket{class_name="n/a",operation="total_uc_level",shard_name="n/a",le="10"} 217
batch_durations_ms_bucket{class_name="n/a",operation="total_uc_level",shard_name="n/a",le="50"} 5912
batch_durations_ms_bucket{class_name="n/a",operation="total_uc_level",shard_name="n/a",le="100"} 6874
batch_durations_ms_bucket{class_name="n/a",operation="total_uc_level",shard_name="n/a",le="500"} 8507
batch_durations_ms_bucket{class_name="n/a",operation="total_uc_level",shard_name="n/a",le="1000"} 9151
batch_durations_ms_bucket{class_name="n/a",operation="total_uc_level",shard_name="n/a",le="5000"} 10496

Server Setup Information

  • Weaviate Server Version: 1.25.0
  • Deployment Method: k8s
  • Multi Node? Number of Running Nodes: 3
  • Client Language and Version: Python weaviate-client==4.5.5
  • Multitenancy?: no

Any additional Information

Hi @Alan_Sun !!

Have you deployed using our helm charts?

I was not able to reproduce this on a single deployment in docker.

I will need to follow up on this to try replicating the same environment.

Can you see any outstanding logs?

Thanks!

Hi @DudaNogueira ,
Yes I am using your official helm chart as following:

|NAME               |NAMESPACE    |REVISION|UPDATED                             |STATUS  |CHART                    |APP VERSION|
|---|---|---|---|---|---|---|
|ssdl-weaviate      |ssdl-weaviate|34      |2024-06-03 14:25:44.220168 +0800 CST|deployed|weaviate-17.0.0          |1.25.0|

Of course we created our collections and inserted data into this collections by using following python code

!pip install "weaviate-client==4.*"
!pip install -U weaviate-client

init get client then

import weaviate.classes.config as wvcc

client.collections.create(
    name="EmilyTest1",
    properties=[
        wvcc.Property(
          name="solution_number",
          data_type=wvcc.DataType.NUMBER
        )
      ],
    replication_config=Configure.replication(
        factor=3
    ),
)

Then batch import

start_time = datetime.datetime.now()
with client.batch.fixed_size(batch_size=200) as batch:
    with open("embedding_3m.pkl", "rb") as f:
        loaded_data = pickle.load(f)
        # objects = ijson.items(f, "item")
        for obj_soln, obj_vector in loaded_data.items():
            properties = {
                "solution_number": obj_soln,
            }
            batch.add_object(
                collection="EmilyTest1",
                properties=properties,
                vector=obj_vector
            )

            # Calculate and display progress
            counter += 1
            if counter % interval == 0:
                print(f"Imported {counter} solutions...")

end_time = datetime.datetime.now()
delta_time = end_time - start_time
print("Time taken:", delta_time)
print(f"Finished importing {counter} solutions.")

Hi!

I believe this is only the case for the totals.

In my environment I get:

batch_durations_ms_count{class_name="Test_Batch",operation="object_storage",shard_name="2uApOMYRXmM7"} 247
....
batch_durations_ms_bucket{class_name="n/a",operation="total_persistence_level",shard_name="n/a",le="10"} 0
.....

So all that entries that has class_name as “n/a” is referring to the overall.

those were my two configurations for the exposed metrics:

Expose metrics on port 2112 for Prometheus to scrape

PROMETHEUS_MONITORING_ENABLED: true
PROMETHEUS_MONITORING_GROUP: false

Let me know if this helps.

Thanks!

Hi,

Yes, i have enabled prometheus monitoring thats why i am able to see the metrics through 2112.
But i am still not seeing class_name even for ms_count.
Are you also testing with batch upload with weaviate-client 4.* ?

batch_durations_ms_bucket{class_name="n/a",operation="total_preprocessing",shard_name="n/a",le="+Inf"} 10509
batch_durations_ms_sum{class_name="n/a",operation="total_preprocessing",shard_name="n/a"} 47322.00007900016
batch_durations_ms_count{class_name="n/a",operation="total_preprocessing",shard_name="n/a"} 10509

Can you check your values.yaml for those variables:

PROMETHEUS_MONITORING_ENABLED: true
PROMETHEUS_MONITORING_GROUP: false

if PROMETHEUS_MONITORING_GROUP is set to true, it will not expose per collection metrics.

Let me know if this helps.

Thanks!

Oh thanks for your tips. Looks good now.

Are you planning to change grouping in a way that it would expose class data? It makes sense to group shards if multi-tenancy is enabled, but it would still be good to see per class metrics

hi @SStalciuss !! Welcome to our community :hugs:

What metrics are you looking for?

We had recently a PR that touches this:

There are probably some more metrics that could be interesting to expose.

I suggest opening a new thread so we can discuss this further :slight_smile:

Thanks!