Failure in retrieving content from a backup restored collection created on a different machine

kaushik_acharya · October 5, 2024, 7:25am

Description

I am running locally on my laptop the assignments from the DeepLearning.ai course: Building Multimodal Search and RAG

I am facing issue in the assignment L5 Building Multimodal Search and RAG - DeepLearning.AI.
In this assignment we load the backup collection (containing images and videos) provided in the course:

client.backup.restore(
backup_id=“resources-img-and-vid”,
include_collections=collection_name,
backend=“filesystem”
)

I am able to get the count of images and videos.

The error comes while retrieving similar content based on a text query. (This is a multimodal assignment where we do retrieve images and videos based on text query).

resources = client.collections.get(collection_name)
response = resources.query.near_text(
query=query,
filters=Filter.by_property(“mediaType”).equal(“image”), # return only image objects
return_properties=[“path”],
limit=1
)

Error stack:

python3.11/site-packages/weaviate/collections/grpc/query.py:618) raise WeaviateQueryError(e.details(), "GRPC search") WeaviateQueryError: Query call with protocol GRPC search failed with message explorer: get class: vectorize params: vectorize params: vectorize params: vectorize keywords: remote client vectorize: connection to Google PaLM failed with status: 403 error: Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/semi-random-dev/locations/us-central1/publishers/google/models/multimodalembedding@001' (or it may not exist)..

My understanding for the reason of this failure:

The collection was likely created using the below command as mentioned in L2 assignment: Building Multimodal Search and RAG - DeepLearning.AI

client.collections.create(
    name=collection_name,
    vectorizer_config=Configure.Vectorizer.multi2vec_palm(
        image_fields=["image"],
        video_fields=["video"],
        project_id="semi-random-dev",
        location="us-central1",
        model_id="multimodalembedding@001",
        dimensions=1408
    )
)

The project_id mentioned is semi-random-dev. Whereas the project_id of my project in Google Cloud is different.

Question: Is it possible to change the project_id in the vectorizer_config of the restored collection?

Server Setup Information

Weaviate Server Version: weaviate-client==4.5.4
Deployment Method: embedded
Multi Node? Number of Running Nodes:
Client Language and Version: Python 3.11
Multitenancy?:

Any additional Information

DudaNogueira · October 6, 2024, 5:49pm

hi @kaushik_acharya !!

Welcome to our community

It looks like it is indeed what you mentioned.

You may need to create the collection with the project name you have access to.

Unfortunately, those infos are not mutable (check here for a list of mutability configs of a collection).

What you can do, in order to change immutable configs of a collection is to reindex your data using our migration guide.

Let me know if this helps!

kaushik_acharya · October 13, 2024, 1:52pm

Hi @DudaNogueira
First of all thanks for the reply.

I am facing issue similar to the one mentioned in Embedded Weaviate Port 6060 - Support - Weaviate Community Forum

As mentioned in the above thread, I added the environment variable:
"GO_PROFILING_DISABLE": "true"
in weaviate.connect_to_embedded
which gives the message:
listen tcp :6060: bind: address already in use
and throws the error

WeaviateStartUpError: Embedded DB did not start listening on port 8090 within 30 seconds

I have passed a different port and grpc_port for the target weaviate instance.

DudaNogueira · October 14, 2024, 11:10pm

hi! There is probably something running also on port 6060, so Weaviate embedded was not able to start.

Also, try using a new client version.

kaushik_acharya · October 15, 2024, 7:57pm

The command lsof -i:6060 shows

COMMAND     PID    USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
weaviate- 12670 kaushik    8u  IPv6 3175934      0t0  TCP *:6060 (LISTEN)

DudaNogueira · October 16, 2024, 2:30am

Yes, this is probably some conflict on embedded running on port 6060 for profiling.

Can you try moving away from embedded? This is only meant to allow a quick run on Weaviate directly from the client.

you can copy over your data from the Embedded Weaviate path to the same path at PERSISTENCE_DATA_PATH at a docker instance for example.

Also, have you tried running on a newer client?

Thanks!

kaushik_acharya · October 26, 2024, 10:20am

Using weaviate client version 4.9.0 on embedded also throws the same error:

WeaviateStartUpError: Embedded DB did not start listening on port 8090 within 30 seconds

Only difference is the newer client version is not showing the message

listen tcp :6060: bind: address already in use

I didn’t tried docker instance as suggested by you.

Topic		Replies	Views
Error in adding a video object to collection Support	1	42	July 25, 2024
Issue with collections when using Google's text2vec-palm Support bug	2	213	June 24, 2024
GRPC Query failed AioRpcError of RPC terminated status UNAVAILABLE Support python	5	1069	December 20, 2024
Text and multimodal embedding configure Support	1	191	June 10, 2024
GRPC Resource Exhausted Error Support	2	647	January 20, 2025

Failure in retrieving content from a backup restored collection created on a different machine

Description

Server Setup Information

Any additional Information

Related topics