Failure in retrieving content from a backup restored collection created on a different machine

Description

I am running locally on my laptop the assignments from the DeepLearning.ai course: Building Multimodal Search and RAG

I am facing issue in the assignment L5 Building Multimodal Search and RAG - DeepLearning.AI.
In this assignment we load the backup collection (containing images and videos) provided in the course:

client.backup.restore(
backup_id=“resources-img-and-vid”,
include_collections=collection_name,
backend=“filesystem”
)

I am able to get the count of images and videos.

The error comes while retrieving similar content based on a text query. (This is a multimodal assignment where we do retrieve images and videos based on text query).

resources = client.collections.get(collection_name)
response = resources.query.near_text(
query=query,
filters=Filter.by_property(“mediaType”).equal(“image”), # return only image objects
return_properties=[“path”],
limit=1
)

Error stack:

python3.11/site-packages/weaviate/collections/grpc/query.py:618) raise WeaviateQueryError(e.details(), "GRPC search") WeaviateQueryError: Query call with protocol GRPC search failed with message explorer: get class: vectorize params: vectorize params: vectorize params: vectorize keywords: remote client vectorize: connection to Google PaLM failed with status: 403 error: Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/semi-random-dev/locations/us-central1/publishers/google/models/multimodalembedding@001' (or it may not exist)..

My understanding for the reason of this failure:

The collection was likely created using the below command as mentioned in L2 assignment: Building Multimodal Search and RAG - DeepLearning.AI

client.collections.create(
    name=collection_name,
    vectorizer_config=Configure.Vectorizer.multi2vec_palm(
        image_fields=["image"],
        video_fields=["video"],
        project_id="semi-random-dev",
        location="us-central1",
        model_id="multimodalembedding@001",
        dimensions=1408
    )
)

The project_id mentioned is semi-random-dev. Whereas the project_id of my project in Google Cloud is different.

Question: Is it possible to change the project_id in the vectorizer_config of the restored collection?

Server Setup Information

  • Weaviate Server Version: weaviate-client==4.5.4
  • Deployment Method: embedded
  • Multi Node? Number of Running Nodes:
  • Client Language and Version: Python 3.11
  • Multitenancy?:

Any additional Information

hi @kaushik_acharya !!

Welcome to our community :hugs:

It looks like it is indeed what you mentioned.

You may need to create the collection with the project name you have access to.

Unfortunately, those infos are not mutable (check here for a list of mutability configs of a collection).

What you can do, in order to change immutable configs of a collection is to reindex your data using our migration guide.

Let me know if this helps!

Hi @DudaNogueira
First of all thanks for the reply.

I am facing issue similar to the one mentioned in Embedded Weaviate Port 6060 - Support - Weaviate Community Forum

As mentioned in the above thread, I added the environment variable:
"GO_PROFILING_DISABLE": "true"
in weaviate.connect_to_embedded
which gives the message:
listen tcp :6060: bind: address already in use
and throws the error

WeaviateStartUpError: Embedded DB did not start listening on port 8090 within 30 seconds

I have passed a different port and grpc_port for the target weaviate instance.

hi! There is probably something running also on port 6060, so Weaviate embedded was not able to start.

Also, try using a new client version.

The command lsof -i:6060 shows

COMMAND     PID    USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
weaviate- 12670 kaushik    8u  IPv6 3175934      0t0  TCP *:6060 (LISTEN)

Yes, this is probably some conflict on embedded running on port 6060 for profiling.

Can you try moving away from embedded? This is only meant to allow a quick run on Weaviate directly from the client.

you can copy over your data from the Embedded Weaviate path to the same path at PERSISTENCE_DATA_PATH at a docker instance for example.

Also, have you tried running on a newer client?

Thanks!

Using weaviate client version 4.9.0 on embedded also throws the same error:

WeaviateStartUpError: Embedded DB did not start listening on port 8090 within 30 seconds

Only difference is the newer client version is not showing the message

listen tcp :6060: bind: address already in use

I didn’t tried docker instance as suggested by you.