Python `v4` client - feedback megathread!

jphwang · January 19, 2024, 1:11am

Hey @Alan_S - would you mind posting your code snippet for the collection creation here, or even on a separate thread?

Broadly speaking, the syntax should something like this:

client = weaviate.connect_to_local(
    headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)

client.collections.create(
    name="MyCollectionName",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_azure_openai(
        resource_name="<YOUR-RESOURCE-NAME>",
        deployment_id="<YOUR-MODEL-NAME>",
        base_url="<YOUR_BASE_URL>"
    ),
    # Other parameters
)

asido · January 25, 2024, 4:08pm

Hi,

Since updating to 4.4b7 I’m seeing a lot of these warnings and exceptions particularly when batch uploading:

Exception in callback PollerCompletionQueue._handle_events(<_WindowsSele...e debug=False>)()
handle: <Handle PollerCompletionQueue._handle_events(<_WindowsSele...e debug=False>)()>
Traceback (most recent call last):
  File "<redacted env path>\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "src\python\grpcio\grpc\_cython\_cygrpc/aio/completion_queue.pyx.pxi", line 147, in grpc._cython.cygrpc.PollerCompletionQueue._handle_events
BlockingIOError: [WinError 10035] A non-blocking socket operation could not be completed immediately

<redacted env path>\Lib\threading.py:986: ResourceWarning: unclosed <socket.socket fd=4548, family=23, type=1, proto=0, laddr=('::1', 53391, 0, 0), raddr=('::1', 8080, 0, 0)>
  del self._target, self._args, self._kwargs
ResourceWarning: Enable tracemalloc to get the object allocation traceback

<redacted env path>\Lib\site-packages\h11\_headers.py:201: ResourceWarning: unclosed <socket.socket fd=4380, family=23, type=1, proto=0, laddr=('::1', 53368, 0, 0), raddr=('::1', 8080, 0, 0)>
  new_headers.append((raw_name, name, value))
ResourceWarning: Enable tracemalloc to get the object allocation traceback

I saw that since 4.4b7 you have to explicitly close the client connection which I now do, but these exceptions still remain. Not sure if there’s something I’m missing?

DudaNogueira · January 26, 2024, 12:58am

Hi @asido !

Can you see any outstanding logs in Weaviate server when you get one of those?

Also, any code that can reproduce this? It also happens in 4.4b8?

Dirk · January 26, 2024, 4:11am

Hi, this should be fixed in b9. Please let us know if not!

asido · January 26, 2024, 12:43pm

4.4b9 seems to have fixed it, thanks for the quick response!

oss · January 26, 2024, 7:25pm

Hey there,

I am facing configuration issues with the collection generative azure_openai module. I do have my deployment id, resource_name and base url and I can use it to connect using the openai package AzureOpenAI so I know it works.
I am starting to believe the base_url parameter is not being used when I use wvc.Configure.Generative.azure_openai in the collection configuration.

I can see the documentation mentions base_url as an optional parameter but when I try to run the code with a fake resource_name for testing reasons I am seeing that it is not using my base_url but only the resource name to create the connection url.

When using the following configuration:

generative_config=wvc.config.Configure.Generative.azure_openai(
        base_url=azure_base_url,
        resource_name = "fake-resource-name",
        deployment_id="gpt-4-32k",
        temperature=0.1,
        top_p=0.5),

I get the following error:

Query call with protocol GRPC search failed with message send POST request:
 Post "https://fake-resource-name.openai.azure.com/openai/deployments/gpt-4-32k/chat/completions?api-version=2023-03-15-preview":
 dial tcp: lookup fake-resource-name.openai.azure.com on <some_ip>:53: no such host.

As you can see it is not sending the post request to the base_url but it uses the resource_name to build a url. I am on weaviate:1.23.6 .

EDIT: I upgraded to the latest version. Experiencing the same behavior.

I’d appreciate any help.

Thank you.

Alan_S · January 26, 2024, 8:12pm

Hey, thank you for the response. We were able to resolve the issue.

oss · January 26, 2024, 9:16pm

Hi again, after some further inspection I looked into the generative-openai module code and I can see that you guys are not checking for the baseUrl in the buildUrlFn before all else.

Screenshot from
modules/generative-openai/clients/openai.go

As you can see in the code, because you are requiring the user to add both resourceName and deploymentID during the collection generative-openai configuration and you check if those variables are not empty first, you never get to the last line that uses the baseURL. Ideally, the baseUrl should be in the if-statement and have priority over the resourceName because you need the path openai/deployment/deploymentID/chat/completion for Azure and not /v1/chat/completions which is used for OpenAI’s API.

@antas-marcin I saw you last made a change to that file so maybe you would be able to take a look at this and see if I am correct.

Dirk · January 27, 2024, 5:29am

could you make an issue here Issues · weaviate/weaviate · GitHub and then tag marcin in there? Not sure how often he looks at the forum

oss · January 27, 2024, 5:34pm

Okay, thank you for the response Dirk, I will submit the issue there.

shadowlin · January 28, 2024, 2:55am

A strange bug(or my mistake) in 4.4b8 with weaviate 1.23.3

uuid = "b4c4b93e-10ef-5098-b525-cd2bd29f1870"
print("-------------")
result1 = client.collections.get("FileData").query.fetch_objects(
    filters=Filter.by_id().equal(uuid)
)
print("fetch_objects:", result1)

result2 = client.collections.get("FileData").query.fetch_object_by_id(
    uuid=uuid
)
print("fetch_object_by_id:", result2)

I think the two approch should have the same result but it turned out not.

the result is

fetch_objects: QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt(‘b4c4b93e-10ef-5098-b525-cd2bd29f1870’), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={‘hash’: ‘file_hash_0’, ‘status’: 0}, references=None, vector=None, collection=‘FileData’)])

fetch_object_by_id: None

I don’t know why fetch_object_by_id can’t get the objects

EDIT:
After some digging I find under the hood fetch_object_by_id also uses filters=Filter.by_id().equal(uuid) to filter the object but the only differernce is it add a limit=1 parameter.It turns out the limit=1 somehow make it not get the object.
I tried

result1 = client.collections.get("FileData").query.fetch_objects(
    filters=Filter.by_id().equal(uuid),
    limit=1
)

also can’t get the result.

jphwang · February 1, 2024, 4:48pm

Hi @shadowlin - sorry for the late reply.

Are you still seeing this issue (or similar) with the latest version of the client and the server?

For me, Read objects | Weaviate - Vector Database this example works fine. If you are still having issues please let us know.

rjalex · February 13, 2024, 5:23pm

Little benign rant
I fell in love with Weaviate and V3. In just a few days was obtaining very interesting results.
With V4 I am needing to rewrite more or less everything and get stuck again and again.
I find the navigation of classes in the IDE very confusing, the import system too.
Well going to give it a pause and maybe tomorrow the sun will shine brighter.
Take care guys.

jphwang · February 14, 2024, 10:29am

Hi @rjalex - sorry to hear that.

It is a major rewrite, so a learning curve and difficulty is to be expected for sure.

We have an updated guide here: Python | Weaviate - Vector Database, and a migration guide v3 to v4 migration guide | Weaviate - Vector Database.

And the documentation includes examples, in sections like these: Manage data | Weaviate - Vector Database and these: Search | Weaviate - Vector Database.

We’ve also got these videos for moving to the v4 client for v3 users.

I hope they help, and if you have any specific questions please don’t hesitate to post them here or on a new thread!

rjalex · February 14, 2024, 1:10pm

No pain, no gain I’m sure I’ll eventually pick up the passion
Maybe my use case is a tad strange since because of privacy issues we cannot use any external vectorization service and therefore we need to compute embeddings on our own for both the objects and the queries.
A good example of the best practices/patterns of such a situation, in which you have a few properties you want to vectorize, a few which you will only use for keyword search (eg categories names) and some other for boolean queries (eg items from 2023 only), all of this in a V4 fashion, would be so nice to have all in one place, from the declaration of the colllection/class, to the queries, all in one notebook/video.
Thanks a lot for your excellent work.

jphwang · February 14, 2024, 2:55pm

Thanks @rjalex - Hey that’s really interesting.

Would you be open to sharing your workflow? I’m really curious to learn how you’re using Weaviate in your (custom vectorizer) scenario.

If it’s possible for you to share some of that code, I would be also happy to convert some of that to the v4 syntax for you to the best of my ability.

(You can feel free to leave out the actual vectors and data as needed - I can substitute those of course.)

(Feel free to DM me here)

rjalex · February 16, 2024, 7:52pm

Sorry for the late reply. Been traveling. Will DM you, thanks.

Alan_S · May 20, 2024, 8:00pm

Hi Weaviate community. Does anyone know of any ways to calculate cost of generating embeddings while inserting? I haven’t seen anything online for this functionality. My team was thinking to calculate tokens before inserting, getting the token count using tiktoken python package, but if there is a better way it would be great to hear about it. Thanks in advance for any suggestions.

jphwang · May 21, 2024, 8:03am

Hi Alan, unfortunately it’s not possible to do this, at least with Weaviate directly.

You could do it by separately calling the tokenizer, eg from the OpenAI library.

You may be able to estimate it from the input length and typical number of tokens per words, as well as the embedding pricing . But precise calculations are quite difficult as it relies on calling the exact tokenizer, which would vary for each model.

But if someone else has a solution I’d love to hear it!

zhou_yangbo · May 22, 2024, 6:21am

but still AttributeError: ‘Client’ object has no attribute ‘collections’ happen. any idea?

Topic		Replies	Views
AttributeError: 'WeaviateClient' object has no attribute 'query' Support python	3	30	December 9, 2024
What is the best practice to use v4 python client for query with fastapi(or other async python framework) Support	5	789	January 19, 2024
[Feedback] Typescript client `v3.beta` released General developer-experience , feedback , typescript , javascript	2	207	February 23, 2024
[Feedback] Update to the Python client – collections, search, CRUD operations General developer-experience , feedback	18	1295	July 1, 2023
AttributeError: 'WeaviateClient' object has no attribute 'schema' Support	4	950	May 27, 2024

Python `v4` client - feedback megathread!

Related topics