Hi all, we are working on updating our language clients – starting with the Python Client – to make it easier to:
- create and manage collections,
- configure what vectors are used,
- CRUD data operations
- search.
But also to enable modern IDEs to help build with Weaviate. For example, wouldn’t you love to get Intelissense support and get suggestions on what params are available?
We would love to hear your feedback about what we are about to propose below.
Note: the below examples are a subset of use cases that we are looking at, but you should get the general gist and direction we are heading towards.
Collections
A quick note, but not the main topic for this post:
There is a little confusion around
Schemas
andClasses
in Weaviate, so we thought that we could make it easier for everyone to understand what is what, and introduce the concept ofCollections
.
A collection (currently called a class) is where you store your data with vector embeddings.
Create a collection
Creating a new collection should be as easy as:
client.collection.create(name="Articles")
Create a collection - select vectorizer
You could also create a new collection with a vectorizer module:
client.collection.create(
name="Articles",
vectorizerConfig=VectorizerConfig(vectorizer="text2vec-openai")
)
Note: VectorizerConfig
would be defined as a class with a named set of parameters, so your IDE could help you pick the parameters you need to pass. Like this:
@dataclass(frozen=True)
class VectorizerConfig:
alias: str
vectorizer: str
model: str
vectorProperties: list[str]
Create a collection - select model and properties to vector
vc = VectorizerConfig(
vectorizer="text2vec-cohere",
model="multilingual-22-12",
vectorProperties=["title", "description"]
)
client.collection.create(
name="Articles",
vectorizerConfig=vc
)
Note:vectorProperties
indicates which properties should be used for vectorization. Currently, this is done as part of the schema property definition, where we exclude properties from vectorization. Which is a bit of a problem when your data objects are made of several properties, but you only want to vectorize on 1-2 properties.
Create a collection – with property definition
p = [
Property(name="title", description="The title of the article", dataType="string"),
Property(name="content", dataType="string"),
Property(name="url", dataType="string"),
Property(name="img", dataType="blob")
]
client.collection.create(
name="Articles",
vectorizerConfig=vc,
properties=p
)
The property
definition is what most databases out there consider a data schema. This is where we can define properties (and their types) for our data collections. i.e. Articles are made of title, content, url, etc.
Get collection configuration
Getting a collection configuration, should be as simple as calling getConfiguration
:
configuration = client.collection.getConfiguration(name="Articles")
print(configuration.vectorizerConfig)
print(configuration.properties)
Alternatively, we could use the configuration
namespace like this:
configuration = client.collection.configuration.get(name="Articles")
What do you think about these two options?
Update collection configuration
Updating a collection configuration should be done with a call to upadateConfiguration
:
// define properties
p = {...}
// define new vector configuration
v = VectorizerConfig(...)
client.collection.updateConfiguration(
name="Articles",
vectorizerConfig=v,
properties=p
)
Alternatively, it could be done with configuration.update
:
client.collection.configuration.update(...)
Delete Collection
To delete a collection, you can call:
client.collection.delete(name="Articles")
Data Operations
Following the concept of collections
, we propose to introduce collection.data
, which can be used for data operations and search.
Data Insert
For example, to insert a new object, first we can get a data object for the Articles collection. Then we can use the data object (called here “data” to insert a new object, like this:
data = client.collection.data("Articles")
data.insert({ name: "foo", description: "bar"})
Insert multiple objects
data.insert([
{ name: "foo", description: "bar"},
{ name: "ping", description: "pong"},
{ name: "cat", description: "kitten"},
{ name: "dog", description: "puppy"}
])
Data Get
To get a number of objects, you could call get
:
items = data.get(limit=5)
print(items)
Get with a filter
items = data.get(
where=Filter(
property="price",
operator=Operator.GreaterThan,
value=100
),
limit=5
)
Note the use of the enum for the operator: Operator.GreaterThan
This will help you see what filter operators are available and get code predictions in your IDE.
Loop through data in a collection
data = client.collection.data("Articles")
for item in data.iterate(20):
print (item)
Update
To update an object by ID, you could call:
article={ name: "foo", description: "bar"},
data.update_by_id(uuid="1234-1234-1234", object=article)
Delete
To delete objects, we can call data.delete()
.
Delete by ID
To delete an object by ID, you could call:
data.delete(uuid="1234-1234-1234")
Delete where
To delete based on a where filter:
data.delete(
where=Filter(
property="price",
operator=Operator.GreaterThan,
value=100
)
)
Search
Here are a couple examples of how the new syntax for search might look like:
new – nearText
result = data.textSearch(
concept="marvel avengers",
properties=["title", "description"],
limit=10
)
new - nearImage
result = data.imageSearch(
base64=img,
properties=["title", "description", "url"],
where=Filter(property="price", operator="GreaterThan", value=100),
distance=123,
limit=10
)
We will share the examples for search in a separate thread, as that is a whole different discussion.