I’m experiencing slow data upload rates when migrating a large number of objects (24,500) to the same class in my Weaviate docker instance version 1.23.7, with each object taking over 30 seconds. I suspect this might be related to my class definition and indexing settings.
I do batch uploads with 200 objects per batch (and it takes ~10 min) , and async indexing enabled.
I perform two types of searches:
- Hybrid search to retrieve multiple fields
- Vector search to retrieve content and vector only
Given these searches, I need some guidance:
- Which fields should have
indexfilterable
andindexsearchable
configured as true or false? - For hybrid searches, is there a way to restrict the keyword search to only the “content” field?
- Does the
withFields
parameter in hybrid search influence keyword search? - If indexing is disabled for certain fields, will hybrid search ignore those fields or perform slower searches without indices?
Here’s a sanitized version of the relevant code:
Search for multiple fields using a hybrid query:
const hybridResults = await client.graphql
.get()
.withClassName('DataCollection')
.withFields('detail dataId metadata {properties}')
.withWhere({
path: ["dataId"],
operator: "ContainsAny",
valueTextArray: idArray,
})
.withHybrid({
query: searchQuery,
vector: searchVector,
alpha: semanticWeight,
})
.withLimit(limitResults ?? defaultLimit)
.do();
Vector search to retrieve content and vector:
const vectorResults = await client.graphql
.get()
.withClassName('DataCollection')
.withNearVector({ vector: searchVector })
.withWhere({
path: ["dataId"],
operator: "ContainsAny",
valueTextArray: idArray,
})
.withFields("detail _additional {vector}")
.withLimit(chunkSize)
.do();
Class definition:
const classDefinition = {
class: 'DataCollection',
properties: [
{ name: "dataId", datatype: ["text"] },
{ name: "detail", datatype: ["text"] },
{ name: "metadata", datatype: ["object"], nestedProperties: [
{ name: "pageNo", datatype: ["text"] },
{ name: "webAddress", datatype: ["text"] },
{ name: "heading", datatype: ["text"] },
{ name: "creator", datatype: ["text"] },
{ name: "pageNumber", datatype: ["int"] },
{ name: "fileExtension", datatype: ["text"] },
{ name: "originType", datatype: ["text"] },
],
},
],
vectorIndexConfig: { distance: "cosine" },
};
Would greatly appreciate any insights to optimize my Weaviate setup for better performance.