ACORN feedback mega thread

jphwang · October 29, 2024, 10:08am

Weaviate 1.27 introduced the new filtering strategy based on the ACORN paper.

According to our internal tests, the ACORN algorithm generally improves the filtered vector search performances, with the most significant improvements in negatively correlated filtered searches.

We are excited for you to try it out! If you have the Python client, you can activate it like so:

from weaviate.classes.config import Configure, Property, DataType, VectorDistances, VectorFilterStrategy

client.collections.create(
    "Article",
    # Additional configuration not shown
    vector_index_config=Configure.VectorIndex.hnsw(
        quantizer=Configure.VectorIndex.Quantizer.bq(),
        ef_construction=300,
        distance_metric=VectorDistances.COSINE,
        filter_strategy=VectorFilterStrategy.ACORN # (Available from Weaviate v1.27.0)
    ),
)

Read more in our 1.27 release blog (Blog | Weaviate) that will be posted later today

Try it out and let us know what you think!

SomebodySysop · November 21, 2024, 10:43am

So, I just read through this documentation: Filtering | Weaviate

I’m trying to figure out how to use acorn filtering. Is it an embed option or query option. I don’t use python or java, so how can I utilize it using curl?

DudaNogueira · November 21, 2024, 12:29pm

hi @SomebodySysop !!

This is a collection level configuration.

You can create a collection with this configuration, or you can change it at any time, as it is a mutable configuration.

This is how you would change it using curl:

First, let’s get a collection definition;

curl --request GET \
  -H "Content-Type: application/json" \
  --url http://localhost:8080/v1/schema/Test

In my case, I got this:

{“class”:“Test”,“invertedIndexConfig”:{“bm25”:{“b”:0.75,“k1”:1.2},“cleanupIntervalSeconds”:60,“stopwords”:{“additions”:null,“preset”:“en”,“removals”:null}},“moduleConfig”:{“text2vec-openai”:{“baseURL”:“https://api.openai.com”,“model”:“text-embedding-3-large”,“vectorizeClassName”:true}},“multiTenancyConfig”:{“autoTenantActivation”:false,“autoTenantCreation”:false,“enabled”:false},“properties”:[{“dataType”:[“text”],“indexFilterable”:true,“indexRangeFilters”:false,“indexSearchable”:true,“moduleConfig”:{“text2vec-openai”:{“skip”:false,“vectorizePropertyName”:true}},“name”:“text”,“tokenization”:“word”}],“replicationConfig”:{“asyncEnabled”:false,“deletionStrategy”:“DeleteOnConflict”,“factor”:1},“shardingConfig”:{“actualCount”:1,“actualVirtualCount”:128,“desiredCount”:1,“desiredVirtualCount”:128,“function”:“murmur3”,“key”:“_id”,“strategy”:“hash”,“virtualPerPhysical”:128},“vectorIndexConfig”:{“bq”:{“enabled”:false},“cleanupIntervalSeconds”:300,“distance”:“cosine”,“dynamicEfFactor”:8,“dynamicEfMax”:500,“dynamicEfMin”:100,“ef”:-1,“efConstruction”:128,“filterStrategy”:“sweeping”,“flatSearchCutoff”:40000,“maxConnections”:32,“pq”:{“bitCompression”:false,“centroids”:256,“enabled”:false,“encoder”:{“distribution”:“log-normal”,“type”:“kmeans”},“segments”:0,“trainingLimit”:100000},“skip”:false,“sq”:{“enabled”:false,“rescoreLimit”:20,“trainingLimit”:100000},“vectorCacheMaxObjects”:1000000000000},“vectorIndexType”:“hnsw”,“vectorizer”:“text2vec-openai”}

Now, we want to change the filter strategy, from:

“filterStrategy”:“sweeping”

to

“filterStrategy”:“acorn”

so our curl will be:

curl \
  --request PUT \
  -H "Content-Type: application/json" \
  --url http://localhost:8080/v1/schema/Test \
  --data '{
  "class":"Test",
  "invertedIndexConfig":{
    "bm25":{
      "b":0.75,
      "k1":1.2
    },
    "cleanupIntervalSeconds":60,
    "stopwords":{
      "additions":null,
      "preset":"en",
      "removals":null
    }
  },
  "moduleConfig":{
    "text2vec-openai":{
      "baseURL":"https://api.openai.com",
      "model":"text-embedding-3-large",
      "vectorizeClassName":true
    }
  },
  "multiTenancyConfig":{
    "autoTenantActivation":false,
    "autoTenantCreation":false,
    "enabled":false
  },
  "properties":[
    {
      "dataType":["text"],
      "indexFilterable":true,
      "indexRangeFilters":false,
      "indexSearchable":true,
      "moduleConfig":{
        "text2vec-openai":{
          "skip":false,
          "vectorizePropertyName":true
        }
      },
      "name":"text",
      "tokenization":"word"
    }
  ],
  "replicationConfig":{
    "asyncEnabled":false,
    "deletionStrategy":"DeleteOnConflict",
    "factor":1
  },
  "shardingConfig":{
    "actualCount":1,
    "actualVirtualCount":128,
    "desiredCount":1,
    "desiredVirtualCount":128,
    "function":"murmur3",
    "key":"_id",
    "strategy":"hash",
    "virtualPerPhysical":128
  },
  "vectorIndexConfig":{
    "bq":{
      "enabled":false
    },
    "cleanupIntervalSeconds":300,
    "distance":"cosine",
    "dynamicEfFactor":8,
    "dynamicEfMax":500,
    "dynamicEfMin":100,
    "ef":-1,
    "efConstruction":128,
    "filterStrategy":"acorn",
    "flatSearchCutoff":40000,
    "maxConnections":32,
    "pq":{
      "bitCompression":false,
      "centroids":256,
      "enabled":false,
      "encoder":{
        "distribution":"log-normal",
        "type":"kmeans"
      },
      "segments":0,
      "trainingLimit":100000
    },
    "sq":{
      "enabled":false,
      "rescoreLimit":20,
      "trainingLimit":100000
    },
    "vectorCacheMaxObjects":1000000000000,
    "skip":false
  },
  "vectorIndexType":"hnsw",
  "vectorizer":"text2vec-openai"
}'

Let me know if this helps!

Thanks!

SomebodySysop · November 21, 2024, 7:45pm

Yes, thank you! Exactly what I needed to know. I was hoping that I could just send “filterStrategy” by itself, but this will work!

Topic		Replies	Views
Advice Needed on Optimizing Vector Search in Weaviate Support	1	278	September 6, 2024
New weaviate version filtering issue Support bug , integration , technical	3	234	October 1, 2024
Weaviate FAQ Resources	1	1786	June 20, 2023
Performance Issue when Extracting Documents with Field Filter in Weaviate Support	3	745	June 21, 2023
VectorIndexConfig not effective? Support	7	660	June 15, 2023

ACORN feedback mega thread

Related topics