ACORN feedback mega thread

Weaviate 1.27 introduced the new filtering strategy based on the ACORN paper.

According to our internal tests, the ACORN algorithm generally improves the filtered vector search performances, with the most significant improvements in negatively correlated filtered searches.

We are excited for you to try it out! If you have the Python client, you can activate it like so:

from weaviate.classes.config import Configure, Property, DataType, VectorDistances, VectorFilterStrategy

client.collections.create(
    "Article",
    # Additional configuration not shown
    vector_index_config=Configure.VectorIndex.hnsw(
        quantizer=Configure.VectorIndex.Quantizer.bq(),
        ef_construction=300,
        distance_metric=VectorDistances.COSINE,
        filter_strategy=VectorFilterStrategy.ACORN # (Available from Weaviate v1.27.0)
    ),
)

Read more in our 1.27 release blog (Blog | Weaviate) that will be posted later today :wink:

Try it out and let us know what you think!

1 Like

So, I just read through this documentation: Filtering | Weaviate

I’m trying to figure out how to use acorn filtering. Is it an embed option or query option. I don’t use python or java, so how can I utilize it using curl?

hi @SomebodySysop !!

This is a collection level configuration.

You can create a collection with this configuration, or you can change it at any time, as it is a mutable configuration.

This is how you would change it using curl:

First, let’s get a collection definition;

curl --request GET \
  -H "Content-Type: application/json" \
  --url http://localhost:8080/v1/schema/Test

In my case, I got this:

{“class”:“Test”,“invertedIndexConfig”:{“bm25”:{“b”:0.75,“k1”:1.2},“cleanupIntervalSeconds”:60,“stopwords”:{“additions”:null,“preset”:“en”,“removals”:null}},“moduleConfig”:{“text2vec-openai”:{“baseURL”:“https://api.openai.com”,“model”:“text-embedding-3-large”,“vectorizeClassName”:true}},“multiTenancyConfig”:{“autoTenantActivation”:false,“autoTenantCreation”:false,“enabled”:false},“properties”:[{“dataType”:[“text”],“indexFilterable”:true,“indexRangeFilters”:false,“indexSearchable”:true,“moduleConfig”:{“text2vec-openai”:{“skip”:false,“vectorizePropertyName”:true}},“name”:“text”,“tokenization”:“word”}],“replicationConfig”:{“asyncEnabled”:false,“deletionStrategy”:“DeleteOnConflict”,“factor”:1},“shardingConfig”:{“actualCount”:1,“actualVirtualCount”:128,“desiredCount”:1,“desiredVirtualCount”:128,“function”:“murmur3”,“key”:“_id”,“strategy”:“hash”,“virtualPerPhysical”:128},“vectorIndexConfig”:{“bq”:{“enabled”:false},“cleanupIntervalSeconds”:300,“distance”:“cosine”,“dynamicEfFactor”:8,“dynamicEfMax”:500,“dynamicEfMin”:100,“ef”:-1,“efConstruction”:128,“filterStrategy”:“sweeping”,“flatSearchCutoff”:40000,“maxConnections”:32,“pq”:{“bitCompression”:false,“centroids”:256,“enabled”:false,“encoder”:{“distribution”:“log-normal”,“type”:“kmeans”},“segments”:0,“trainingLimit”:100000},“skip”:false,“sq”:{“enabled”:false,“rescoreLimit”:20,“trainingLimit”:100000},“vectorCacheMaxObjects”:1000000000000},“vectorIndexType”:“hnsw”,“vectorizer”:“text2vec-openai”}

Now, we want to change the filter strategy, from:

“filterStrategy”:“sweeping”

to

“filterStrategy”:“acorn”

so our curl will be:

curl \
  --request PUT \
  -H "Content-Type: application/json" \
  --url http://localhost:8080/v1/schema/Test \
  --data '{
  "class":"Test",
  "invertedIndexConfig":{
    "bm25":{
      "b":0.75,
      "k1":1.2
    },
    "cleanupIntervalSeconds":60,
    "stopwords":{
      "additions":null,
      "preset":"en",
      "removals":null
    }
  },
  "moduleConfig":{
    "text2vec-openai":{
      "baseURL":"https://api.openai.com",
      "model":"text-embedding-3-large",
      "vectorizeClassName":true
    }
  },
  "multiTenancyConfig":{
    "autoTenantActivation":false,
    "autoTenantCreation":false,
    "enabled":false
  },
  "properties":[
    {
      "dataType":["text"],
      "indexFilterable":true,
      "indexRangeFilters":false,
      "indexSearchable":true,
      "moduleConfig":{
        "text2vec-openai":{
          "skip":false,
          "vectorizePropertyName":true
        }
      },
      "name":"text",
      "tokenization":"word"
    }
  ],
  "replicationConfig":{
    "asyncEnabled":false,
    "deletionStrategy":"DeleteOnConflict",
    "factor":1
  },
  "shardingConfig":{
    "actualCount":1,
    "actualVirtualCount":128,
    "desiredCount":1,
    "desiredVirtualCount":128,
    "function":"murmur3",
    "key":"_id",
    "strategy":"hash",
    "virtualPerPhysical":128
  },
  "vectorIndexConfig":{
    "bq":{
      "enabled":false
    },
    "cleanupIntervalSeconds":300,
    "distance":"cosine",
    "dynamicEfFactor":8,
    "dynamicEfMax":500,
    "dynamicEfMin":100,
    "ef":-1,
    "efConstruction":128,
    "filterStrategy":"acorn",
    "flatSearchCutoff":40000,
    "maxConnections":32,
    "pq":{
      "bitCompression":false,
      "centroids":256,
      "enabled":false,
      "encoder":{
        "distribution":"log-normal",
        "type":"kmeans"
      },
      "segments":0,
      "trainingLimit":100000
    },
    "sq":{
      "enabled":false,
      "rescoreLimit":20,
      "trainingLimit":100000
    },
    "vectorCacheMaxObjects":1000000000000,
    "skip":false
  },
  "vectorIndexType":"hnsw",
  "vectorizer":"text2vec-openai"
}'

Let me know if this helps!

Thanks!

1 Like

Yes, thank you! Exactly what I needed to know. I was hoping that I could just send “filterStrategy” by itself, but this will work!