Hi @sebawita - thank you for taking time and giving me light on this path
Agreed, it has a lot of questions but you managed to give path forward to each - so thank you so much!
Please see below my replies (some follow-up question ).
Full Name - importance
Thank you so much on this - I wonder how I missed this Weight-boost capability in BM25 when I first implemented the Hybrid search.
Just a qq though, so if do this - I need now to declare all my targeted properties for keyword search - right?
properties=["full_name^2", "marketing_pitch", "xxx", "yyy", "zzz", so on]
there is no way to just say, boost the full_name in BM25 search but still include the rest? If not, I am still ok with this and will try to start checking this out in a couple of days.
Also, if I do something like below where I only wanted to declare all the non-vectorized props and boost full name only:
response = (
client.query
.get("MyCollection", ["full_name", "marketing_pitch", "some_other_prop"])
.with_hybrid(
query="Jon Doe doing something",
properties=["full_name^2", "languages", "...other non-vectorized props"],
alpha=0.5
)
.do()
)
will the vector search of the hybrid still work on my vectorized props even if they are not declared under properties (e.g., marketing_pitch, exp story) ?
Running Hybrid search on not vectorized properties
Thanks for confirming this.
Regarding the tokenization method, we are using the default “word” so it should work with search strings as shown here Tokenization and Search Filtering but I think because I am using hybrid with alpha=0.5 so somehow the vector scoring affects the result but will try it out once I played around with the boost props.
What vectorizer do you use?
We are using the Weaviate module: text2vec-transformers where we use the pre-built image:
semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
To your knowledge, is there any recommendation to use for higher tokens (>250) which is part of Weaviate pre-built images?
What do you mean by “through languages”?
Sorry for the confusion, but this is just in our Profile model - this is just a collection of languages that our users can speak so it has nothing to do with multi-language searching - we are just using English in our platform for now.
I just gave this as an example that this field languages[] is not vectorized but will searching it be part of the hybrid search? It is of type text[] and not text type - will it still be part of the hybrid search?
Thank you!