How to best handle domain specific acronyms/abbreviations?

Description

Hi,

I’ve noticed that when using the text2vec-contextionary, one can add custom words or abbreviations (referred to as “concepts”) through the v1/modules/text2vec-contextionary/extensions/ endpoint. However, I am not utilizing the Contextionary model in my implementation.

I am already using Hybrid Search. Hence, the only solution I’ve identified is to fine-tune the embedding model I’m using to better capture the context and meaning of these abbreviations. While this approach seems promising, I’m wondering if there are any alternative methods or best practices within Weaviate that could help with this issue.

Has anyone faced a similar challenge or found effective ways to handle abbreviations in their embeddings without relying on Contextionary? Any insights or suggestions would be greatly appreciated!

hi @srbk95 !!

Welcome to our community :hugs:
Sorry for the delay here.

That’s an interesting question.

The text2vec-contextionary is really not being used according to some stats we have, so I don’t think that’s an interesting path to follow.

fine tuning your model may be a long and expensive path.

One alternative way I can think of is to create your own collection of abbreviations. Before performing the hybrid search, you will search that abbreviation collection and add the most relevant abbreviations to the prompt. :thinking:

Another alternative is to add those abbreviations before ingesting the content…

Any way, let me know if you were able to come with some alternatives.

Thanks!