(Architecture / Design Question - No specific error logs)
I love Weaviate’s hybrid search, but I’m struggling with the architecture around context rot.
If I have a highly relevant 3-year-old chunk and a moderately relevant chunk from yesterday, standard hybrid search still heavily favors the old one because the vector similarity is higher.
Is anyone doing mathematical freshness scoring (like a temporal decay curve) before embedding, or are we all just relying on post-retrieval metadata filters to drop old data? Looking for best practices on how to actually weight recency without destroying relevance.
hi!!
That’s an interesting topic. I don’t think I have seen temporal decay being accounted at the embeddings, but using the metadata in order to rescore the data accordingly.
It’s worth noting that we have recently released TTL, while not directly related to decaying the score based on TTL, but may be something “around” it.
Hi Duda, thanks for the reply!
Exactly—TTL is great for hard expiries, but it acts like a sledgehammer. A 3-year-old JavaScript tutorial might be useless, but a 3-year-old paper on quantum error correction is still highly relevant. You can’t just delete both with a blanket TTL.
I actually ended up building a dedicated infrastructure layer to solve this exact gap because I couldn’t find it anywhere. It runs a deterministic exponential decay function `
final_score = result.metadata.score * exp(-lambda_val * result.properties[‘ageDays’])
` on the metadata before it gets passed to the LLM context window. It calculates a platform-specific half-life (GitHub code decays faster than arXiv papers) and applies a penalty multiplier to the base semantic score.
It effectively gives RAG pipelines a days_until_stale integer, acting as a scalpel instead of a sledgehammer. Would love to get your thoughts on the math behind it if the Weaviate team is exploring this space!
That’s nice! I will raise this to our research team, as this is something that is likely in our radar.
Thanks!!
Awesome, thanks Duda. Happy to share the raw decay math or run a benchmark over a Weaviate cluster if the research team wants to dig into the mechanics. Cheers!