[Architecture] Preventing RAG "Context Rot" – A Deterministic Temporal Decay Layer for Vector Payloads

The Problem: Semantic Similarity vs. Temporal Reality We are all building highly optimized RAG pipelines, but I’ve been running into a persistent bottleneck when deploying agents in regulated or high-velocity domains (like clinical NLP or fintech): Context Rot.

Standard vector retrieval is purely semantic. A vector database will happily return a 3-year-old superseded clinical guideline or a retracted arXiv paper with a 0.95 similarity score. If we feed that directly into the LLM, the agent hallucinates with extreme confidence because the context itself is stale or conflicting.

The Proposed Architecture: A Temporal Governance Layer To fix this, I architected a deterministic routing engine (the Knowledge Universe API) that sits exactly between the vector DB (like Weaviate) and the LLM generation step.

Instead of relying on the LLM to figure out what is outdated, the API intercepts the vector payloads and applies an F1 Context Optimizer with mathematical decay scoring.

Here is how the pipeline operates:

  1. Retrieve: Weaviate pulls the semantically relevant chunks.
  2. Score & Filter: The API evaluates the payload, computing a decay_score based on source age, domain velocity (e.g., “hypersonic” vs. “frozen”), and cross-references for conflict_detection (e.g., superseded regulatory documents).
  3. Cache: Safe paths are heavily cached via a dedicated Redis layer (currently clocking a 26.3x latency speedup on repeated citation graph traversals). For mid-chain blocking decisions, the engine supports a cache_bypass=true hard gate.
  4. Generate: The LLM only receives mathematically validated, temporally fresh context.

Live Trace Example (Clinical NLP Domain): Here is a live snippet of what the API stamps onto the payload when an agent traverses from a stable domain into a rapidly shifting one. Notice how it dynamically tightens the cache TTL to 7 days because it detects a “hypersonic” velocity shift.

JSON

{
  "query": "LLM output validation clinical decision support",
  "temporal_context": {
    "avg_decay_score": 0.08,
    "knowledge_velocity": "hypersonic",
    "half_life_days": 7,
    "stamped_at": "2026-04-29T04:23:39Z"
  },
  "conflict_detection": {
    "conflicts_found": 0,
    "conflict_pairs": []
  },
  "velocity_warning": "80% of sources published in last 90 days. Domain is evolving rapidly. Tightening cache TTL to 7 days."
}

The Sandbox I put together an interactive Colab notebook to stress-test the decay math, the cache routing, and the context cutoff logic.

Google Colab Link Here

Quick update — rather than making everyone run raw curl commands to test the decay math, I just spun up a live sandbox UI. You can type in the topic your RAG agent is currently retrieving (e.g., ‘clinical LLM validation’ or ‘fintech regulatory compliance’) and it will generate the JSON trace, Domain Velocity, and the live Decay Score gauge. Test your payloads here: https://ku-freshness-engine-fwsxfw7up2x9txshqcydf9.streamlit.app/

Let me know what scores your current edge-cases are hitting!

I know the Weaviate community is pushing the boundaries of what’s possible with agentic retrieval. I would highly value any brutal engineering feedback on this routing logic, especially regarding how you handle edge-weighting for stale vectors in your own production graphs.

Hi @VLSiddarth ,

Thanks for this proposal.

I agree that temporal decay is very important to RAG usecases. We are adding a new boosting feature / soft-weighting feature to Weaviate in 1.38 that could help with this.

Draft PR [Draft] Add rank soft-ranking query parameter by trengrj · Pull Request #11103 · weaviate/weaviate · GitHub

It will allow for queries like this, where a boost parameter is provided to interact with the hybrid of vector score. You can boost with a decay curve, via a property (i.e likes or popularity), and also boost a particilar filter (i.e. price > $20).


results = collection.query.near_vector(
    near_vector=vector,
    limit=limit,
    boost=Boost.decay(
            "date_first_available",
            origin=origin or "now",
            scale="200d",
            curve=Boost.Curve.EXPONENTIAL,
            weight=weight, depth=depth,
    ),
    return_metadata=MetadataQuery(distance=True),
    return_properties=[
        "title", "price", "average_rating", "rating_number", "main_category", "image", "date_first_available",
    ],
)

Would be great to get any feedback. Will update this comment.with python client PR when available.

Hi @trengrj, this is fantastic. Native soft-weighting at the DB layer is exactly what the enterprise RAG ecosystem has been missing to combat context rot.

Looking at the syntax, Boost.decay using an EXPONENTIAL curve perfectly aligns with the math we’re running in our temporal governance layer.

Here is where I think the two approaches (Weaviate’s retrieval weighting vs. Knowledge Universe’s payload governance) are complementary, and where I have some engineering feedback for the 1.38 design (#11103), specifically around regulated use cases:

1. RETRIEVAL WEIGHTING VS. PAYLOAD GOVERNANCE Boost.decay() re-ranks what Weaviate returns. KU intercepts what the LLM receives.

  • Retrieval weighting (Weaviate): A stale document with a strong vector score gets down-ranked. Good. But if no fresh document exists for that query, the stale one still wins by default and enters the LLM context.
  • Payload governance (KU): After retrieval, every document gets an explicit decay score stamped onto it. The LLM prompt or agent gate can then decide: “This source is 0.81 decayed in a hypersonic domain — block it entirely.” For regulated pipelines (clinical NLP, financial disclosure), you need hard gates — a stale FDA guideline should never reach the LLM, not just be weighted lower.

2. THE DOMAIN VELOCITY PROBLEM & PR FEEDBACK Having read the draft, here are four things I’d push for before merge:

  • Per-object scale: The current scale="200d" is a collection-level constant. For heterogeneous collections (arXiv papers mixed with GitHub repos mixed with Wikipedia) the correct half-life differs by 10x across source types. Allowing scale to reference a per-object float property would unlock platform-calibrated decay on mixed collections.
  • Depth parameter visibility: The depth parameter added in commit 680b752 is exactly the right control for domain-velocity-aware decay — shallow depth for hypersonic domains (LLM releases, 7-day half-life), deep depth for frozen domains (HTTP spec, 5-year half-life). Is this documented in the Python client example yet? That distinction is non-obvious but critical for regulated use cases.
  • gRPC-only scope: Is REST support planned for a follow-on PR? Most teams prototyping RAG pipelines hit the REST API first. Keeping Boost.decay() gRPC-only means the majority of new Weaviate users won’t encounter it organically.
  • Metadata visibility: Does the current implementation expose the raw boost multiplier in return_metadata alongside the final distance? If not, that’s worth adding — downstream audit systems in regulated pipelines need to log the decay contribution per result, not just the final rank.

The SonarCloud duplication flag (11.9% vs ≤3% threshold) suggests there’s still refactoring to do before merge — good timing to incorporate these if they fit the design.

Overall: Boost.decay() is the right primitive at the right layer. I’d love to build a reference integration showing both layers working together — Weaviate handling decay-weighted retrieval natively, and KU feeding the dynamic domain scale and handling post-retrieval governance before LLM generation.

— V.L. Siddarth