How does Elysia handle collections that have already been pre-chunked?

I’m using Elysia with a Weaviate collection where documents are already pre-chunked before ingestion.

Since Elysia typically uses a chunk-on-demand (post-chunking) approach—storing full documents and creating chunks at query time—I’d like to know:

  • How does Elysia handle collections that are already pre-chunked?

    • Does it use the existing chunks, ignore them, or re-chunk/merge them?
  • Is there a recommended setup for working with pre-chunked data?

  • Can I avoid the latency from post-chunking by leveraging my pre-chunked data or adjusting Elysia’s configuration?

Any guidance on the best way to handle this would be appreciated.

hi @Orly_Mugwaneza !!

Welcome to our community :hugs: !!

Elysia does not currently detect or use pre-chunked data . It implements a chunk-on-demand approach where it creates its own chunked collection at query time, regardless of whether your documents are already chunked

When you query a collection, Elysia evaluates whether chunking is needed based on the content field size. If the mean token count exceeds 400 tokens and the display type is “document”, it triggers chunking.

The chunking process:

  1. Creates a separate collection named ELYSIA_CHUNKED_{collection_name.lower()}__
  2. Chunks full documents using sentence-based chunking (5 sentences per chunk by default)
  3. Stores chunks with references back to the original full documents

Also keep in mind that Elysia is on it’s early stages :smiling_face_with_sunglasses: So this is can change later or get some new features.

Feel free to add your feature request at Elysia’s repo: GitHub - weaviate/elysia: Python package and backend for the Elysia platform app.

Let me know if this helps!

Thanks!