How to define a specific Chunking method? What

3Dream · January 22, 2025, 10:32am

Hi there, I am starting to use this amazing tool to implement a robust RAG, but after uploading some documents to a new corpus and examining how these documents were split, I realized that I need to change the chunking method. For this project, I need to define a specific character to serve as a marker for where the documents should be divided (text splitter).

Is this possible in Vectara? How should I proceed?
Thank you.

tallat · January 22, 2025, 2:00pm

We don’t support chunking by specific character. The chunking strategies we currently support are listed under the chunking_strategy field in the API (e.g., Upload a file to the corpus | Vectara Docs).

One option you have is to chunk the documents yourself, and then use our low-level API (aka Core Document) to upload the chunks directly. For details, see CoreDocument here: Add a document to a corpus | Vectara Docs

I hope this helps.

Topic		Replies	Views
Use vectara as vector store with OpenAI embeddings?	1	45	September 16, 2024
Automatic breaking large sections into smaller ones	1	831	July 3, 2023
Langchain with Bearer Token Vectara Platform Q&A	6	783	September 18, 2023
What is an appropriate amount of "text" in a document part, and what is the recommended way to split something? Vectara Platform Q&A	9	1091	January 29, 2023
About Vectara Vectara Platform Q&A	13	959	December 29, 2023

How to define a specific Chunking method? What

Related topics