What is the actual file size limit when uploading to a corpus?

In the file upload API documentation I see that “The maximum file size supported by the server is 10 MB.” My use case involves hundreds of PDFs, most of which exceed 10 megabytes. In testing I’ve been able to upload large PDFs and only get an error at files which exceed 50 MB.

What is the actual limit? And can it be exceeded in any way?

I believe 10mb refers to the maximum limit on a single gRPC request. The file upload API, which is an HTTP endpoint, allows files up to 50MB.

Right now, that’s a hard limit, but we could consider adjusting it. Are you able to share the distribution of file sizes that you’re dealing with? One workaround for very large files is to handle the text extraction on your end, and use the gRPC endpoint to index the extracted documents.

1 Like

Thank you, @aahmad. I can work with this limit for the moment as I’m currently in development and testing mode. I’ll come back if it proves to be a real impediment.