Failing to upload docx file using file upload API

I’m trying to upload .docx files using Python but the request fails with the following reason: Content-Type 'multipart/form-data; boundary=faf8b18ba37a8f5539d693a894d10cfb' is not supported

The docs mentioned I should be using multipart/form-data, so that error is confusing to me.

From: File Upload API Definition | Vectara Docs

The endpoint expects an multipart/form-data POST request that includes the following http parameters:

Here’s the code:

def upload(fh: io.BytesIO, title: str, extension: str, mimetype: str, corpus_id=2):
    token = _get_jwt_token()
    post_headers = {"Authorization": f"Bearer {token}"}
    response = requests.post(
        f"https://indexing.vectara.io/v1/upload/?c={CUSTOMER_ID}&o={corpus_id}",
        files={"file": (f"{title}{extension}", fh, mimetype)},
        verify=True,
        headers=post_headers,
    )

    if response.status_code != 200:
        logging.error(
            "REST upload failed with code %d, reason %s, text %s",
            response.status_code,
            response.reason,
            response.text,
        )
        return response, False
    return response, True

Hi Daniel,

The problem seems to be wrong indexing address. It should be api.vectara.io instead of indexing.vectara.io
This indexing address is mentioned here in the github repo.

The full post url should be:
f"https://api.vectara.io/v1/upload?c={CUSTOMER_ID}&o={corpus_id}"
(Notice that I have removed an extra slash as well after upload)

Also, two small things.

  1. There is no dot between title and extension in your example code. That might be irrelevant though if extension already has a dot in it.
  2. Make sure that you are passing the correct mimeType

Hi! Thanks for the suggestion. It kinda works now. Now I’m getting another error about the request entity being too large:

{"httpCode":400,"internalCode":3,"details":"Request entity too large."}

The size is 436179 bytes, or about 3.5MB:

(Pdb) method_len = len(response.request.method)
(Pdb) url_len = len(response.request.url)
(Pdb) headers_len = len('\r\n'.join('{}{}'.format(k, v) for k, v in response.request.headers.items()))
(Pdb) body_len = len(response.request.body if response.request.body else [])
(Pdb) print(f'Request size {method_len + url_len + headers_len + body_len}')
Request size 436179

The docs say the max file size is 10MB though. Am I doing something wrong? Here’s the new code:

def upload(fh: io.BytesIO, title: str, extension: str, mimetype: str, corpus_id=2):
    token = _get_jwt_token()
    post_headers = {
        "Authorization": f"Bearer {token}",
        "Content-Type": "multipart/form-data",
    }
    response = requests.post(
        f"https://api.vectara.io/v1/upload?c={CUSTOMER_ID}&o={corpus_id}&d=true",
        files={"file": (f"{title}{extension}", fh, mimetype)},
        headers=post_headers,
        data={"c": CUSTOMER_ID, "o": corpus_id, "d": True},
    )

    if response.status_code != 200:
        logging.error(
            "REST upload failed with code %d, reason %s, text %s",
            response.status_code,
            response.reason,
            response.text,
        )
        return response, False
    return response, True

Nevermind, I was passing the jwt wrong :sweat_smile: Thanks for the help!

1 Like

Whew! We were digging in trying to find something that could have gone wrong and hadn’t uncovered anything. Thanks for letting us know and thanks for the feedback!

Also, I’d be curious if you found indexing.vectara.io somewhere recent. We made a switch from h.indexing.vectara.io (which does not have the /v1/ path part) to api.vectara.io (which does) in the docs a few weeks ago, and I want to check if we missed something in the docs or examples that currently say indexing.vectara.io instead of api.vectara.io that caused your first problem?

Sorry I didn’t see this before, I can’t remember. But i guess I mixed the grpc endpoint with the rest endpoint.