fix: audio api endpoint filetype check

RFC2046 allows the Content-Type field to have additional parameters
after the main type/subtype information (Section 1).

Following RFC4281, many applications put codec information inside
parameters in the Content-Type. This is especially common for formats
that support many codecs, such as Ogg (RFC5334, Section 4).

The `/api/audio/transcriptions` endpoint is currently rejecting files
that contain parameters in the Content-Type field with a bad request
error.

This commit changes the current check in order to accept any
Content-Type field that begins with a supported type/subtype as listed
in the `supported_filetypes` tuple.

Since Content-Type here is provided by the user, I believe this check
is meant to prevent honest mistakes, like posting a PDF to an audio
processing endpoint, not as a security measure against possibly
malicious use. Therefore, I think it's OK not to validate the rest of
the field.
This commit is contained in:
Hermógenes Oliveira 2025-03-08 17:29:59 -03:00
parent 3b70cd64d7
commit e936d7b53d

View File

@ -625,7 +625,9 @@ def transcription(
):
log.info(f"file.content_type: {file.content_type}")
if file.content_type not in ["audio/mpeg", "audio/wav", "audio/ogg", "audio/x-m4a"]:
supported_filetypes = ("audio/mpeg", "audio/wav", "audio/ogg", "audio/x-m4a")
if not file.content_type.startswith(supported_filetypes):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=ERROR_MESSAGES.FILE_NOT_SUPPORTED,