The maximum file size is 20 MB, but it can be increased upon request. However, please note that 3rd party connectors may have their own file size limits, which you can check here.

We use AssemblyAI and DeepGram’s next-gen speech-to-text to transcribe audio to text and then the text is embedded via the embedding model of your choice.

The audio file is stored as the raw file (raw_file) and the text transcript is stored as the parsed file (parsed_text_file).

Carbon supports transcriptions for the following audio and video file formats:

File Formats
MP3
MP2
AAC
WAV
FLAC
PCM
M4A
OGG
OPUS
MPEG
MPG
MP4
WMV
AVI
MOV
MKV
FLV
WEBM