POST
/
integrations
/
s3
/
files
curl --request POST \
  --url https://api.carbon.ai/integrations/s3/files \
  --header 'Content-Type: application/json' \
  --header 'authorization: <api-key>' \
  --data '{
  "ids": [
    {
      "id": "<string>",
      "bucket": "<string>",
      "prefix": "<string>"
    }
  ],
  "tags": {},
  "chunk_size": 123,
  "chunk_overlap": 123,
  "skip_embedding_generation": true,
  "embedding_model": "OPENAI",
  "generate_sparse_vectors": true,
  "prepend_filename_to_chunks": true,
  "max_items_per_chunk": 1,
  "set_page_as_boundary": false,
  "data_source_id": 123,
  "request_id": "<string>",
  "use_ocr": true,
  "parse_pdf_tables_with_ocr": true,
  "file_sync_config": {
    "auto_synced_source_types": [
      "ARTICLE"
    ],
    "sync_attachments": false,
    "detect_audio_language": false,
    "transcription_service": "assemblyai",
    "include_speaker_labels": false,
    "split_rows": false,
    "generate_chunks_only": false,
    "store_file_only": false,
    "skip_file_processing": false
  }
}'
{
  "success": true
}

Authorizations

authorization
string
header
required

token <token>, corresponds to temporary access tokens.

Body

application/json
ids
object[]
required

Each input should be one of the following: A bucket name, a bucket name and a prefix, or a bucket name and an object key. A prefix is the common path for all objects you want to sync. Paths should end with a forward slash.

tags
object | null
chunk_size
integer | null
default:
1500
chunk_overlap
integer | null
default:
20
skip_embedding_generation
boolean | null
default:
false
embedding_model
enum<string>
Available options:
OPENAI,
AZURE_OPENAI,
AZURE_ADA_LARGE_256,
AZURE_ADA_LARGE_1024,
AZURE_ADA_LARGE_3072,
AZURE_ADA_SMALL_512,
AZURE_ADA_SMALL_1536,
COHERE_MULTILINGUAL_V3,
VERTEX_MULTIMODAL,
OPENAI_ADA_LARGE_256,
OPENAI_ADA_LARGE_1024,
OPENAI_ADA_LARGE_3072,
OPENAI_ADA_SMALL_512,
OPENAI_ADA_SMALL_1536,
SOLAR_1_MINI
generate_sparse_vectors
boolean | null
default:
false
prepend_filename_to_chunks
boolean | null
default:
false
max_items_per_chunk
integer | null

Number of objects per chunk. For csv, tsv, xlsx, and json files only.

Required range: x > 0
set_page_as_boundary
boolean
default:
false
data_source_id
integer | null
request_id
string | null
use_ocr
boolean | null
default:
false
parse_pdf_tables_with_ocr
boolean | null
default:
false
file_sync_config
object | null

Response

200
application/json
Successful Response
success
boolean
required