All text, audio and image files are supported file formats via S3.

Authorization Type

Carbon uses Access Key based authentication to connect to S3.

Functionality

Carbon allows users to upload text and audio file formats directly from S3.

Synchronization

Step 1: Generate IAM User and Access Key

Create an IAM user in AWS with the following permissions:

  • List all buckets: Grant permissions to list all S3 buckets.
  • Read access to specific buckets and objects: Allow read access to the buckets and objects intended for synchronization in Carbon. Ensure that new buckets added in the future grant access to this IAM user for object retrieval.

Step 2: Provide Credentials to Carbon

POST the generated credentials to the /integrations/s3 endpoint in this format:

{
    "access_key": "<<your-access-key>>",
    "access_key_secret": "<<access-key-secret>>"
}

Step 3: List Files and Buckets

Optionally load all available S3 buckets and objects by using POST /integrations/items/sync and list them via POST /integrations/items/list.

Step 4: Sync Files or Buckets

Use the /integrations/s3/files endpoint to sync specific file objects or buckets.

For syncing specific files (multiple files allowed):

{
    "ids": [{"bucket": "bucket-name", "id": "object-key"}],
    "tags": "...",
    "chunk_size": "...",
    "chunk_overlap": "...",
    "skip_embedding_generation": "..."
}

For syncing all files from buckets (multiple buckets allowed):

{
    "ids": [{"bucket": "bucket-name"}],
    "tags": "...",
    "chunk_size": "...",
    "chunk_overlap": "...",
    "skip_embedding_generation": "..."
}

You can also use the resync_file API endpoint to programmatically resync specific S3 files. To delete S3 files from Carbon, you can use the delete_file endpoint directly.

To sync S3 files on a 24-hour schedule (more frequent schedules available upon request), you can use the /update_users endpoint. This endpoint allows organizations to customize syncing settings according to their requirements, with the option to enable syncing for all data sources using the string ‘ALL’. It’s important to note that each request supports up to 100 customer IDs.

Here’s an example illustrating how to automatically enable syncing for updated S3 content for specified users:

{
    "customer_ids": ["team@carbon.ai", "sam@openai.com"],
    "auto_sync_enabled_sources": ["RSS"]
}