S3
The Carbon Connect enabledIntegrations
value for AWS S3 is S3
.
Authorization Type
Carbon uses Access Key based authentication to connect to S3.
Functionality
Carbon allows users to upload text, image, video. and audio file formats directly from Spaces Object Storage.
Synchronization
Step 1: Generate IAM User and Access Key
Create an IAM user in AWS with the following permissions:
- List all buckets: Grant permissions to list all S3 buckets.
- Read access to specific buckets and objects: Allow read access to the buckets and objects intended for synchronization in Carbon. Ensure that new buckets added in the future grant access to this IAM user for object retrieval.
Step 2: Provide Credentials to Carbon
Via API: POST
the generated credentials to the /integrations/s3
endpoint in this format:
{
"access_key": "<<your-access-key>>",
"access_key_secret": "<<access-key-secret>>"
}
Via Carbon Connect: Optionally you can input the credentials directly via the Carbon Connect component.
Step 3: List Files and Buckets
Optionally load all available S3 buckets and objects by using POST /integrations/items/sync
and list them via POST /integrations/items/list
.
Step 4: Sync Files or Buckets
Use the /integrations/s3/files
endpoint to sync specific file objects or buckets.
For syncing specific files (multiple files allowed):
{
"ids": [{"bucket": "bucket-name", "id": "object-key"}],
"tags": "...",
"chunk_size": "...",
"chunk_overlap": "...",
"skip_embedding_generation": "..."
}
For syncing all files from buckets (multiple buckets allowed):
{
"ids": [{"bucket": "bucket-name"}],
"tags": "...",
"chunk_size": "...",
"chunk_overlap": "...",
"skip_embedding_generation": "..."
}
You can also use the resync_file
API endpoint to programmatically resync specific S3 files. To delete S3 files from Carbon, you can use the delete_file
endpoint directly.
To sync S3 files on a 24-hour schedule (more frequent schedules available upon request), you can use the /update_users
endpoint. This endpoint allows organizations to customize syncing settings according to their requirements, with the option to enable syncing for all data sources using the string ‘ALL’. It’s important to note that each request supports up to 100 customer IDs.
Here’s an example illustrating how to automatically enable syncing for updated S3 content for specified users:
{
"customer_ids": ["team@carbon.ai", "sam@openai.com"],
"auto_sync_enabled_sources": ["S3"]
}