File Processing

We currently support the following webhook events for file processing:

EventDescription
FILES_CREATEDSent when a user queues up a file to be synced for the first time. The body of the webhook will contain a list of file_ids for files that were created in the same upload, and multiple events could fire for the same upload if a lot of files were queued. The body will contain the upload’s request_id and upload_id.
ALL_UPLOADED_FILES_QUEUEDSent when every single item in an upload has been queued for sync, including all children of folders in an upload. The body will contain the upload’s request_id and upload_id.
FILE_READYSent after a file has been fully processed and added to Carbon.
FILE_SYNCINGSent after a file transitions to the SYNCING sync status. The pre-signed URLs for both the plain text and original file will be available when this webhook is triggered.
FILE_ERRORSent if there was an error during the file processing stage. If an error occurred, the file is requeued for reprocessing up to three times.
FILE_DELETEDSent if a file has been successfully deleted.
RATE_LIMIT_ERRORSent if the organization has reached the rate limit for file processing. In this case, the file is requeued for reprocessing up to three times.
FILE_SYNC_LIMIT_REACHEDSent if a organization user attempts to upload files that would cause them to exceed the organization daily upload limit (2.5GB per day), max file limit, max file per upload limit.
SPARSE_VECTOR_GENERATIONSent when the status of sparse vector generation for a document changes. The webhook payload includes an embedded field in the additional_information object called sparse_vector_queue_status. This field can have one of three possible values: queued, aborted, failed.
FILES_SKIPPEDSent when file processing is skipped. The body of the webhook will contain a list of file_ids for files and a reason in additional_information.
ALL_FILES_PROCESSEDSent when all files in an upload have moved into the READY, SYNC_ERROR, READY_TO_SYNC, or RATE_LIMITED status. The body will contain the upload’s request_id and upload_id.

Data Source Connections

We currently support the following webhook events for managing user connections to data sources:

EventDescription
ADDThis event is fired when a user connects a new account, when an access token is revoked and the user reconnects, and when a user initiates the “connect a new account” flow but connects to an already connected account.
UPDATEThis event is fired when a user selects one or more files for a particular data source. This event is fired once per file selection confirmation, regardless of how many files are selected.
CANCELThis event is fired when a user cancels the authentication flow. It’s important to note that this event cannot capture tab-close events, so directly closing a tab is not within our tracking capabilities.
REVOKEThis event is fired when a user’s access token is revoked via the /revoke_access_token endpoint or when Carbon receives a 401 Unauthorized from the data source using an expired or revoked access token.

Organization

We currently support the following webhook events for organizations:

EventDescription
FILE_STATISTICS_AGGREGATEDThis event is fired when the organization’s request to refresh usage metrics is successfully processed.
CHECKUPWhen Carbon sends a request to a webhook URL, it tracks the number of successful requests per minute (RPM). A request is considered successful if it satisfies the following criteria: 1. Carbon is able to connect to the specified webhook URL. 2. The request doesn’t timeout (our current timeout is 8 seconds). If the number of unsuccessful RPMs exceeds a certain threshold, the webhook URL is flagged, and no further webhooks will be sent to that URL. The owner of the URL will also receive an email notification. Currently, the threshold for flagging a URL is set at 40 unsuccessful RPMs. To verify the health of a flagged URL, Carbon will periodically send health check events every 10 seconds. These events are of type CHECKUP and have an empty payload. If Carbon receives a successful response that meets the above criteria, the URL will be reactivated.

Websites

EventDescription
WEBSCRAPE_URLS_READYThis event is fired when a specific web page from a web scrape request is finished processing.
WEBPAGE_ERRORThis event is fired when a fetch_webpage request failed. The webhook payload includes the corresponding webpage ID.
WEBPAGE_READYThis event is fired when a fetch_webpage request succeeded. The webhook payload includes the corresponding webpage ID.

User

EventDescription
ORGANIZATION_USER_DELETEDThis event is fired after we’ve deleted all the user’s data (files, chunks and embeddings, data sources) and the user (customer-id) itself.

Cold Storage

EventDescription
MOVED_TO_COLD_STORAGEThis event is fired when a file is moved to cold storage.
MOVED_TO_HOT_STORAGEThis event is fired when a file is moved to hot storage.

Example Payload

Our webhook payload returns stringified JSON.

{
    "payload": "{\"webhook_type\": \"FILE_READY\", \"obj\": {\"object_type\": \"FILE\", \"object_id\": \"78687\", \"additional_information\": \"null\"}, \"customer_id\": \"FAKE_ID\", \"timestamp\": \"1700508942\"}"
}