Web Scrape

POST

web_scrape

curl --request POST \
  --url https://api.carbon.ai/web_scrape \
  --header 'Content-Type: application/json' \
  --header 'authorization: <api-key>' \
  --data '[
  {
    "url": "<string>",
    "tags": {},
    "recursion_depth": 1,
    "max_pages_to_scrape": 2,
    "chunk_size": 123,
    "chunk_overlap": 123,
    "skip_embedding_generation": true,
    "enable_auto_sync": true,
    "generate_sparse_vectors": true,
    "prepend_filename_to_chunks": true,
    "html_tags_to_skip": [
      "<string>"
    ],
    "css_classes_to_skip": [
      "<string>"
    ],
    "css_selectors_to_skip": [
      "<string>"
    ],
    "embedding_model": "OPENAI",
    "url_paths_to_include": [
      "<string>"
    ],
    "download_css_and_media": true,
    "generate_chunks_only": false,
    "store_file_only": false,
    "use_premium_proxies": false
  }
]'

"<any>"

Authorizations

authorization

string

header

required

token <token>, corresponds to temporary access tokens.

Body

application/json · object[]

url

string

required

Response

200

application/json

Successful Response

The response is of type any.

List Users Scrape Sitemap

curl --request POST \
  --url https://api.carbon.ai/web_scrape \
  --header 'Content-Type: application/json' \
  --header 'authorization: <api-key>' \
  --data '[
  {
    "url": "<string>",
    "tags": {},
    "recursion_depth": 1,
    "max_pages_to_scrape": 2,
    "chunk_size": 123,
    "chunk_overlap": 123,
    "skip_embedding_generation": true,
    "enable_auto_sync": true,
    "generate_sparse_vectors": true,
    "prepend_filename_to_chunks": true,
    "html_tags_to_skip": [
      "<string>"
    ],
    "css_classes_to_skip": [
      "<string>"
    ],
    "css_selectors_to_skip": [
      "<string>"
    ],
    "embedding_model": "OPENAI",
    "url_paths_to_include": [
      "<string>"
    ],
    "download_css_and_media": true,
    "generate_chunks_only": false,
    "store_file_only": false,
    "use_premium_proxies": false
  }
]'

"<any>"

API Documentation

Health

Auth

Files

User

Data Source

Gitbook

S3

SharePoint

GitHub

Gmail

Slack

Outlook

Organizations

Tags

Chunks / Embeddings

Retrieval

Webhooks

White Labeling

CRM

Web Scrape

Authorizations

Body

Response