POST
/
web_scrape

Authorizations

authorization
string
headerrequired

token <token>, corresponds to temporary access tokens.

Body

application/json · object[]
url
string
required
tags
object | null
recursion_depth
integer | null
default: 3
max_pages_to_scrape
integer | null
default: 100
chunk_size
integer | null
default: 1500
chunk_overlap
integer | null
default: 20
skip_embedding_generation
boolean | null
default: false
enable_auto_sync
boolean | null
default: false
generate_sparse_vectors
boolean | null
default: false
prepend_filename_to_chunks
boolean | null
default: false
html_tags_to_skip
string[] | null
css_classes_to_skip
string[] | null
css_selectors_to_skip
string[] | null
embedding_model
enum<string>
default: OPENAI
Available options:
OPENAI,
AZURE_OPENAI,
AZURE_ADA_LARGE_256,
AZURE_ADA_LARGE_1024,
AZURE_ADA_LARGE_3072,
AZURE_ADA_SMALL_512,
AZURE_ADA_SMALL_1536,
COHERE_MULTILINGUAL_V3,
VERTEX_MULTIMODAL,
OPENAI_ADA_LARGE_256,
OPENAI_ADA_LARGE_1024,
OPENAI_ADA_LARGE_3072,
OPENAI_ADA_SMALL_512,
OPENAI_ADA_SMALL_1536,
SOLAR_1_MINI
url_paths_to_include
string[] | null

URL subpaths or directories that you want to include. For example if you want to only include URLs that start with /questions in stackoverflow.com, you will add /questions/ in this input

download_css_and_media
boolean | null
default: false

Whether the scraper should download css and media from the page (images, fonts, etc). Scrapes might take longer to finish with this flag enabled, but the success rate is improved.

Response

200 - application/json

The response is of type any.