The WebScraper module provides a lightweight web scraping service that allows scripts to reliably load web pages and retrieve their HTML content or extract structured data from them.
The module loads web pages in a controlled environment and supports multiple waiting strategies (such as DOM completion, network idle, or waiting for specific elements). These strategies help ensure stability when scraping modern websites that rely on dynamic rendering.
Typical use cases include:
- Retrieving the final rendered HTML of a page
- Extracting structured data by running JavaScript in the page context
- Waiting for dynamic content to load before scraping
- Writing stable automation scripts for web data extraction
All operations are executed as asynchronous tasks. Each task has a unique taskId, which can be used to track or cancel the task.
Type Definitions PRO
WaitOptions
Defines the strategy used to determine when a page has finished loading.
Available Modes
"domComplete"
Waits until document.readyState === "complete".
Suitable for:
- Static websites
- Simple pages
- Websites that do not rely heavily on asynchronous loading
Example:
"networkIdle"
Waits until the page network activity becomes idle.
When no new network requests occur within a period of time, the page is considered fully loaded.
Suitable for:
- SPA applications
- Pages that fetch data via APIs
- Websites with dynamic content loading
Example:
You may also specify the idle duration:
"selector"
Waits until a specific DOM element appears.
When an element matching the selector exists in the document, the task continues.
Suitable for:
- Dynamically rendered content
- Waiting for specific UI components
- Precisely controlling when scraping begins
Example:
Error PRO
Represents error information when a scraping task fails.
Properties
code
An error code identifying the type of failure.
Examples:
message
A human-readable error message describing the failure.
Example:
Timing PRO
Represents execution timing information for a task.
Properties
totalMs
The total execution time of the task in milliseconds.
Result PRO
All WebScraper APIs return results using the Result type.
Properties
ok
Indicates whether the task completed successfully.
taskId
The unique identifier of the task.
If taskId is not specified when starting the task, the system automatically generates one.
It can be used to cancel the task:
url
The final loaded URL.
If redirects occurred during navigation, this field contains the final destination URL.
html
The final HTML of the page.
This is typically the rendered DOM HTML, not the raw HTML response.
data
The data returned by extractScript or eval.
The type of this field is defined by the generic parameter T.
error
Returned when the task fails.
Contains error information.
timing
Timing information about the task execution.
API PRO
load
Loads a web page and returns the final HTML.
Parameters
url
The URL of the page to load.
Example:
wait
The waiting strategy used to determine when the page is ready.
Default:
timeout
The maximum allowed time for the task in seconds.
If the task does not finish within the specified time, it fails with a timeout error.
Example:
taskId
An optional task identifier.
If not provided, the system automatically generates one.
Example
scrape PRO
Loads a page and optionally executes an extraction script in the page context.
This method performs the following steps:
- Load the page
- Wait until the specified condition is satisfied
- Execute
extractScriptin the page context - Return both the HTML and extracted data
Parameters
extractScript
JavaScript code executed in the page context.
The script should return the data to extract.
Example:
Example
eval PRO
Evaluates JavaScript in the page context and returns the result.
This method is intended for executing custom JavaScript logic inside the page.
Compared to scrape:
evalfocuses on executing arbitrary JavaScriptscrapeis designed specifically for extracting data
Parameters
script
JavaScript code executed in the page context.
The script must return a value.
Example
cancel PRO
Cancels a running scraping task.
Parameters
taskId
The identifier of the task to cancel.
Return Value
Example
Usage Recommendations PRO
Choose an Appropriate Waiting Strategy
Different types of websites benefit from different strategies.
Prefer selector Waiting for Dynamic Content
For dynamic pages, waiting for a specific element improves reliability.
Adjust Timeout for Complex Pages
Complex sites may require longer timeouts.
Prefer extractScript Instead of Parsing HTML
Instead of retrieving HTML and parsing it manually:
It is often more efficient to extract data directly in the page:
This approach reduces script complexity and improves performance.
