Documentation | Webscrape AI

WebScrapeAI offers a streamlined, efficient approach to web scraping, allowing users to easily extract data from websites with minimal setup. By providing detailed inputs and selecting your preferred output format, you can tailor the scraping process to your specific needs. Our platform is designed to simplify and enhance your data scraping experience using advanced AI technology. The following documentation outlines the process and capabilities of all the plans.

Plans Overview:

1. WebScrapeAI

The WebScrapeAI plan is crafted for users who need to extract data from a single URL at a time. This plan is perfect for straightforward, efficient web scraping tasks, providing a user-friendly interface and output flexibility.

To initiate a scraping task, you will need to provide the following information:

1. URL of the Website: The exact web address of the site you wish to scrape.
2. Data Requirements: Specify the data you wish to extract using a command-line input. Our AI model will analyze the website's structure and generate a precise schema based on your command, ensuring accurate data extraction.

Once you have provided the necessary information, you can generate the schema either automatically using our AI agent or manually by specifying the fields you want to extract from the website. The generated schema will represent the fields which will be extracted from the website, allowing you to review and modify them as needed. Here is how the schema will look like:

After generating the schema, click the Run Web Scraper button, and our AI model will begin extracting the required data from the specified website.

Output Formats

Upon completion of the scraping process, you can retrieve your data in one of the following formats, based on your preference and project requirements:

1. CSV (Comma-Separated Values): Ideal for spreadsheet applications and data analysis tools. Can be directly exported into a file, facilitating easy download and storage.
2. JSON (JavaScript Object Notation): Best for applications that require data interchange or further processing with scripting languages.
3. Text: Simple, unformatted text output for a wide range of uses.

Here are the snippets of the data extracted in CSV and JSON format:

2. WebScrapeAI Pro

WebScrapeAI Pro is the advanced plan designed for users seeking a comprehensive web scraping solution with additional capabilities. This plan extends the features of WebScrapeAI with the integration of proxies, custom headers, and advanced JavaScript tools for a more powerful and tailored scraping experience.

The Pro plan offers the following additional features:

1. Proxies: Use your own proxies for scraping to manage your IP footprint and bypass geo-restrictions.
2. Headers: Forward custom headers to target websites for enhanced access control and personalization.
3. JavaScript Tools: Execute advanced browsing instructions and JavaScript for dynamic websites and complex scraping tasks.

For advanced users who need to interact with web pages that require dynamic interaction, WebScrapeAI Pro offers JavaScript Tools. This feature allows users to input browsing instructions that the AI will execute before data collection begins.

How to Use

1. Enter JavaScript Instructions: Use your own proxies for scraping to manage your IP footprint and bypass geo-restrictions.
2. Custom Headers: Forward custom headers to target websites for enhanced access control and personalization.
3. Proxies: Execute advanced browsing instructions and JavaScript for dynamic websites and complex scraping tasks.

JavaScript Instructions

{"click": "#button_id"}: Click on a specified element.
{"wait": 1000}: Pause for a specified duration in milliseconds.
{"wait_for": "#element_id"}: Wait for a specified element to become available.
{"scroll_y": 1000}: Scroll vertically by the specified pixel amount.
{"fill": {"#input_id", "value_1"}}: Fill in a specified input field with a value.
{"evaluate": "console.log('action')}: Execute custom JavaScript code.

Headers

Custom headers can be forwarded to the target website. This is particularly useful for setting request headers like User-Agent, Accept-Language, or custom headers required by the website. Enter any headers in key-value format that you would like the scraper to use when making requests to the website.

Proxies

WebScrapeAI Pro users have the ability to use their own proxies. This feature is crucial for users who need to manage their scraping operations discreetly or access content from different geographical locations.

Enter the proxy details in the provided format:

Make sure to replace <protocol>, <username>, <password>, <host>, and <port> with your actual proxy details.

After configuring your JavaScript instructions, headers, and proxies, proceed with submitting your request just like in the basic and bulk plans. Click the Run Web Scraper button to start the scraping process with all your specified parameters.

3. WebScrapeAI Bulk

It is an advanced offering from WebScrapeAI designed for users who require data extraction from multiple URLs simultaneously. This plan has all the features of previous plans and has capability to handle bulk scraping tasks with ease and efficiency.

To utilize the bulk scraping functionality, users must provide the following inputs:

1. URLs of the Websites: The exact web address of the site you wish to scrape.
2. Data Requirements: Specify the data you wish to extract using a command-line input. Our AI model will analyze the website's structure and generate a precise schema based on your command, ensuring accurate data extraction.

After putting the necessary information and generating the schema, click the Run Web Scraper button, and our AI model will begin extracting the required data from the specified website.

It's crucial that all pages from which data is being extracted contain the same type of information as specified in your data requirements. If a particular field is not found on a webpage, the AI model will return a null value for that field to maintain the consistency of the output format.

WebScrapeAI Documentation

Plans Overview:

1. WebScrapeAI

Output Formats

2. WebScrapeAI Pro

How to Use

JavaScript Instructions

Headers

Proxies

3. WebScrapeAI Bulk

WebScrapeAI API Documentation

Introduction

Authentication

API Endpoints

Required Parameters

Optional Parameters

Request and Response Examples

Automate Your Data Collection With No-Code