WebScrapeAI Documentation

WebScrapeAI offers a streamlined, efficient approach to web scraping, allowing users to easily extract data from websites with minimal setup. By providing detailed inputs and selecting your preferred output format, you can tailor the scraping process to your specific needs. Our platform is designed to simplify and enhance your data scraping experience using advanced AI technology. The following documentation outlines the process and capabilities of all the plans.

Plans Overview:

1. WebScrapeAI

The WebScrapeAI plan is crafted for users who need to extract data from a single URL at a time. This plan is perfect for straightforward, efficient web scraping tasks, providing a user-friendly interface and output flexibility.

To initiate a scraping task, you will need to provide the following information:

  1. 1. URL of the Website: The exact web address of the site you wish to scrape.

  2. 2. Data Requirements: Specify the data you want to extract, listed in a comma-separated format. This could include elements like product names, prices, descriptions, etc.

  3. 3. CSS Selectors (Optional): For enhanced accuracy, you can provide CSS selectors corresponding to the specific elements you're targeting. While optional, using CSS selectors is highly recommended for obtaining precise results.

plan1

Once you have provided the necessary information, click the Submit button to activate our AI model. The AI will then proceed to scrape the specified data from the provided URL.

Output Formats

Upon completion of the scraping process, you can retrieve your data in one of the following formats, based on your preference and project requirements:.

  1. 1. CSV (Comma-Separated Values): Ideal for spreadsheet applications and data analysis tools. Can be directly exported into a file, facilitating easy download and storage.

  2. 2. JSON (JavaScript Object Notation): Best for applications that require data interchange or further processing with scripting languages.

  3. 3. Text:Simple, unformatted text output for a wide range of uses.

Here are the snippets of the data extracted in CSV and JSON format:

Analyzing Website Structure


Analyzing Website Structure


2. WebScrapeAI Pro

WebScrapeAI Pro is the advanced plan designed for users seeking a comprehensive web scraping solution with additional capabilities. This plan extends the features of WebScrapeAI with the integration of proxies, custom headers, and advanced JavaScript tools for a more powerful and tailored scraping experience.

Features

  1. 1. Proxies: Use your own proxies for scraping to manage your IP footprint and bypass geo-restrictions.

  2. 2. Headers: Forward custom headers to target websites for enhanced access control and personalization.

  3. 3. JavaScript Tools: Execute advanced browsing instructions and JavaScript for dynamic websites and complex scraping tasks.

For advanced users who need to interact with web pages that require dynamic interaction, WebScrapeAI Pro offers JavaScript Tools. This feature allows users to input browsing instructions that the AI will execute before data collection begins.

How to Use

  1. 1. Enter JavaScript Instructions: Provide instructions for page interactions, such as clicks, waits, and scrolls

  2. 2. Custom Headers: Include any required headers to navigate the website.

  3. 3. Proxies: Enter any proxies you would like to use in the specified format.

JavaScript Instructions

  • {"click": "#button_id"}: Click on a specified element.

  • {"wait": 1000}: Pause for a specified duration in milliseconds.

  • {"wait_for": "#element_id"}: Wait for a specified element to become available.

  • {"scroll_y": 1000}: Scroll vertically by the specified pixel amount.

  • {"fill": {"#input_id", "value_1"}}: Fill in a specified input field with a value.

  • {"evaluate": "console.log('action')}: Execute custom JavaScript code.

Headers

Custom headers can be forwarded to the target website. This is particularly useful for setting request headers like User-Agent, Accept-Language, or custom headers required by the website. Enter any headers in key-value format that you would like the scraper to use when making requests to the website.

Proxies

WebScrapeAI Pro users have the ability to use their own proxies. This feature is crucial for users who need to manage their scraping operations discreetly or access content from different geographical locations.

Enter the proxy details in the provided format:

<protocol>://<username>:<password>@<host>:<port>

Make sure to replace <protocol>, <username>, <password>, <host>, and <port> with your actual proxy details.

After configuring your JavaScript instructions, headers, and proxies, proceed with submitting your request just like in the basic and bulk plans. Click the Submit button to start the scraping process with all your specified parameters.

3. WebScrapeAI Bulk

It is an advanced offering from WebScrapeAI designed for users who require data extraction from multiple URLs simultaneously. This plan has all the features of previous plans and has capability to handle bulk scraping tasks with ease and efficiency.

To utilize the bulk scraping functionality, users must provide the following inputs:

  1. 1. URLs of the Websites: For scrapping URLs in bulk, you can enter the URLs either manually or can upload them through a CSV file. Ensure that all URLs are valid and accessible for scraping.

  2. 2. Data Requirements: Specify the data you wish to extract, formatted as a comma-separated list. This should be consistent across all URLs for optimal results.

  3. 3. CSS Selectors (Optional): Although optional, providing CSS selectors for the specific data elements you're targeting is recommended to achieve the highest accuracy.

  4. 4. Number of pages to scrape (Optional): If the webpage URL you provided is paginated, you can specify the number of pages you want to scrape. By default, it will only scrape a single page.

plan1


After inputting the necessary information, click the Submit button. Our AI model will then begin the process of scraping the specified data from all provided URLs.

It's crucial that all pages from which data is being extracted contain the same type of information as specified in your data requirements. If a particular field is not found on a webpage, the AI model will return a null value for that field to maintain the consistency of the output format.



For further assistance you can contact our support team.

WebScrapeAI API Documentation

Introduction

Welcome to the Web Scraping API Documentation. This API allows you to scrape websites using AI to extract specific data as per your requirements. The API is designed to be simple and easy to use, requiring only an HTTP request to interact with it

Authentication

You can make authorized requests to our API by passing API key as a query parameter. You can obtain an API key by signing up and creating an account.

API Endpoints

http://api.webscrapeai.com/scrapeWebSite?url=website_url_you_want_to_scrape&command=data_you_want_to_scrape&pages=1 &apiKey=your_api_key

Required Parameters

  • authentication_key: Your API key.
  • url: The URL of the website you want to scrape.
  • command: The data you want to extract, listed in a comma-separated format.

Optional Parameters

  • pages: Number of pages to scrape. Default value is 1.
  • selectors:For enhanced accuracy, you can provide CSS selectors corresponding to the specific elements you're targeting. While optional, using CSS selectors is highly recommended for obtaining precise results.
  • headers: List of headers in key value pair. i.e Accept: application/json
  • instructions: List of JavaScript instructions that you want to execute like clicking a specific button, waiting for a specific code block to appear etc.

Request and Response Examples

http://api.webscrapeai.com/scrapeWebSite?url=https://news.ycombinator.com/&command=news_title, news_external_url, news_comments &apiKey=your_api_key

output:[[

{
"news_external_url": "https://github.com/naklecha/llama3-from-scratch",
"news_title": "Llama3 implemented from scratch",
"number_of_news_comments": 15
},
{
"news_external_url": "https://www.amygoodchild.com/blog/cursive-handwriting-in-javascript",
"news_title": "Coding My Handwriting",
"number_of_news_comments": 1
},
{
"news_external_url": "https://julienposture.substack.com/p/the-ai-doppelganger-experiment-part",
"news_title": "AI doppelgänger experiment – Part 1: The training",
"number_of_news_comments": 47
}

]]


For further assistance you can contact our support team.

Webscrape AI

Automate Your Data Collection With No-Code