Testing FireCrawl – A Modern Web Scraper Integrated with LLMs

Development

Karol Gruszka | 31/07/2025

In recent days, I had the chance to test FireCrawl – an advanced web scraping tool natively integrated with language models. I took a comprehensive approach: analyzing features, configuration options, and limitations of the solution.

What Is FireCrawl and What Makes It Unique?

FireCrawl is a tool for automatic data extraction from websites, distinguished by its built-in AI integration. Unlike traditional crawlers, it not only downloads content but also transforms it into formats that are more friendly for further processing by LLMs. This enables advanced interpretation, filtering, and transformation of web content.

Users can define the desired output format such as markdown, HTML, rawHtml, screenshots, links, or JSON.

Core Features of FireCrawl

Crawl

Recursively scans subdomains and internal links to get a full picture of the website.

Extract

Extracts content from single pages, multiple pages, or entire domains. You can define both user and system prompts to retrieve specific information. For instance, asking "Who is the CTO?" after scanning a website may provide the correct answer even if it's not explicitly stated, by interpreting contextual clues.

Scrape

Converts web pages into defined formats (e.g., markdown, JSON) or generates screenshots. You can also extract specific data using prompts and track changes over time.

Search

Acts as a search engine. Type in a query (e.g., "primotly company services") to get a list of matching pages ready for scraping or transformation.

Map

Quickly collects all available links on a given page.

Actions

Allows user-like interactions (e.g., clicking buttons, expanding sections) before scraping. This is essential for dynamic websites.

AI Integration and Configuration Options

Each feature comes with extended configuration settings – for example, excluding specific HTML tags. FireCrawl integrates with external tools like Make.com, n8n, and offers SDKs for Python, Node.js, Go, and Rust.

Note: FireCrawl uses a single, predefined LLM model. You cannot choose or change the underlying model.

Two versions are available:

Open source (AGPL-3.0 license)
Hosted version (includes additional premium features)

Comparison of Firecrawl in open source and cloud version - source of illustration: docs.firecrawl.dev

Limits and Pricing

Free plan: up to 500 pages/month
Paid tiers: available via subscription
Extract feature: billed separately
Webhook support: enables asynchronous task execution

Practical Use Cases

FireCrawl shines in scenarios requiring fast, automated data collection for further analysis or use:

Gather structured data for CMS, BI dashboards, or chatbots
Summarize news or industry reports automatically
Build dynamic content feeds from competitor or industry sites
Extract and convert client or product information from websites

Prompt customization and format configuration unlock more advanced workflows and automation potential.

Challenges and Limitations

Exported markdown included excessive line breaks, reducing readability (especially for humans)
No option to switch the LLM engine
Processing time depends heavily on data volume and page complexity
Async workflows recommended for performance optimization