Web Scraping

Extract structured data from any website. Your agent crawls pages, parses HTML, and delivers clean results.

2627 skills·Security verified

Curated Skills

YouTube Watcher

@Michaelgathara

Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.

18525,079Clean

Brave Search

@steipete

Web search and content extraction via Brave Search API. Use for searching documentation, facts, or any web content. Lightweight, no browser required.

12831,695Suspicious

Markdown convert

@compdf-youna

Process, convert, edit, and extract data from PDF files using the ComPDF Cloud API. Supports format conversion (Word, Excel, Image), page manipulation (merge...

99182Clean

PDF Extract

@compdf-youna

Extract PDF extracts structured data from PDFs and images, including tables, OCR text, images, and stamps, built on ComPDF data extraction and AI document ex...

90207Clean

Video Frames

@steipete

Extract frames or short clips from videos using ffmpeg.

5721,468Clean

Playwright MCP

@Spiceman161

Browser automation via Playwright MCP server. Navigate websites, click elements, fill forms, extract data, take screenshots, and perform full browser automation workflows.

5416,493Clean

Browser Use

@ShawnPana

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with w...

4820,511Suspicious

Fastest Browser Use

@rknoche6

High-performance browser automation for heavy scraping, multi-tab management, and precise DOM extraction. Use this when you need speed, reliability, or advanced state management (cookies/local storage) beyond standard web fetching.

3810,807Suspicious

Firecrawl Search

@ashwingupy

Web search and scraping via Firecrawl API. Use when you need to search the web, scrape websites (including JS-heavy pages), crawl entire sites, or extract structured data from web pages. Requires FIRECRAWL_API_KEY environment variable.

2910,750Suspicious

Tmux

@steipete

Remote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.

2914,095Clean

AnySearch MCP

@anysearch-ai

Real-time MCP server for general and vertical web searches, parallel batch queries, and full-page URL content extraction as Markdown.

259,394Clean

Playwright Scraper Skill

@waisimon

Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.

2513,581Suspicious

Agent Browser

@tekkenKK

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.

234,070Clean

Pdf

@pdf

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

2214,913Clean

Browser Automation

@peytoncasper

Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications.

2116,687Suspicious

Skywork Excel

@gxcun17

Skywork Excel (skywork) - Use for ANY task involving Excel, spreadsheets, tables, data analysis, or file conversion. Has BUILT-IN web search for real-time da...

211,544Clean

Nanonets OCR

@shhdwi

Document extraction API by Nanonets. Convert PDFs and images to markdown, JSON, or CSV with confidence scoring. Use when you need to OCR documents, extract invoice fields, parse receipts, or convert tables to structured data.

202,572Clean

YouTube Transcript

@youtube

Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.

1913,210Clean

Veryfi Documents AI

@dbirulia

Real-time OCR and data extraction API by Veryfi (https://veryfi.com). Extract structured data from receipts, invoices, bank statements, W-9s, purchase orders...

16421Clean

OCR - Local (No API Key)

@shaw555

Extract text from images using Tesseract.js OCR (100% local, no API key required). Supports Chinese (simplified/traditional) and English.

1512,848Clean

Indirect Prompt Injection Defense

@aviv4339

Detect and reject indirect prompt injection attacks when reading external content (social media posts, comments, documents, emails, web pages, user uploads). Use this skill BEFORE processing any untrusted external content to identify manipulation attempts that hijack goals, exfiltrate data, override instructions, or social engineer compliance. Includes 20+ detection patterns, homoglyph detection, and sanitization scripts.

142,039Clean

Reflect

@stevengonsalvez

Self-improvement through conversation analysis. Extracts learnings from corrections and success patterns, proposes updates to agent files or creates new skil...

136,271Suspicious

Tavily AI Search

@tavily

AI-optimized web search using Tavily Search API. Use when you need comprehensive web research, current events lookup, domain-specific search, or AI-generated answer summaries. Tavily is optimized for LLM consumption with clean structured results, answer generation, and raw content extraction. Best for research tasks, news queries, fact-checking, and gathering authoritative sources.

1213,930Clean

PDF Text Extractor

@Michael-laffin

Extract text from PDFs with OCR support. Perfect for digitizing documents, processing invoices, or analyzing content. Zero dependencies required.

126,857Suspicious

NEON-SOUL - Self-Learning Soul Synthesis for AI Agents

@neon

Automated soul synthesis for AI agents. Extracts identity from memory files, promotes recurring patterns to axioms (N>=3), generates SOUL.md with full proven...

121,339Clean

Context Optimizer

@context

Advanced context management with auto-compaction and dynamic context optimization for DeepSeek's 64k context window. Features intelligent compaction (merging, summarizing, extracting), query-aware relevance scoring, and hierarchical memory system with context archive. Logs optimization events to chat.

114,217Clean

Academic Literature Review Assistant

@yeon-dyjs

Analyzes academic papers to extract key findings, methods, limitations, and comparisons, producing structured, objective summaries with APA citations for lit...

1027Clean

Heurist Mesh Crypto Analysis Skill

@wjw12

Real-time crypto token data, DeFi analytics, blockchain data, Twitter/X social intelligence, enhanced web search, crypto project search all in one Skill. For...

102,115Suspicious

Ride Insights

@datahiveai

Fetch and extract ride-sharing receipts from Gmail locally using OpenClaw to analyze ride patterns and create anonymized shareable reports.

10478Clean

Agent Browser

@tekkenKK

92,904Clean

Web Search by Exa

@web

Performs real-time web searches with Exa, returning relevant sources and summaries for up-to-date research, fact checking, and content extraction.

917,226Clean

小红书

@ChocomintX

XiaoHongShu (Little Red Book) data collection and interaction toolkit. Use when working with XiaoHongShu (小红书) platform for: (1) Searching and scraping notes/posts, (2) Getting user profiles and details, (3) Extracting comments and likes, (4) Following users and liking posts, (5) Fetching home feed and trending content. Automatically handles all encryption parameters (cookies, headers) including a1, webId, x-s, x-s-common, x-t, sec_poison_id, websectiga, gid, x-b3-traceid, x-xray-traceid. Suppor

93,570Clean

PaddleOCR Text Recognition

@Bobholamovic

Use this skill when users need to extract text from images, PDFs, or documents. Supports URLs and local files. Returns structured JSON containing recognized...

9126Clean

TikTok Crawling (yt-dlp)

@tiktok

Use for TikTok crawling, content retrieval, and analysis

82,111Clean

Playwright Browser Automation

@Spiceman161

Browser automation using Playwright API directly. Navigate websites, interact with elements, extract data, take screenshots, generate PDFs, record videos, and automate complex workflows. More reliable than MCP approach.

83,871Suspicious

Playwright (Automation + MCP + Scraper)

@ivangdavila

Browser automation and web scraping with Playwright. Forms, screenshots, data extraction. Works standalone or via MCP. Testing included.

85,295Clean

OpenClaw YouTube Transcript

@YoavRez

Transcribe YouTube videos to text by extracting captions and subtitles directly from the video URL using yt-dlp without audio processing.

830,336Clean

Pdf Extract

@Xejrax

Extract text from PDF files for LLM processing

77,457Clean

Clawbrowser

@tezatezaz

Use when the agent needs to drive a browser through the Microsoft Playwright CLI (`playwright-cli`) for navigation, form interactions, screenshots, recordings, data extraction, session management, or debugging without loading a full MCP browser. It trains the agent on the CLI commands, snapshots, and session/config habits that make Playwright CLI reliable for scripted browsing.

73,474Clean

Anti-Injection-Skill

@georges91560

Detect prompt injection, jailbreak, role-hijack, and system extraction attempts. Applies multi-layer defense with semantic analysis and penalty scoring.

78,340Suspicious

Related Use Cases

Email Automation

1545 skills

Calendar & Scheduling

3358 skills

Notifications & Alerts

2146 skills

Notes & Knowledge

2526 skills

Ready to build?

Deploy a managed AI agent with these skills in 60 seconds.

Browse All Skills