Directories
¶
| Path | Synopsis |
|---|---|
|
AI Web Crawler — crawls sites and uses LLM to summarize, extract entities, and analyze sentiment Requires Ollama running locally: ollama serve && ollama pull llama3.2
|
AI Web Crawler — crawls sites and uses LLM to summarize, extract entities, and analyze sentiment Requires Ollama running locally: ollama serve && ollama pull llama3.2 |
|
E-Commerce Product Scraper — scrapes books.toscrape.com Extracts: title, price, rating, availability, description, image
|
E-Commerce Product Scraper — scrapes books.toscrape.com Extracts: title, price, rating, availability, description, image |
|
GitHub Trending Scraper — scrapes today's trending repos Extracts: repo name, description, language, stars, forks, stars gained today
|
GitHub Trending Scraper — scrapes today's trending repos Extracts: repo name, description, language, stars, forks, stars gained today |
|
Hacker News Scraper — scrapes front page stories Extracts: rank, title, URL, points, author
|
Hacker News Scraper — scrapes front page stories Extracts: rank, title, URL, points, author |
|
LinkedIn Public Profile Scraper Example
|
LinkedIn Public Profile Scraper Example |
|
Multi-Site News Aggregator — crawls multiple news sources Extracts structured data (JSON-LD, OpenGraph, meta) from any news site
|
Multi-Site News Aggregator — crawls multiple news sources Extracts structured data (JSON-LD, OpenGraph, meta) from any news site |
|
Product Scraper Example
|
Product Scraper Example |
|
Search Engine Crawler — indexes pages with full text, metadata, and link graph Produces a search-engine-style index: title, URL, body text, meta, headings, links
|
Search Engine Crawler — indexes pages with full text, metadata, and link graph Produces a search-engine-style index: title, URL, body text, meta, headings, links |
|
Example: Using the Spider interface for structured, Scrapy-style crawling.
|
Example: Using the Spider interface for structured, Scrapy-style crawling. |
|
Wikipedia Knowledge Extractor — deep crawl with 1000 article limit Extracts: title, summary, categories, reference count, external links
|
Wikipedia Knowledge Extractor — deep crawl with 1000 article limit Extracts: title, summary, categories, reference count, external links |
Click to show internal directories.
Click to hide internal directories.