CEWL - Command Line Web Crawler
Understanding CEWL for Web Content Discovery
CEWL (Custom Word List Generator) is a powerful command-line tool used for discovering web content and generating custom wordlists from websites. It's particularly useful for security professionals and penetration testers to gather potential usernames, passwords, or other relevant strings from a target's web presence. This tool crawls a given URL and extracts words found within the HTML, JavaScript, and other linked files.
Key CEWL Usage Examples and Options
Below are common ways to utilize CEWL, demonstrating its flexibility for various web content discovery tasks:
Basic Wordlist Generation
To spider a site and write all found words to a file:
cewl -w <file> <url>
Following External Links
To spider a site and follow links to other external sites:
cewl -o <url>
Custom User-Agent
To spider a site using a given user-agent string:
cewl -u <user-agent> <url>
Depth and Minimum Word Length
To spider a site for a given depth and minimum word length:
cewl -d <depth> -m <min word length> <url>
Word Count
To spider a site and include a count for each word found:
cewl -c <url>
Including Meta Data
To spider a site, including meta data, and separate the meta_data words into a specified file:
cewl -a -meta_file <file> <url>
Extracting Email Addresses
To spider a site and store discovered email addresses in a separate file:
cewl -e -email_file <file> <url>