goodbots - trust but verify
goodbots verifies the IP addresses of respectful crawlers like Googlebot by performing reverse dns and forward dns lookups.
- Given an IP address (ex.
- It performs a reverse dns lookup to get a hostname (ex.
- Then does a forward dns lookup on the hostname to get an IP (ex.
- It compares the 1st IP to the 2nd IP
- If they match, goodbots outputs the IP and hostname
The Job-to-be-Done (#jtbd)
In search engine optimization (SEO), it is common to analyze a site's access logs (aka bot logs). Often there are various requests by spoofed user-agents pretending to be official search engine crawlers like Googlebot. In order to have an accurate understanding of the site's crawl rate, we want to verify the IP address of the various crawlers.
How to install/build goodbots
Clone the repo:
git clone [email protected]:eywu/goodbots.git
Change to the
Build the binary/executable
How to use goodbots
If you've built the
main.go file that comes with goodbots above, you can simply feed goodbots IPs via
Test a single IP
echo "184.108.40.206" | ./goodbots
Test a range of IPs with prips command line tool
prips 220.127.116.11 18.104.22.168 | ./goodbots
Test a list of IPs from a text or csv file
./goodbots < ip-list.txt
note: The CSV or text file expects only an IP on its own line.
22.214.171.124 126.96.36.199 188.8.131.52 184.108.40.206
Saving the results
goodbots prints to
standard-out with tab (\t) delimiters, so you can capture the output with an output redirect.
220.127.116.11 crawl-203-208-60-1.googlebot.com 18.104.22.168 google-proxy-66-249-85-123.google.com 22.214.171.124 rate-limited-proxy-66-249-87-12.google.com 126.96.36.199 google-proxy-66-249-85-224.google.com
Save verified bot IPs provide in a file name
ip-list.txt to a filed named
./goodbots < ip-list.txt > saved-results.tsv
goodbots randomly selects a different public DNS resolver for each DNS lookup to reduce the chances of being blocked or throttled by your DNS provider if you have lots of IPs to verify.
It uses these DNS providers:
- CloudFlare Public DNS
- Google Public DNS
- Open DNS
- Quad9 DNS (
⛔not supported yet) 188.8.131.52 184.108.40.206
Currently verifying the domain name is a little imprecise. goodbots looks for just the domain name to match and does NOT match the TLD.
Future improvements will test for more precise domains based on the crawlers specifications.
Make it go faster!
By default we only set the concurrency of requests to 10. If you want to speed up the work, you can increase that number by modifying the
main.go file before building the binary/executable.
Other usage of goodbots
In building goodbots, we created a general purpose function for simply resolving the hostnames of any IP address.
main.go you can uncomment the line that calls
ResolveNames() and comment out the
GoodBots() function call.
This will not perform a forward DNS lookup to verify the hostname resolves to the same IP address. Additionally, it will output errors to the TSV output when it encounters IPs that error out when requesting the hostname.
➜ goodbots git:(main) ✗ prips -i 50 220.127.116.11 18.104.22.168 | ./goodbots 22.214.171.124 (error) lookup 126.96.36.199.in-addr.arpa. on 192.168.1.1:53: no such host ... 188.8.131.52 (error) lookup 184.108.40.206.in-addr.arpa. on 192.168.1.1:53: no such host 220.127.116.11 WebGods 18.104.22.168 (error) lookup 250.0.100.66.in-addr.arpa. on 192.168.1.1:53: no such host ... 22.214.171.124 (error) lookup 126.96.36.199.in-addr.arpa. on 192.168.1.1:53: no such host 188.8.131.52 mail.esai.com
Suggestions from others
via John Murch
- Generate bad bot list for blacklist usage
- Track search engine bot list over time to see changes
- Google published IP ranges h/t Michael Stapelberg
- DuckDuckGo published IPs
- Facebook published IP ranges