| Well that wouldn't be difficult. Just write an IP address and a timestamp to a database on each pageload, then every so often run back through the records and retrieve IP addresses that appear very frequently with small gaps between records. Imagine you have 100 records for an IP address within a minute of each other, you can be sure they're up to something. Then block the IP addresses from loading any pages. It could all be done automatically if you so wished.
It won't do you any favours as far as load goes since that could be a lot of database writes, but presumably the person harvesting would give up pretty quickly if they couldn't do it any more and it wouldn't be something you would have to do long term. |