UnitedForums - UK Web Hosting Forum UnitedHosting Community Hosting Forums
Network and Server StatusCustomer SupportUK Web Hosting
UnitedHostingUnitedHosting Sitemap UK Hosting ForumUK Web HostingWeb Hosting ForumsUK Reseller HostingWeb Host CommunityUK Managed Dedicated ServersHosting Help and SupportUK Domain Name Registration

Go Back   UnitedForums.co.uk > UnitedHosting Community > Website Development & Scripting

Reply
 
Thread Tools Rate Thread Display Modes
Old 3rd August 2008, 03:53 PM   #1 (permalink)
percepts
Senile Member
 
percepts's Avatar
 
Join Date: Mar 2005
Posts: 1,009
htaccess referer empty?

I know how to check for %{HTTP_REFERER} being equal to something but how do you check for it being empty in htaccess

logs show it as - but what about in htaccess? Is it - or empty?

I've got an unidentified robot accessing one piece of html and a javascript file and using a standard browser user-agent with empty referer and different ip numbers each time so its screwing my stats.
__________________
An old dog learning new tricks
percepts is offline   Reply With Quote
Old 3rd August 2008, 04:32 PM   #2 (permalink)
MrBen
Munky!
 
MrBen's Avatar
 
Join Date: Sep 2003
Location: nr Woking, England
Posts: 2,597
^$ should do the trick.

Ben
__________________
Veterinary Practice Management System by SoftFooding
Internet Data Usage Calculator: Estimate your monthly bandwidth usage for your Internet connection.
MrBen is offline   Reply With Quote
Old 3rd August 2008, 05:08 PM   #3 (permalink)
percepts
Senile Member
 
percepts's Avatar
 
Join Date: Mar 2005
Posts: 1,009
Thanks,

If the user agent has spaces in it, will the following work?

RewriteEngine On
RewriteCond %{REQUEST_URI} /inflatableretube\.php [NC]
RewriteCond %{HTTP_REFERER} ^$ [NC]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1$ [NC]
RewriteRule .* - [F,L]
__________________
An old dog learning new tricks
percepts is offline   Reply With Quote
Old 3rd August 2008, 05:53 PM   #4 (permalink)
percepts
Senile Member
 
percepts's Avatar
 
Join Date: Mar 2005
Posts: 1,009
OK I think I got it working. Just need to wait for the next attempt on it to find out. At least it doesn't reject it.

RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0(.*)compatible(.*)MSIE(.*)6\.0(.*)Windows(.*)N T(.*)5\.1 [NC]
__________________
An old dog learning new tricks
percepts is offline   Reply With Quote
Old 3rd August 2008, 07:41 PM   #5 (permalink)
Samizdata
Virtual Dilettante
 
Join Date: Nov 2006
Location: Planet Earth
Posts: 182
You can deal with the special characters by escaping them or enclosing the string in quotes.

Example 1:
Code:
RewriteCond %{HTTP_USER_AGENT} \(compatible;\ MSIE\ 5.0\)
Example 2:
Code:
RewriteCond %{HTTP_USER_AGENT} "(compatible; MSIE 5.0)"
Test the response by changing your browser user-agent or at www.wannabrowser.com

You might also want to consider the IP and any HTTP headers if it's a nuisance bot.

...
__________________
The Silhouettes - 50th Anniversary Website
Samizdata is offline   Reply With Quote
Old 3rd August 2008, 09:00 PM   #6 (permalink)
percepts
Senile Member
 
percepts's Avatar
 
Join Date: Mar 2005
Posts: 1,009
Thanks, I had overcome that problem but I have now found that if you access a page directly, then %{HTTP_REFERER} does not exist, so checking what it contains does not work. I need to be able to check for the existance of %{HTTP_REFERER} and I haven't worked that one out yet.
If it was always the same IP that would have been much simpler, but the IP changes each time.
__________________
An old dog learning new tricks
percepts is offline   Reply With Quote
Old 4th August 2008, 09:16 AM   #7 (permalink)
TygerTyger
Lumberjack and OK
 
Join Date: Aug 2004
Posts: 833
Quote:
Originally Posted by percepts View Post
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1$ [NC]
Are you sure that's correct? $ indicates the end of a value, so unless the user agent string ends with Windows NT 5.1 it should be .*$ or extended further.
TygerTyger is offline   Reply With Quote
Old 4th August 2008, 09:19 AM   #8 (permalink)
MrBen
Munky!
 
MrBen's Avatar
 
Join Date: Sep 2003
Location: nr Woking, England
Posts: 2,597
Or just leave the $ off.

Ben
__________________
Veterinary Practice Management System by SoftFooding
Internet Data Usage Calculator: Estimate your monthly bandwidth usage for your Internet connection.
MrBen is offline   Reply With Quote
Old 5th August 2008, 04:45 PM   #9 (permalink)
percepts
Senile Member
 
percepts's Avatar
 
Join Date: Mar 2005
Posts: 1,009
I think its working now. Curious thing is that if you have a list of rewritecond ors' and then a few ands' and then an or and then some ands, it doesn't work.
So I had to split up my rewrites so I now the following which I think works;

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} favorstar [NC,OR]
RewriteCond %{HTTP_REFERER} favorstar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webalta [NC,OR]
RewriteCond %{HTTP_REFERER} webalta [NC]
RewriteRule .* - [F,L]

RewriteEngine On
RewriteCond %{HTTP_REFERER} ^$ [NC]
RewriteCond %{HTTP_USER_AGENT} Mozilla/4\.0(.*)compatible(.*)MSIE(.*)6\.0(.*)Windows(.*)N T(.*)5\.1 [NC]
RewriteCond %{REQUEST_URI} /js/script\.js [NC]
RewriteRule .* - [F,L]

RewriteEngine On
RewriteCond %{HTTP_REFERER} ^$ [NC]
RewriteCond %{HTTP_USER_AGENT} Mozilla/4\.0(.*)compatible(.*)MSIE(.*)6\.0(.*)Windows(.*)N T(.*)5\.1 [NC]
RewriteCond %{REQUEST_URI} /myfile\.php [NC]
RewriteRule .* - [F,L]

got the little B*******!!!


thanks
__________________
An old dog learning new tricks
percepts is offline   Reply With Quote
Old 5th August 2008, 06:32 PM   #10 (permalink)
percepts
Senile Member
 
percepts's Avatar
 
Join Date: Mar 2005
Posts: 1,009
Am just wondering what other people do in this scenario. Although I have implemented the htaccess to stop one particular page and file from being accessed directly, there is a risk that genuine hits, from people using a favourites link, will be rejected if their user-agent is the same as the one I have stopped with the particular files.
Is there anyway to track the robot down and get it stopped. I've looked at the IPs but they come from all over the place, many countries, so I guess it must be randomising them somehow.
__________________
An old dog learning new tricks
percepts is offline   Reply With Quote
Old 5th August 2008, 08:12 PM   #11 (permalink)
Samizdata
Virtual Dilettante
 
Join Date: Nov 2006
Location: Planet Earth
Posts: 182
Quote:
Originally Posted by percepts View Post
I've looked at the IPs but they come from all over the place
That suggests that what you are seeing is a botnet - the hits come from "zombies", usually computers owned by innocent (if stupid) people who have no idea they have been compromised, or that their computer is attempting to access your site.

As for your .htaccess, you only need to turn on the Rewrite Engine once, the L after the F is redundant, and you seem to have an accidental space in N T.

Code:
# Turn on mod-rewrite
RewriteEngine On

# Unwanted user-agent list
RewriteCond %{HTTP_USER_AGENT} favorstar [NC,OR]
# Last one has no OR
RewriteCond %{HTTP_USER_AGENT} ^web [NC]
# Let them have robots.txt
RewriteCond %{REQUEST_URI} !^/robots\txt
# But nothing else
RewriteRule .* - [F]

# if a request for the javascript file
RewriteCond %{REQUEST_URI} /js/script\.js [NC,OR]
# or the php file
RewriteCond %{REQUEST_URI} /myfile\.php [NC]
# has a blank referrer
RewriteCond %{HTTP_REFERER} ^$ [NC]
# and a particular user-agent
RewriteCond %{HTTP_USER_AGENT} Mozilla/4\.0 \(compatible;\ MSIE\ 6\.0;\ Windows\ NT 5\.1\)$ [NC]
# choke on this
RewriteRule .* - [F]
Botnets don't usually give up in a hurry even if blocked - once the "controller" gives them a target they keep hammering away obliviously, and I usually block their IPs for a few months.

Unless you are having a specific problem there is no point blocking referrers from Webalta (a very large Russian search engine) as you are unlikely to get any (I do block their bot though). The same goes for favorstar I suspect, and that would also be blocked on my sites due to it's user agent.

In fact I block anything with "bot", "spider" or "crawl" that is not a mainstream search engine.

And much else besides.

...
__________________
The Silhouettes - 50th Anniversary Website
Samizdata is offline   Reply With Quote
Old 5th August 2008, 08:27 PM   #12 (permalink)
percepts
Senile Member
 
percepts's Avatar
 
Join Date: Mar 2005
Posts: 1,009
thanks.

The hits I'm getting from these bots seem pointless. They just keep reading the same two files umpteen times a day. Its no worry about bandwidth but just annoying because they get logged as real visitors.
__________________
An old dog learning new tricks
percepts is offline   Reply With Quote
Old 5th August 2008, 08:36 PM   #13 (permalink)
Samizdata
Virtual Dilettante
 
Join Date: Nov 2006
Location: Planet Earth
Posts: 182
A couple more thoughts:

The fact that you have attracted a botnet usually means that one of your files has been identified (not necessarily accurately) as vulnerable. Check it is secure and change the filename, and make sure it does not get listed in any search engines (which is how most targets are found).

Because the user-agent you cited is a valid one you do risk blocking genuine users, so you might be better off making the changes above - blocking the compromised IPs won't matter as the file won't exist and they will get a 404, and zombies won't choose an alternative target by themselves.

I don't know how you analyse your logs but filtering out 404s should be easy enough.

EDIT: Yes, I forgot to escape one of the spaces in the user-agent string above

...
__________________
The Silhouettes - 50th Anniversary Website

Last edited by Samizdata : 5th August 2008 at 08:50 PM.
Samizdata is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


All times are GMT. The time now is 03:45 PM.

UK Web Hosting  |  UK Reseller Hosting  |  UK Dedicated Servers UnitedHosting  |  UnitedSupport  |  UnitedForums  |  SEO by vBSEO 3.0.0
Copyright © 1998-2008 United Communications Limited. All Rights Reserved. Registered in England and Wales 3651923 - VAT Reg No. 737662309