Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
+3 votes
in Q2A Core by
How to write robots.txt to disallow bot ro vist url like:



1 Answer

0 votes
edited by

I would recommend disallowing register and login entirely in robots.txt:

Disallow: /qa/?qa=login
Disallow: /qa/?qa=register

However, be aware that the robots.txt mechanism is just a friendly request for bots (mainly search engines) to not index particular pages. It does not stop anyone from ignoring your wishes.

If you want to actually stop bad actors from accessing those URLs you need other tools, for example fail2ban.

A fail2ban filter for the URLs you mentioned could look for instance like this:

before = botsearch-common.conf

failregex = ^\[\] <HOST> \"(GET|POST|HEAD) \/qa\/\?qa=(login|register)&to=%3D[0-9]+ HTTP\/\S+\"

Of course you'll have to adjust the filter to the log format of your webserver, but the above should give you a general idea.

Addendum: To avoid confusion, fail2ban is for stopping bots and other bad actors from continued access to your site. It does not prevent their first request to a particular URL. If you want to block access to particular URLs entirely you'd have to look into other approaches than robots.txt or fail2ban. Redirecting or rewriting the URL(s) to an error page based on client IP and path might work for that. Or putting an application level gateway between the internet and your website.

I will have a try