Forum Moderators: goodroi
The info page says it's a distributed crawler, so just like my policy for the cronic robots.txt violater Grub, I banned the user agent and the entire IP block associated with the offending IP.
The problem seems to be that you are making inaccurate assumptions about the reasons for my post. I didn't post here to help you debug your bot (although I offered multiple times to answer specific syntax-related questions), I posted here to document a robots.txt violation. I think it is very telling that rather than accept whatever help is offered, you go on the defensive and attack anyone that doesn't submit to your testing procedures.
As a result, my offer to answer syntax-related questions is now withdrawn.
I posted here to document a robots.txt violation
And you have not provided the least of information on what your robots.txt is like, just how that qualifies for "documenting" is beyond me :(
you go on the defensive and attack anyone that doesn't submit to your testing procedures.
Like I asked you something weird like to provide me with sample of your DNA or something: robots.txt compliance can be tested with the following:
1) robots.txt in question
2) list of URLs to be tested
If you know some better way of tracking bugs then be my guest - offer your solution. :)
As a result, my offer to answer syntax-related questions is now withdrawn.
Oh dear, but what exactly you expected me to ask? Its like you have a million uniquely numbered balls mixed and then you pulled a few of them to hide in your hands, and then offer me to guess the numbers: just how unreasonable is this? :(
Just to show some good will I will try:
1) do you have trailing /'s in disallowed statements in your robots.txt, ie (bolded):
User-agent: *
Disallow: /somedir/
2) What is the HTTP response code on HEAD request of your robots.txt, ie (using cygwin's utility):
HEAD [example.com...]
3) Do you use Unix style end of lines in robots.txt?
4) Do you have robots.txt's on all subdomains of your site?
Let it be clear that I certainly put some effort in it :)
I posted here to document a robots.txt violationAnd you have not provided the least of information on what your robots.txt is like, just how that qualifies for "documenting" is beyond me
I offered to supply more information multiple times, you threw a hissy fit and wouldn't accept that offer.
Like I asked you something weird like to provide me with sample of your DNA or something
A perfect example of why I'm not inclined to help you debug your software and why I rescinded my offer to help you. Your posts are filled with insults and sarcasm. You're behavior reminds me of that of spammers when they get reported -- shift the blame, insist on personal details.
If you know some better way of tracking bugs then be my guest - offer your solution.
Your lack of reading comprehension is disturbing. Or maybe it's just denial.
Just to show some good will I will try:
Let it be clear that I certainly put some effort in it
You've got to be kidding. Your "effort" and "good will" only came on page 2 of this thread and only after I had already withdrawn my offer because of your behavior. And your "effort" and "good will" post was still littered with insulting sarcasm.
Although I thought we were done before you went and edited more spin into your last post, I guess you'll still feel a need to have the last word, so go right ahead. If it's just more repetition of what's already been covered, my participation in this thread is done.
You're behavior reminds me of that of spammers when they get reported -- shift the blame, insist on personal details.
I hope you are not reporting spammers the same way you report bugs here by just stating that server X is spamming without offering any proof of your words whatsoever.
I can't validate a bug report without details and your refusal to provide robots.txt + URLs to validate your claim gives no other choice but to ignore your bug report. I wash my hands now: I did all I could to sort this problem, including readiness to publish robots.txt checking code (for peer review), and if thats not enough then tough luck: I have better things to do than to argue with someone who refuses to substantiate their claim.
I am forced to post all this to ensure that whoever comes across to read about MJ12bot not obeying robots.txt will see who here was cooperative and who was not. No sarcasm intended :)
I am forced to post all this to ensure that whoever comes across to read about MJ12bot not obeying robots.txt will see who here was cooperative and who was not.
Oh, what the heck, if you're going to keep spinning my actions over and over again, I guess I'll have a keep setting the record straight. You suggest that I wasn't being cooperative by not complying with your debugging procedures. I say you were being uncooperative by not accepting the help that I offered you. Is there any real need to keep hashing that out or do you think readers of the thread can decide for themselves?
I hope you are not reporting spammers the same way you report bugs here by just stating that server X is spamming without offering any proof of your words whatsoever.
More repetition. I did offer more proof. You chose not to accept that offer.
I can't validate a bug report without details and your refusal to provide robots.txt + URLs to validate your claim gives no other choice but to ignore your bug report.
That's more repetition. I already responded to that previously.
I wash my hands now: I did all I could to sort this problem
Not exactly. You refused and belittled my offer to supply you with more information.
if thats not enough then tough luck
Tough luck for who? The website administrators who haven't yet banned your bot? I couldn't care less if you fix the bugs in your software.
I have better things to do than to argue with someone who refuses to substantiate their claim.
But apparently not better things to do than keep putting your spin on offers of substantiation that were rejected.
You suggest that I wasn't being cooperative by not complying with your debugging procedures.
Please post here what information in your view is necessary to ascertain whether there is or there is not a bug in a robots.txt code used by my bot. You clearly know more about software debugging than me, so please share your knowledge for the benefit of all.
My view is probably old fashioned, but the following data is minimally needed to achieve required result:
1) robots.txt in question
2) URLs to check
You see, I am not being funny here, but if code takes on input robots.txt + URLs to validate, then that's what minimally required to verify if it works or not.
Just who is spinning here: me when I ask for minimally required data to verify code, or you who refuses to provide this publicly available data? Its not like I am asking your c/c to validate credit card payment module here! Just who is being unreasonable here?
I think its my last post on this matter unless you are going to provide what is minimally required to check if code is faulty or not.
You clearly know more about software debugging than me, so please share your knowledge for the benefit of all.
I find it odd that you seem think sarcasm would encourage someone to help you.
My view is probably old fashioned, but the following data is minimally needed to achieve required result:
Again, more repetition. Your debugging demands have already been covered in this thread.
Just who is spinning here: me when I ask for minimally required data to verify code, or you who refuses to provide this publicly available data?
You, when you claim that what might be the easiest data for you to verify code is the minimally required data. I offered data--you rejected it. And also you are spinning when you choose to ignore or belittle privacy concerns raised earlier in the thread.
Just who is being unreasonable here?
In my opinion, you are for the reasons stated above. I don't consider it unreasonable to withhold personally-identifiable information from an entity responsible for a bot that left evidence of malicious behavior on a site I administer. And I certainly don't think it unreasonable to withhold such information from someone who chooses sarcasm and insults as their primary style of communication.
I think its my last post on this matter unless you are going to provide what is minimally required to check if code is faulty or not.
Good, I thought this thread was over many posts back. I think both sides of the argument have been thoroughly covered. We disagree on what is minimally required to check your code.
Jazzguy, you are not a programmer, are you? actually are you a "guy"? All this thread looked like a discussion with my wife, lol. :)
See, I understand your point, you're not MJ12bot's debugger and are not willing to help. That's fine, it's your choice. But also understand that nobody will stand quiet while it is being accused of buggy software without probe ;)
Lord Majestic, your project seems very interesting, it got my attention you're using MS technology on the distributed client.
Wow, congratulations.
you are not a programmer, are you? actually are you a "guy"?
Yes and yes (insult noted).
See, I understand your point, you're not MJ12bot's debugger and are not willing to help. That's fine, it's your choice. But also understand that nobody will stand quiet while it is being accused of buggy software without probe
That is certainly understandable. And he could have made his debugging requirements known without all of the insults and sarcasm. He also could have accepted the information that I offered him, and waited until after he had evaluated it before making a determination on whether or not it was sufficient.
And he could have made his debugging requirements known without all of the insults and sarcasm.
I asked for robots.txt in the 2nd post in this thread, I fail to see any sarcasm in it, here it is:
Just out of interest, can you post your robots.txt?
You told me off, and since original post was not about my bot I gone away as I generally don't like to get into flame wars, however you proceeded to mention my bot being banned, which prompted me to ask for robots.txt again since it concerned my code that I could check.
I am looking at my crawler stats right now and I see that out of 860k URLs checked in this session just over 33k were disallowed by robots.txt, so clearly my bot does not just ignore robots.txt, and since I know for fact that it works in principle I can't just simulate a case that I have no idea about.
My bot is legit and every HTTP request contains a link to page where more information is given and anyone has chance to contact me with any problems (like other people did). If you choose to remain anonymous and refuse to share publicly available information then its your choice - just don't expect anything to happen with your "report" as it lacks credibility, not least due to your insistance to hide behing anonymity, but mainly due to lack of data minimally necessary to verify it.
Anyway, I think I am going to publish code that does the job (checks if URL should not be retrieved for a given robots.txt) so that anybody who questions MJ12bot's support for robots.txt can see for themselves. We come in peace :)