Comment Spam Compiled and Interpreted

Following on from Automated Blog Comment Spam? and the feedback (many thanks), figured I’d compile (and interpret) some of it into something more ordered.

Gnomes or Robots?

The answer to who (or what) is posting comment spam seems to be both sad gnomes with little life and automated scripts / programs. Given that being the case, the conclusion I still have is different approaches are required if we want to prevent human submitted spam vs. script submitted spam (emphasis on the prevent – see “Remove the Incentive” below).

Have yet to find any hard figures but I also imagine the more serious problem is spam automation, based on anecdotal evidence related to attacks on some of the well known blogging apps as well as solutions people have adopted which had a dramatic effect on reducing spam. Obviously any automated process is capable of generating quantities vastly greater than anything possible via manual data entry.

No Bars to Legitimate Use

…or the “Accessibility Curse”. There seems to be a general agreement that posting a comment on a blog must be easy for legitimate users. In fact the ideal scenario is legitimate users should not be impacted at all by whatever spam protection mechanisms are in place.

Some people are willing to require a user sign-up / authentication and have found that’s already enough to discourage spammers. The risk though is discouraging legitimate use. Also, as sites like Hotmail have discovered, it’s quite possible to automate registration and login with scripts, although it’s a lot more work. Really think it suggests making your comment posting API more complex is enough to discourage todays breed of spammers (more on that shortly).

There was some talk about the use of captchas, to sift out the humans from the scripts. The key arguments against were focused on accessibility for legitimate users; are the images actually readable? what about the hearing impaired? A couple of answers there – check out the ASCII-based captchas Wez uses on his blog – very readable but still requiring a PhD in Computer Science to analyse programmatically. Also check out Colin’s thoughts on Turing, With Audio.

Another question on captchas and ingenious ways to circumvent them was raised a while back by Christian here. People seem to have reacted to this like “The End of Captchas!”. In fact I expect this has only happened rarely and it’s also not difficult to stop anyway – either research hotlinking prevention or use Wez’s ASCII captchas which are, by nature, not hotlinkable.

Although it’s possible to captchas in a secure and accessible manner, they’re still an extra step for legitimate users plus I believe they’re overkill for the problem. What’s required is not actually sifting out the human users but rather sifting out the legitimate user agents (web browsers) from the scripts…

Preventing Automation

For me there’s now enough anecdote to suggest that making your posting API a little more complex is enough to block scripts posting spam automatically.

One comment mentioned Pete Bowyer’s simple but effective solution, which requires a single extra step by users with a web browser but would need more than just LWP::Simple to be scripted.

Elsewhere a WordPress user described the immediate effect of simply renaming the POST url on spam. One of the comments following from that was particularly interesting;

The renaming trick works for most of the spam robots – as long as you remember to delete wp-comments-post.php off your server too as somebody mentioned :p There are however, a few robots out there which seem to parse the entire index.php file to find what the comments file name is, I’ve also changed the comment form variables but still a few get through probably because the robot parses the comments form and gets the variable names too. So, as somebody mentioned, this is like the cold war where you have to adapt to constantly keep ahead of the spammers.

For those that go so far as parsing forms, Spam Stopgap Extreme;

This prevents spammers from automatically scraping the form, because anyone wanting to submit a comment *must* execute the javascript md5.

That leaves spammers hunting a Javascript runtime they can use… Having suggested similar of course people pointed out some people surf with Javascript disabled. Another angle might be something like this;

…with a form like;

The knowledge of which form fields are actually meant to be filled in is contained in the CSS. If they get as far as parsing that, it could be made more difficult by relating styles to tags via CSS class selectors. The uniqueId in the POST URL identifies which set of fields contain the real data while a script which parses the form could be fooled into submitting data in the wrong fields, thereby identifying itself. Anyway – serves as yet another possible solution in the arms race…

Blacklisting

Thanks to a tip off from Amit, it turns out there is already a central service to help with blacklisting, described here. There’s also this WordPress plugin which uses some of the RBL (Realtime Blackhole) services which have evolved for dealing with email spam.

If we’re headed in that direction, I guess techniques that have been employed to combat email spam (e.g. Bayesian filters) are worth researching.

Regarding RBLs and blacklisting, this paper (the subject being email spam) highlights some of the problems. In fact, reading that, almost all of the problems being described, apart from “Collateral Damage and Legitimate Users”, relate to RBLs being centralized services.

Bearing that in mind, Marcus’s suggestion could well be the way to go;

RSS would provide a distributed solution.

Not just that, it attaches a name to the data, allowing “consumers” to pick who they trust for their blacklists, rather than a central service where data is provided anonymously.

There’s also a built in mechanism for keep the data fresh and managing bottlenecks. Each blogger keeps their own blacklist which is periodically updated from other people’s feeds. There’s probably a Web Service-killing insight hidden in there as well – something like: “A distributed and scalable Web is not a normalized Web” – but that’s another story…

Remove the Incentive

Simon pointed out how he uses redirects to eliminate PageRank, basically preventing the Googlebot from indexing them.

Personally I still think that eliminating PageRank is the best solution simply because it battles the economics of comment spam. As e-mail spam has shown, as long as there’s an economic incentive spammers will take more and more advanced steps to avoid filters and counter-measures.

Simon’s approach seems to have been highly effective, judging from the lack of spam he gets. Technically I guess this violates the principle of “no bars to legitimate use” – what if you want legitimate users to be able to post links and have Google associate page rank with it? It also assumes you’re dealing with “smart spammers” who realise what you’ve done – it’s not actually prevent spam and a “dumb spammer” may post anyway.

Markus made a similar remark;

There is a third party involved here that could do a lot to help. If we had a simple way of reporting the spam links to Google then the incentive could be destroyed at source. Google could drop any spam promoted website.

To an extent that’s already a possibility, as Simon described here.

Economics

Diana C. told the story of how she dealt with one comment spammer (at the end);

Within 24 hours, I got a response from a wholesale pill supplier, who explained that they received copies of the diet-pills web site’s emailed feedback, and they apologized for the spam, and told me that they were immediately discontinuing their wholesale relationship with the diet-pills web site because they have a strict anti-spam policy.

If that’s representative of comment spammers, they’re simply acting as (semi-authorized) middle-men in a marketing process. One non-technical approach may be to shift the pressure onto the suppliers with “naming and shaming” for those who fail to keep their own house in order.

Finally amusing economic spin, for those looking for opportunities, is Kitten

Replay

Category: programming Time: 2004-12-28 Views: 0
Tags:

Related post

  • What's the difference between compiled and interpreted language? 2010-04-17

    After reading some material on this subject I'm still not sure. I was told this is one of the differences between java and javascript. Would someone please help me in understanding it? Thanks, Mike --------------Solutions------------- What's the diff

  • How to set environment variable for BACI compiler and interpreter? 2015-02-05

    I have installed executable files for BACI compiler and interpreter in my Ubuntu system. But now every time I have to compile my .cm file I have to go to the program folder and run ./bacc and ./bainterp. Can anyone tell me how to set environment vari

  • how compiler and interpreter work in case of array declaration 2013-08-12

    Recently I read somewhere that if an array dimensions are given at run time such a program can be interpreted but can not be compiled. like in this following C++ code int m; cin >> m; int a[m]; I've compiled it using turbo C++ and it's giving me err

  • Understanding the differences: traditional interpreter, JIT compiler, JIT interpreter and AOT compiler 2014-06-26

    I'm trying to understand the differences between a traditional interpreter, a JIT compiler, a JIT interpreter and an AOT compiler. An interpreter is just a machine (virtual or physical) that executes instructions in some computer language. In that se

  • Why does Python need both a compiler and an interpreter? 2015-07-11

    I can understand the fact that Java needs both a compiler and an interpreter. It compiles source code to bytecode and then a virtual machine (on Windows, on Linux, on Android, etc.) translates that bytecode to machine code for the current architectur

  • Automated Blog Comment Spam? 2004-12-14

    Via Simon – MT Plus Comment Spam Equals Dead Site. The subject of blog comment spam bothers me, not so much as a problem in itself but because there's alot of people talking about it (and suffering from it) while, at the same time, little real techni

  • how to compile and run a cpp file using only one command in linux? 2010-06-24

    Recently I've started to learn cpp language on linux,and now I run a cpp file using following commands. g++ -o xxx xxx.cpp ./xxx Is there a way to make it one line command such as compile-and-run xxx.cpp?Thanks. --------------Solutions------------- J

  • How To Stop Comment Spam 2005-03-31

    Spam is no longer limited to email. If you run a Website on which you allow users to leave comments, you have undoubtedly faced the problem of comment spam. The spammers' aim is not to redirect some of your traffic to their site, which is the obvious

  • Comment Spam crashing Mediatemple dv Server 2010-08-20

    I have a dv (Dedicated Virtual) server at Mediatemple where I am hosting over 700 Wordpress blogs. Lately I have been facing a problem with comment spam wherein my the server memory utilization shoots up and Apache crashes. I have resorted to using P

  • G++ Compile and Run Without Output 2010-10-31

    This is just a convenience thing. Is there a way to use C++ sort of like an interpreted language, only in the sense that you could compile and run it without outputting a binary file. This is of course assuming it's a single c++ that doesn't need any

  • Best way to block "comment spam" postings to web forms? 2011-02-28

    Possible Duplicate: Make your site anti-bot? I have a custom web form on my PHP-based site. Recently it is getting a regular stream of comment-spam postings from a few specific IP addresses. Question: What is a good way to block a small set of blackl

  • baffling comment spam 2011-06-09

    I've been seeing some odd comment spam on one of my sites. Odd because there are no links posted. Just, "Wow, that's a really celevr way of thinking about it!" or similar. Note the typo. The messages change but they almost always seem to have a

  • Plans for a D7 version of the Spam module, and possible alternatives? 2011-10-26

    I am thinking about porting an existing D6 site to Drupal 7. The main shortcoming that I see is that the splendid Spam module is still missing for Drupal 7. I don't want to relay on third-party services, so I wonder whether there is any plan to port

  • Stopping comment spam with links(need suggestion) 2012-01-23

    Possible Duplicate: How can I prevent comment spam on sites which I control? After bearing 1 year spammy comments with links in those in my 10 sites, finally I've disallowed comment posting with prefix of "http" or "https" ( with messa

  • Free, "compile and preview as you type" latex editor 2012-08-30

    Possible Duplicate: Is there any way to get real-time compilation for LaTeX? How can I see what am I writing in a TeX editor? I've seen some websites, such as ones on the stack exchange, that are able to interpret and display latex code as you type.

  • Forcing people to read and understand code instead of using comments, function summaries and debuggers? 2013-06-15

    This question already has an answer here: "Comments are a code smell" [closed] 34 answers Is writing comments inside methods not a good practice? [duplicate] 24 answers I am a young programmer (finished computer science university but still unde

  • Headlessly Compiling and Uploading Arduino Programs on an armhf Platform 2013-12-24

    I'm currently running my Hardkernel ODROID U2 with the ODUINO One (Arduino Uno R3 & additions) in a headless setup and connect to the ODROID U2 via SSH over LAN. It is running Debian 7.1 armhf and got all major libraries that are needed to compile so

  • Parser and interpreter knowledge as a way to gauge programmer ability 2014-01-21

    This is only anecdotal evidence but from my past encounters with programmers at various workplaces the programmers that understand the fundamentals of parsing and interpreting seem to be overall better programmers. They also tend to be less religious

  • About auto-compiling and performance between Do and Fold 2014-01-25

    I was investigating how Fold could improve performance vs Do. I tested the code AbsoluteTiming[ sum = 1.0; inc = 1.0; Do[inc = inc*Sin[10.5]/i; sum = sum + Tan[inc], {i, 10^5}]; sum] The output is Out[] = {2.303896, 0.105747} I have hoped that using

iOS development

Android development

Python development

JAVA development

Development language

PHP development

Ruby development

search

Front-end development

Database

development tools

Open Platform

Javascript development

.NET development

cloud computing

server

Copyright (C) avrocks.com, All Rights Reserved.

processed in 5.230 (s). 13 q(s)