Tuesday, August 7, 2012

Can You Break My CAPTCHA?

By Gursev Kalra.

I wrote a simple CAPTCHA scheme and wanted to share it with the awesome security community as a CAPTCHA breaking exercise. To solve the CAPTCHA an individual (or machine) will have to enter only the characters with a white border and ignore other text. I understand that at this stage the lettering may sometimes be hard to read, we're working on that, but for now, lets see how far we can get with this POC design. Here are a couple:

Anti-Automation Mechanisms

The main intent was to make Noise removal, Segmentation and Classification interdependent and increase the complexity of automatic solvers. Here are the anti-automation mechanisms in this CAPTCHA:

  1. Closeness of noise to real text : The noise is of same style (i.e. alphanumeric) and superimposed on the real text. The font is of same size as original text and the text to be solved. This increases the difficulty of removing the noise and regular noise removal algorithms may be ineffective. The random background line serves to highlight the white border and also as a noise source.
  2. Hard to Segment : The CAPTCHA solution and noise are mixed up in an unpredictable fashion with random positioning variation on X and Y axis. It may therefore be hard to segment various alphabets from each other single out individual characters for classification.
  3. Anti-Classification :While writing custom solvers, statistical analysis is performed to on CAPTCHA text by plotting the pixel densities against X and Y axis to identify the correct characters. When text is superimposed with no clear indication on demarcation of text boundary, the classification techniques do not work.


You can find 200 samples for download here:

Where to start?

Since this is a new scheme, you may not be able to use any of the popular CAPTCHA breaking tools out there to defeat it. Instead, one approach is to use a graphics editing software like Adobe Photoshop to modify a sample. Once you have a set of actions (e.g. apply effect X then apply filter Y) that you can repeat on multiple samples to reliably solve them - you have a potential solution! Then just post your solution or questions in the comments below and we can discuss!


  1. The trick with this captcha would be to create sample images of every character in each font used, and do pattern matching onto your challenges. The character with the closest match wins. It's relatively easy to classify edges, so that gives plenty of optimisation room.

    You need to do some warping of the text to prevent this.

  2. @codeinsecurity, Thank you for your inputs.

    1. I attempted text warping but that was altering text beyond recognition for a good number of CAPTCHAs. But yeah, it will be a good thing to include and I need to experiment more with warping coefficients.
    2. If this CAPTHCA gets rolled out, the font size will also be varied.
    3. I am interested to know how would pattern matching be successful if both the noise and solution characters are nearly visible?

    1. The noise characters are negated, since they don't match the pattern characters - they'd be interrupted by real characters.

      If you include random character sizes, this will make it more difficult, as long as you vary the size per-character. I can see that the final character tends to be reasonably accessible, so working back from there to identify size might be possible. In fact, heuristically guessing the font size based on any single highly-matching pattern would be relatively trivial.

      I think that if you include the normal "I can't read this, give me another one" feature, you're going to be dealing with a situation where the bot will simply attempt to classify all characters to within a set probability, and ask for a new one if it cannot.

      Whilst this is all quite interesting, I think you may have missed out on a relatively recent revelation: CAPTCHAs are dead. It's very cheap to hire 3rd world labour to solve them, and it's even cheaper to set up a clickjacking botnet to get unsuspecting users to solve them for you. The only reason they're still around is to block the low-hanging fruit attacks. If your CAPTCHA is custom and is not defeated by absolutely trivial automated analysis, a bot probably won't try.

    2. Thank you for your inputs. So I will add random font size (and maybe even font) for each character and adjust the warping coefficients to make it harder.

      I do agree on the 'CAPTCHAs are dead' against the human solvers or clickjacking botnets.

  3. I coded a little attack to remove the noise this afternoon, it doesn't work on all the samples and needs some refinements but it's the first time I try to play with captchas :)
    here is 2 "cleared" images : pic.twitter.com/lbzY8rfe

    1. Very neat Eloi. Thank you for taking out time. So how did you approach it? Do you mind sharing your algorithm?

  4. I noticed the white borders appear to have a fixed width. It may be possible to detect the border areas which are between 2 blobs. Then for each of these, you could decide which of the 2 blobs is most likely to be the character. I'm sure there would be a lot more to it than that and it probably wouldn't work as well as expected, but I might give it a try sometime.