Defeating reCAPTCHA with Clarifai
Thomas Clarke, Lily Elshaktori, Christopher Green, Sam Thomas
University of Birmingham
- reCAPTCHA is really irritating!
- Can we break reCAPTCHA image recognition based challenges with the aid of Clarifai’s object recognition API?
- Extract reCAPTCHA iframe from target website;
- Parse webpage:
- Extract search tag;
- Extract image grid;
- Split image grid into component images;
- Query and obtain tags from component images using Clarifai API;
- Correlate tags:
- Use porter stemming algorithm to remove morphological and inflectional suffixes;
- Augment existing tag sets with additional tags due to porter stemmer and synonyms of tags;
- Find matching sets by comparing cardinalities of set intersections against base tag (exact matches);
- Find similar matching sets by comparing set intersections against tags of exact matching images with a threshold (partial matches);
- Generate list of matched images.
- Further development to construct a browser plug-in capable of automatically intercepting reCAPTCHA iframes and solving challenges.