First and foremost, Congratulations to Yoko Harada, one of this year’s Ruby Heroes! She’s made substantial contributions to the Ruby community, specifically for her work on open source projects JRuby and Nokogiri. Yoko inspired me to share an example of one of many projects that were possible thanks to free and open source software, and specifically Nokogiri.
In July 2011, Girl Develop It hosted a Hackathon for Humanity (in the Hamptons thanks to Deborah Jackson & JumpThru!), Nathan Hurst and I used Nokogiri to parse data from Backpage.com and flag potential evidence of human trafficking. We used wget to pull the posts and Nokogiri to parse the data into a Rails application and Postgres database, which I then queried to identify potential child prostitution advertisements. We were able to flag hundreds to be investigated and removed by Backpage.com.The project readme file expands a bit on our methodology- basically, we’d read that many human trafficking rings tend to diversify their crime businesses by engaging in many types of trafficking and other illegal activity, so we cross-referenced some of the phone numbers across different subject areas on the site. We found dozens of posts that once examined by the human eye were clearly not legal, and hardly any false positives. This method proved quite accurate and a lot more manageable than manually sorting through all of the posts.We definitely wouldn’t have pulled that off, especially not in a weekend hackathon, without Nokogiri being free, open source, and delightfully easy to use.The project is on Github at https://github.com/girldevelopit/traffic-report. Note that this is the code from a one-time weekend project, but if you’re interested in forking or just learning from it and building a service to continue this work, please do! We’re happy to share any undocumented lessons or answer questions, just reach out by email (nathan@ohours.org or vanessa@developersforgood.org).