I recently wrote a simple PHP web app for creating Twitter data collection campaigns. It allows you to download hundreds of thousands of tweets [tested upto 1.2 million tweets on an Amazon EC2 instance] based on a specific set of keywords. The data it produces is stored in a MySQL database which can further be converted to a CSV or any other format of your choice.

The data collected is properly formatted and stored in a MySQL database, here are the fields that are recorded:

Fields per Tweet



Fields per Twitter User


The source code/app can be downloaded from here.

The instructions for its usage are detailed in this blog post.

Step 1: Create a database


Step 2: Create a new Twitter App [link]. Get your apps credentials from Twitter and add them up in db/140dev_config.php.step0-scraper

Step 3: Edit db/db_config.php to match your own MySQL database server settings



Step 4: Go to your MySQL database and import the database structure [mysql_database_schema.sql] inside the db folder into your database.step3-scraper

Before proceeding forward make sure that you have libssh2 installed on your server, if you haven’t you’ll have to install it now since the script requires the ssh2 library to create a new ssh session to its host machine and run the data collector script in a new screen session.

Ubuntu: sudo apt-get install libssh2-php

OPTIONAL: If you don’t want the script to SSH into your machine you’ll have to run your campaign yourself by running the following commands via a terminal on your host machine:

_php gettweets.php &

_php parsetweets.php

Replace with the name you chose while creating the campaign. 

Important : Don’t forget to restart your webserver once the installation is complete.


Update SSH credentials to match those of your server in the file NiceSSH.class.php



Step 5 [the last step!]: Open up your Webserver to the directory of the TFDC project, and click on create a new campaign.



comments powered by Disqus