I recently wrote a simple PHP web app for creating Twitter data collection campaigns. It allows you to download hundreds of thousands of tweets [tested upto 1.2 million tweets on an Amazon EC2 instance] based on a specific set of keywords. The data it produces is stored in a MySQL database which can further be converted to a CSV or any other format of your choice.
The data collected is properly formatted and stored in a MySQL database, here are the fields that are recorded:
Fields per Tweet
Fields per Twitter User
The source code/app can be downloaded from here.
The instructions for its usage are detailed in this blog post.
Step 1: Create a database
Step 2: Create a new Twitter App [link]. Get your apps credentials from Twitter and add them up in db/140dev_config.php.
Step 3: Edit db/db_config.php to match your own MySQL database server settings
Before proceeding forward make sure that you have libssh2 installed on your server, if you haven’t you’ll have to install it now since the script requires the ssh2 library to create a new ssh session to its host machine and run the data collector script in a new screen session.
Ubuntu: sudo apt-get install libssh2-php
OPTIONAL: If you don’t want the script to SSH into your machine you’ll have to run your campaign yourself by running the following commands via a terminal on your host machine:
php get_tweets.php <campaign-name> &
php parse_tweets.php <campaign-name> &
Replace <campaign-name> with the name you chose while creating the campaign.
Important : Don’t forget to restart your webserver once the installation is complete.
Update SSH credentials to match those of your server in the file NiceSSH.class.php
Step 5 [the last step!]: Open up your Webserver to the directory of the TFDC project, and click on create a new campaign.