Who Paid the Bill?? Tracking Campaign Donations in Congressional Politics | Bills


Methodology: How stuff gets done

The idea behind this project was two-fold:

  1. To collect and present the data requested.
  2. To automate the process of data collection and presentation in such a way that it could be dynamic and incorporate new data through the use of tools like cron.

In order to do this, many techniques were employed, which I will try to briefly outline here:

First, I needed to collect the names and campaign donation information on each Congressman in the current (112th) Congress. To do this, I followed the following process:

  1. Select a site that has the information that I require:
    • I decided that Open Secrets was the best option because they're consistent and highly rated as reputable.
  2. Then, I had to figure out how to get the info from their site. To this end, we discovered that they use a unique numerical identifier for each Congressman. In addition, they make it easy to detect which election cycle you're accessing information from. In the following examples, I've bolded the relevant fields:
  3. Then, I needed to download their index file and run an XSL Stylesheet on it to extract the relevant profile id numbers to a text file (Note: I'm only looking at career and 2010 data right now).
  4. Next, I wrote a shell script to handle the following processes:
  5. Next, I uploaded the transformed XML to the eXist database on Obdurodon so that I could run queries.
  6. Finally, I used xQuery and PHP to search for and output the data that I wanted into a readable format.

Second, I needed to get the bills from the current Congressional session, make them queryable, and present them in a readable format:

  1. These are freely available to the public in relatively decent, though not well-formed, XML from the House.gov site. I followed the same steps above using sed and jTidy to convert the files into well-formed XML.
  2. Then, I used an XSL Stylesheet to extract all of the links from the index file and output them to a text file so I could use an iterative loop and curl to "grab" all of the files
  3. Next, to grab the specific data that I need, because I can link to the full-text of the bill, I applied an XSL Stylsheet and stored the resulting files into the eXist database.
  4. Finally, I used xQuery and PHP to search for and output the data that I wanted into a readable format.

Then, in order to connect the data, I followed the following steps:

  1. Created an XML file that contains each Congressman's name, state, state abbreviation, and party affiliation to use to sycnchronize data:

To Be Continued...