The following is a collection of projects I've worked on that focus on data integration. Unfortunately several of them are private projects that involve non-disclosure clauses so several details need to be omitted.
USDA Market News Report Parser
USDA report parser showing a dump of some collected data
  • Private project developed for a hedge fund working in the agriculture industry
  • Collects data out of USDA reports which are human-friendly text reports that are difficult to parse
  • Scheduled execution of report parsing ensures user's database is always up to date with the latest data
  • Report back-filling supported to populate previous data in the case of data loss or the addition of a new report to the parsing system
  • User-friendly interface for administration based on Django
CharityJobHub Screenshot showing an example of site usage with a listing of jobs
  • Collects job postings from the largest charity and non-profit job posting sites via spraping
  • Harmonizes the job postings into a single database of listings
  • Provides a filtering interface to users to select postings based on the relevance to their careers
  • Supports both positive ("must have one of ...") and negative ("must not have any of ...") filters on all important columns, unlike any job posting site
  • Help users keep track of relevant postings with a subscribable calendar of application deadlines and in-interface marking of "applied" and "interesting" jobs
  • Updates daily and deletes expired postings (which many sites leave up for months)
Home Depot Price Historizer
Home Depot Price Historizer screenshow showing some output from the database
  • Private project collecting prices for 99941 Home Depot products Canada-wide
  • Capable of identifying best regional and nation-wide prices
  • Tracks changing prices of all items over time to identify patterns
  • Tracks inventory numbers of all products for all stores nation-wide
Book Market Spider
Screenshot of the output of a report generated from the book market spider
  • Private project collecting prices for books across multiple online book markets
  • Keeps track of offerings, quantities, and prices
  • Used to determine estimates of books sold at what prices
  • Ultimately determines what the market is willing to pay for a particular book
Landlord Finder
Screenshot of the output of a report generated from the landlord finder
  • Private project collecting rental postings across several websites
  • Identifies landlords and property management companies common to several properties
  • Can be used to identify trends in rental pricing
  • Useful for market research for property management companies to identify potential clients