The following is a collection of projects I’ve worked on that focus on data integration. Unfortunately several of them are private projects that involve non-disclosure clauses so several details need to be omitted.
USDA Market News Report Parser
USDA report parser showing a dump of some collected data
  • Private project developed for a hedge fund working in the agriculture industry
  • Collects data out of USDA reports which are human-friendly text reports that are difficult to parse
  • Scheduled execution of report parsing ensures user’s database is always up to date with the latest data
  • Report back-filling supported to populate previous data in the case of data loss or the addition of a new report to the parsing system
  • User-friendly interface for administration based on Django
CharityJobHub Screenshot showing an example of site usage with a listing of jobs
  • Collects job postings from the largest charity and non-profit job posting sites via spraping
  • Harmonizes the job postings into a single database of listings
  • Provides a filtering interface to users to select postings based on the relevance to their careers
  • Supports both positive (“must have one of …”) and negative (“must not have any of …”) filters on all important columns, unlike any job posting site
  • Help users keep track of relevant postings with a subscribable calendar of application deadlines and in-interface marking of “applied” and “interesting” jobs
  • Updates daily and deletes expired postings (which many sites leave up for months)
Upward trend of the number ofcollected jobs from Charity Job Hub
Home Depot Price Historizer
Home Depot Price Historizer screenshow showing some output from the database
  • Private project collecting prices for 99941 Home Depot products Canada-wide
  • Capable of identifying best regional and nation-wide prices
  • Tracks changing prices of all items over time to identify patterns
  • Tracks inventory numbers of all products for all stores nation-wide
Book Market Spider
Screenshot of the output of a report generated from the book market spider
  • Private project collecting prices for books across multiple online book markets
  • Keeps track of offerings, quantities, and prices
  • Used to determine estimates of books sold at what prices
  • Ultimately determines what the market is willing to pay for a particular book
Landlord Finder
Screenshot of the output of a report generated from the landlord finder
  • Private project collecting rental postings across several websites
  • Identifies landlords and property management companies common to several properties
  • Can be used to identify trends in rental pricing
  • Useful for market research for property management companies to identify potential clients