Craig I. Hagan 215 South Broadway #194 Salem, NH 03079 hagan@cih.com Education: University of Massachusetts at Amherst, BSCS SKILLS: Oracle DBA: 11g, RAC, Active Dataguard, Fast Start Failover, Rman backup and recovery O/S: Linux, OpenBSD, FreeBSD, OSF/1, Solaris, SunOS, Windows NT, HPUX, IRIX, AIX, SCO, Mach, UnixWare Networking: TCP/IP, DNS, INN, DHCP, NT RAS, NIS/YP, DCE/DFS, OpenAFS, Coda, NFS, AMD, Cisco routers Software: TIS FWTK, Sendmail V8, Postfix, Qmail, Veritas Netbackup Languages: Perl, C, C++, html, php, cgi scripting, Javascript, UNIX shell scripting EXPERIENCE: Amazon.com 4/2004 - Present Senior Database Engineer * Operation of Merchant related databases, both RAC and non RAC. Handled everything from deployment of new schema, tracking down performance related issues with applications, off hours, support, and scaling of the fleet to insure success during peak shopping season. * Team lead for 10g and 11g upgrades. Validated and improved oracle process to decrease downtime and reduce error. * Team lead for lease replacements. This involved documenting and executing the process of creating additional standby systems, flipping to them, and releasing the old systems. * First production evaluation and deployment of Broker managed fast start failvoer. Wrote procedures and tools to insure proper process in managing a failover, generate correct configuration, and validate that primary and standby systems are being properly run. * Technical and Project lead for moving all databases for European sites from US based datacenters to EU based datacenters. This involved scheduling staff, writing and proving procedures to build and maintain standby databases across the wan as well as tools to quickly manage the additional database fleet to insure viability of project. * Wrote audit framework and toolset which periodically checked databases to insure proper operation and configuration of database and operating system as well as primary/standby systems being configured in a similar manner. * Descaled four node rac cluster to single node to improve performance. * Proof of concept for S3/rman backups plus backup/restore performance testing for 1T+ databases and archive logs modified all tools in our backup/restore chain to work with S3, rman compression, and encryption packaged tools up for use in Amazon's automated deployment system * Rman net duplication testing. Performed proof of concept and initial procedure documentation for using rman net duplication to create a standby from a live database. This is now used in production on a case-by-case basis. * Conducted evaluation and deployment of Active Dataguard. Performed testing both in and out of production as well as writing documentation and procedures around using ADG. Wrote management software to handle dns cname movement based upon status of standby (open/up to date vs. closed or behind) * Tested and deployed 11g SQL Plan management. Wrote documentation and performed inital evaluation of 11g SQL Plan management. This has resulted in increased availabilty, particularly during changes due to applications reliablly getting the correct plan from the optimizer. Amazons.com 4/2003 - 4/2004 Database Engineer * Responsible for developing and maintaining programs and procedures to manage non dataguard standby database systems and archived log management for both RAC and single instance databases. * Develop tools to monitor various aspects of production systems to improve availability. * Manage several corporate databases including design, analysis, capacity planning, and daily operations thereof. * Created a training lab for team members to practice procedures and to simulate fault conditions. * Production support for complex replicated database environment including OLTP and DSS systems running on both RAC and single instance databases. Cray Research 4/2002 - 4/2003 Senior Systems Engineer * Developed CRAY branded release of Linux for rapidly creating and easily managing large clusters of machines. This includes software development, release management, functional verification of the incorporated software. * Enhanced above system to allow customers to transparently pick and choose between compilers and interconnects available for software development/benchmarking. * Created prototype cluster release on itanium2 hardware. * Participated in a comparison between the ia64 system and other computers for competetive analysis. * Responsible for specifying hardware in response to customer bids in terms of appropriately sizing systems within a realistic cost budget. * Evaluated PVFS (a clusterred filesystem) for use in HPC clusters. * Customer deployment of an MTA supercomputer and customization of the system to integrate with their computing infrastructure. 3/2001 - 4/2002 Fat Mice Technologies Principal Consultant working at Cray Research * Linux systems programming including NFS client/server verification and performance enhancements. * Create policies and procedures for installation and maintance of cluster fileservers. * Wrote kernel patches for Linux which improve read io by almost 100% on large fibre attach raid arrays. This patch was accepted into both RedHat and mainline kernel trees. * Sun/Solaris NFS server tuning * Linux Beowulf cluster configuration and development * AFS client and server setup and evaluation on both Linux and Solaris * Evaluation of Linux/Solaris ATM environment using classical IP * Development of transparent firewall proxy system to perform packet size conversion/interrupt offloading for a small packet gigabit ethernet connection to a supercomputer. 12/2000 - 3/2001 Amazon.com Performance Engineer * Evaluate third-party hardware and software to determine performance and scaling ability. * Determine causes and solutions for problems impacting site availability or performance. * Evaluate new high availability solutions for Amazon.com systems. * Profile and benchmark internally written software to identify and improve hotspots as well as to assure an appropriate end user experience. * Create and utilize a replica of Amazon.com's production environment to allow performance and availability testing of software. * Proactively work with software engineers to improve performance and availability of Amazon.com software applications. * Architect new systems solutions to take advantage of new technologies while improving service capacity and decreasing cost to the business. 6/1999 - 12/2000 Amazon.com Systems Engineer * Designed and implemented high capacity electronic mail system capable of handling over 20 million unique messages per day including both customer contact and corporate email. * Wrote a POP/IMAP proxy to allow for the distribution of user load across multiple back-end mail systems to more easily scale corporate mail systems using multiple machines. * Created a mail monitoring/reporting system using in band messages to allow usage tracking and trouble notification across a large pool of mail servers, regardless of location in the network topology. * Backup systems team lead, responsibilities included scaling the system from 4T/day to 8T/day, improving system monitoring, reliability and throuput, establishing policies and procedures, and preparing a proper cost analysis of the system. * Created Linux server standards and procedures including an automated installation environment, customized kernel appropriate for Amazon.com production use, hardware selection/standardization, and performance/capacity/reliability testing of these solutions. * Lead engineer for 24/7 operational support of over 200 machines. This included training new personell, setting standards, creating procedures, writing scripts to automate systems management, as well as on-call support. 8/1998-6/1999 Present GTE Internetworking (BBN Planet) Distributed Hosting Engineer * Architected web server caching strategy for Internet hosting facilities including product selection, network placement and configuration to take advantage of Hopscotch(tm) technology to achieve best path routing to end users. * Specified, designed, implementated file replication system for Solaris, including roll-out and training of NOC administrators. 10/1995 - 8/1998 Consulting Client: Yoyodyne Entertainment, a subdivision of Yahoo! ; 1/1997 - 8/1998 Architected and deployed an AIX and Linux mail system using Qmail for a subscription based marketing program * Currently processes over a million mail messages a day. * Has been tested to scale linearly in direct proportion to the number of mail servers and bandwidth available. Designed and implemented a packet filtering and application gateway firewall to protect internal servers and prevent unauthorized access to internal business systems. Client: Lotus Software; 1/1998 Presented firewall tutorial covering TCP/IP networking, different types of firewalls and how they work, troubleshooting issues which administrators may face and a brief overview of several vendor solutions available at the time. Client: InUnity Corporation; 9/1997-1/1998 Implemented web based mutual fund compliance site using php and Oracle on Linux. * System is currently in use by several major financial companies. Client: Wildfire Communications, Inc; 2/1997-7/1997 Responsible for upgrading LAN from flat to switched and routed topology and deploying production business systems on NT server platforms. * Designed and built custom Internet firewall system using proxy agents and packet filtering technology. * Handled movement of network to new location, including dismantling, re-deploying and wiring all hardware and critical services. * Transitioned network systems to an RFC 1918 unrouted network. * Implemented dialup network access system. * Created hardware and software configuration guidelines. Client: Borland International (Open Environment); 7/1996 - 1/1997; * Responsible for training of junior staff to manage both UNIX systems and the corporate network. * Reconfigure UNIX servers to prevent service outages. * Identify and resolve intermittent network problems. * Install and configure ISDN to connect California and Boston offices prior to frame relay connectivity. * Handle local router and network configuration changed to support frame relay connection to corporate headquarters. * DCE cell troubleshooting and administration. Client: Addison-Wesley Longman; 4/96 - 7/96 * Implemented prototype firewall and recommended final firewall solution to prevent unauthorized access to systems. * Conduct investigations to determine the extent of system penetration after a cracking incident, including creating new policies and procedures to reduce the likelyhood of a recurrance. * Administered UNIX systems * Migrated systems and services from SunOS 4 to Solaris (SunOS 5) * Identified and resolved major trouble areas with UNIX provided services to improve performance and availability. * Responsible for WAN including router configuration and new site addition to the WAN. Client: Federal Home Loan Mortgage Corporation; 10/1995-4/1996 Part of team handling system administration for the Loan Prospector DCE/Encina environment as it transitioned into production status. * Capacity planning and system configuration for new server machines. * Systems administration and deployment for both production and non-production systems. * Second tier support. 9/1994-10/1996 - Open Environment Corporation Systems Engineer Development, deployment and administration of a DCE/DFS computing environment to host both production development, corporate applications, and customer education. * Responsible for network configuration and management including packet filtering and internal/external security. * Evaluated 100mbit ethernet technology for potential use in internal network. * LAN/WAN setup and administration. * Internet services management and deployment. * DCE/DFS cell configuration and administration. 1/1993-9/1994 Department of Computer Science, Umass/Amherst Associate Software Specialist Systems administration and support for a heterogeneous UNIX, PC, and Macintosh network. Migration of data and existing information services to the Alpha platform from Ultrix. Introduction of WWW and SLIP/PPP services. * Systems security sweeps including account management, software updates, and creation of and assuring systems were in compliance with local security policy. * Extended DNS to provide load balancing to increase fault tolerance for mission critical applications. * Maintained an extensive software library for the following platforms: OSF/1, Ultrix, Solaris, SunOS, and IRIX. * Ported public domain software and rewrote local systems to run on the newly introduced DEC Alpha platform. Interests: Sailing, hiking, harmonica, cooking, Linux.