HPC Power Management
My Ph.D. work was on power scheduling for high performance computing (HPC) systems. A single energy efficient large scale HPC system is capable of consuming upwards of 10 megawatts of power, though generally these systems consume only 70% of their maximum. The rate of energy consumption can change substantially and rapidly (several megawatts in under a second) which can create a significant problem for the power grid. My work looks at how to tightly enforce a variable upper bound on total system power consumption.
Realtime Distributed System Introspection
The power scheduling work grew out of my interests in realtime introspection for large scale distributed systems. Large scale computing systems are composed of many geographically dispersed nodes. Management and operations of these platforms can be extremely challenging since errors and poor performance may involve complex interactions between a large number of nodes.
Colorado College Teaching
2017-2018 Academic Year
- Block 1 - Non-teaching
- Block 2 - Parallel Programming
- Block 3 - Capstone Project
- Block 4 - Computer Science 2
- Half-block - Computer Language as Language
- Block 5 - Computer Science 1
- Block 6 - Non-teaching
- Block 7 - Computer Science 1
- Block 8 - Computational Thinking
- Block B - Computational Thinking
Colorado College Projects
HPC computing resources are a useful resource for scientific research and education. Using hardware generously provided by ITS, I built and operate the Math and Computer Science Computational Network (mcscn). This cluster is available to support student and class projects needing compute resource beyond those provided by a commodity laptop. Some additional information on mcscn can be found here.
- Simulating Power Scheduling at Scale (E2SC 2017)
- A Unified Platform for Exploring Power Management Strategies (E2SC 2016)
- Systemwide Power Management with Argo (HPPAC 2016)
- Dynamic Power Sharing for Higher Job Throughput (SC 2015)
- POW: System-wide Dynamic Reallocation of Limited Power in HPC (HPDC 2015)
- Production Hardware Overprovisioning: Real-World Performance Optimization Using an Extensible Power-Aware Resource Management Framework (IPDPS 2017)
- A Scalable Observation System for Introspection and In Situ Analytics (ESPT 2016)
Graduate School Era - 2011-2017
Teaching - Instructor of Record
- CIS322 Introduction to Software Engineering - Winter 2017
- CIS122 - Summer 2013
Teaching - Teaching Assistent
- CIS210 - Winter 2016, Fall 2012, Winter 2012, Fall 2011
- CIS410 - Spring 2014
- CIS211 - Winter 2013, Spring 2013
- CIS110 - Spring 2012
Teaching - Special Projects
- CIS410 Spring 2014 - Designed the Parallel Programming labs and projects
- CIS210 Fall 2013 - Designed the lab component to complement the lectures
- CIS210 Fall 2012 - Assisted in an experimental delivery of CIS210 as a bilingual course using both Python and JAVA
- Summer/Fall 2016 - LLNL - Implementation of RMAP and PowSched algorithms as SLURM plugins
- Summer 2015 - LLNL - Wrote a decoupled power scheduling solution for evaluation
- Summer 2014 - LLNL - Built a hierarchical publish/subscribe system for introspection and control at scale
- Summer 2013 - EGI - Ported a scientific code to OpenCL
Industry Era - 2005-2011
Between my undergraduate and graduate studies, I ran a micro-company in the DC metro area. The projects I worked on fell into three basic categories: big data, operational support systems, and professional training.
Big Data Projects
The largest system I worked on could ingest 2.5 terabytes of operational data per day, 30000 records per second from sources distributed all over the world, and report on events within 40 seconds of occurrence via a relational database. A custom ETL framework was developed to support the rate of data ingestion and deliver high availability. Presenting the data through a traditional relational database provided data consistency and an ability to easily present and analyze the data in the context of other business data.
Several systems of this shape were built different customers, tailoring to their specific business needs and enterprise information architectures. These systems reduced customer service costs, automatically detected misconfigurations, analyzed internal network traffic, and supported information security activities.
Operational Support Systems
Operational support systems (OSSes) are the business applications that track and manage business processes that involve both automated and manual components. My first OSS project was a 12 week project that handled order intake and tracking for a small company. I was responsible for requirements gathering, system implementation, migration of legacy data, and training. The system freed 1 full-time employee, improved customer satisfaction, and enabled a business model for the customer.
College Era - 1998-2005
During my undergrad I worked as a web developer, linux systems administrator, and help desk technician for various departments at the University of Oregon. I also did a little work in support of research efforts in software analysis and networking.