69. TEMPERATURE AND COOLING MANAGEMENT IN SERVERS

Department: Computer Science & Engineering
Faculty Advisor(s): Tajana Simunic-Rosing

Primary Student
Name: Christine Shun Yee Chan
Email: csc019@ucsd.edu
Phone: 858-790-8852
Grad Year: 2016

Abstract
In this work we propose a joint energy, thermal and cooling management technique (JETC) that significantly lowers per server cooling and energy costs. State-of-the-art thermal management systems decouple the optimization of cooling and workload scheduling, ignoring the thermal dependencies of CPU and memory and leading to suboptimal solutions and non-linearity in cooling costs. This motivates us to develop a holistic solution that integrates the energy, thermal and cooling management to maximize energy savings with negligible performance hit from workload migrations and emergency throttling. We develop a comprehensive thermal and cooling model, which is used for online optimization in JETC. JETC decisions consider thermal state of CPU & memory, dependencies between them and fan speed to arrive at energy efficient decisions. The CPU actuator saves cooling costs by removing hot spots between and within the sockets, and reduces the effects of thermal coupling with memory. The memory actuator of JETC reduces the operational energy of memory clustering the pages to a subset of memory modules. Extensive experimental results show that employing JETC results in 50.7% average energy reduction of memory and cooling subsystems. We also present some results from current investigations into the impact of fan cooling on hard disk throughput. However, more recent measurements show that fan vibrations in high-end servers and its rack neighbors cause the bandwidth of hard disks to drop dramatically, causing a corresponding decrease in application performance. In this work, we quantify the performance and energy cost effects of the fan vibrations and propose a thermal, energy and cooling technique to manage thermal issues jointly with application SLA. Results show that we can not only meet thermal constraints, but also improve performance as compared to conventional methods.

« Back to Posters or Search Results