According to Lawrence Berkeley National Laboratory, due to the explosive growth of artificial intelligence, it is estimated that data centers will consume up to 12 percent of total US electricity by 2028. Improving data center energy efficiency is one way scientists are attempting to make AI more sustainable.
Toward that goal, researchers at MIT and the MIT-IBM Watson AI Lab have developed a rapid prediction tool that tells data center operators how much power running a particular AI workload on a certain processor or AI accelerator chip will consume.
Their method produces reliable power estimates in a few seconds, unlike traditional modeling techniques, which can take hours or even days to obtain results. Furthermore, their forecasting tools can be applied to a wide range of hardware configurations – even emerging designs that have not yet been deployed.
Data center operators can use these estimations to effectively allocate limited resources across multiple AI models and processors, improving energy efficiency. Furthermore, this tool can allow algorithm developers and model providers to assess the potential energy consumption of a new model before it is deployed.
“The AI sustainability challenge is an urgent question we need to answer. Because our inference method is fast, convenient, and provides direct feedback, we hope it will make algorithm developers and data center operators more likely to think about reducing energy consumption,” says Kyungmi Lee, an MIT postdoc and lead author of a paper on this technique.
She is joined on the paper by Xie Song, a graduate student in electrical engineering and computer science (EECS); Yoon Kyung Lee and Xin Zhang, research managers at IBM Research and the MIT-IBM Watson AI Lab; Tamar Eilam, IBM Fellow, chief scientist for sustainable computing at IBM Research and member of the MIT-IBM Watson AI Lab; and senior author Ananth P. Chandrakasan, MIT provost, Vannevar Bush Professor of Electrical Engineering and Computer Science, and member of the MIT-IBM Watson AI Lab. The research is being presented this week at the IEEE International Symposium on Performance Analysis of Systems and Software.
Accelerating Energy Assessment
Inside a data center, thousands of powerful graphics processing units (GPUs) operate to train and deploy AI models. The power consumption of a particular GPU will vary depending on its configuration and the workload it is handling.
Many traditional methods used to estimate energy consumption involve dividing the workload into separate stages and simulating how each module inside the GPU is being used one stage at a time. But AI workloads like model training and data preprocessing are very large and can take hours or even days to simulate in this manner.
“As an operator, if I want to compare different algorithms or configurations to find the most energy-efficient way to proceed, if a simulation would take several days, it would become very impractical,” says Lee.
To speed up the prediction process, the MIT researchers tried to use less-detailed information that could be predicted faster. They found that AI workloads often contain many repeatable patterns. They can use these patterns to generate the information needed for reliable but quick power estimation.
In many cases, algorithm developers write programs to run as efficiently as possible on the GPU. For example, they use well-structured optimizations to distribute work across parallel processing cores and move pieces of data in the most efficient way.
“These optimizations used by software developers create a regular structure and we are trying to take advantage of that,” Lee explains.
The researchers developed a lightweight estimation model, called EnergAIzer, that captures the GPU’s power usage patterns from those optimizations.
an accurate assessment
But while their estimate was fast, the researchers found that it did not take into account all energy costs. For example, whenever a GPU runs a program, it requires a certain energy cost to install and configure that program. Then every time the GPU runs an operation on a portion of the data, an additional energy cost must be paid.
Due to hardware fluctuations or collisions in accessing or transferring data, a GPU may not be able to utilize all of the available bandwidth, causing operations to slow down and consume more energy over time.
To account for these additional costs and variations, the researchers collected real measurements from GPUs to generate correction terms applied to their estimation model.
“This way, we can make predictions faster that are also very accurate,” she says.
Finally, a user can provide information about their workload, such as the AI models they want to run and the number and length of user inputs to process, and EnergAIzer will produce an energy consumption estimate in a matter of seconds.
The user can change the GPU configuration or adjust the operating speed to see how such design choices affect overall power consumption.
When researchers tested Energizer using real AI workload information from real GPUs, it could predict power consumption with only 8 percent error, which is comparable to traditional methods that can take hours to generate results.
Their method can also be used to predict the power consumption of future GPUs and emerging device configurations, as long as the hardware does not change drastically in a short period of time.
In the future, the researchers want to test Energizer on the latest GPU configurations and scale the model so it can be applied to multiple GPUs that are collaborating to run workloads.
“To really make an impact on sustainability, we need a tool that can provide fast energy estimation solutions across the entire stack for hardware designers, data center operators, and algorithm developers, so they can all be more aware of power consumption. With this tool, we’ve taken a step toward that goal,” says Lee.
This research was partially funded by the MIT-IBM Watson AI Lab.