Power estimation through power emulation

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
68Forward
Citations 
0
Petitions 
1
Assignment
First Claim
1. A method comprising producing a circuitimplemented emulation that emulates a powermodelenhanced circuit, the powermodelenhanced circuit comprising a functional circuit and power estimation circuitry, the power estimation circuitry being adapted to generate an estimate of the power consumption of functional circuitry of the functional circuit, the estimate being generated as a function of input signals applied to the circuitimplemented emulation when it is executed.
1 Assignment
0 Petitions
Accused Products
Abstract
The time required to estimate the amount of power that will be consumed by a circuit under design is significantly speeded up. Specifically, the steps involved in power estimation (power model evaluation, aggregation) are implemented as power estimation circuitry that is added to the design of the functional circuit during circuit design. The resulting powermodelenhanced circuit is mapped onto a hardware emulation platform, one of whose outputs is a computation of the estimated power computed by the power estimation circuitry during the emulation. As compared to stateoftheart commercial power estimation tools, speedups from around 10fold to over 500fold can be realized.
75 Citations
View as Search Results
CMOS circuit leakage current calculator  
Patent #
US 7,904,847 B2
Filed 02/18/2008

Current Assignee
Mentor Graphics Corporation

Sponsoring Entity
International Business Machines Corporation

POWER MANAGEMENT OF DATA PROCESSING RESOURCES, SUCH AS POWER ADAPTIVE MANAGEMENT OF DATA STORAGE OPERATIONS  
Patent #
US 20110239013A1
Filed 08/28/2008

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Aggregate power display for multiple data processing systems  
Patent #
US 8,055,926 B2
Filed 02/28/2008

Current Assignee
Lenovo International Limited

Sponsoring Entity
International Business Machines Corporation

PERFORMING DATA STORAGE OPERATIONS WITH A CLOUD STORAGE ENVIRONMENT, INCLUDING AUTOMATICALLY SELECTING AMONG MULTIPLE CLOUD STORAGE SITES  
Patent #
US 20100332401A1
Filed 03/31/2010

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

SYSTEMS AND METHODS FOR MANAGEMENT OF VIRTUALIZATION DATA  
Patent #
US 20100070725A1
Filed 09/03/2009

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

CLOUD GATEWAY SYSTEM FOR MANAGING DATA STORAGE TO CLOUD STORAGE SITES  
Patent #
US 20100333116A1
Filed 03/31/2010

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

CLOUD STORAGE AND NETWORKING AGENTS, INCLUDING AGENTS FOR UTILIZING MULTIPLE, DIFFERENT CLOUD STORAGE SITES  
Patent #
US 20100332818A1
Filed 03/31/2010

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

DATA OBJECT STORE AND SERVER FOR A CLOUD STORAGE ENVIRONMENT, INCLUDING DATA DEDUPLICATION AND DATA MANAGEMENT ACROSS MULTIPLE CLOUD STORAGE SITES  
Patent #
US 20100332456A1
Filed 03/31/2010

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

PERFORMING DATA STORAGE OPERATIONS IN A CLOUD STORAGE ENVIRONMENT, INCLUDING SEARCHING, ENCRYPTION AND INDEXING  
Patent #
US 20100332479A1
Filed 03/31/2010

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

PERFORMING DATA STORAGE OPERATIONS WITH A CLOUD ENVIRONMENT, INCLUDING CONTAINERIZED DEDUPLICATION, DATA PRUNING, AND DATA TRANSFER  
Patent #
US 20100332454A1
Filed 03/31/2010

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

CMOS Circuit Leakage Current Calculator  
Patent #
US 20090210831A1
Filed 02/18/2008

Current Assignee
Mentor Graphics Corporation

Sponsoring Entity
Mentor Graphics Corporation

AGGREGATE POWER DISPLAY FOR MULTIPLE DATA PROCESSING SYSTEMS  
Patent #
US 20090222682A1
Filed 02/28/2008

Current Assignee
Lenovo International Limited

Sponsoring Entity
Lenovo International Limited

ARRANGEMENT FOR TRANSMITTING INFORMATION  
Patent #
US 20080262825A1
Filed 05/02/2007

Current Assignee
Infineon Technologies AG

Sponsoring Entity
Infineon Technologies AG

POWER ESTIMATION EMPLOYING CYCLEACCURATE FUNCTIONAL DESCRIPTIONS  
Patent #
US 20070022395A1
Filed 03/31/2006

Current Assignee
NEC Corporation

Sponsoring Entity
NEC Corporation

Power estimation employing cycleaccurate functional descriptions  
Patent #
US 7,260,809 B2
Filed 03/31/2006

Current Assignee
NEC Corporation

Sponsoring Entity
NEC Corporation

System and method for emulating a logic circuit design using programmable logic devices  
Patent #
US 20060247909A1
Filed 08/18/2005

Current Assignee
Himanshu Sharma, Patkar Sachin B., Desai Madhav P., Purandare Mitra Sudhir

Sponsoring Entity
Himanshu Sharma, Patkar Sachin B., Desai Madhav P., Purandare Mitra Sudhir

SYSTEM AND METHOD FOR ANALYZING POWER CONSUMPTION OF ELECTRONIC DESIGN UNDERGOING EMULATION OR HARDWARE BASED SIMULATION ACCELERATION  
Patent #
US 20060277509A1
Filed 06/05/2006

Current Assignee
Cadence Design Systems Incorporated

Sponsoring Entity
Cadence Design Systems Incorporated

Data object store and server for a cloud storage environment, including data deduplication and data management across multiple cloud storage sites  
Patent #
US 8,285,681 B2
Filed 03/31/2010

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Systems and methods for management of virtualization data  
Patent #
US 8,307,177 B2
Filed 09/03/2009

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Performing data storage operations with a cloud environment, including containerized deduplication, data pruning, and data transfer  
Patent #
US 8,407,190 B2
Filed 03/31/2010

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

System and method for analyzing power consumption of electronic design undergoing emulation or hardware based simulation acceleration  
Patent #
US 8,453,086 B2
Filed 06/05/2006

Current Assignee
Cadence Design Systems Incorporated

Sponsoring Entity
Cadence Design Systems Incorporated

Measuring data switching activity in a microprocessor  
Patent #
US 8,458,501 B2
Filed 07/27/2010

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Performing data storage operations in a cloud storage environment, including searching, encryption and indexing  
Patent #
US 8,612,439 B2
Filed 03/31/2010

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Power management of data processing resources, such as power adaptive management of data storage operations  
Patent #
US 8,707,070 B2
Filed 08/28/2008

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Cloud storage and networking agents, including agents for utilizing multiple, different cloud storage sites  
Patent #
US 8,849,955 B2
Filed 03/31/2010

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Data object store and server for a cloud storage environment, including data deduplication and data management across multiple cloud storage sites  
Patent #
US 8,849,761 B2
Filed 09/14/2012

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Information management of data associated with multiple cloud services  
Patent #
US 8,950,009 B2
Filed 03/07/2013

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Veryifing low power functionality through RTL transformation  
Patent #
US 8,954,904 B1
Filed 04/30/2013

Current Assignee
JASPER DESIGN AUTOMATION INC.

Sponsoring Entity
JASPER DESIGN AUTOMATION INC.

Power management of data processing resources, such as power adaptive management of data storage operations  
Patent #
US 9,021,282 B2
Filed 12/23/2013

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Power aware retention flop list analysis and modification  
Patent #
US 9,104,824 B1
Filed 04/30/2013

Current Assignee
JASPER DESIGN AUTOMATION INC.

Sponsoring Entity
JASPER DESIGN AUTOMATION INC.

Performing data storage operations with a cloud environment, including containerized deduplication, data pruning, and data transfer  
Patent #
US 9,171,008 B2
Filed 03/26/2013

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Information management of data associated with multiple cloud services  
Patent #
US 9,213,848 B2
Filed 01/05/2015

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Unified access to personal data  
Patent #
US 9,262,496 B2
Filed 03/07/2013

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Efficiently restoring execution of a backed up virtual machine based on coordination with virtualmachinefilerelocation operations  
Patent #
US 9,417,968 B2
Filed 09/22/2014

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Efficient livemount of a backed up virtual machine in a storage management system  
Patent #
US 9,436,555 B2
Filed 09/22/2014

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Data object store and server for a cloud storage environment, including data deduplication and data management across multiple cloud storage sites  
Patent #
US 9,454,537 B2
Filed 09/24/2014

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Specification for automatic power management of networkonchip and systemonchip  
Patent #
US 9,477,280 B1
Filed 09/24/2014

Current Assignee
Netspeed Systems Inc.

Sponsoring Entity
Netspeed Systems Inc.

Seamless virtual machine recall in a data storage system  
Patent #
US 9,489,244 B2
Filed 02/17/2016

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Systems and methods to process blocklevel backup for selective file restoration for virtual machines  
Patent #
US 9,495,404 B2
Filed 12/06/2013

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Information management of data associated with multiple cloud services  
Patent #
US 9,571,579 B2
Filed 12/14/2015

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Creation of virtual machine placeholders in a data storage system  
Patent #
US 9,652,283 B2
Filed 10/14/2016

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Archiving virtual machines in a data storage system  
Patent #
US 9,684,535 B2
Filed 03/14/2016

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Virtual server agent load balancing  
Patent #
US 9,703,584 B2
Filed 01/06/2014

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Efficiently restoring execution of a backed up virtual machine based on coordination with virtualmachinefilerelocation operations  
Patent #
US 9,710,465 B2
Filed 09/22/2014

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Systems and methods to identify unprotected virtual machines  
Patent #
US 9,740,702 B2
Filed 06/28/2013

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Creation of virtual machine placeholders in a data storage system  
Patent #
US 9,766,989 B2
Filed 03/28/2017

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Virtual machine change block tracking  
Patent #
US 9,823,977 B2
Filed 12/29/2016

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Efficiently restoring execution of a backed up virtual machine based on coordination with virtualmachinefilerelocation operations  
Patent #
US 9,928,001 B2
Filed 06/23/2016

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

File manager integration with virtualization in an information management system with an enhanced storage manager, including user control and storage management of virtual machines  
Patent #
US 9,939,981 B2
Filed 06/17/2014

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Unified access to personal data  
Patent #
US 9,959,333 B2
Filed 02/01/2016

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Archiving virtual machines in a data storage system  
Patent #
US 9,965,316 B2
Filed 02/27/2017

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Virtual server agent load balancing  
Patent #
US 9,977,687 B2
Filed 06/29/2016

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Virtual machine change block tracking  
Patent #
US 9,983,936 B2
Filed 11/20/2014

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Efficiently restoring execution of a backed up virtual machine based on coordination with virtualmachinefilerelocation operations  
Patent #
US 9,996,534 B2
Filed 06/09/2017

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Virtual machine change block tracking  
Patent #
US 9,996,287 B2
Filed 12/29/2016

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Efficient livemount of a backed up virtual machine in a storage management system  
Patent #
US 10,048,889 B2
Filed 06/23/2016

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Information management of data associated with multiple cloud services  
Patent #
US 10,075,527 B2
Filed 01/06/2017

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Systems and methods to process blocklevel backup for selective file restoration for virtual machines  
Patent #
US 10,108,652 B2
Filed 06/29/2016

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Targeted backup of virtual machine  
Patent #
US 10,152,251 B2
Filed 10/25/2016

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Targeted snapshot based on virtual machine location  
Patent #
US 10,162,528 B2
Filed 10/25/2016

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Data object store and server for a cloud storage environment, including data deduplication and data management across multiple cloud storage sites  
Patent #
US 10,248,657 B2
Filed 09/07/2016

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Information management of data associated with multiple cloud services  
Patent #
US 10,264,074 B2
Filed 08/31/2018

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Data recovery using a cloudbased remote data recovery center  
Patent #
US 10,346,259 B2
Filed 03/06/2013

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Power management of data processing resources, such as power adaptive management of data storage operations  
Patent #
US 10,379,598 B2
Filed 03/25/2015

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

External dynamic virtual machine synchronization  
Patent #
US 10,387,073 B2
Filed 03/29/2017

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Heartbeat monitoring of virtual machines for initiating failover operations in a data storage management system, including virtual machine distribution logic  
Patent #
US 10,417,102 B2
Filed 09/26/2017

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Efficiently restoring execution of a backed up virtual machine based on coordination with virtualmachinefilerelocation operations  
Patent #
US 10,437,505 B2
Filed 02/13/2018

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Efficient livemount of a backed up virtual machine in a storage management system  
Patent #
US 10,452,303 B2
Filed 03/05/2018

Current Assignee
CommVault Systems Incorporated

Sponsoring Entity
CommVault Systems Incorporated

Apparatus and method for connecting a hardware emulator to a computer peripheral  
Patent #
US 20070016396A9
Filed 05/31/2002

Current Assignee
Intellectual Ventures Assets 130 LLC

Sponsoring Entity
Intellectual Ventures Assets 130 LLC

Method and system for debugging an electronic system using instrumentation circuitry and a logic analyzer  
Patent #
US 7,065,481 B2
Filed 07/31/2002

Current Assignee
Synopsys Incorporated

Sponsoring Entity
Synplicity Inc.

Method and system for debugging an electronic system  
Patent #
US 7,072,818 B1
Filed 11/28/2000

Current Assignee
Synopsys Incorporated

Sponsoring Entity
Synplicity Inc.

System and method of workloaddependent reliability projection and monitoring for microprocessor chips and systems  
Patent #
US 20050257078A1
Filed 04/21/2004

Current Assignee
GlobalFoundries Inc.

Sponsoring Entity
GlobalFoundries Inc.

Method and apparatus for efficient registertransfer level (RTL) power estimation  
Patent #
US 20040019859A1
Filed 07/29/2002

Current Assignee
NEC Corporation

Sponsoring Entity
NEC Corporation

Element placement method and apparatus  
Patent #
US 20040139413A1
Filed 08/18/2003

Current Assignee
California Institute of Technology

Sponsoring Entity
California Institute of Technology

Method for improving timing behavior in a hardware logic emulation system  
Patent #
US 20020178427A1
Filed 05/25/2001

Current Assignee
Quickturn Design Systems Incorporated

Sponsoring Entity
Quickturn Design Systems Incorporated

45 Claims
 1. A method comprising producing a circuitimplemented emulation that emulates a powermodelenhanced circuit, the powermodelenhanced circuit comprising a functional circuit and power estimation circuitry, the power estimation circuitry being adapted to generate an estimate of the power consumption of functional circuitry of the functional circuit, the estimate being generated as a function of input signals applied to the circuitimplemented emulation when it is executed.
 2. A circuitimplemented emulation of a powermodelenhanced circuit, the powermodelenhanced circuit comprising a functional circuit that is interconnected with power estimation circuitry, the power estimation circuitry being adapted to generate an estimate of the power consumption of functional circuitry of the functional circuit, the estimate being generated as a function of input signals applied to the circuitimplemented emulation when it is executed.
 3. The method of claims 1 or 2 wherein the circuitimplemented emulation is a general purpose circuit that has been configured to emulate the powermodelenhanced circuit.
 4. The method of claims 1 or 2 wherein the circuitimplemented emulation is an array of gates that have been interconnected in such a way as to emulate the powermodelenhanced circuit.
 5. The method of claims 1 or 2 wherein the circuitimplemented emulation is a programmable gate array that is programmed in such a way as to emulate the powermodelenhanced circuit.
 6. The method of claims 1 or 2 wherein the execution of the circuitimplemented emulation includes applying a set of signals to a portion of the circuitimplemented emulation that emulates the functional circuitry, and receiving an indication of said estimate from a portion of the circuitimplemented emulation that emulates the power estimation circuitry.
 8. The method of claims 1 or 2 wherein the power estimation circuitry estimates the estimated power consumption as a function of a) said input signals and b) coefficients that characterize the power consumption characteristics of the functional circuitry.
 11. The method of claims 1 or 2 wherein the power estimation circuitry includes a least one power model circuit to which at least one of said input signals is applied, said power model circuit generating an estimate of the power consumption of at least a portion of the functional circuitry.
 17. The method of claims 1 or 2 wherein the functional circuitry includes a plurality of clusters each formed of two or more circuit components, and the power estimation circuitry includes power model circuits each associated with a respective one of the clusters, each power model circuit being adapted to estimate the power consumption of each of the circuit components of the associated cluster on a timeshared basis, the clusters being formed in such a way that error in the power estimate made by the power estimation circuitry is less than if the clusters were to be formed in at least one other way.
 23. The method of claims 1 or 2 wherein the functional circuitry comprises at least first and second circuit components, and the power estimation circuitry estimates the power consumption of said first and second circuit components at different associated sampling rates.
 25. A method comprising generating a description of a powermodelenhanced circuit, the powermodelenhanced circuit comprising a functional circuit and power estimation circuitry that is adapted to generate a succession of estimates of the power consumption of a plurality of components of the functional circuit in response to signals that are input to those components, producing a circuitimplemented emulation of the powermodelenhanced circuit by configuring a configurable circuit system in response to the description of the powermodelenhanced circuit, executing the circuitimplemented emulation with a test bench, and obtaining the power consumption estimates from the emulated power estimation circuitry.
1 Specification
This application claims the benefit of U.S. provisional application Ser. No. 60/522,333 filed Sep. 16, 2004.
BACKGROUND OF THE INVENTIONThe present invention relates to techniques for estimating the power consumed by electronic circuits and systems.
Power consumption has emerged as a primary design metric for a wide range of electronic systems. Minimizing and managing power consumption requires appropriate tool support for power consumption estimation (hereinafter “power estimation”) and optimization at various stages in the design methodology, or “design flow.” Extensive research in the low power design area has addressed the problem of power estimation for circuits described at varying levels of abstraction, including the transistor level, logic (or gate) level, registertransfer level, and system level. These technologies have been incorporated into several commercial power estimation tools.
At the transistor level, power estimation is typically performed as a byproduct of circuit simulation. Gatelevel power estimation requires the computation of signal statistics for the signals in the circuit, which can be performed through simulation, probabilistic analysis, or simulation with statistical sampling. Of these, simulation with a comprehensive test bench is the most commonly used in practice, due to its accuracy and the ability to produce detailed feedback such as power breakdown versus time for different circuit components. At the registertransfer level, approaches to power estimation include analytical techniques, characterizationbased macromodels, or fast synthesis into gatelevel descriptions. While a few attempts have been made to perform power estimation at the behavioral level, accuracy is limited due to the lack of structural circuit information in behavioral descriptions. At the system level, most research has focused on developing power models for different system components, including processors, memories, onchip buses and others.
In practice, most current commercial design flows utilize registertransferlevel and gatelevel power estimation tools. However, due to their poor efficiency for large designs, the applicability of those tools is limited until late in the design flow, or they are applied only to small parts of a design.
Advances in fabrication technologies have led to shrinking device sizes and consequently to increasing chip complexities. This increase in complexity is straining the capabilities of conventional power estimation tools. For example, in an experiment conducted by the applicants, registertransferlevel power estimation for a 1.25 million transistor MPEG4 decoder circuit when decoding just 4 frames of a video stream required 43 minutes for one stateoftheart commercial power estimation tool and 55 minutes for another. Gate and transistorlevel power estimation tools can be as much as 100 times slower. The slow speed of power estimation tools limits their utility in the design flow and certainly renders them impractical for use in an iterative manner for architectural exploration. Hence, efficient power estimation for large designs is a critical challenge.
Speedup techniques such as statistical sampling and circuit partitioning for parallel mixedlevel simulation offer useful improvements in efficiency but are not sufficient in the face of everincreasing circuit complexities. Raising the level of abstraction to the system level can lead to substantial efficiency improvements, but accuracy is then significantly compromised.
SUMMARY OF THE INVENTIONPower estimation is typically performed by evaluating softwareimplemented power estimation models (hereinafter “power models”) for different circuit components, based on the input and output values of each component during circuit simulation. The present invention is informed by our prior realization that those power models can themselves be thought of as synthesizable functions and implemented as circuitry—referred to herein as “power estimation circuitry.” See our paper with S. Chakradhar, “Efficient RTL power estimation for large designs,” in Proc. Int. Conf. VLSI Design, January 2003. That paper, as well as all of the prior art cited herein is hereby incorporated by reference as though fully set forth herein.
We refer to our inventive technique as “power emulation.” Power estimation circuitry is added to the circuit description of the design of the circuit whose power is desired to be estimated, referred to herein as the “functional circuit.” The functionalcircuitpluspowerestimationcircuitry—referred to herein as a “powermodelenhanced circuit”—is then emulated by producing, in response to the circuit description, a circuitimplemented emulation that emulates the powermodelenhanced circuit in just the same way that the functional circuit could or would be emulated. Illustratively, the powermodelenhanced circuit is realized on an emulation platform by configuring a configurable circuit system in response to the circuit description. In the disclosed embodiment, in particular, the powermodelenhanced circuit is realized by programming one or more FPGAs (fieldprogrammable gate arrays) of the emulation platform. Among the outputs of the emulated powermodelenhanced circuit, once executed on the emulation platform, is the estimated power that was computed by the power estimation circuitry.
The power estimation circuitry is not intended be included in the final design of the functional circuit. Rather, it is intended that the power estimation circuitry be included in the circuit design only initially, in order to evaluate the power consumption characteristics of the functional circuit. Once the final design of the functional circuit has been decided upon, the functional circuit would be manufactured without the power estimation circuitry. (The power estimation circuitry could, however, be included in the final design if there was some specialized need or desire for it.)
Advantageously, we have found that the present invention can facilitate a speedup in power estimation, as compared to existing power estimation tools, by factors of 10 to over 500, depending on the application, with little or no loss of accuracy in the estimation. Thus, much like functional emulation, the power emulation technique of the present invention can enable the investigation of circuit characteristics in the context of realistic system environments and workloads, such as booting up an operating system. Using prior art power estimation tools, this is a task that can often be achieved as a practical matter only after circuit fabrication.
When added to the functional circuit, the power estimation circuitry could, in many cases, cause the powermodelenhanced circuit to be too large to be handled by whatever emulation platform may be available to the user. In one case, for example, we added power estimation circuitry to the registertransferlevel design of an MPEG4 decoder circuit. It was computed that the invention would decrease the time required for power estimation by a factor of about 400 as compared to a commercially available power estimation tool. However, straightforward realization of the power estimation circuitry using an FPGAbased emulation platform would have increased the overall area (number of primitive FPGA elements required to implement the circuit) by a factor of as much as 18.2, greatly outstripping the capacity of the emulation platform that was available.
In accordance with a feature of the invention, embodiments of the invention keep the size of the powermodelenhanced circuit to workable levels by employing one or more of a suite of techniques that reduce the size of the power estimation circuitry. These include power model reuse across different circuit components, regulating the granularity of components for power modeling, exploiting intercomponent power correlations, resource sharing for power model computations, and the use of block memories for efficient storage within power models.
In particular experiments in which one or more of the aforementioned techniques were used to design the power estimation circuitry, the powermodelenhanced circuit was, on average, 3.1 times the area of the functional circuit, which was well within the capabilities of the considered emulation platform. The amount of time required by that particular design for power estimation was on the order of only 1/200<sup>th</sup>, or 0.5%, of the time required for each of two commercially available power estimation tools. And the cost of power emulation in terms of estimation accuracy averaged a modest 3.4% loss of accuracy.
The invention is applicable at any level of abstraction of the functional circuit, e.g., transistor level, logic (or gate) level, registertransfer level, or system level. Indeed, we believe that the invention can significantly extend the scope of current registertransferlevel, gatelevel or other level power estimation techniques, making them applicable to large designs with little or no tradeoff in accuracy. The advantages of the invention as compared to commercially available power estimation tools are particularly manifest when the functional circuit is particularly large and complex.
BRIEF DESCRIPTION OF THE DRAWINGFIG. 1 is a block diagram of a functional circuit to which has been added power estimation circuitry pursuant to the principles of the present invention;
FIG. 2 is a block diagram (or “netlist”) of a typical power model of the power estimation circuitry of FIG. 1;
FIG. 3 is a flow diagram depicting an illustrative design flow incorporating the principles of the present invention;
FIG. 4 is a flow diagram depicting illustrative details of one of the steps of the design flow depicted in FIG. 3;
FIG. 5 is a generic power model that can be used as the power model for a cluster of components of the functional circuit of FIG. 1; and
FIGS. 611 are charts and graphs helpful in explaining various aspects of the illustrative implementation of the invention.
DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT1.0 Overview
The concept of power emulation pursuant to the principles of the present invention is applicable at different levels of abstraction. It is here presented in the context of registertransfer level (RTL) power estimation. Since RTL descriptions in practice can contain an arbitrary combination of macroblocks (arithmetic units, registers, multiplexers, etc.) and random logic gates, the descriptions herein apply directly to gatelevel descriptions as a special case.
The powermodelenhanced circuit of FIG. 1 includes a functional circuit 10, which is illustratively a binary search circuit of conventional design, represented at the registertransfer level. The binary search circuit 10 includes a number of computational units 101, registers 102 and buses 106, operating under the control of a controller 104. Inputs 105 for the binary search circuit are the conventional “first,” “last,” “value,” and “data” inputs. The output of the binary search circuit, indicated at 111, is labeled “out”.
In accordance with the principles of the invention, functional circuit 10 is interconnected with power estimation circuitry comprising power models 112, power strobe generator 113 and power aggregator 115. The power estimation circuitry is adapted to generate at least one estimate—illustratively a succession of estimates—of the power consumption of at least a portion of the functional circuit, the estimate(s) being generated as a function of input signals applied to the powermodelenhanced circuit once it has been realized as a circuitimplemented emulation (as described below) and the emulation is thereafter executed.
In particular, each RTL component (the various computational units, registers, the controller, etc.) of the binary search circuit 10 has an associated power model. For clarity, not all of the power models are explicitly shown. Moreover, although not shown in this particular FIG., a single power model can be used to service all components in a cluster of the RTL components. This is described in further detail hereinbelow. Each power model computes the current power consumption of the associated functional circuit component whenever the power model is triggered, or strobed, by power strobe generator 113.
Computing the power consumption of a component requires a power model to take account of both the input and output signals of the component. It is possible, however, to not actually connect a component's outputs to the associated power model. Rather, the power model can be designed in such a way—based on a knowledge of the function that the associated component performs—as to take account of what the output of the component will be for a given set of inputs and to thus compute the power consumed by that component. This approach will make the power model more complicated than it would otherwise be, but may be desirable because it reduces the number of leads connecting the functional circuit to the power estimation circuitry and thus achieves circuit simplification at the functional circuit/power estimation circuitry interface.
Power strobe generator 113 provides triggers to each of the power models 112 via strobe leads 114, causing the power models to evaluate the power consumption of the associated circuit components at that particular time. When strobed by power strobe generator 113, each power model outputs a signal to power aggregator 115 indicating the evaluated power consumption of the associated component at that particular time. Power aggregator 115 implements a sequence of additions to accumulate the total power from the outputs of the power models and thus the total power consumption of the RTL components. The total power is output on lead 117.
Power strobe generation is similar to clock generation and is done separately for each clock domain in the design. For example, power strobe generator 113 can receive each of the different clock signals that may be used in the functional circuit and can strobe those power models whose associated components' states are expected to be affected by any given clocking. FIG. 1 shows a single such clock signal being provided on clock lead 116.
Each power model is a circuit implementation of a power macromodel constructed using known techniques. Each macromodel is illustratively a cycleaccurate linearregressionbased macromodel that expresses the power consumed in an RTL component with n input/output bits as<maths id="MATHUS00001" num="1"><math overflow="scroll"><mrow><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><mrow><msub><mi>Coeff</mi><mi>i</mi></msub><mo>*</mo><mrow><mi>T</mi><mo></mo><mrow><mo>(</mo><msub><mi>x</mi><mi>i</mi></msub><mo>)</mo></mrow></mrow></mrow></mrow><mo>,</mo></mrow></math></maths>where Coeff<sub>i </sub>represent the power model coefficients, and T(x<sub>i</sub>) is the transition count (0 or 1) at each input/output bit. Further description of such macromodels can be found, for example, in L. Benini et al, “Regression models for behavioral power estimation,” Proc. Int. Wkshp. Power & Timing Modeling, Optimization, and Simulation (PATMOS), 1996 and in Q. Wu et al, “Cycleaccurate macromodels for RTlevel power analysis,” IEEE Trans. VLSI Systems, vol. 6, pp. 520528, December 1998.
FIG. 2 shows a circuit implementation of such a power model 112 used for the purpose of power emulation pursuant to the principles of the invention. The inputs to the power model include the input/output bits 21 of the associated component being monitored and a power strobe (POW_STROBE) 22 from power strobe generator 113. The output of the power model is an estimate of the associated component's power consumption at the time of the strobe. That estimate is a function of at least a) the input bits and b) coefficients that characterize the power consumption characteristic of the circuitry whose power is being estimated.
In particular, the power model illustratively performs the computation<maths id="MATHUS00002" num="2"><math overflow="scroll"><mtable><mtr><mtd><mrow><mi>Power</mi><mo>=</mo><mi/><mo></mo><mrow><mrow><mrow><mi>tc</mi><mo></mo><mrow><mo>(</mo><mrow><mrow><msub><mi>queue_x</mi><mn>1</mn></msub><mo></mo><mrow><mo>(</mo><mn>0</mn><mo>)</mo></mrow></mrow><mo>,</mo><mrow><msub><mi>queue_x</mi><mn>1</mn></msub><mo></mo><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow></mrow></mrow><mo>)</mo></mrow></mrow><mo>*</mo><msub><mi>Coeff</mi><mn>1</mn></msub></mrow><mo>+</mo><mi>…</mi><mo>+</mo></mrow></mrow></mtd></mtr><mtr><mtd><mrow><mi/><mo></mo><mrow><mrow><mi>tc</mi><mo></mo><mrow><mo>(</mo><mrow><mrow><msub><mi>queue_x</mi><mi>N</mi></msub><mo></mo><mrow><mo>(</mo><mn>0</mn><mo>)</mo></mrow></mrow><mo>,</mo><mrow><msub><mi>queue_x</mi><mi>N</mi></msub><mo></mo><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow></mrow></mrow><mo>)</mo></mrow></mrow><mo>*</mo><msub><mi>Coeff</mi><mi>N</mi></msub></mrow></mrow></mtd></mtr></mtable></math></maths>where, tc represents the transition count (EXCLUSIVEOR) function carried out by exclusiveor gates 24. The inputs to tc come from a set of internal queues 23 that maintain the previous and current values of each component input/output. Since the transition count is a binary value, the multiplications in the power model equation are implemented simply using vector AND gates 25. The products of the coefficients and respective transition counts are added by power summation 26 to obtain the power consumed by the component in the current strobe period. The output of power summation 26 is strobed into output register 28, which is output on lead 29 to power aggregator 115.
FIG. 3 is a flow diagram depicting an illustrative design flow incorporating the principles of the present invention.
Step 31 receives the functional circuit RTL design described in a circuitdescription language such as Verilog, VHDL, or SystemC. This step determines what power models are required for every component in the design. Preconstructed power models are stored in power model library 37. The preconstructed power models are described using the same circuitdescription language in which the components of the functional circuit are described. Reference may be had in this regard to our abovecited January 2003 paper. The required power models determined by step 31 are identified to step 38, which obtains code from library 37 implementing those models. Step 38 derives optimized versions of the models using the techniques of resource sharing and block memory usage as described below, and it stores the derived optimized power models in optimized power model library 35. Step 31 inserts into the RTL design from optimized power model library 35 the code describing the required power models, as well as the other required power estimation circuitry.
Step 32 comprises a number of substeps that are shown in FIG. 4 and are described below. In overview, step 32 optimizes the description of the power model enhanced RTL design so that it can meet a target area budget (based on the capacity of the emulation platform), while minimizing any loss in estimation accuracy. The output of step 32 is an RTL description that is used to configure a general purpose circuit to emulate the powermodelenhanced circuit. In particular, the RTL description is fed for this purpose to FPGA synthesis tool flow at step 33. The resulting logic level description, or netlist, is downloaded to an FPGAbased emulation platform at step 34 for programming of the FPGA—interconnecting its array of gates—to become a circuitimplemented emulation of the powermodelenhanced circuit. The FPGA is then executed by test bench 36, which applies a set of signals to the portion of the emulation that emulates functional circuit 10. The portion of the emulation that emulates the power estimation circuitry thereupon provides indications of the power estimates that it generates. Those estimates, taken over time constitute a power profile for the functional circuit. The power profile, more particularly, may be, for example, a measure of the functional circuit's average power consumption, its peak power consumption, or a cyclebycycle power consumption profile of the entire circuit or any part thereof, as suits the circuit designer's needs. It can also be used to separate the static part of a circuit's power consumption (e.g., leakage) from the dynamic part.
Illustrative details of step 32 are shown in FIG. 4. The methodology takes as its input a) the power model enhanced RTL circuit design and its test bench, b) optimized power model library 35, and c) parameters including a target area constraint (target_area) imposed by the emulation platform and a selected clustering algorithm control factor k as described below. The output of step 32 is a poweremulationreadyRTL description, i.e., a description of the powermodelenhanced circuit, that can meet the constraint target_area with a minimum loss of estimation accuracy.
The following is an overview of the various steps shown in FIG. 4. Further details as to how various of those steps are illustratively implemented are presented thereafter.
Step 41 involves running an RTL simulation using conventional simulation software for a short, userselected interval to generate the power profiles for all the components—that is, their power consumption characteristics over time, given a set of inputs from the test bench. This is done because the power profiles are then used at step 42 to generate various indicators of the components' power consumption characteristics, these being, in this embodiment, (i) mean and (ii) variance of the component power profiles, and (iii) intercomponent power correlation factors. These statistics are used by the area reduction techniques carried out at steps 4345.
Step 43 identifies components whose power consumption statistics are strongly and linearly correlated, based on whether an intercomponent power correlation factor (described below) exceeds a fixed or, alternatively, a userspecified threshold. The power models for components whose power consumption statistics are strongly and linearly correlated are combined into a new power model, which can estimate the power consumption for all the components by monitoring the inputs of any one of the correlated components. This reduces the number of components with unique power models.
Step 44 identifies sets of components for which construction of higher granularity power models is suitable. To the extent that that is the case, optimized power model library 35 is updated accordingly, as shown in FIG. 3 by an arrow from step 32 to library 35. This is desirable since the higher granularity power models can be used for other (subsequent) designs for which one may wish to perform power emulation. Moreover, the process of constructing higher granularity power models is similar to the process of constructing the original power models themselves, making such updating a logical way of constructing the higher granularity power models. Since the number of such sets is exponential, one can use empirical studies to consider only connected components (higher potential of area savings) and small sets with up to three components (likely to have lower loss of estimation accuracy). Finally, if the fitting error for the resultant power model is higher than is adjudged to be desirable, then the new power model is not a good choice and should be dropped.
The task now is to reduce the number of power models further by determining component clusters that can be mapped to generic power models. Steps 4548 provide a twophase strategy in order to meet the target area constraint with a minimum loss of accuracy. In the first phase, at step 45, a hierarchical clustering algorithm is used to determine from among the possible clustering solutions that meet the target area constraint some number k of those solutions. Larger values of k provide greater flexibility in meeting power estimation circuitry design objectives, at a cost of additional time consumed by the design flow. In the second phase, at step 46, we first compute a measure of the relative significance of each component to the overall power profile, based on the component power mean and variance. This allows us to compute a desirable sampling rate for each component (i.e., how often its inputs are sampled by the associated power model) for any given power model latency (i.e., the number of clock cycles that the power model uses to carry out a power computation after having done the sampling).
The areaoptimized solutions of step 45 can result in undersampling (an actual sampling rate that is less than the desirable sampling rate) for some components and oversampling (an actual sampling rate that is greater than the desirable sampling rate) for others. Undersampling can result in higher estimation errors. Hence, Steps 47 and 48 attempt to minimize component undersampling. For each of the k solutions identified in step 45, a classical multiway component swapping between clusters is performed at step 47 to minimize the undersampling. Two components that belong to different clusters are chosen, and the impact of swapping them (moving each into the other's original cluster) on undersampling is computed. A sequence of such swaps is constructed that results in a cumulative reduction in undersampling. In order to explore many solutions, swaps that locally increase the undersampling may be accepted (in the hope that they lead to a sequence with a better cumulative reduction). The k initial solutions produced by step 45 are thus convereted into k further optimized solutions. Step 48 then examines the clustering solutions produced by step 47, and chooses the solution with the lowest undersampling to generate the power model enhanced RTL circuit description ready for power emulation.
Further specifics of steps 4247 are detailed in the following sections.
2.0 Reduction of Area Requirements—Steps 4245
This section presents a suite of techniques that reduce the area requirements of the powermodelenhanced circuit. These techniques are based on the observation that power models dominate the overall circuit area, since they are instantiated for every component in the design. The suite of techniques attempts to reduce the number of power models in a design. They also help make areaefficient implementations of the power model logic, without a significant loss of power estimation accuracy. In a given application, any number of these techniques, including none of them, may be used depending, for example, on the extent to which it is desired or necessary to reduce the size of the power estimation circuitry, and thus of the overall powermodelenhanced circuit, in order to meet constraints imposed by the emulation platform—notably the available FPGA area.
2.1 Power Model ReUse Through Clustering—Step 45
The number of power models required for a design can be reduced by grouping components into clusters and by using a single power model to service all components in a cluster on a timeshared basis. In effect, a component may be considered by the power model (or “sampled”) only once in several cycles, similar to statistical sampling. See, for example, R. Burch et al, “A Monte Carlo approach for power estimation,” IEEE Trans. VLSI Systems, Vol. 1, pp. 6371, March 1993.
The architecture of a generic power model that services a cluster of M components is shown in FIG. 5. It consists of (i) input multiplexers 54a and 54b that select the component inputs 51 to be monitored at a particular time and the corresponding macromodel coefficients, (ii) a ROM 56 containing the arrays of coefficients for each type of component in the cluster, and (iii) a basic Nbit power model 55, such as of the type shown in FIG. 2, for calculating the component power consumption value, where N is the maximum number of input bits that are monitored among all components in a cluster, this being referred to as the maximum bit width. (In this embodiment the outputs of the various components are not measured directly but are taken into account in the design of the power models, as was suggested earlier.) The area of the generic power model is chiefly governed by tradeoffs between the number of components being serviced (which determines the multiplexer size) and the largest bit width component (which determines the size of the adder tree within the power model).
Control logic 58, responsive to an overall clock signal of the powermodelenhanced circuit, controls the selection of which component's inputs are the ones to be sampled at any given time by the power model. To this end, control logic 58 generates a log<sub>2</sub>Mbitwide selection signal that is applied to multiplexers 54a and 54b, thereby identifying the selected component. The algorithm by which control logic 58 generates the selection signal is determined based on how often the various components are to be sampled, per the considerations described above.
In operation, control logic 58 identifies a particular component to multiplexers 54a and 54b. Multiplexer 54a responds by providing the (up to) N input bits of that component to “Inputs” of power model 55. At the same time, multiplexer 54b selects as an address for ROM 56 the address on one of its M Kbitwide inputs associated with the selected component. The selected address is provided to ROM 56, causing the latter to provide N coefficients at “dout” and provide them to power model 55. Power model 55 is thus provided with the inputs necessary for power consumption computation, as described above in connection with FIG. 2, and it provides the computed power on lead 57 to power aggregator 115.
Clustering reduces area because it shares power model resources, but there are a few caveats with the generic power model that affect its efficiency. The maximum number of monitored points from the serviced components determines the power model bit width. For some components in the cluster, this requirement means that the input bits and matching coefficient array are padded with zeros. Coefficient ROM 56 must have a data bit width of N*coeff_width to meet the bandwidth requirement of the power model. At the cost of estimation accuracy, we can relax this requirement and allow multiple cycles for the power model's power computation. ROM 56 is illustratively implemented as a clocked device to support this multicycle feature. The size of ROM 56 is dictated by the heterogeneity of the components in a cluster. When there are multiple instances of the same type of component, only a single copy of the coefficients is stored in the ROM.
FIG. 6 shows the impact of clustering on area reduction and estimation error for a bubble sort circuit that we investigated. The design contained 777 RTL components, and we considered various clustering solutions by varying the number of generic power models allowed. At one extreme, there are 777 power models (with one power model per component) and this configuration results in the highest area cost of about 25,000 LUTs with zero estimation error. (A LUT is a standard area measurement unit in this technology.) When the number of generic power models reduces to six, the area curve is at a minimum value that is 3 times smaller, namely 7,615 LUTs. At the same time the estimation error has risen to about 1%.
As the number of power models is decreased further, we first note that the estimation error increases sharply. This is to be expected, since the estimation error depends on the frequency with which a component is sampled for power consumption, and sampling frequency decreases as the number of components serviced by a model increases. Secondly, we observe that area requirements start increasing again. The parabolic nature of the area curve in FIG. 6 is explained by tradeoffs between multiplexer and adder area costs. Decreasing the number of power models means that each model services more components, thus requiring larger multiplexers, a situation that begins to outweigh the benefits of having fewer adders. Thus, we must carefully consider the conflicting trends imposed by the multiplexer and adder costs of a generic power model while performing clustering.
The clustering is illlustratively carried out using a hierarchical clustering algorithm such as that disclosed in A. K. Jain et al, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, N.J., 1988. This algorithm takes as its input the list of components, and outputs several candidate clustering solutions that meet the specified target area constraint. With an initial state wherein every component forms a distinct cluster and each cluster is associated with a power model, the algorithm proceeds as follows:
 1. Evaluate pairwise the cost of combining two clusters into a single cluster. The cost is given by the size in LUTs of a generic power model that will be used to service all the components in the two clusters. In other words, if CL<sub>i </sub>and CL<sub>j </sub>are two clusters, the area cost of a generic power model that services the cluster CL<sub>i</sub>+CL<sub>j </sub>is approximately given by<maths id="MATHUS00003" num="3"><math overflow="scroll"><mtable><mtr><mtd><mrow><mrow><mi>Area</mi><mo></mo><mrow><mo>(</mo><mrow><msub><mi>CL</mi><mi>i</mi></msub><mo>+</mo><msub><mi>CL</mi><mi>j</mi></msub></mrow><mo>)</mo></mrow></mrow><mo>≈</mo><mi/><mo></mo><mrow><mrow><mrow><mo>(</mo><mrow><mrow><mi>max</mi><mo></mo><mrow><mo>(</mo><mrow><msub><mi>BW</mi><msub><mi>CL</mi><mi>i</mi></msub></msub><mo>,</mo><msub><mi>BW</mi><msub><mi>CL</mi><mi>j</mi></msub></msub></mrow><mo>)</mo></mrow></mrow><mo></mo><mn>1</mn></mrow><mo>)</mo></mrow><mo>*</mo><msub><mi>Area</mi><mi>add</mi></msub></mrow><mo>+</mo></mrow></mrow></mtd></mtr><mtr><mtd><mrow><mi/><mo></mo><mrow><mrow><mi>max</mi><mo></mo><mrow><mo>(</mo><mrow><msub><mi>BW</mi><msub><mi>CL</mi><mi>i</mi></msub></msub><mo>,</mo><msub><mi>BW</mi><msub><mi>CL</mi><mi>j</mi></msub></msub></mrow><mo>)</mo></mrow></mrow><mo>*</mo><mrow><msub><mi>Area</mi><mi>mux</mi></msub><mo></mo><mrow><mo>(</mo><mrow><mrow><mo></mo><msub><mi>CL</mi><mi>i</mi></msub><mo></mo></mrow><mo>+</mo><mrow><mo></mo><msub><mi>CL</mi><mi>j</mi></msub><mo></mo></mrow></mrow><mo>)</mo></mrow></mrow></mrow></mrow></mtd></mtr></mtable></math></maths>
 where, the first term denotes the contribution due to the power model computation and the second term denotes the contribution due to the input multiplexer. BW<sub>CL</sub><sub><sub2>i </sub2></sub>denotes the bit width of the largest component in cluster CL<sub>i </sub>(with cardinality CL<sub>i</sub>), Area<sub>add </sub>denotes the size of a basic adder required to add the products of the power model coefficients and transition counts, and the function Area<sub>mux</sub>(n) returns the area corresponding to a nto1 multiplexer.
 2. Choose the pair of clusters that can be combined to result in the best area savings (Area (CL<sub>i</sub>)+Area (CL<sub>i</sub>)−Area(CL<sub>i</sub>,CL<sub>j</sub>)) and update the bit width of the resultant cluster as max(BW<sub>CL</sub><sub><sub2>i</sub2></sub>,BW<sub>CL</sub>)
 3. Repeat the above steps until k solutions that meet the target area constraint are found or all components are in a single cluster.2.2 Exploiting InterComponent Power Correlations—Steps 4243
The power consumptions of several components in a design are often correlated due to the functional circuit topology. Correlations can be exploited to reduce the number of components being explicitly monitored, since the power consumption of correlated components can be potentially inferred by monitoring one component in that set. For example, if P<sub>x </sub>and P<sub>y </sub>are power consumption variables correlated by a function ƒ such that P<sub>y</sub>=ƒ(P<sub>x</sub>), then we can monitor only component x to obtain P<sub>x</sub>, and apply ƒ to compute P<sub>y</sub>, as long as a selected correlation criterion is met.
In particular, for power emulation, since the correlation function will also be implemented as circuitry, it is desirable for function ƒ to be simple, requiring very few circuit resources. A linear fitting function, for example, meets these requirements. Additionally, the linear correlation must be strong. The correlation between two components can be expressed by the statistical correlation coefficient (p) between two power consumption variables P<sub>x </sub>and P<sub>y </sub>as follows.<maths id="MATHUS00004" num="4"><math overflow="scroll"><mrow><mi>ρ</mi><mo>=</mo><mrow><mfrac><mrow><mi>E</mi><mo></mo><mrow><mo>[</mo><mrow><mrow><mo>(</mo><mrow><msub><mi>P</mi><mi>x</mi></msub><mo></mo><msub><mi>μ</mi><mi>x</mi></msub></mrow><mo>)</mo></mrow><mo></mo><mrow><mo>(</mo><mrow><msub><mi>P</mi><mi>y</mi></msub><mo></mo><msub><mi>μ</mi><mi>y</mi></msub></mrow><mo>)</mo></mrow></mrow><mo>]</mo></mrow></mrow><mrow><msub><mi>σ</mi><mi>x</mi></msub><mo></mo><msub><mi>σ</mi><mi>y</mi></msub></mrow></mfrac><mo>=</mo><mfrac><mrow><mi>Cov</mi><mo></mo><mrow><mo>(</mo><mrow><msub><mi>P</mi><mi>x</mi></msub><mo>,</mo><msub><mi>P</mi><mi>y</mi></msub></mrow><mo>)</mo></mrow></mrow><mrow><msub><mi>σ</mi><mi>x</mi></msub><mo></mo><msub><mi>σ</mi><mi>y</mi></msub></mrow></mfrac></mrow></mrow></math></maths>where μ<sub>x</sub>, μ<sub>y </sub>are the means and σ<sub>x</sub>, σ<sub>y </sub>are the standard deviations of P<sub>x</sub>, P<sub>y </sub>See, for example, G. G. Roussas, A Course in Mathematical Statistics, Second Edition, Academic Press, London, UK, 1997. The value of ρ can vary from −1 to 1, where a large value of ρ (positive or negative) indicates strong linear correlation.
Given a reference component, a threshold value for ρ may be chosen such that any components with a correlation coefficient of at least that amount—that is, components having corresponding power consumption variables that are linearly correlated to at least a predetermined extent—can be grouped together and replaced by a linearly scaled version of the reference component.
The following two examples are provided to show (i) varying degrees of linear correlation between component power and (ii) how components with similar values of ρ can be collapsed into a single power model.
FIG. 7 plots the correlation between the power profiles of various component pairs in the aforementioned bubble sort circuit design. Using a 12to1 multiplexer as the reference component (power consumption P<sub>1</sub>), we examine its correlation with two other 12to1 multiplexers (power consumptions P<sub>2 </sub>and P<sub>3</sub>), and a register that forms an input to our reference component (power consumptions P<sub>4</sub>). FIG. 7(a) shows that P<sub>1 </sub>and P<sub>2 </sub>are perfectly correlated with ρ=1 (it turns out that they are a duplication of the same component implemented in the functional circuit in order to improve the circuit's timing characteristics). FIG. 7(b) shows that components P<sub>1 </sub>and P<sub>3 </sub>are weakly correlated with p=0.263, while FIG. 7(c) shows that P<sub>1 </sub>and P<sub>4 </sub>are strongly correlated nonlinearly, but weakly correlated linearly. Thus, in this example, we monitor P<sub>1</sub>, P<sub>3 </sub>and P<sub>4</sub>, and use P<sub>1 </sub>to infer P<sub>2</sub>.
FIG. 8 illustrates how power correlations can be exploited to optimize the power estimation circuitry for the bubble sort circuit design. The histogram of FIG. 8(a) shows the distribution of correlation coefficients for all components in the design, relative to one specific OR gate. There are 36 components that have a correlation coefficient ρ>0.5 (we assume 0.5 to be the correlation threshold in this example). Therefore, there are 36 components in the bubble sort circuit design whose power consumption can be computed by a power model that monitors only the single OR gate. The computed power is then scaled up to reflect the power consumption of the 36 components. The scaling can be implemented in any of a number of equivalent ways, including (i) as part of the power model itself, (i) as a separate unit that is cascaded to the output of the appropriate power model, or (iii) as part of the power aggregation circuitry. FIG. 8(b) shows the estimation error that results from different approaches to estimating the power consumption of the 36 components identified in FIG. 8(a). The 36 components are responsible for 1.04% of the total power consumption. Ignoring the power consumed by these components when computing the total circuit's power consumption will therefore result in an error of 1.04% (see the bar marked “DROP” in FIG. 8(b). By naively substituting the OR gate power for the power of any component in the group, the estimation error improves to 0.75% (see the bar marked “DIRECT” in FIG. 8(b)). However, based on further analysis, we observed that it was possible to scale the power consumption of the OR gate by a factor of 4 to approximately include the power consumption of the other 36 components. This approach (marked “SCALED” in FIG. 8(b)) results in an estimation error of only 0.13%. To save area, the scaling factor is chosen as a power of 2 so that it can be implemented in circuitry as a bit shift operation.
2.3 Changing Component Granularity—Step 44
A power model enhanced RTL circuit description contains power models for every component in an RTL design. We can modify this policy by increasing the granularity of the components for which power models are (pre)constructed and instantiated. In other words, we can construct a new entity comprising several RTL components, characterize this entity and use the resultant power model. Thus, by increasing the component granularity, we lower the number of power models, leading to a decrease in area. However, as shown by the following example, increasing component granularity has a significant impact on estimation accuracy.
We consider a design that implements the popular DES encryption algorithm and contains several chains of twoinput OR gates. In the power model enhanced RTL circuit description, a power model is dedicated to each OR gate, but we can combine several consecutive gates in a chain to form a wideOR entity and construct the corresponding power model. FIG. 9 plots the impact on estimation accuracy as the size of the coalesced gate increases (from 3 inputs to 11 inputs). The plot shows that the absolute error increases monotonically. This trend can be explained by the fact that when several 2input gates are coalesced and subsumed by a large power model, the internal signals are no longer explicitly modeled and are subject to the effectiveness of the new power macromodel. This implies that it is often only practical to group small numbers of components into a single entity.
3.0 Resource Sharing For Power Model Computation—Step 38
Classical resource sharing techniques can be employed to make the computation in the power model areaefficient. In particular, the power consumption computation performed by a power model can be carried out over multiple powermodelenhanced circuit clock cycles, thereby allowing adder circuitry within the power model to be used multiple times successively in the course of the computation. A power model with N bits of input typically requires a chain of N−1 adders to compute the power. The area requirements can be reduced using a statically scheduled tree configuration.
An adder tree with a width of A adders computes a sum in log<sub>2</sub>(A) cycles, assuming all terms can be read in one cycle. However, the bandwidth limitations of circuitry restrict the number of macromodel coefficients that can be read in a cycle. A scheduler reads one new input value for each adder per cycle, reducing the required bandwidth and simplifying control logic. Assuming a onecycle latency for coefficient storage, the sampling period T<sub>sample </sub>for a power model with bit width N and A adders is given by<maths id="MATHUS00005" num="5"><math overflow="scroll"><mrow><msub><mi>T</mi><mi>sample</mi></msub><mo>=</mo><mrow><mrow><mo>⌈</mo><mfrac><mi>N</mi><mi>A</mi></mfrac><mo>⌉</mo></mrow><mo>+</mo><mrow><msub><mi>log</mi><mn>2</mn></msub><mo></mo><mrow><mo>(</mo><mi>A</mi><mo>)</mo></mrow></mrow><mo>+</mo><mn>1</mn></mrow></mrow></math></maths>
Since resource sharing increases the intercomponent sampling period, estimation error also increases. For example, FIG. 10 plots the area and estimation error for the bubble sorting circuit design as a function of the number of adders allowed per power model. With 8 adders, we obtain the minimum area (7504 LUTs) and the estimation error is almost negligible (0.26%). As expected, estimation error declines as we increase the number of adders per power model. At the same time, area exhibits an interesting trend by descending rapidly, reaching a minimum, and then rising slowly. Scheduling overhead dominates power model area for a small number of adders, where large multiplexers are placed at the input of each adder to select the correct coefficient during each cycle of computation. An increasing number of adders lessens (and often eliminates) the scheduling overhead. Also, adders are areaefficient because FPGA architectures are typically optimized with dedicated carrychain logic. Thus, for a growing number of adders beyond the optimal minimum of 8, we see a slowly increasing curve.
3.1 Using Block Memories—Step 38
When clustering is applied to create a generic power model, there must be a coefficient array for each type of component supported in the cluster. The size of each array increases to match the maximum bit width of the generic model (to avoid extra control logic). If implemented directly using lookup tables in LUTs on an FPGA, the coefficient arrays are a major contributor to the area overhead. Fortunately, FPGAs provide block memories, which are ideal for storing coefficients. It is, in fact, desirable to map the power models' coefficient ROMs to the FPGA's block memories. For example, Xilinx's CORE Generator tool offers the ability to configure a block memory macro with parameters such as width and depth. Since block RAM has at best a onecycle latency, it is essential to read multiple coefficients per cycle. This is achieved by packing coefficients into long words and fetching the data appropriately for the power computations.
4.0 Sampling Rates
Steps 46 and 47 in FIG. 4 relate to component sampling. This section provides further details relative to those steps.
4.1 Determining Optimum Component Sampling Rates—Step 46
We derive the optimum sampling rates for each component based on the observation that components whose power consumptions have a higher mean and variance must be sampled more frequently. Let comp<sub>1</sub>, comp<sub>2 </sub>. . . comp<sub>n </sub>denote n RTL components of a design. Assuming that we are sampling this set of components, the objective is to minimize the aggregate error due to sampling. If δP<sub>i </sub>represents the estimated error due to sampling a component comp<sub>i</sub>, then the aggregate error for the entire design is given by<maths id="MATHUS00006" num="6"><math overflow="scroll"><mrow><mrow><mi>Δ</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><mi>P</mi></mrow><mo>=</mo><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><mrow><mi>δ</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><msub><mi>P</mi><mi>i</mi></msub></mrow></mrow></mrow></math></maths>
Furthermore, during minimization, the errors associated with components with higher power should be considered more significant as compared to the errors associated with components with lower power. Therefore, we weigh the estimated error δP<sub>i </sub>by the fractional power ƒ<sub>i </sub>given by the following:<maths id="MATHUS00007" num="7"><math overflow="scroll"><mrow><msub><mi>f</mi><mi>i</mi></msub><mo>=</mo><mrow><msub><mi>P</mi><mi>compi</mi></msub><mo>/</mo><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><msub><mi>P</mi><mi>compi</mi></msub></mrow></mrow></mrow></math></maths>
Therefore, the objective function being minimized can be written as<maths id="MATHUS00008" num="8"><math overflow="scroll"><mrow><mrow><mi>Minimize</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><mi>Δ</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><mi>P_weighted</mi></mrow><mo>=</mo><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><mrow><msub><mi>f</mi><mi>i</mi></msub><mo>*</mo><mi>δ</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><msub><mi>P</mi><mi>i</mi></msub></mrow></mrow></mrow></math></maths>
For normally distributed power profiles of an RTL component comp<sub>i</sub>, δP<sub>i </sub>is governed by the following equation as described, for example, at R. Burchet al, “A Monte Carlo approach for power estimation,” IEEE Trans. VLSI Systems, Vol. 1, pp. 6371, March 1993:<FORM>δ<sub>comp</sub><sub><sub2>i</sub2></sub>≈t*s<sub>comp</sub><sub><sub2>i</sub2></sub>/√{square root over (N<sub>i</sub>)}</FORM>
In the above equation, s<sub>comp</sub><sub><sub2>i </sub2></sub>refers to the standard deviation of the power profile of comp<sub>i</sub>, N<sub>i </sub>is the number of samples for the component comp<sub>i </sub>and t is a positive constant. Therefore, the objective function can be rewritten as<maths id="MATHUS00009" num="9"><math overflow="scroll"><mrow><mrow><mi>Minimize</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><mi>Δ</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><mi>P_weighted</mi></mrow><mo>=</mo><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><mrow><msub><mi>f</mi><mi>i</mi></msub><mo>*</mo><mrow><msub><mi>s</mi><mi>compi</mi></msub><mo>/</mo><msqrt><msub><mi>N</mi><mi>i</mi></msub></msqrt></mrow></mrow></mrow></mrow></math></maths>
The constraints that must be obeyed during minimization can be formulated as follows. If we denote N<sub>tot </sub>to be the total number of simulation cycles,<FORM>N<sub>1</sub>+ . . . +N<sub>n</sub>≦N<sub>tot</sub>, </FORM><FORM>and </FORM><FORM>N<sub>i</sub>≧1, ∀i=1 . . . n </FORM>
Since the above constraints are linear and the objective function is nonlinear, the minimization problem is a linearly constrained optimization problem. There are many wellknown solvers such as MINOS. See, for example, “Using AMPL/MINOS (http://www.ampl.com/BOOKLETS/amplminos.pdf).” Such a solver can be used to determine the values of N<sub>i</sub>. Once N<sub>1</sub>, . . . , N<sub>n </sub>are determined, the sampling rate for each component R<sub>i </sub>can simply be written down as follows:<FORM>R<sub>i</sub>=N<sub>i</sub>/N<sub>tot </sub></FORM>
FIG. 11 shows the results of the above optimization procedure for the abovementioned DES design. The design contains 1520 RTL components, and for each component, we plot the sampling rates computed based on the mean and standard deviation of the component's power consumption characteristics. For example, point P denotes the highest sampling rate of 0.2864 and corresponds to a component characterized by high mean power (10.8 μW) and high standard deviation (6.1 μW).
4.2 Minimizing Undersampling—Step 47
Let clusters CL<sub>1</sub>, CL<sub>2 </sub>. . . CL<sub>m </sub>denote a solution that is output by the abovementioned hierarchical clustering algorithm. Assuming a uniform sampling rate for all the components in a given cluster, we can determine a measure of the estimation error introduced for a component comp<sub>j </sub>in cluster CL<sub>i </sub>by computing the distance from its optimum sampling rate (denoted by the undersampling factor δR<sub>ji</sub>):<maths id="MATHUS00010" num="10"><math overflow="scroll"><mtable><mtr><mtd><mrow><mrow><mrow><mi>δ</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><msub><mi>R</mi><mi>ji</mi></msub></mrow><mo>=</mo><mrow><msub><mi>R</mi><mi>j</mi></msub><mo></mo><mrow><mn>1</mn><mo>/</mo><mrow><mo></mo><msub><mi>CL</mi><mi>i</mi></msub><mo></mo></mrow></mrow></mrow></mrow><mo>,</mo><mrow><mrow><mi>if</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><msub><mi>R</mi><mi>j</mi></msub></mrow><mo>></mo><mrow><mn>1</mn><mo>/</mo><mrow><mo></mo><msub><mi>CL</mi><mi>i</mi></msub><mo></mo></mrow></mrow></mrow></mrow></mtd></mtr><mtr><mtd><mrow><mrow><mo>=</mo><mn>0</mn></mrow><mo>,</mo><mrow><mrow><mi>if</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><msub><mi>R</mi><mi>j</mi></msub></mrow><mo>≤</mo><mrow><mn>1</mn><mo>/</mo><mrow><mo></mo><msub><mi>CL</mi><mi>i</mi></msub><mo></mo></mrow></mrow></mrow></mrow></mtd></mtr></mtable></math></maths>where, (a) 1/CL<sub>i</sub> denotes the uniform sampling rate for all components in a cluster CL<sub>i </sub>with cardinality CL<sub>i</sub>, (b) R<sub>j </sub>is the optimum sampling rate given in Section 4.1, and (c) the undersampling is zero if the optimum component sampling rates are met by the clustering solution, i.e., if R<sub>j</sub>≦1/CL<sub>i</sub>. Therefore, the aggregate undersampling for the present clustering solution is given by<maths id="MATHUS00011" num="11"><math overflow="scroll"><mrow><mrow><mi>Δ</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><mi>R</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><mrow><mo>(</mo><mrow><msub><mi>CL</mi><mn>1</mn></msub><mo>,</mo><mrow><msub><mi>CL</mi><mn>2</mn></msub><mo></mo><mi>…</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><msub><mi>CL</mi><mi>n</mi></msub></mrow></mrow><mo>)</mo></mrow></mrow><mo>=</mo><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><mrow><munder><mo>∑</mo><mrow><msub><mi>comp</mi><mi>j</mi></msub><mo>∈</mo><msub><mi>CL</mi><mi>i</mi></msub></mrow></munder><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><mrow><mi>δ</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><msub><mi>R</mi><mi>ji</mi></msub></mrow></mrow></mrow></mrow></math></maths>
We minimize ΔR(CL<sub>1</sub>, CL<sub>2 </sub>. . . CL<sub>n</sub>) by using an iterative improvement algorithm based on the KernighanLin heuristic to carefully select components that must be moved to other clusters to reduce undersampling, while ensuring that the target area constraint is not violated. See, for example, B. Kernighan and S. Lin, “An Efficient Heuristic Procedure for Partitioning Graphs,” The Bell System Tech J., Vol. 49, pp. 291307, February 1970. The main steps of the algorithm are briefly outlined below:
 1. For every component (comp<sub>j </sub>in CL<sub>i</sub>), evaluate the gain of moving the component to every other cluster CL<sub>k </sub>from the perspective of undersampling:<maths id="MATHUS00012" num="12"><math overflow="scroll"><mtable><mtr><mtd><mrow><mrow><mi>Gain</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><mrow><mo>(</mo><mrow><msub><mi>comp</mi><mi>j</mi></msub><mo>></mo><msub><mi>CL</mi><mi>k</mi></msub></mrow><mo>)</mo></mrow></mrow><mo>=</mo><mi/><mo></mo><mrow><mrow><mi>Δ</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><mi>R</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><mrow><mo>(</mo><mrow><mrow><msub><mi>CL</mi><mn>1</mn></msub><mo></mo><mi>…</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><msub><mi>CL</mi><mi>i</mi></msub></mrow><mo>,</mo><mrow><msub><mi>CL</mi><mi>k</mi></msub><mo></mo><mi>…</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><msub><mi>CL</mi><mi>n</mi></msub></mrow></mrow><mo>)</mo></mrow></mrow><mo></mo></mrow></mrow></mtd></mtr><mtr><mtd><mrow><mi/><mo></mo><mrow><mi>Δ</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><mi>R</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><mrow><mo>(</mo><mrow><mrow><mrow><msub><mi>CL</mi><mn>1</mn></msub><mo></mo><mi>…</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><msub><mi>CL</mi><mi>i</mi></msub></mrow><mo></mo><msub><mi>comp</mi><mi>i</mi></msub></mrow><mo>,</mo><mrow><msub><mi>CL</mi><mi>k</mi></msub><mo>+</mo><mrow><msub><mi>comp</mi><mi>i</mi></msub><mo></mo><mi>…</mi><mo></mo><mstyle><mtext> </mtext></mstyle><mo></mo><msub><mi>CL</mi><mi>n</mi></msub></mrow></mrow></mrow><mo>)</mo></mrow></mrow></mrow></mtd></mtr></mtable></math></maths>
 2. Evaluate the area in each case. If the target area constraint is not violated, choose the componenttocluster move that results in the highest gain. Here, a move is chosen even if the highest gain is negative (results in increased undersampling) so as to enable better hillclimbing from local minima. Lock the componenttocluster move for the rest of this pass.
 3. Repeat Steps 1 and 2 until all modules are locked, and return the clustering solution with the lowest aggregate undersampling observed.
 4. Terminate algorithm if the clustering solution returned is inferior to the starting solution in aggregate undersampling cost. Otherwise, repeat Steps 1, 2 and 3.5.0 Variations, Alternatives and Uses of Power Emulation
The results obtained from power emulation may be used to redesign the circuit using known design techniques, so that its power consumption is reduced. If the circuit contains a programmable processor, the result of power emulation may also be used to optimize the software running on the processor using known techniques, so that the circuit's power consumption is reduced.
Power emulation can be used to analyze the power consumption of a circuit during manufacturing test, under the application of a given set of test patterns. The results obtained from power emulation may thus be used to optimize the test patterns or the circuit itself so that the power consumption during manufacturing test is minimized.
The power estimation circuitry can be enhanced to process the power estimates computed by the power models in order to produce information useful to the designer. For example, the power estimation circuitry can be enhanced to automatically identify components with the highest power consumption, or components whose power consumption is above a specified threshold.
The power models for different parts of a circuit may operate at different levels of abstraction. For example, consider a circuit that contains a processor, memory, and bus, in addition to other circuitry. The power model for the processor could operate at the instruction level (i.e., compute the processor's power consumption by only observing the sequence of instructions it executes), while the power model for the memory may be based on the type of operations it performs (read, write, idle, etc), and the power model for the bus may be based on the types of transactions it executes.
Power emulation can be extended so that the circuitry added during emulation also computes the voltage drops seen on the supply and ground wires for each circuit component. The power estimation circuitry can also be extended to identify thermal hotspots in the circuit. Another possible extension is to use additional circuitry during emulation to monitor the logical values at a subset of signals in the circuit and compute the electrical noise that would be generated at one or more signals (e.g., due to capacitive or inductive coupling).
The foregoing merely illustrates the principles of the invention. Those skilled in the art will be able to devise numerous arrangements, methods and techniques that, although not explicitly shown or described herein, embody those principles of the invention and thus are within their spirit and scope.