Skip to content

Commit

Permalink
[CodeGen] Add Performance Monitor
Browse files Browse the repository at this point in the history
Add support for -polly-codegen-perf-monitoring. When performance monitoring
is enabled, we emit performance monitoring code during code generation that
prints after program exit statistics about the total number of cycles executed
as well as the number of cycles spent in scops. This gives an estimate on how
useful polyhedral optimizations might be for a given program.

Example output:

  Polly runtime information
  -------------------------
  Total: 783110081637
  Scops: 663718949365

In the future, we might also add functionality to measure how much time is spent
in optimized scops and how many cycles are spent in the fallback code.

Reviewers: bollu,sebpop

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31599

llvm-svn: 299359
  • Loading branch information
tobiasgrosser committed Apr 3, 2017
1 parent 1179470 commit 65371af
Show file tree
Hide file tree
Showing 5 changed files with 473 additions and 0 deletions.
132 changes: 132 additions & 0 deletions polly/include/polly/CodeGen/PerfMonitor.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
//===--- PerfMonitor.h --- Monitor time spent in scops --------------------===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//

#ifndef PERF_MONITOR_H
#define PERF_MONITOR_H

#include "polly/CodeGen/IRBuilder.h"

namespace llvm {
class Function;
class Module;
class Value;
class Instruction;
} // namespace llvm

namespace polly {

class PerfMonitor {
public:
/// Create a new performance monitor.
///
/// @param M The module for which to generate the performance monitor.
PerfMonitor(llvm::Module *M);

/// Initialize the performance monitor.
///
/// Ensure that all global variables, functions, and callbacks needed to
/// manage the performance monitor are initialized and registered.
void initialize();

/// Mark the beginning of a timing region.
///
/// @param InsertBefore The instruction before which the timing region starts.
void insertRegionStart(llvm::Instruction *InserBefore);

/// Mark the end of a timing region.
///
/// @param InsertBefore The instruction before which the timing region starts.
void insertRegionEnd(llvm::Instruction *InsertBefore);

private:
llvm::Module *M;
PollyIRBuilder Builder;

/// Indicates if performance profiling is supported on this architecture.
bool Supported;

/// The cycle counter at the beginning of the program execution.
llvm::Value *CyclesTotalStartPtr;

/// The total number of cycles spent within scops.
llvm::Value *CyclesInScopsPtr;

/// The value of the cycle counter at the beginning of the last scop.
llvm::Value *CyclesInScopStartPtr;

/// A memory location which serves as argument of the RDTSCP function.
///
/// The value written to this location is currently not used.
llvm::Value *RDTSCPWriteLocation;

/// A global variable, that keeps track if the performance monitor
/// initialization has already been run.
llvm::Value *AlreadyInitializedPtr;

llvm::Function *insertInitFunction(llvm::Function *FinalReporting);

/// Add Function @p to list of global constructors
///
/// If no global constructors are available in this current module, insert
/// a new list of global constructors containing @p Fn as only global
/// constructor. Otherwise, append @p Fn to the list of global constructors.
///
/// All functions listed as global constructors are executed before the
/// main() function is called.
///
/// @param Fn Function to add to global constructors
void addToGlobalConstructors(llvm::Function *Fn);

/// Add global variables to module.
///
/// Insert a set of global variables that are used to track performance,
/// into the module (or obtain references to them if they already exist).
void addGlobalVariables();

/// Get a reference to the intrinsic "i64 @llvm.x86.rdtscp(i8*)".
///
/// The rdtscp function returns the current value of the processor's
/// time-stamp counter as well as the current CPU identifier. On modern x86
/// systems, the returned value is independent of the dynamic clock frequency
/// and consistent across multiple cores. It can consequently be used to get
/// accurate and low-overhead timing information. Even though the counter is
/// wrapping, it can be reliably used even for measuring longer time
/// intervals, as on a 1 GHz processor the counter only wraps every 545 years.
///
/// The RDTSCP instruction is "pseudo" serializing:
///
/// "“The RDTSCP instruction waits until all previous instructions have been
/// executed before reading the counter. However, subsequent instructions may
/// begin execution before the read operation is performed.”
///
/// To ensure that no later instructions are scheduled before the RDTSCP
/// instruction it is often recommended to schedule a cpuid call after the
/// RDTSCP instruction. We do not do this yet, trading some imprecision in
/// our timing for a reduced overhead in our timing.
///
/// @returns A reference to the declaration of @llvm.x86.rdtscp.
llvm::Function *getRDTSCP();

/// Get a reference to "int atexit(void (*function)(void))" function.
///
/// This function allows to register function pointers that must be executed
/// when the program is terminated.
///
/// @returns A reference to @atexit().
llvm::Function *getAtExit();

/// Create function "__polly_perf_final_reporting".
///
/// This function finalizes the performance measurements and prints the
/// results to stdout. It is expected to be registered with 'atexit()'.
llvm::Function *insertFinalReporting();
};
} // namespace polly

#endif
1 change: 1 addition & 0 deletions polly/lib/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ add_polly_library(Polly
CodeGen/Utils.cpp
CodeGen/RuntimeDebugBuilder.cpp
CodeGen/CodegenCleanup.cpp
CodeGen/PerfMonitor.cpp
${GPGPU_CODEGEN_FILES}
Exchange/JSONExporter.cpp
Support/GICHelper.cpp
Expand Down
18 changes: 18 additions & 0 deletions polly/lib/CodeGen/CodeGeneration.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

#include "polly/CodeGen/IslAst.h"
#include "polly/CodeGen/IslNodeBuilder.h"
#include "polly/CodeGen/PerfMonitor.h"
#include "polly/CodeGen/Utils.h"
#include "polly/DependenceInfo.h"
#include "polly/LinkAllPasses.h"
Expand All @@ -45,6 +46,11 @@ static cl::opt<bool> Verify("polly-codegen-verify",
cl::Hidden, cl::init(true), cl::ZeroOrMore,
cl::cat(PollyCategory));

static cl::opt<bool>
PerfMonitoring("polly-codegen-perf-monitoring",
cl::desc("Add run-time performance monitoring"), cl::Hidden,
cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));

namespace {
class CodeGeneration : public ScopPass {
public:
Expand Down Expand Up @@ -145,6 +151,18 @@ class CodeGeneration : public ScopPass {
IslNodeBuilder NodeBuilder(Builder, Annotator, this, *DL, *LI, *SE, *DT, S,
StartBlock);

if (PerfMonitoring) {
PerfMonitor P(EnteringBB->getParent()->getParent());
P.initialize();
P.insertRegionStart(SplitBlock->getTerminator());

BasicBlock *MergeBlock = SplitBlock->getTerminator()
->getSuccessor(0)
->getUniqueSuccessor()
->getUniqueSuccessor();
P.insertRegionEnd(MergeBlock->getTerminator());
}

// First generate code for the hoisted invariant loads and transitively the
// parameters they reference. Afterwards, for the remaining parameters that
// might reference the hoisted loads. Finally, build the runtime check
Expand Down
Loading

0 comments on commit 65371af

Please sign in to comment.