Profile:   Statistical Profiler
1 Toplevel Interface
profile-thunk
profile
2 Collecting Profile Information
create-sampler
3 Analyzing Profile Data
analyze-samples
profile
node
edge
4 Profile Renderers
4.1 Textual Rendering
render
4.2 Graph Rendering
render
7.7.0.7

Profile: Statistical Profiler

The profile collection implements a statistical profiler. The profiling is done by running a background thread that collects stack snapshots either via continuation-mark-set->context or via Errortrace, meaning that the result is an estimate of the execution costs.

When using continuation-mark-set->context, it is limited to the kind of information that continuation-mark-set->context produces (most notably being limited to functions calls, and subject to compiler optimizations); but the result is often useful. In practice, since this method does not require recompilation of your source and has very little runtime overhead, it can be used for longer runs which compensates for these limits.

When using Errortrace, profiles are more precise and more fine-grained (expression-level instead of function-level) but profiling has higher overhead and recompilation may be necessary.

    1 Toplevel Interface

    2 Collecting Profile Information

    3 Analyzing Profile Data

    4 Profile Renderers

      4.1 Textual Rendering

      4.2 Graph Rendering

1 Toplevel Interface

The profiler can be invoked directly from the command-line using the raco profile command, which takes a file name as argument, and runs the profiler on the main submodule of that file (if it exists), or on the module itself (if there is no main submodule). To allow control over its behavior, raco profile accepts flags that correspond to those of profile-thunk below.

 (require profile) package: profile-lib

This module provides one procedure and one macro that are convenient high-level entry points for profiling expressions. It abstracts over details that are available through other parts of the library, and is intended as a convenient tool for profiling code.

procedure

(profile-thunk thunk    
  [#:delay delay    
  #:repeat iterations    
  #:threads threads?    
  #:render renderer    
  #:periodic-renderer periodic-renderer    
  #:use-errortrace? use-errortrace?    
  #:order order])  any/c
  thunk : (-> any/c)
  delay : (>=/c 0.0) = 0.05
  iterations : exact-nonnegative-integer? = 1
  threads? : any/c = #f
  renderer : (-> profile? (or/c 'topological 'self 'total) any/c)
   = text:render
  periodic-renderer : 
(or/c #f (list/c (>=/c 0.0)
                 (-> profile?
                     (or/c 'topological 'self 'total)
                     any/c)))
   = #f
  use-errortrace? : any/c = #f
  order : (or/c 'topological 'self 'total) = 'topological
Executes the given thunk and collect profiling data during execution, eventually analyzing and rendering this. Returns the value of the profiled expression. Keyword arguments can customize the profiling:
  • The profiler works by starting a “sampler” thread to periodically collect stack snapshots (using continuation-mark-set->context). To determine the frequency of these collections, the sampler thread sleeps delay seconds between collections. Note that this will be close, but not identical to, the frequency in which data is actually sampled. (The delay value is passed on to create-sampler, which creates the sampler thread.)

  • Due to the statistical nature of the profiler, longer executions result in more accurate analysis. You can specify a number of iterations to repeat the thunk to collect more data.

  • Normally, the sampler collects only snapshots of the current-thread’s stack. Profiling a computation that creates threads will therefore lead to bad analysis: the timing results will be correct, but because the profiler is unaware of other threads the observed time will be suspiciously small, and work done in other threads will not be included in the results. To track all threads, specify a non-#f value for the threads? argument—this will execute the computation in a fresh custodian, and keep track of all threads under this custodian.

  • Once the computation is done and the sampler is stopped, the accumulated data is analyzed (by analyze-samples) and the resulting profile value is sent to the renderer function. See Profile Renderers for available renderers.

  • To provide feedback information during execution, specify a periodic-renderer. This should be a list holding a delay time (in seconds) and a renderer function. The delay determines the frequency in which the renderer is called, and it should be larger than the sampler delay (usually much larger since it can involve more noticeable overhead, and it is intended for a human observer).

  • When use-errortrace? is not #f, more accurate stack snapshots are captured using Errortrace. Note that when this is provided, it will only profile uncompiled files and files compiled while using errortrace-compile-handler, and the profiled program must be run using
      racket -l errortrace -t program.rkt
    Removing compiled files (with extension .zo) is sufficient to enable this.

  • The order value is passed to the renderer to control the order of its output. By default, entries in the profile are sorted topologically, but they can also be sorted by the time an entry is on top of the stack ('self) or appears anywhere on the stack ('total). Some renderers may ignore this option.

syntax

(profile expr keyword-arguments ...)

A macro version of profile-thunk. Keyword arguments can be specified as in a function call: they can appear before and/or after the expression to be profiled.

2 Collecting Profile Information

 (require profile/sampler) package: profile-lib

procedure

(create-sampler to-track 
  delay 
  [super-cust 
  custom-keys 
  #:use-errortrace? use-errortrace?]) 
  ((symbol?) (any/c) . ->* . any/c)
  to-track : 
(or/c thread? custodian?
      (listof (or/c thread? custodian?)))
  delay : (>=/c 0.0)
  super-cust : custodian? = (current-custodian)
  custom-keys : (listof any/c) = '()
  use-errortrace? : any/c = #f
Creates a stack-snapshot collector thread, which tracks the given to-track value every delay seconds. The to-track value can be either a thread (track just that thread), a custodian (track all threads managed by the custodian), or a list of threads and/or custodians. If a custodian is given, it must be subordinate to super-cust, which defaults to the current custodian.

When custom-keys are provided, the sampler takes snapshots of the continuation marks corresponding to the given keys, in addition to taking snapshots of the stack.

When use-errortrace? is not #f, the errortrace-key is used to sample snapshots instead of the implicit key used by continuation-mark-set->context.

The resulting value is a controller function, which consumes a message consisting of a symbol and an optional argument, and can affect the sampler. The following messages are currently supported:
  • 'pause and 'resume will stop or resume snapshot collection. These messages can be nested. Note that the thread will continue running—it will just stop collecting snapshots.

  • 'stop kills the sampler thread. It should be called when no additional data should be collected. (This is currently irreversible: there is no message to start a new sampler thread.)

  • 'set-tracked! with a value will change the tracked object(s) which were initially specified as the to-track argument.

  • 'set-tracked! with a numeric value will change the delay that the sampler is taking between snapshots. Note that although changing this means that the snapshots are not uniformly distributed, the results will still be correct: the cpu time between samples is taken into account when the collected data is analyzed.

  • A 'get-snapshots message will make the controller return the currently collected data. Note that this can be called multiple times, each call will return the data that is collected up to that point in time. In addition, it can be (and usually is) called after the sampler was stopped.

    The value that is returned should be considered as an undocumented internal detail of the profiler, intended to be sent to analyze-samples for analysis. The reason this is not done automatically, is that a future extension might allow you to combine several sampler results, making it possible to combine a profile analysis from several individual runs, possibly from different machines.

  • Finally, a 'get-custom-snapshots message will make the controller return the currently collected snapshots corresponding to custom-keys. This returns a list of samples, where each sample is a list of vectors of marks in the same format as the output of continuation-mark-set->list*.

3 Analyzing Profile Data

 (require profile/analyzer) package: profile-lib

Once a profile run is done and the results are collected, the next step is to analyze the data. In this step sample times are computed and summed, a call-graph representing observed function calls is built, and per-node and per-edge information is created. This is the job of the main function provided by profile/analyzer.

procedure

(analyze-samples raw-sample-data)  profile?

  raw-sample-data : any/c
This function consumes the raw result of the sampler (given in an undocumented form), analyzes it, and returns a profile value holding the analyzed results. Without this function, the results of the sampler should be considered meaningless.

struct

(struct profile (total-time
    cpu-time
    sample-number
    thread-times
    nodes
    *-node))
  total-time : exact-nonnegative-integer?
  cpu-time : exact-nonnegative-integer?
  sample-number : exact-nonnegative-integer?
  thread-times : 
(listof (cons exact-nonnegative-integer?
              exact-nonnegative-integer?))
  nodes : (listof node?)
  *-node : node?
Represents an analyzed profile result.

struct

(struct node (id src thread-ids total self callers callees))

  id : (or/c #f symbol? any/c)
  src : (or/c #f srcloc?)
  thread-ids : (listof exact-nonnegative-integer?)
  total : exact-nonnegative-integer?
  self : exact-nonnegative-integer?
  callers : (listof edge?)
  callees : (listof edge?)
Represents a function call (or, when using Errortrace, expression) node in the call graph of an analyzed profile result.

struct

(struct edge (total caller caller-time callee callee-time))

  total : exact-nonnegative-integer?
  caller : node?
  caller-time : exact-nonnegative-integer?
  callee : node?
  callee-time : exact-nonnegative-integer?
Represents an edge between two function call nodes in the call graph of an analyzed profile result or, in Errortrace mode, an edge corresponding to an evaluation step between two expressions.

4 Profile Renderers

After collecting the profile samples and analyzing the data, the last step of the profiling process is to render the results. The profile collection provides several renderers, each providing a rendering function that consumes a profile instance. See the analyzer section for a description of the profile struct if you want to implement a new renderer.

4.1 Textual Rendering

 (require profile/render-text) package: profile-lib

procedure

(render profile-data    
  [order    
  #:truncate-source truncate-source    
  #:hide-self hide-self%    
  #:hide-subs hide-subs%])  void?
  profile-data : profile?
  order : (or/c 'topological 'self 'total) = 'topological
  truncate-source : exact-nonnegative-integer? = 50
  hide-self% : (between/c 0 1) = 1/100
  hide-subs% : (between/c 0 1) = 1/50
Prints the given profile results as a textual table.

The printout begins with general information about the profile, followed by a table with an entry for each node in the call graph. The entries are displayed in a topological order by default (roughly, since the graph can have cycles). This means that it is usually easy to find the callers and callees of a function in its close environment.

Each row in the table has the following format:

                          B [M1] M2%

  [N1] N2(N3%) N4(N5%)  A ...path/to/source.rkt:12:34

                          C [M3] M4%

with the following meaning of the numbers and labels:

The function has a few keyword arguments to customize its output:
  • The truncate-source argument determines the length that the source string should take (together with its label).

  • hide-self% and hide-subs% control hiding some of the nodes. A node is hidden if its self time (N3 in the above example) is smaller than hide-self% and if all places where it occurs as a caller or a callee have percentages that are smaller than hide-subs%. The reason for requiring both conditions is to avoid having “dangling references” to hidden nodes.

  • The order argument determines the order in which entries appear in the output. If order is 'topological (the default), entries are sorted topologically, grouping callers and callees close together. If order is 'self, entries are sorted by how often they appear at the top of a stack snapshot. If order is 'total, entries are sorted by how often they appear anywhere in a stack snapshot.

4.2 Graph Rendering

 (require profile/render-graphviz) package: profile-lib

procedure

(render profile-data    
  [order    
  #:hide-self hide-self%    
  #:hide-subs hide-subs%])  void?
  profile-data : profile?
  order : (or/c 'topological 'self 'total) = 'topological
  hide-self% : (between/c 0 1) = 1/100
  hide-subs% : (between/c 0 1) = 1/50
Prints the given profile results as a Graphviz directed graph.

This is an experimental module, provided mostly as a proof-of-concept. It renders the profile’s call-graph as a graph representation for one of the Graphviz tools to render. Nodes are colored according to their ‘self’ percentages, and edges.

The keyword arguments control hiding nodes in the same way as with the textual renderer. The order argument is ignored.