Data Flow Tracking
Data Flow Tracking is currently not supported between multiple fragments in a distributed application.
The Holoscan SDK provides the Data Flow Tracking APIs as a mechanism to profile your application and analyze the data flow between operators in the graph of a fragment.
Currently, data flow tracking is only supported between the root operators and leaf operators of a graph (support for tracking data flow between any pair of operators in a graph is planned in the future).
A root operator is an operator without any predecessor nodes
A leaf operator (also known as a sink operator) is an operator without any successor nodes.
When data flow tracking is enabled, every message is tracked from the root operators to the leaf operators. Then, the maximum (worst-case), average and minimum end-to-end latencies of one or more paths between the root and the leaf operators can be retrieved using the Data Flow Tracking APIs.
The API also provides the ability to retrieve the number of messages sent from the root operators.
The Data Flow Tracking feature is also illustrated in the flow_tracker
Look at the
C++
andpython
API documentation for exhaustive definitions
Before an application (C++
/python
) is run with the run()
method,
data flow tracking can be enabled by calling the track()
method in
C++
and using the Tracker
class in
python
.
auto app = holoscan::make_application<MyPingApp>();
auto& tracker = app->track(); // Enable Data Flow Tracking
// Change tracker and application configurations
...
app->run();
app = MyPingApp()
with Tracker(app) as tracker:
# Change tracker and application configurations
...
app.run()
After an application has been run, data flow tracking results can be accessed by various functions:
-
Prints all data flow tracking results including end-to-end latencies and the number of source messages to the standard output.
-
Returns the number of paths between the root operators and the leaf operators.
get_path_strings()
(C++
/python
)Returns a vector of strings, where each string represents a path between the root operators and the leaf operators. A path is a comma-separated list of operator names.
-
Returns the value of different metrics based on the arguments.
get_metric(std::string pathstring, holoscan::DataFlowMetric metric)
returns the value of a metricmetric
for a pathpathstring
. The metric can be one of the following:holoscan::DataFlowMetric::kMaxE2ELatency
(python
): the maximum end-to-end latency in the pathholoscan::DataFlowMetric::kAvgE2ELatency
(python
): the average end-to-end latency in the pathholoscan::DataFlowMetric::kMinE2ELatency
(python
): the minimum end-to-end latency in the pathholoscan::DataFlowMetric::kMaxMessageID
(python
): the message number or ID which resulted in the maximum end-to-end latencyholoscan::DataFlowMetric::kMinMessageID
(python
): the message number or ID which resulted in the minimum end-to-end latency
get_metric(holoscan::DataFlowMetric metric = DataFlowMetric::kNumSrcMessages)
returns a map of source operator and its edge, and the number of messages sent from the source operator to the edge.
In the above example, the data flow tracking results can be printed to the standard output like the following:
auto app = holoscan::make_application<MyPingApp>();
auto& tracker = app->track(); // Enable Data Flow Tracking
// Change application configurations
...
app->run();
tracker.print();
app = MyPingApp()
with Tracker(app) as tracker:
# Change tracker and application configurations
...
app.run()
tracker.print()
Data flow tracking can be customized using a few, optional configuration parameters. The
track()
method (C++
/Tracker class in python
) can be configured to skip a few messages at
the beginning of an application’s execution as a warm-up period. It is also possible to discard a few
messages at the end of an application’s run as a wrap-up period. Additionally, outlier
end-to-end latencies can be ignored by setting a latency threshold value which is the minimum
latency below which the observed latencies are ignored.
For effective benchmarking, it is common practice to include warm-up and cool-down periods by skipping the initial and final messages.
Listing 37 Optional parameters to
track()
Fragment::track(uint64_t num_start_messages_to_skip = kDefaultNumStartMessagesToSkip,
uint64_t num_last_messages_to_discard = kDefaultNumLastMessagesToDiscard,
int latency_threshold = kDefaultLatencyThreshold);
Listing 38 Optional parameters to
Tracker
Tracker(num_start_messages_to_skip=num_start_messages_to_skip,
num_last_messages_to_discard=num_last_messages_to_discard,
latency_threshold=latency_threshold)
The default values of these parameters of track()
are as follows:
kDefaultNumStartMessagesToSkip
: 10kDefaultNumLastMessagesToDiscard
: 10kDefaultLatencyThreshold
: 0 (do not filter out any latency values)
These parameters can also be configured using the helper functions:
set_skip_starting_messages
,
set_discard_last_messages
and set_skip_latencies
.
The Data Flow Tracking API provides the ability to log every message’s graph-traversal information to a file. This enables developers to analyze the data flow at a granular level. When logging is enabled, every message’s received and sent timestamps at every operator between the root and the leaf operators are logged after a message has been processed at the leaf operator.
The logging is enabled by calling the enable_logging
method in C++
and by providing the filename
parameter to Tracker
in python
.
auto app = holoscan::make_application<MyPingApp>();
auto& tracker = app->track(); // Enable Data Flow Tracking
tracker.enable_logging("logging_file_name.log");
...
app->run();
app = MyPingApp()
with Tracker(app, filename="logger.log") as tracker:
...
app.run()
The logger file logs the paths of the messages after a leaf operator has finished its compute
method.
Every path in the logfile includes an array of tuples of the form:
“(root operator name, message receive timestamp, message publish timestamp) -> … -> (leaf operator name, message receive timestamp, message publish timestamp)”.
This log file can further be analyzed to understand latency distributions, bottlenecks, data flow and other characteristics of an application.