The Isaac Scheduler

The Isaac SDK uses a scheduler to manage and dispatch work, following the Earliest Deadline First (EDF) model. The scheduler attempts to ensure that jobs are dispatched according to their desired start times. The scheduler also manages multithreading concerns and core management. See the Scheduling Interface and Policy section for details.

Jobs are specified by a configuration structure and referenced via opaque handles after creation. Jobs are dispatched either to a pool of worker threads or executed on a dedicated thread. The scheduler also tracks statistics for jobs when possible. See the Job Description and Job Statistics sections for more information.

Thread and core management are configurable, and scheduler timing can be optimized with clock scaling and a system called time machine. Time machine removes periods of idle time in the schedule to accelerate training times. See the Scheduler Configuration and Time Machine and Clock Scaling sections for more information.

All codelets use the scheduler implicitly, reducing the need to actively manage jobs. In most cases it is unnecessary to create jobs directly. However, understanding the scheduler makes it easier to optimize application graphs and codelet runtime performance.

Job Description

The scheduler operates in terms of JobDescriptors and JobHandles. A JobDescriptor specifies how a job is to be executed and a JobHandle is used to track a created job.

The JobDescriptor has the following fields:

  • JobAction action: This is a function pointer which takes no arguments and has no return value.

  • std::string execution_group: This optional field specifies which execution unit is responsible for executing the job. If no name is specified, a default group is assigned based on execution characteristics. Execution groups define relationships between jobs and hardware cores. See the Scheduler Configuration section for more information.

  • std::string name: This optional field can hold a human-readable name to help with diagnostics.

  • int64_t period: This field specifies the period of the job in nanoseconds. This is only applicable to jobs which are periodic. This field is mutually exclusive with target_start_time.

  • int64_t target_start_time: This field specifies the target start time of the job in nanoseconds. It only applies to events and oneshot jobs.

  • int64_t slack: This field specifies how much delay is allowable when scheduling a job. This is mainly used as a tiebreaker when two jobs are scheduled for the same time window.

  • std::optional<int64_t> deadline: This value in nanoseconds is used to detect if a job is running longer than desired. By default, periodic jobs set this value to their period.

  • int priority: The value in this field is used as a tie breaker if two jobs start in the same time window. Higher values have priority.

  • ExecutionMode execution_mode: This field specifies how a job is executed. Options include the folllowing:

    • kBlocking: The job is executed repeatedly on a dedicated thread. This is useful for dedicated hardware polling or other activities which must run repeatedly and have dedicated resources.
    • kBlockingOneShot: The job is executed once on a dedicated thread.
    • kOneShotTask: The job is executed once within the worker thread pool.
    • kPeriodicTask: The job is executed periodically within the worker thread pool.
    • kEventTask: The job is executed on event notification within the worker thread pool.
  • EventTriggerMode event_trigger_mode: This field specifies how receiveing multiple events is handled. Options include the following:

    • kIgnoreWhileRunning: An event is ignored if it is received while the job is scheduled or executing.
    • kQueueAtMostOnce: If an event is received while the job is scheduled or executing, one more instance (at most) of the job is scheduled to execute.
  • bool has_statistics: This field controls if runtime statistics are collected for the job. Default behavior is to collect statistics.

See gems/scheduler/job_descriptor.hpp for more details.

Job Statistics

Jobs which execute within the worker thread pools can collect behavior statistics. Jobs which execute on their own threads do not collect statistics as there is no clearly defined way to measure their performance. A summary report of all jobs executed is written to the log file at the end of application execution.

Below is a sample jobs report. A report is generated for each execution group. See gems/scheduler/job_statistics.hpp for more details about what is tracked.

===============================================================================================================================================
|                                              Job Statistics Report                                                                          |
===============================================================================================================================================
Name                                               | Job Mode          |   #Run |     Avg Time |     Load |  Overrun | Avg Overrun | Exec Delay
-----------------------------------------------------------------------------------------------------------------------------------------------
websight/WebsightServer                            | Blocking          |    N/A |       N/A ms |    N/A % |    N/A % |    N/A ms   |    N/A mus
NodeQueue                                          | Blocking One Shot |    N/A |       N/A ms |    N/A % |    N/A % |    N/A ms   |    N/A mus
ASIO Service                                       | Blocking          |    N/A |       N/A ms |    N/A % |    N/A % |    N/A ms   |    N/A mus
===============================================================================================================================================
|                                              End Job Statistics Report                                                                      |
===============================================================================================================================================

===============================================================================================================================================
|                                              Job Statistics Report                                                                          |
===============================================================================================================================================
Name                                               | Job Mode          |   #Run |     Avg Time |     Load |  Overrun | Avg Overrun | Exec Delay
-----------------------------------------------------------------------------------------------------------------------------------------------
global_plan/isaac.planner.GlobalPlanner            | Periodic          |     72 |     10.88 ms |   1.10 % |   0.00 % |   0.00 ms   |  -36.9 mus
scan_localization/robot_viewer                     | Periodic          |   3558 |      0.02 ms |   0.09 % |   0.00 % |   0.00 ms   |  -37.5 mus
scan_localization/flatscan_viewer                  | Event             |   1422 |      0.13 ms |   0.26 % |   0.00 % |   0.00 ms   |  102.9 mus
..tion/isaac.navigation.ParticleFilterLocalization | Event             |   1422 |     16.17 ms |  32.36 % |   0.00 % |   0.00 ms   |   66.6 mus
global_localization/grid_search_localizer          | Periodic          |     20 |      4.30 ms |   1.91 % |   0.00 % |   0.00 ms   |  -92.5 mus
replay_bridge/ReplayBridge                         | Periodic          |    763 |      0.06 ms |   0.06 % |   0.00 % |   0.00 ms   |  -30.7 mus
sim_range_scan/isaac.flatsim.SimRangeScan          | Periodic          |   1525 |      0.96 ms |   1.92 % |   0.00 % |   0.00 ms   |  -26.2 mus
recorder_bridge/RecorderBridge                     | Periodic          |   1906 |      0.04 ms |   0.09 % |   0.00 % |   0.00 ms   |  -33.4 mus
map/restricted_area                                | Periodic          |     77 |      0.08 ms |   0.01 % |   0.00 % |   0.00 ms   |  -38.7 mus
map/waypoints                                      | Periodic          |     77 |      0.10 ms |   0.01 % |   0.00 % |   0.00 ms   |  -38.4 mus
..aluation/isaac.navigation.LocalizationEvaluation | Periodic          |   1525 |      0.03 ms |   0.06 % |   0.00 % |   0.00 ms   |  -57.7 mus
FailsafeHeartBeat                                  | Periodic          |   7629 |      0.00 ms |   0.01 % |   0.00 % |   0.00 ms   |  -34.6 mus
random_walk/isaac.navigation.RandomWalk            | Periodic          |    305 |      0.01 ms |   0.00 % |   0.00 % |   0.00 ms   |  -46.8 mus
_pose_tree_bridge/PoseTreeJsonBridge               | Periodic          |   1526 |      0.09 ms |   0.18 % |   0.00 % |   0.00 ms   |  -49.8 mus
_config_bridge/ConfigBridge                        | Event             |      1 |      0.16 ms | 100.00 % |   0.00 % |   0.00 ms   |   37.0 mus
_statistics/NodeStatistics                         | Periodic          |    306 |      0.24 ms |   0.10 % |   0.00 % |   0.00 ms   |  -35.2 mus
..mulation/isaac.flatsim.DifferentialBaseSimulator | Periodic          |   7620 |      0.11 ms |   1.12 % |   0.00 % |   0.00 ms   |  -54.3 mus
..simulation/isaac.flatsim.DifferentialBasePhysics | Periodic          |   7620 |      0.08 ms |   0.81 % |   0.00 % |   0.00 ms   |  -64.1 mus
map_bridge/isaac.map.MapBridge                     | Event             |      4 |      3.31 ms |   0.46 % |   0.00 % |   0.00 ms   |   42.6 mus
goal_behavior/isaac.navigation.SelectorBehavior    | Periodic          |   1525 |      0.01 ms |   0.02 % |   0.00 % |   0.00 ms   |  -85.4 mus
go_to/isaac.navigation.GoTo                        | Periodic          |    763 |      0.09 ms |   0.09 % |   0.00 % |   0.00 ms   |  -89.5 mus
localize/isaac.navigation.LocalizeBehavior         | Periodic          |   1525 |      0.04 ms |   0.07 % |   0.00 % |   0.00 ms   |  -80.2 mus
control/isaac.planner.DifferentialBaseControl      | Periodic          |   7620 |      0.08 ms |   0.77 % |   0.00 % |   0.00 ms   |  -65.5 mus
lqr/isaac.planner.DifferentialBaseLqrPlanner       | Periodic          |    763 |     18.34 ms |  18.39 % |   0.00 % |   0.00 ms   |  -83.4 mus
..ometry/isaac.navigation.DifferentialBaseOdometry | Periodic          |   7619 |      0.12 ms |   1.24 % |   0.00 % |   0.00 ms   |  -84.4 mus
local_map/isaac.navigation.LocalMap                | Event             |   1519 |      8.91 ms |  17.79 % |   0.00 % |   0.00 ms   |   47.7 mus
..eractive_markers_bridge/InteractiveMarkersBridge | Periodic          |   1525 |      0.01 ms |   0.01 % |   0.00 % |   0.00 ms   |  -59.2 mus
===============================================================================================================================================
|                                              End Job Statistics Report                                                                      |
===============================================================================================================================================

Scheduling Interface and Policy

The scheduler follows the principles of Earliest Deadline First, but it does not require a known execution time. The desired start time is used as a proxy. Jobs are executed in order of their desired start times with priority and available slack as tie breakers, if needed.

In order to expose the ability to pin tasks to specific hardware cores, the scheduler uses the concept of execution groups. Each group represents a set of cores on which associated jobs may be executed. The scheduler operates in a greedy manner and allocates default groups to all cores not explicitly allocated by a configuration file.

The Isaac SDK requires at least one core to operate certain internal behaviors. If all cores are allocated, the scheduler still reserves one core for operations.

Execution groups are defined in gems/scheduler/execution_group_descriptor.hpp. They have three primary values:

  • std::string name: Specifies the name of the group for indexing.
  • std::vector<int> cores: Specifies a list of cores on which the group operates.
  • bool has_workers: A flag which controls whether the system spawns worker threads. If set to true then one worker thread per core in the group is spawned and pinned to those cores.

The primary entry point to the scheduler is the interface in gems/scheduler/scheduler.hpp. The following 4 functions are the primary methods used to interact with the scheduler.

std::optional<JobHandle> createJob(const JobDescriptor& descriptor)
void destroyJob(const JobHandle& handle);
void startJob(const JobHandle& handle) const;
void waitForJobDestruction(const JobHandle& handle) const;

Those functions create, destroy and start jobs. You must call waitForJobDestruction after destroying a job to ensure that is safe to free resources as the job may be executing when destroyJob is invoked.

The following convenience functions combine some of the common actions used when interacting with the scheduler.

std::optional<JobHandle> createJobAndStart(const JobDescriptor& descriptor);
void destroyJobAndWait(const JobHandle& handle);

The following functions handle event based tasks:

void registerEvents(const JobHandle& handle, const std::unordered_set<std::string>& events) const;
void unregisterEvents(const JobHandle& handle, const std::unordered_set<std::string>& events) const;
void notify(const std::string& event, int64_t target_time) const;

The following function returns statistics for a given job. See the Job Statistics for more information.

JobStatistics getJobStatistics(const JobHandle& handle) const;

These functions control the time machine functionality of the scheduler. See Time Machine and Clock Scaling for more information.

void enableTimeMachine();
void disableTimeMachine();

Time Machine and Clock Scaling

To accelerate simulation and training of algorithms, the scheduler is tied to the Isaac system clock which supports clock scaling and to a system called the time machine.

Clock scaling allows you to speed up or slow down the system clock for the Isaac SDK, and the time machine detects gaps in the schedule and removes them at run time. This can greatly reduce training times during simulation with no discernable impact on training results.

The time machine does not work with blocking jobs because it is unable to detect when those jobs are idle.

Scheduler Configuration

The scheduler exposes a few basic parameters in the application configuration file. The following sample JSON blocks demonstrates how to set these configurations. If no default configurations are specified the system will attempt to generate one manually.

"scheduler": {
   "use_time_machine": true,
   "clock_scale": 1.0,
   "execution_groups": [
     {
       "name": "MyTestWorkerGroup",
       "cores": [0,1,2,3],
       "workers": true
     },
     {
       "name": "MyTestBlockerGroup",
       "cores": [4,5,6,7],
       "workers": false
     }
   ]
 }
"scheduler": {
   "use_time_machine": true,
   "clock_scale": 1.0,
   "default_execution_group_config": [
     {
       "worker_cores": [0,1],
       "blocker_cores": [4,5]
     }
   ]
 }