Runtime#
- group runtime
- Enums - 
enum class ExceptionMode : std::uint8_t#
- Enum for exception handling modes. - Values: - 
enumerator IMMEDIATE#
- Handles exceptions immediately. Any throwable task blocks until completion. 
 - 
enumerator DEFERRED#
- Defers all exceptions until the current scope exits. 
 - 
enumerator IGNORED#
- All exceptions are ignored. 
 
- 
enumerator IMMEDIATE#
 - Functions - 
std::int32_t start(std::int32_t argc, char **argv)#
- Starts the Legate runtime. - This makes the runtime ready to accept requests made via its APIs - Parameters:
- argc – Number of command-line flags 
- argv – Command-line flags 
 
- Returns:
- Non-zero value when the runtime start-up failed, 0 otherwise 
 
 - 
bool has_started()#
- Checks if the runtime has started. - Returns:
- trueif the runtime has started,- falseif the runtime has not started yet or after- finish()is called.
 
 - 
bool has_finished()#
- Checks if the runtime has finished. - Returns:
- trueif- finish()has been called,- falseotherwise.
 
 - 
std::int32_t finish()#
- Waits for the runtime to finish. - The client code must call this to make sure all Legate tasks run - Returns:
- Non-zero value when the runtime encountered a failure, 0 otherwise 
 
 - 
template<typename T>
 void register_shutdown_callback(T &&callback)#
- Registers a callback that should be invoked during the runtime shutdown. - Any callbacks will be invoked before the core library and the runtime are destroyed. All callbacks must be non-throwable. Multiple registrations of the same callback are not deduplicated, and thus clients are responsible for registering their callbacks only once if they are meant to be invoked as such. Callbacks are invoked in the FIFO order, and thus any callbacks that are registered by another callback will be added to the end of the list of callbacks. Callbacks can launch tasks and the runtime will make sure of their completion before initializing its shutdown. - Parameters:
- callback – A shutdown callback 
 
 - 
mapping::Machine get_machine()#
- Returns the machine for the current scope. - Returns:
- Machine object 
 
 - 
bool is_running_in_task()#
- Checks if the code is running in a task. - Returns:
- true If the code is running in a task 
- Returns:
- false If the code is not running in a task 
 
 - 
class Library#
- #include <legate/runtime/library.h>A library class that provides APIs for registering components. Public Functions - 
std::string_view get_task_name(LocalTaskID local_task_id) const#
- Returns the name of a task. - Parameters:
- local_task_id – Task id 
- Returns:
- Name of the task 
 
 - 
template<typename REDOP>
 GlobalRedopID register_reduction_operator(
- LocalRedopID redop_id,
- Registers a library specific reduction operator. - The type parameter - REDOPpoints to a class that implements a reduction operator. Each reduction operator class has the following structure:- struct RedOp { using LHS = ...; // Type of the LHS values using RHS = ...; // Type of the RHS values static const RHS identity = ...; // Identity of the reduction operator template <bool EXCLUSIVE> LEGATE_HOST_DEVICE inline static void apply(LHS& lhs, RHS rhs) { ... } template <bool EXCLUSIVE> LEGATE_HOST_DEVICE inline static void fold(RHS& rhs1, RHS rhs2) { ... } }; - Semantically, Legate performs reductions of values - V0, …,- Vnto element- Ein the following way:I.e., Legate gathers all reduction contributions using- RHS T = RedOp::identity; RedOp::fold(T, V0) ... RedOp::fold(T, Vn) RedOp::apply(E, T) - foldand applies the accumulator to the element using- apply.- Oftentimes, the LHS and RHS of a reduction operator are the same type and - foldand- applyperform the same computation, but that’s not mandatory. For example, one may implement a reduction operator for subtraction, where the- foldwould sum up all RHS values whereas the- applywould subtract the aggregate value from the LHS.- The reduction operator id ( - REDOP_ID) can be local to the library but should be unique for each opeartor within the library.- Finally, the contract for - applyand- foldis that they must update the reference atomically when the- EXCLUSIVEis- false.- Warning - Because the runtime can capture the reduction operator and wrap it with CUDA boilerplates only at compile time, the registration call should be made in a .cu file that would be compiled by NVCC. Otherwise, the runtime would register the reduction operator in CPU-only mode, which can degrade the performance when the program performs reductions on non-scalar stores. - Template Parameters:
- REDOP – Reduction operator to register 
- Parameters:
- redop_id – Library-local reduction operator ID 
- Returns:
- Global reduction operator ID 
 
 
 
- 
std::string_view get_task_name(LocalTaskID local_task_id) const#
 - 
struct ResourceConfig#
- #include <legate/runtime/resource.h>POD for library configuration. Public Members - 
std::int64_t max_tasks = {1024}#
- Maximum number of tasks that the library can register. 
 - 
std::int64_t max_dyn_tasks = {0}#
- Maximum number of dynamic tasks that the library can register (cannot exceed max_tasks) 
 - 
std::int64_t max_reduction_ops = {}#
- Maximum number of custom reduction operators that the library can register. 
 
- 
std::int64_t max_tasks = {1024}#
 - 
class Runtime#
- #include <legate/runtime/runtime.h>Class that implements the Legate runtime. The legate runtime provides common services, including as library registration, store creation, operator creation and submission, resource management and scoping, and communicator management. Legate libraries are free of all these details about distribute programming and can focus on their domain logics. Public Functions - Library create_library(
- std::string_view library_name,
- const ResourceConfig &config = ResourceConfig{},
- std::unique_ptr<mapping::Mapper> mapper = nullptr,
- std::map<VariantCode, VariantOptions> default_options = {},
- Creates a library. - A library is a collection of tasks and custom reduction operators. The maximum number of tasks and reduction operators can be optionally specified with a - ResourceConfigobject. Each library can optionally have a mapper that specifies mapping policies for its tasks. When no mapper is given, the default mapper is used.
 
 - std::optional<Library> maybe_find_library(
- std::string_view library_name,
- Attempts to find a library. - If no library exists for a given name, a null value will be returned 
 
 - Library find_or_create_library(
- std::string_view library_name,
- const ResourceConfig &config = ResourceConfig{},
- std::unique_ptr<mapping::Mapper> mapper = nullptr,
- const std::map<VariantCode, VariantOptions> &default_options = {},
- bool *created = nullptr,
- Finds or creates a library. - The optional configuration and mapper objects are picked up only when the library is created. - Parameters:
- library_name – Library name. Must be unique to this library 
- config – Optional configuration object 
- mapper – Optional mapper object 
- default_options – Optional default task variant options 
- created – Optional pointer to a boolean flag indicating whether the library has been created because of this call 
 
- Returns:
- Context object for the library 
 
 
 - 
AutoTask create_task(Library library, LocalTaskID task_id)#
- Creates an AutoTask. - Parameters:
- library – Library to query the task 
- task_id – Library-local Task ID 
 
- Returns:
- Task object 
 
 - ManualTask create_task(
- Library library,
- LocalTaskID task_id,
- const tuple<std::uint64_t> &launch_shape,
- Creates a ManualTask. - Parameters:
- library – Library to query the task 
- task_id – Library-local Task ID 
- launch_shape – Launch domain for the task 
 
- Returns:
- Task object 
 
 
 - ManualTask create_task( )#
- Creates a ManualTask. - This overload should be used when the lower bounds of the task’s launch domain should be non-zero. Note that the upper bounds of the launch domain are inclusive (whereas the - launch_shapein the other overload is exlusive).- Parameters:
- library – Library to query the task 
- task_id – Library-local Task ID 
- launch_domain – Launch domain for the task 
 
- Returns:
- Task object 
 
 
 - void issue_copy(
- LogicalStore &target,
- const LogicalStore &source,
- std::optional<ReductionOpKind> redop_kind = std::nullopt,
- Issues a copy between stores. - The source and target stores must have the same shape. - Parameters:
- target – Copy target 
- source – Copy source 
- redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator. 
 
- Throws:
- std::invalid_argument – If the store’s type doesn’t support the reduction operator 
 
 
 - void issue_copy(
- LogicalStore &target,
- const LogicalStore &source,
- std::optional<std::int32_t> redop_kind,
- Issues a copy between stores. - The source and target stores must have the same shape. - Parameters:
- target – Copy target 
- source – Copy source 
- redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator. 
 
- Throws:
- std::invalid_argument – If the store’s type doesn’t support the reduction operator 
 
 
 - void issue_gather(
- LogicalStore &target,
- const LogicalStore &source,
- const LogicalStore &source_indirect,
- std::optional<ReductionOpKind> redop_kind = std::nullopt,
- Issues a gather copy between stores. - The indirection store and the target store must have the same shape. - Parameters:
- target – Copy target 
- source – Copy source 
- source_indirect – Store for source indirection 
- redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator. 
 
- Throws:
- std::invalid_argument – If the store’s type doesn’t support the reduction operator 
 
 
 - void issue_gather(
- LogicalStore &target,
- const LogicalStore &source,
- const LogicalStore &source_indirect,
- std::optional<std::int32_t> redop_kind,
- Issues a gather copy between stores. - The indirection store and the target store must have the same shape. - Parameters:
- target – Copy target 
- source – Copy source 
- source_indirect – Store for source indirection 
- redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator. 
 
- Throws:
- std::invalid_argument – If the store’s type doesn’t support the reduction operator 
 
 
 - void issue_scatter(
- LogicalStore &target,
- const LogicalStore &target_indirect,
- const LogicalStore &source,
- std::optional<ReductionOpKind> redop_kind = std::nullopt,
- Issues a scatter copy between stores. - The indirection store and the source store must have the same shape. - Parameters:
- target – Copy target 
- target_indirect – Store for target indirection 
- source – Copy source 
- redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator. 
 
- Throws:
- std::invalid_argument – If the store’s type doesn’t support the reduction operator 
 
 
 - void issue_scatter(
- LogicalStore &target,
- const LogicalStore &target_indirect,
- const LogicalStore &source,
- std::optional<std::int32_t> redop_kind,
- Issues a scatter copy between stores. - The indirection store and the source store must have the same shape. - Parameters:
- target – Copy target 
- target_indirect – Store for target indirection 
- source – Copy source 
- redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator. 
 
- Throws:
- std::invalid_argument – If the store’s type doesn’t support the reduction operator 
 
 
 - void issue_scatter_gather(
- LogicalStore &target,
- const LogicalStore &target_indirect,
- const LogicalStore &source,
- const LogicalStore &source_indirect,
- std::optional<ReductionOpKind> redop_kind = std::nullopt,
- Issues a scatter-gather copy between stores. - The indirection stores must have the same shape. - Parameters:
- target – Copy target 
- target_indirect – Store for target indirection 
- source – Copy source 
- source_indirect – Store for source indirection 
- redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator. 
 
- Throws:
- std::invalid_argument – If the store’s type doesn’t support the reduction operator 
 
 
 - void issue_scatter_gather(
- LogicalStore &target,
- const LogicalStore &target_indirect,
- const LogicalStore &source,
- const LogicalStore &source_indirect,
- std::optional<std::int32_t> redop_kind,
- Issues a scatter-gather copy between stores. - The indirection stores must have the same shape. - Parameters:
- target – Copy target 
- target_indirect – Store for target indirection 
- source – Copy source 
- source_indirect – Store for source indirection 
- redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator. 
 
- Throws:
- std::invalid_argument – If the store’s type doesn’t support the reduction operator 
 
 
 - 
void issue_fill(const LogicalArray &lhs, const LogicalStore &value)#
- Fills a given array with a constant. - Parameters:
- lhs – Logical array to fill 
- value – Logical store that contains the constant value to fill the array with 
 
 
 - 
void issue_fill(const LogicalArray &lhs, const Scalar &value)#
- Fills a given array with a constant. - Parameters:
- lhs – Logical array to fill 
- value – Value to fill the array with 
 
 
 - LogicalStore tree_reduce(
- Library library,
- LocalTaskID task_id,
- const LogicalStore &store,
- std::int32_t radix = 4,
- Performs reduction on a given store via a task. - Parameters:
- library – The library for the reducer task 
- task_id – reduction task ID 
- store – Logical store to reduce 
- radix – Optional radix value that determines the maximum number of input stores to the task at each reduction step 
 
 
 
 - 
void submit(AutoTask &&task)#
- Submits an AutoTask for execution. - Each submitted operation goes through multiple pipeline steps to eventually get scheduled for execution. It’s not guaranteed that the submitted operation starts executing immediately. - The runtime takes the ownership of the submitted task. Once submitted, the task becomes invalid and is not reusable. - Parameters:
- task – An AutoTask to execute 
 
 - 
void submit(ManualTask &&task)#
- Submits a ManualTask for execution. - Each submitted operation goes through multiple pipeline steps to eventually get scheduled for execution. It’s not guaranteed that the submitted operation starts executing immediately. - The runtime takes the ownership of the submitted task. Once submitted, the task becomes invalid and is not reusable. - Parameters:
- task – A ManualTask to execute 
 
 - LogicalArray create_array(
- const Type &type,
- std::uint32_t dim = 1,
- bool nullable = false,
- Creates an unbound array. - Parameters:
- type – Element type 
- dim – Number of dimensions 
- nullable – Nullability of the array 
 
- Returns:
- Logical array 
 
 
 - LogicalArray create_array( )#
- Creates a normal array. - Parameters:
- shape – Shape of the array. The call does not block on this shape 
- type – Element type 
- nullable – Nullability of the array 
- optimize_scalar – When true, the runtime internally uses futures optimized for storing scalars 
 
- Returns:
- Logical array 
 
 
 - LogicalArray create_array_like(
- const LogicalArray &to_mirror,
- std::optional<Type> type = std::nullopt,
- Creates an array isomorphic to the given array. - Parameters:
- to_mirror – The array whose shape would be used to create the output array. The call does not block on the array’s shape. 
- type – Optional type for the resulting array. Must be compatible with the input array’s type 
 
- Returns:
- Logical array isomorphic to the input 
 
 
 - StringLogicalArray create_string_array(
- const LogicalArray &descriptor,
- const LogicalArray &vardata,
- Creates a string array from the existing sub-arrays. - The caller is responsible for making sure that the vardata sub-array is valid for all the descriptors in the descriptor sub-array - Parameters:
- descriptor – Sub-array for descriptors 
- vardata – Sub-array for characters 
 
- Throws:
- std::invalid_argument – When any of the following is true: 1) - descriptoror- vardatais unbound or N-D where N > 1 2)- descriptordoes not have a 1D rect type 3)- vardatais nullable 4)- vardatadoes not have an int8 type
- Returns:
- String logical array 
 
 
 - ListLogicalArray create_list_array(
- const LogicalArray &descriptor,
- const LogicalArray &vardata,
- std::optional<Type> type = std::nullopt,
- Creates a list array from the existing sub-arrays. - The caller is responsible for making sure that the vardata sub-array is valid for all the descriptors in the descriptor sub-array - Parameters:
- descriptor – Sub-array for descriptors 
- vardata – Sub-array for vardata 
- type – Optional list type the returned array would have 
 
- Throws:
- std::invalid_argument – When any of the following is true: 1) - typeis not a list type 2)- descriptoror- vardatais unbound or N-D where N > 1 3)- descriptordoes not have a 1D rect type 4)- vardatais nullable 5)- vardataand- typehave different element types
- Returns:
- List logical array 
 
 
 - 
LogicalStore create_store(const Type &type, std::uint32_t dim = 1)#
- Creates an unbound store. - Parameters:
- type – Element type 
- dim – Number of dimensions of the store 
 
- Returns:
- Logical store 
 
 - LogicalStore create_store( )#
- Creates a normal store. - Parameters:
- shape – Shape of the store. The call does not block on this shape. 
- type – Element type 
- optimize_scalar – When true, the runtime internally uses futures optimized for storing scalars 
 
- Returns:
- Logical store 
 
 
 - LogicalStore create_store( )#
- Creates a normal store out of a - Scalarobject.- Parameters:
- scalar – Value of the scalar to create a store with 
- shape – Shape of the store. The volume must be 1. The call does not block on this shape. 
 
- Returns:
- Logical store 
 
 
 - LogicalStore create_store(
- const Shape &shape,
- const Type &type,
- void *buffer,
- bool read_only = true,
- const mapping::DimOrdering &ordering = mapping::DimOrdering::c_order(),
- Creates a store by attaching to an existing allocation. - This call does not block wait on the input shape - Parameters:
- shape – Shape of the store. The call does not block on this shape. 
- type – Element type 
- buffer – Pointer to the beginning of the allocation to attach to; allocation must be contiguous, and cover the entire contents of the store (at least - extents.volume() * type.size()bytes)
- read_only – Whether the allocation is read-only 
- ordering – In what order the elements are laid out in the passed buffer 
 
- Returns:
- Logical store 
 
 
 - LogicalStore create_store(
- const Shape &shape,
- const Type &type,
- const ExternalAllocation &allocation,
- const mapping::DimOrdering &ordering = mapping::DimOrdering::c_order(),
- Creates a store by attaching to an existing allocation. - Parameters:
- shape – Shape of the store. The call does not block on this shape. 
- type – Element type 
- allocation – External allocation descriptor 
- ordering – In what order the elements are laid out in the passed allocation 
 
- Returns:
- Logical store 
 
 
 - std::pair<LogicalStore, LogicalStorePartition> create_store(
- const Shape &shape,
- const tuple<std::uint64_t> &tile_shape,
- const Type &type,
- const std::vector<std::pair<ExternalAllocation, tuple<std::uint64_t>>> &allocations,
- const mapping::DimOrdering &ordering = mapping::DimOrdering::c_order(),
- Creates a store by attaching to multiple existing allocations. - External allocations must be read-only. - Parameters:
- Throws:
- std::invalid_argument – If any of the external allocations are not read-only 
- Returns:
- A pair of a logical store and its partition 
 
 
 - void prefetch_bloated_instances(
- const LogicalStore &store,
- tuple<std::uint64_t> low_offsets,
- tuple<std::uint64_t> high_offsets,
- bool initialize = false,
- Gives the runtime a hint that the store can benefit from bloated instances. - The runtime currently does not look ahead in the task stream to recognize that a given set of tasks can benefit from the ahead-of-time creation of “bloated” instances encompassing multiple slices of a store. This means that the runtime will construct bloated instances incrementally and completely only when it sees all the slices, resulting in intermediate instances that (temporarily) increases the memory footprint. This function can be used to give the runtime a hint ahead of time about the bloated instances, which would be reused by the downstream tasks without going through the same incremental process. - For example, let’s say we have a 1-D store A of size 10 and we want to partition A across two GPUs. By default, A would be partitioned equally and each GPU gets an instance of size 5. Suppose we now have a task that aligns two slices A[1:10] and A[:9]. The runtime would partition the slices such that the task running on the first GPU gets A[1:6] and A[:5], and the task running on the second GPU gets A[6:] and A[5:9]. Since the original instance on the first GPU does not cover the element A[5] included in the first slice A[1:6], the mapper needs to create a new instance for A[:6] that encompasses both of the slices, leading to an extra copy. In this case, if the code calls - prefetch(A, {0}, {1})to pre-alloate instances that contain one extra element on the right before it uses A, the extra copy can be avoided.- A couple of notes about the API: - Unless - initializeis- true, the runtime assumes that the store has been initialized. Passing an uninitialized store would lead to a runtime error.
- If the store has pre-existing instances, the runtime may combine those with the bloated instances if such combination is deemed desirable. 
 - Note - This API is experimental - Parameters:
- store – Store to create bloated instances for 
- low_offsets – Offsets to bloat towards the negative direction 
- high_offsets – Offsets to bloat towards the positive direction 
- initialize – If - true, the runtime will issue a fill on the store to initialize it. The default value is- false
 
 
 
 - 
void issue_mapping_fence()#
- Issues a mapping fence. - A mapping fence, when issued, blocks mapping of all downstream operations before those preceding the fence get mapped. An - issue_mapping_fencecall returns immediately after the request is submitted to the runtime, and the fence asynchronously goes through the runtime analysis pipeline just like any other Legate operations. The call also flushes the scheduling window for batched execution.- Mapping fences only affect how the operations are mapped and do not change their execution order, so they are semantically no-op. Nevertheless, they are sometimes useful when the user wants to control how the resource is consumed by independent tasks. Consider a program with two independent tasks A and B, both of which discard their stores right after their execution. If the stores are too big to be allocated all at once, mapping A and B in parallel (which can happen because A and B are independent and thus nothing stops them from getting mapped concurrently) can lead to a failure. If a mapping fence exists between the two, the runtime serializes their mapping and can reclaim the memory space from stores that would be discarded after A’s execution to create allocations for B. 
 - 
void issue_execution_fence(bool block = false)#
- Issues an execution fence. - An execution fence is a join point in the task graph. All operations prior to a fence must finish before any of the subsequent operations start. - All execution fences are mapping fences by definition; i.e., an execution fence not only prevents the downstream operations from being mapped ahead of itself but also precedes their execution. - Parameters:
- block – When - true, the control code blocks on the fence and all operations that have been submitted prior to this fence.
 
 - 
void raise_pending_exception()#
- Raises a pending exception. - When the exception mode of a scope is “deferred” (i.e., Scope::exception_mode() == ExceptionMode::DEFERRED), the exceptions from tasks in the scope are not immediately handled, but are pushed to the pending exception queue. Accumulated pending exceptions are not flushed until raise_pending_exception is invoked. The function throws the first exception in the pending exception queue and clears the queue. If there is no pending exception to be raised, the function does nothing. - Throws:
- legate::TaskException – When there is a pending exception to raise 
 
 - 
std::uint32_t node_count() const#
- Returns the total number of nodes. - Returns:
- Total number of nodes 
 
 - 
std::uint32_t node_id() const#
- Returns the current rank. - Returns:
- Rank ID 
 
 
 
- 
enum class ExceptionMode : std::uint8_t#