`nemo_eval.adapters.server`#

Main server holding entry-point to all the interceptors.

The arch is as follows:

     ┌───────────────────────┐
     │                       │
     │ NVIDIA Eval Factory   │
     │                       │
     └───▲──────┬────────────┘
         │      │
 returns │      │
         │      │ calls
         │      │
         │      │
     ┌───┼──────┼──────────────────────────────────────────────────┐
     │   │      ▼                                                  │
     │ AdapterServer (@ localhost:<free port>)                     │
     │                                                             │
     │   ▲      │       chain of RequestInterceptors:              │
     │   │      │       flask.Request                              │
     │   │      │       is passed on the way up                    │
     │   │      │                                                  │   ┌──────────────────────┐
     │   │ ┌────▼───────────────────────────────────────────────┐  │   │                      │
     │   │ │intcptr_1─────►intcptr_2───►...───►intcptr_N────────┼──┼───►                      │
     │   │ │                     │                              │  │   │                      │
     │   │ └─────────────────────┼──────────────────────────────┘  │   │                      │
     │   │                       │(e.g. for caching interceptors,  │   │  upstream endpoint   │
     │   │                       │ this "shortcut" will happen)    │   │   with actual model  │
     │   │                       │                                 │   │                      │
     │   │                       └─────────────┐                   │   │                      │
     │   │                                     │                   │   │                      │
     │ ┌─┼─────────────────────────────────────▼────┐              │   │                      │
     │ │intcptr'_M◄──intcptr'_2◄──...◄───intcptr'_1 ◄──────────────┼───┤                      │
     │ └────────────────────────────────────────────┘              │   └──────────────────────┘
     │                                                             │
     │              Chain of ResponseInterceptors:                 │
     │              requests.Response is passed on the way down    │
     │                                                             │
     │                                                             │
     └─────────────────────────────────────────────────────────────┘

In other words, interceptors are pieces of independent logic which should be relatively easy to add separately.

Module Contents#

Classes#

_AdapterServer

Main server which serves on a local port and holds chain of interceptors

Functions#

create_server_process

Create and start a server process, returning the process and the config.

API#

nemo_eval.adapters.server.create_server_process( adapter_config: nemo_eval.api.AdapterConfig, ) → Tuple[multiprocessing.Process, nemo_eval.api.AdapterConfig][source]#

Create and start a server process, returning the process and the config.

This makes sure that the factory function is not needing any complex serialization for multiprocessing.

Parameters:

api_url – The API URL the adapter will call
adapter_config – Configuration for the adapter server

Returns:

Tuple of (process, adapter_config) where process is the running server process, and adapter_config is the configuration with port filled in.

class nemo_eval.adapters.server._AdapterServer(adapter_config: nemo_eval.api.AdapterConfig)[source]#

Main server which serves on a local port and holds chain of interceptors

Initialization

Initializes the app, creates server and adds interceptors

Parameters:: adapter_config – should be obtained from the evaluation script, see api.py

adapter_host: str#: None

adapter_port: int#: None

request_interceptors: list[nemo_eval.adapters.interceptors.RequestInterceptor]#: None

response_interceptors: list[nemo_eval.adapters.interceptors.ResponseInterceptor]#: None

app: flask.Flask#: None

api_url: str#: None

_build_interceptor_chains( use_reasoning: bool, end_reasoning_token: str, custom_system_prompt: Optional[str], max_logged_requests: Optional[int], max_logged_responses: Optional[int], )[source]#

run() → None[source]#: Start the Flask server.

_EXCLUDED_HEADERS#: [‘content-encoding’, ‘content-length’, ‘transfer-encoding’, ‘connection’]

classmethod _process_response_headers( response: requests.Response, ) → list[tuple[str, str]][source]#: Process response headers, removing excluded ones.

_handler(path: str) → flask.Response[source]#

_log_response_interceptor_error( interceptor: nemo_eval.adapters.interceptors.ResponseInterceptor, adapter_response: nemo_eval.adapters.interceptors.AdapterResponse, error: Exception, ) → None[source]#

nemo_eval.adapters.server#

Module Contents#

Classes#

Functions#

API#

`nemo_eval.adapters.server`#