core.inference.text_generation_server.tokenization#

Tokenization utilities.

Module Contents#

Functions#

tokenize_prompts

Tokenize prompts and make them avaiable on all ranks.

_tokenize_prompts_and_batch

Given a set of prompts and number of tokens to generate:

API#

core.inference.text_generation_server.tokenization.tokenize_prompts(
tokenizer,
prompts=None,
tokens_to_generate=None,
add_BOS=None,
rank=0,
data_parallel=False,
)#

Tokenize prompts and make them avaiable on all ranks.

Parameters:

data_parallel (bool) – Broadcast tokens across a single data parallel model replica.

core.inference.text_generation_server.tokenization._tokenize_prompts_and_batch(
tokenizer,
prompts,
tokens_to_generate,
add_BOS,
)#

Given a set of prompts and number of tokens to generate:

  • tokenize prompts

  • set the sequence length to be the max of length of prompts plus the number of tokens we would like to generate

  • pad all the sequences to this length so we can convert them into a 2D tensor.