`core.inference.text_generation_server.tokenization`#

Tokenization utilities.

Module Contents#

Functions#

`tokenize_prompts`	Tokenize prompts and make them avaiable on all ranks.
`_tokenize_prompts_and_batch`	Given a set of prompts and number of tokens to generate:

API#

core.inference.text_generation_server.tokenization.tokenize_prompts( tokenizer, prompts=None, tokens_to_generate=None, add_BOS=None, rank=0, data_parallel=False, )#

Tokenize prompts and make them avaiable on all ranks.

Parameters:: data_parallel (bool) – Broadcast tokens across a single data parallel model replica.

core.inference.text_generation_server.tokenization._tokenize_prompts_and_batch( tokenizer, prompts, tokens_to_generate, add_BOS, )#

Given a set of prompts and number of tokens to generate:

tokenize prompts
set the sequence length to be the max of length of prompts plus the number of tokens we would like to generate
pad all the sequences to this length so we can convert them into a 2D tensor.