morpheus.utils.cudf_subword_helper

Functions

`create_tokenizer`(vocab_hash_file, do_lower_case)	_summary_
`create_vocab_table`(vocabpath)	Create Vocabulary tables from the vocab.txt file
`get_cached_tokenizer`(vocab_hash_file, ...)	Get cached subword tokenizer.
`tokenize_text_series`(vocab_hash_file, ...)	This function tokenizes a text series using the bert subword_tokenizer and vocab-hash

Classes

Feature(input_ids, input_mask, segment_ids)

create_tokenizer(vocab_hash_file, do_lower_case)[source]

_summary_

Parameters

Returns

create_vocab_table(vocabpath)[source]

Create Vocabulary tables from the vocab.txt file

Parameters

Returns

get_cached_tokenizer(vocab_hash_file, do_lower_case)[source]

Get cached subword tokenizer. Creates tokenizer and caches it if it does not already exist.

Parameters

Returns

tokenize_text_series(vocab_hash_file, do_lower_case, text_ser, seq_len, stride, truncation, add_special_tokens)[source]

This function tokenizes a text series using the bert subword_tokenizer and vocab-hash

Parameters

Returns