bridge.models.qwen3_asr.hf_qwen3_asr.processing_qwen3_asr#
Module Contents#
Classes#
Constructs a Qwen3ASR processor.
[ |
Functions#
Computes the output length of the convolutional layers and the output length of the audio encoder |
Data#
API#
- class bridge.models.qwen3_asr.hf_qwen3_asr.processing_qwen3_asr.Qwen3ASRProcessorKwargs#
Bases:
transformers.processing_utils.ProcessingKwargs- _defaults#
None
- bridge.models.qwen3_asr.hf_qwen3_asr.processing_qwen3_asr._get_feat_extract_output_lengths(input_lengths)#
Computes the output length of the convolutional layers and the output length of the audio encoder
- class bridge.models.qwen3_asr.hf_qwen3_asr.processing_qwen3_asr.Qwen3ASRProcessor(
- feature_extractor=None,
- tokenizer=None,
- chat_template=None,
Bases:
transformers.processing_utils.ProcessorMixinConstructs a Qwen3ASR processor. [
Qwen3ASRProcessor] offers all the functionalities of [WhisperFeatureExtractor], and [Qwen2TokenizerFast]. See the [~Qwen3ASRProcessor.__call__] and [~Qwen3ASRProcessor.decode] for more information.- Parameters:
feature_extractor ([
WhisperFeatureExtractor], optional) – The audio feature extractor.tokenizer ([
Qwen2TokenizerFast], optional) – The text tokenizer.chat_template (
Optional[str], optional) – The Jinja template to use for formatting the conversation. If not provided, the default chat template is used.
Initialization
- attributes#
[‘feature_extractor’, ‘tokenizer’]
- feature_extractor_class#
‘WhisperFeatureExtractor’
- tokenizer_class#
(‘Qwen2Tokenizer’, ‘Qwen2TokenizerFast’)
- __call__(
- text: transformers.tokenization_utils_base.TextInput = None,
- audio: transformers.audio_utils.AudioInput = None,
- **kwargs,
Main method to prepare for the model one or several sequences(s) and audio(s). This method forwards the
textandkwargsarguments to Qwen2TokenizerFast’s [~Qwen2TokenizerFast.__call__] iftextis notNoneto encode the text. To prepare the audio(s), this method forwards theaudioandkwargsarguments to WhisperFeatureExtractor’s [~WhisperFeatureExtractor.__call__] ifaudiois notNone. Please refer to the doctsring of the above two methods for more information.- Parameters:
text (
str,List[str],List[List[str]]) – The sequence or batch of sequences to be encoded. Each sequence can be a string or a list of strings (pretokenized string). If the sequences are provided as list of strings (pretokenized), you must setis_split_into_words=True(to lift the ambiguity with a batch of sequences).audio (
np.ndarray,List[np.ndarray]) – The audio or batch of audio to be prepared. Each audio can be a NumPy array.
- replace_multimodal_special_tokens(text, audio_lengths)#
- get_chunked_index(
- token_indices: numpy.ndarray,
- tokens_per_chunk: int,
Splits token index list into chunks based on token value ranges.
Given a list of token indices, returns a list of (start, end) index tuples representing slices of the list where the token values fall within successive ranges of
t_ntoken_per_chunk.For example, if
t_ntoken_per_chunkis 1000, the function will create chunks such that:the first chunk contains token values < 1000,
the second chunk contains values >= 1000 and < 2000, and so on.
- Parameters:
token_indices (
np.ndarray) – A monotonically increasing list of token index values.t_ntoken_per_chunk (
int) – Number of tokens per chunk (used as the chunk size threshold).
- Returns:
A list of tuples, each representing the start (inclusive) and end (exclusive) indices of a chunk in
token_indices.- Return type:
list[tuple[int, int]]
- apply_chat_template(conversations, chat_template=None, **kwargs)#
- property model_input_names#
- bridge.models.qwen3_asr.hf_qwen3_asr.processing_qwen3_asr.__all__#
[‘Qwen3ASRProcessor’]