Converter#
Added in version 2.24.
- class Converter(*args, **kwargs)#
Implementations: CharsetConverter, ZlibCompressor, ZlibDecompressor
GConverter is an interface for streaming conversions.
GConverter is implemented by objects that convert
binary data in various ways. The conversion can be
stateful and may fail at any place.
Some example conversions are: character set conversion, compression, decompression and regular expression replace.
Methods#
- class Converter
- convert(inbuf: list[int], outbuf: list[int], flags: ConverterFlags) tuple[ConverterResult, int, int]#
This is the main operation used when converting data. It is to be called multiple times in a loop, and each time it will do some work, i.e. producing some output (in
outbuf) or consuming some input (frominbuf) or both. If its not possible to do any work an error is returned.Note that a single call may not consume all input (or any input at all). Also a call may produce output even if given no input, due to state stored in the converter producing output.
If any data was either produced or consumed, and then an error happens, then only the successful conversion is reported and the error is returned on the next call.
A full conversion loop involves calling this method repeatedly, each time giving it new input and space output space. When there is no more input data after the data in
inbuf, the flagINPUT_AT_ENDmust be set. The loop will be (unless some error happens) returningCONVERTEDeach time until all data is consumed and all output is produced, thenFINISHEDis returned instead. Note, thatFINISHEDmay be returned even ifINPUT_AT_ENDis not set, for instance in a decompression converter where the end of data is detectable from the data (and there might even be other data after the end of the compressed data).When some data has successfully been converted
bytes_readand is set to the number of bytes read frominbuf, andbytes_writtenis set to indicate how many bytes was written tooutbuf. If there are more data to output or consume (i.e. unless theINPUT_AT_ENDis specified) thenCONVERTEDis returned, and if no more data is to be output thenFINISHEDis returned.On error
ERRORis returned anderroris set accordingly. Some errors need special handling:NO_SPACEis returned if there is not enough space to write the resulting converted data, the application should call the function again with a largeroutbufto continue.PARTIAL_INPUTis returned if there is not enough input to fully determine what the conversion should produce, and theINPUT_AT_ENDflag is not set. This happens for example with an incomplete multibyte sequence when converting text, or when a regexp matches up to the end of the input (and may match further input). It may also happen wheninbuf_sizeis zero and there is no more data to produce.When this happens the application should read more input and then call the function again. If further input shows that there is no more data call the function again with the same data but with the
INPUT_AT_ENDflag set. This may cause the conversion to finish as e.g. in the regexp match case (or, to fail again withPARTIAL_INPUTin e.g. a charset conversion where the input is actually partial).After
convert()has returnedFINISHEDthe converter object is in an invalid state where its not allowed to callconvert()anymore. At this time you can only free the object or callreset()to reset it to the initial state.If the flag
FLUSHis set then conversion is modified to try to write out all internal state to the output. The application has to call the function multiple times with the flag set, and when the available input has been consumed and all internal state has been produced thenFLUSHED(orFINISHEDif really at the end) is returned instead ofCONVERTED. This is somewhat similar to what happens at the end of the input stream, but done in the middle of the data.This has different meanings for different conversions. For instance in a compression converter it would mean that we flush all the compression state into output such that if you uncompress the compressed data you get back all the input data. Doing this may make the final file larger due to padding though. Another example is a regexp conversion, where if you at the end of the flushed data have a match, but there is also a potential longer match. In the non-flushed case we would ask for more input, but when flushing we treat this as the end of input and do the match.
Flushing is not always possible (like if a charset converter flushes at a partial multibyte sequence). Converters are supposed to try to produce as much output as possible and then return an error (typically
PARTIAL_INPUT).Added in version 2.24.
- Parameters:
inbuf – the buffer containing the data to convert.
outbuf – a buffer to write converted data in.
flags – a
ConverterFlagscontrolling the conversion details
Virtual Methods#
- class Converter
- do_convert(inbuf: list[int] | None, outbuf: list[int], flags: ConverterFlags) tuple[ConverterResult, int, int]#
This is the main operation used when converting data. It is to be called multiple times in a loop, and each time it will do some work, i.e. producing some output (in
outbuf) or consuming some input (frominbuf) or both. If its not possible to do any work an error is returned.Note that a single call may not consume all input (or any input at all). Also a call may produce output even if given no input, due to state stored in the converter producing output.
If any data was either produced or consumed, and then an error happens, then only the successful conversion is reported and the error is returned on the next call.
A full conversion loop involves calling this method repeatedly, each time giving it new input and space output space. When there is no more input data after the data in
inbuf, the flagINPUT_AT_ENDmust be set. The loop will be (unless some error happens) returningCONVERTEDeach time until all data is consumed and all output is produced, thenFINISHEDis returned instead. Note, thatFINISHEDmay be returned even ifINPUT_AT_ENDis not set, for instance in a decompression converter where the end of data is detectable from the data (and there might even be other data after the end of the compressed data).When some data has successfully been converted
bytes_readand is set to the number of bytes read frominbuf, andbytes_writtenis set to indicate how many bytes was written tooutbuf. If there are more data to output or consume (i.e. unless theINPUT_AT_ENDis specified) thenCONVERTEDis returned, and if no more data is to be output thenFINISHEDis returned.On error
ERRORis returned anderroris set accordingly. Some errors need special handling:NO_SPACEis returned if there is not enough space to write the resulting converted data, the application should call the function again with a largeroutbufto continue.PARTIAL_INPUTis returned if there is not enough input to fully determine what the conversion should produce, and theINPUT_AT_ENDflag is not set. This happens for example with an incomplete multibyte sequence when converting text, or when a regexp matches up to the end of the input (and may match further input). It may also happen wheninbuf_sizeis zero and there is no more data to produce.When this happens the application should read more input and then call the function again. If further input shows that there is no more data call the function again with the same data but with the
INPUT_AT_ENDflag set. This may cause the conversion to finish as e.g. in the regexp match case (or, to fail again withPARTIAL_INPUTin e.g. a charset conversion where the input is actually partial).After
convert()has returnedFINISHEDthe converter object is in an invalid state where its not allowed to callconvert()anymore. At this time you can only free the object or callreset()to reset it to the initial state.If the flag
FLUSHis set then conversion is modified to try to write out all internal state to the output. The application has to call the function multiple times with the flag set, and when the available input has been consumed and all internal state has been produced thenFLUSHED(orFINISHEDif really at the end) is returned instead ofCONVERTED. This is somewhat similar to what happens at the end of the input stream, but done in the middle of the data.This has different meanings for different conversions. For instance in a compression converter it would mean that we flush all the compression state into output such that if you uncompress the compressed data you get back all the input data. Doing this may make the final file larger due to padding though. Another example is a regexp conversion, where if you at the end of the flushed data have a match, but there is also a potential longer match. In the non-flushed case we would ask for more input, but when flushing we treat this as the end of input and do the match.
Flushing is not always possible (like if a charset converter flushes at a partial multibyte sequence). Converters are supposed to try to produce as much output as possible and then return an error (typically
PARTIAL_INPUT).Added in version 2.24.
- Parameters:
inbuf – the buffer containing the data to convert.
outbuf – a buffer to write converted data in.
flags – a
ConverterFlagscontrolling the conversion details