Transformations¶
Transformations are the basic bricks to build ETL processes. Basically, it gets lines from its input and sends transformed lines to its output.
You’re highly encouraged to use the rdc.etl.transform.Transform class as a base for your custom transforms, as it defines the whole I/O logic. All transformations provided by the package are subclasses of rdc.etl.transform.Transform.
- class rdc.etl.transform.Transform(transform=None, input_channels=None, output_channels=None)[source]¶
Base class and decorator for transformations.
- transform(hash, channel=0)[source]¶
Core transformation method that will be called for each input data row.
- INPUT_CHANNELS¶
List of input channel names.
- OUTPUT_CHANNELS¶
List of output channel names
Example:
>>> @Transform ... def my_transform(hash, channel=STDIN): ... yield hash.copy({'foo': hash['foo'].upper()}) >>> print list(my_transform( ... H(('foo', 'bar'), ('bar', 'alpha')), ... H(('foo', 'baz'), ('bar', 'omega')), ... )) [H{'foo': 'BAR', 'bar': 'alpha'}, H{'foo': 'BAZ', 'bar': 'omega'}]
Builtin transformations reference
Design notes