Transformations¶
Transformations are the basic bricks to build ETL processes. Basically, it gets lines from its input
and sends
transformed lines to its output
.
You’re highly encouraged to use the rdc.etl.transform.Transform
class as a base for your custom transforms, as it
defines the whole I/O logic. All transformations provided by the package are subclasses of
rdc.etl.transform.Transform
.
-
class
rdc.etl.transform.
Transform
(transform=None, input_channels=None, output_channels=None)[source]¶ Base class and decorator for transformations.
-
transform
(hash, channel=0)[source]¶ Core transformation method that will be called for each input data row.
-
INPUT_CHANNELS
¶ List of input channel names.
-
OUTPUT_CHANNELS
¶ List of output channel names
Example:
>>> @Transform ... def my_transform(hash, channel=STDIN): ... yield hash.copy({'foo': hash['foo'].upper()}) >>> print list(my_transform( ... H(('foo', 'bar'), ('bar', 'alpha')), ... H(('foo', 'baz'), ('bar', 'omega')), ... )) [H{'foo': 'BAR', 'bar': 'alpha'}, H{'foo': 'BAZ', 'bar': 'omega'}]
-
Builtin transformations reference
Design notes