Transformations¶

Transformations are the basic bricks to build ETL processes. Basically, it gets lines from its input and sends transformed lines to its output.

You’re highly encouraged to use the rdc.etl.transform.Transform class as a base for your custom transforms, as it defines the whole I/O logic. All transformations provided by the package are subclasses of rdc.etl.transform.Transform.

class rdc.etl.transform.Transform(transform=None, input_channels=None, output_channels=None)[source]¶

Base class and decorator for transformations.

transform(hash, channel=0)[source]¶: Core transformation method that will be called for each input data row.

INPUT_CHANNELS¶: List of input channel names.

OUTPUT_CHANNELS¶: List of output channel names

Example:

>>> @Transform
... def my_transform(hash, channel=STDIN):
...     yield hash.copy({'foo': hash['foo'].upper()})

>>> print list(my_transform(
...         H(('foo', 'bar'), ('bar', 'alpha')),
...         H(('foo', 'baz'), ('bar', 'omega')),
...     ))
[H{'foo': 'BAR', 'bar': 'alpha'}, H{'foo': 'BAZ', 'bar': 'omega'}]

Builtin transformations reference

Extracts
Loads
- DatabaseLoad
Maps
Filters
Joins
Utilities
- Log
- Stop
- Override
- Clean
- SimpleTransform
Flow-related

Design notes

Input / output design