Kickstart¶

To get started, you should also read pragmatic examples in the Cookbook.

Create an empty project¶

If you want to bootstrap an ETL project on your computer, you can now do it using the provided PasteScript template.

pip install PasteScript
paster create -t etl_project MyProject

Overview of concepts¶

Extract¶

Extract is a flexible base class to write extract transformations. We use a generator here, real life would usually use databases, webservices, files ...

from rdc.etl.transform.extract import Extract

@Extract
def my_extract():
    yield {'foo': 'bar', 'bar': 'min'}
    yield {'foo': 'boo', 'bar': 'put'}

For more informations, see the extracts reference.

Transform¶

Transform is a flexible base class for all kind of transformations.

from rdc.etl.transform import Transform

@Transform
def my_transform(hash, channel):
    yield hash.update({
        'foo': hash['foo'].upper()
    })

For more informations, see the transformations reference.

Load¶

We’ll use the screen as our load target ...

from rdc.etl.transform.util import Log

my_load = Log()

For more informations, see the loads reference.

Note

Log is not a “load” transformation stricto sensu (as it acts as an identity transformation, sending to the default output channel whatever comes in its default input channel), but we’ll use it as such for demonstration purpose.

Run¶

Let’s create a Job. It will be used to:

Connect transformations
Manage threads
Monitor execution

from rdc.etl.job import Job

job = Job()

The Job has a add_chain() method that can be used to easily plug a list of ordered transformations together.

job.add_chain(my_extract, my_transform, my_load)

Our job is ready, you can run it.

job()

For more informations, see the jobs documentation.