Kickstart¶
To get started, you should also read pragmatic examples in the Cookbook.
Create an empty project¶
If you want to bootstrap an ETL project on your computer, you can now do it using the provided PasteScript template.
pip install PasteScript
paster create -t etl_project MyProject
Overview of concepts¶
Extract¶
Extract
is a flexible base class to write extract transformations. We use a generator here, real life
would usually use databases, webservices, files ...
from rdc.etl.transform.extract import Extract
@Extract
def my_extract():
yield {'foo': 'bar', 'bar': 'min'}
yield {'foo': 'boo', 'bar': 'put'}
Transform¶
Transform
is a flexible base class for all kind of transformations.
from rdc.etl.transform import Transform
@Transform
def my_transform(hash, channel):
yield hash.update({
'foo': hash['foo'].upper()
})
Load¶
We’ll use the screen as our load target ...
from rdc.etl.transform.util import Log
my_load = Log()
For more informations, see the loads reference.
Note
Log is not a “load” transformation stricto sensu (as it acts as an identity transformation, sending to the default output channel whatever comes in its default input channel), but we’ll use it as such for demonstration purpose.
Run¶
Let’s create a Job
. It will be used to:
- Connect transformations
- Manage threads
- Monitor execution
from rdc.etl.job import Job
job = Job()
The Job
has a add_chain()
method that can be used to easily plug a list of ordered transformations together.
job.add_chain(my_extract, my_transform, my_load)
Our job is ready, you can run it.
job()