Maps

Maps are transforms that will yield rows depending on the value of one input field. In association with FileExtract for example, it can parse the file content format and yield rows that have an added knowledge.

By default, maps use the topic (_) field for input

Map (base class and decorator)

class rdc.etl.transform.map.Map(map=None, field=None)[source]

Base class for mappers.

map[source]

Map logic callable. Takes the hash’s field value and yields iterable data.

field

The input field.

Example:

>>> from rdc.etl.transform.map import Map
>>> from rdc.etl.transform.util import clean

>>> @Map
... def my_map(s_in):
...     for l in s_in.split('\n'):
...        yield {'f%d' % i: v for i, v in enumerate(l.split(':'))}

>>> map(clean, my_map({'_': 'a:b:c\nb:c:d\nc:d:e'}))
[H{'f0': 'a', 'f1': 'b', 'f2': 'c'}, H{'f0': 'b', 'f1': 'c', 'f2': 'd'}, H{'f0': 'c', 'f1': 'd', 'f2': 'e'}]

CsvMap

class rdc.etl.transform.map.csv.CsvMap(field=None, delimiter=None, quotechar=None, headers=None, skip=None)[source]

Reads a CSV and yield the values, line-by-line.

delimiter

The CSV delimiter.

quotechar

The CSV quote character.

headers

The list of column names, if the CSV does not contain it as its first line.

skip

The amount of lines to skip before it actually yield output.

XmlMap

class rdc.etl.transform.map.xml.XmlMap(map_item=None, xpath=None, field=None)[source]

Reads a XML and yield values for each root children.

Warning

This does not work, don’t use (or fix before :p).

Definitions:

XML Item: In the context of an XmlMap, we define an XML Item as being either a children of the XML root if no xpath has been provided, or one item returned by the XPath provided.
map_item[source]

Will be called for each input XML Item, and should return a dictionary of values for this item.

field

The input field (defined in parent).

xpath

XPath used to select items before running them through item_map().