Maps¶

Maps are transforms that will yield rows depending on the value of one input field. In association with FileExtract for example, it can parse the file content format and yield rows that have an added knowledge.

By default, maps use the topic (_) field for input

Map (base class and decorator)¶

class rdc.etl.transform.map.Map(map=None, field=None)[source]¶

Base class for mappers.

map[source]¶: Map logic callable. Takes the hash’s field value and yields iterable data.

field¶: The input field.

Example:

>>> from rdc.etl.transform.map import Map
>>> from rdc.etl.transform.util import clean

>>> @Map
... def my_map(s_in):
...     for l in s_in.split('\n'):
...        yield {'f%d' % i: v for i, v in enumerate(l.split(':'))}

>>> map(clean, my_map({'_': 'a:b:c\nb:c:d\nc:d:e'}))
[H{'f0': 'a', 'f1': 'b', 'f2': 'c'}, H{'f0': 'b', 'f1': 'c', 'f2': 'd'}, H{'f0': 'c', 'f1': 'd', 'f2': 'e'}]

CsvMap¶

class rdc.etl.transform.map.csv.CsvMap(field=None, delimiter=None, quotechar=None, headers=None, skip=None)[source]¶

Reads a CSV and yield the values, line-by-line.

delimiter¶: The CSV delimiter.

quotechar¶: The CSV quote character.

headers¶: The list of column names, if the CSV does not contain it as its first line.

skip¶: The amount of lines to skip before it actually yield output.

XmlMap¶

class rdc.etl.transform.map.xml.XmlMap(map_item=None, xpath=None, field=None)[source]¶

Reads a XML and yield values for each root children.

Warning

This does not work, don’t use (or fix before :p).

Definitions:

XML Item: In the context of an XmlMap, we define an XML Item as being either a children of the XML root if no xpath has been provided, or one item returned by the XPath provided.

map_item[source]¶: Will be called for each input XML Item, and should return a dictionary of values for this item.

field¶: The input field (defined in parent).

xpath¶: XPath used to select items before running them through item_map().