Maps¶
Maps are transforms that will yield rows depending on the value of one input field. In association with FileExtract
for example, it can parse the file content format and yield rows that have an added knowledge.
By default, maps use the topic (_) field for input
Map (base class and decorator)¶
-
class
rdc.etl.transform.map.
Map
(map=None, field=None)[source]¶ Base class for mappers.
-
field
¶ The input field.
Example:
>>> from rdc.etl.transform.map import Map >>> from rdc.etl.transform.util import clean >>> @Map ... def my_map(s_in): ... for l in s_in.split('\n'): ... yield {'f%d' % i: v for i, v in enumerate(l.split(':'))} >>> map(clean, my_map({'_': 'a:b:c\nb:c:d\nc:d:e'})) [H{'f0': 'a', 'f1': 'b', 'f2': 'c'}, H{'f0': 'b', 'f1': 'c', 'f2': 'd'}, H{'f0': 'c', 'f1': 'd', 'f2': 'e'}]
-
CsvMap¶
-
class
rdc.etl.transform.map.csv.
CsvMap
(field=None, delimiter=None, quotechar=None, headers=None, skip=None)[source]¶ Reads a CSV and yield the values, line-by-line.
-
delimiter
¶ The CSV delimiter.
-
quotechar
¶ The CSV quote character.
-
headers
¶ The list of column names, if the CSV does not contain it as its first line.
-
skip
¶ The amount of lines to skip before it actually yield output.
-
XmlMap¶
-
class
rdc.etl.transform.map.xml.
XmlMap
(map_item=None, xpath=None, field=None)[source]¶ Reads a XML and yield values for each root children.
Warning
This does not work, don’t use (or fix before :p).
Definitions:
XML Item: In the context of an XmlMap, we define an XML Item as being either a children of the XML root if no xpath has been provided, or one item returned by the XPath provided.-
map_item
[source]¶ Will be called for each input XML Item, and should return a dictionary of values for this item.
-
field
¶ The input field (defined in parent).
-
xpath
¶ XPath used to select items before running them through item_map().
-