Rule Class Reference

Overview

You use rules to define your data processing pipeline. Rules have methods and attributes for specifying where data exists, how and when to process it:

Tasks

Initializing a Class

rule

Scheduling

freq

interval

Processing

map()

reduce()

Class Methods and Attributes

class rule(source=source_url[, target=target_url][, name=name])

Declares a new rule or a reusable rule.

Parameters:
  • source_url – The URL where the source data can be retreived
  • target_url – Name of the table inside of triv.io or the external destination
  • name – Make the rule reusable as part of a rule chain.
Return type:

a rule instance

Discussion

You either create an instance of a rule by supplying it with a source and optional target name or you create a reusable rule (see Chaining) by specifying the name keyword argument without specifying the source and target.

Example:

quote = rule("http://download.finance.yahoo.com/d/quotes.csv?s=AAPL&f=nsl1op")

If you do not supply a target, triv.io will use the basename from the source url as the table name inside of triv.io. To specify the table name declare the rule with a second argument like this:

quote = rule("http://download.finance.yahoo.com/d/quotes.csv?s=AAPL&f=nsl1op", 'aapl')

This saves the records once a day to a table named aapl

Note :Your data is stored in triv.io if you specify the target url without a scheme.

If you include a scheme and net location your data will be exported to an an external system as in this example.

quote  = rule("http://download.finance.yahoo.com/d/quotes.csv?s=AAPL&f=nsl1op",
         'mysql://mysql.example.com/aapl')

To create a reusable rule simple use the .. py:keyword: name:

sum = rule(name="sum")

Frequency Constants

The rule class defines the following constants for specifying the frequency of rule processing

rule.YEARLY
rule.MONTHLY
rule.DAILY
rule.HOURLY
rule.MINUTELY

Instance Methods and Attributes

freq

Denotes the period on which the rule is evaluated. See Frequency Constants for valid values. Defaults to rule.DAILY.

Example: Executing a rule once an hour:

instance.freq = rule.HOURLY
interval

The interval between each freq iteration. For example, when using an interval of rule.HOURLY and an interval of 2 means schedule a job once every 2 hours. Defaults to 1.

Example: Executing a rule once every 15 minutes:

instance.freq = rule.MINUTELY
instance.interval = 15
@instance.map

Declares the function following it is to be used as the mapping function. For example to create the identity mapping function (the default if you do not specify a mapping function) you would declare it like so:

@job.map
def identity(record, params):
  yield record
@instance.reduce

Declares the function following it is to be used as the reducer function. For example to create the identity mapping function (the default if you do not specify a mapping function) you would declare it like so:

@job.reduce
def identity(iter, params):
  for key, values in iter:
    yield key, values