DataStore Protocol Reference


DataStore are the connectors to third party systems. They provide the methods necessary to fetch data and provide the hints necessary to convert the data to segments and records.

DataStore’s are provided in the open sourced library available in our trivio.datastores github project.

To add a new third party system or internet protocol you create a class that implement the datastore protocol. You then register the DataStore to handle a specific internet scheme (http,ftp, etc) or a specific URL. DataStores registered for a specific URL take precedence over DataStores registered to handle a internet scheme.

Objects implementing the datastore protocol should expose the attributes and methods documented in this guide. As connivence we provide an abstract class which you can choose to inherit from, but it is not a requirement so long as the following attributes and methods are defined.

Class Methods


Invoked when matches a given source URL to your DataStore.

Parameters:parsed_url – The parsed URL that mapped to the DataStore.
input_stream(stream, size, url, params):

Returns an iterator of records.

rtype:tuple([record iterator], size, url)


This is an optional method that DataStores implement if they wish to interpret the given url into records. Note: some DataStores, like the S3DataStore simply construct Segments that contain URLs to a different protocol like http. In which case the HTTPDataStore does the heavy lifting.

A toy implementation of this method would look like this:

class MyDataStore(object):
  input_stream(stream, size, url):
    return enumerate([['1','2','3']]), None, url

Instance Attributes


The value of the URL minus the scheme and netlocation. uses this value to group all your records into a table.

Implementation Note DataStores typically return parsed_url.path:

class MyDataStore(object):
  ... # other attributes and properties

  def path(self):
    return self.parsed_query.path

Returns the scheme that the DataStore handles

Implementation Note DataStores typically return parsed_url.scheme:

def scheme(self):
  return self.parsed_url.scheme

Returns the url used to initialize the Datastore.

Implementation Note DataStores typically return a reconstructed parsed_url minus authentication:

def url(self):
  """Returns the url for the source minus authentication"""
  parsed_url = self.parsed_url
  scheme, netloc, path, params, query, fragment = parsed_url
  netloc = ':'.join(filter(None, [parsed_url.hostname,parsed_url.port]))
  url = urlparse.urlunparse((scheme, netloc, path, params, query, fragment))

  return url

Instance Methods


Returns a pthyon datetime object representing time of the very first record in the datastore. This method will not be invoked if the rule.dtstart was specified. datetime.utcnow() should be returned if this value can not be determined.

segment_between(start, end)

Returns a Segment that fits between the start and end time