DataStore Protocol Reference

Overview

DataStore are the connectors to third party systems. They provide the methods necessary to fetch data and provide the hints necessary to convert the data to segments and records.

DataStore’s are provided in the open sourced library available in our trivio.datastores github project.

To add a new third party system or internet protocol you create a class that implement the datastore protocol. You then register the DataStore to handle a specific internet scheme (http,ftp, etc) or a specific URL. DataStores registered for a specific URL take precedence over DataStores registered to handle a internet scheme.

Objects implementing the datastore protocol should expose the attributes and methods documented in this guide. As connivence we provide an abstract class triv.io.DataSource which you can choose to inherit from, but it is not a requirement so long as the following attributes and methods are defined.

Class Methods

DataStore(parsed_url):

Invoked when triv.io matches a given source URL to your DataStore.

Parameters:parsed_url – The parsed URL that trivi.io mapped to the DataStore.
input_stream(stream, size, url, params):

Returns an iterator of records.

rtype:tuple([record iterator], size, url)

Discussion

This is an optional method that DataStores implement if they wish to interpret the given url into records. Note: some DataStores, like the S3DataStore simply construct Segments that contain URLs to a different protocol like http. In which case the HTTPDataStore does the heavy lifting.

A toy implementation of this method would look like this:

class MyDataStore(object):
  @staticmethod
  input_stream(stream, size, url):
    return enumerate([['1','2','3']]), None, url

Instance Attributes

path

The value of the URL minus the scheme and netlocation. Triv.io uses this value to group all your records into a table.

Implementation Note DataStores typically return parsed_url.path:

class MyDataStore(object):
  ... # other attributes and properties

  @property
  def path(self):
    return self.parsed_query.path
scheme

Returns the scheme that the DataStore handles

Implementation Note DataStores typically return parsed_url.scheme:

@property
def scheme(self):
  return self.parsed_url.scheme
url

Returns the url used to initialize the Datastore.

Implementation Note DataStores typically return a reconstructed parsed_url minus authentication:

@property
def url(self):
  """Returns the url for the source minus authentication"""
  parsed_url = self.parsed_url
  scheme, netloc, path, params, query, fragment = parsed_url
  netloc = ':'.join(filter(None, [parsed_url.hostname,parsed_url.port]))
  url = urlparse.urlunparse((scheme, netloc, path, params, query, fragment))

  return url

Instance Methods

earliest_record_time()

Returns a pthyon datetime object representing time of the very first record in the datastore. This method will not be invoked if the rule.dtstart was specified. datetime.utcnow() should be returned if this value can not be determined.

rtype:datetime
earliest_record_time()
segment_between(start, end)

Returns a Segment that fits between the start and end time