DataStore are the connectors to third party systems. They provide the methods necessary to fetch data and provide the hints necessary to convert the data to segments and records.
DataStore’s are provided in the open sourced library available in our trivio.datastores github project.
To add a new third party system or internet protocol you create a class that implement the datastore protocol. You then register the DataStore to handle a specific internet scheme (http,ftp, etc) or a specific URL. DataStores registered for a specific URL take precedence over DataStores registered to handle a internet scheme.
Objects implementing the datastore protocol should expose the attributes and methods documented in this guide. As connivence we provide an abstract class triv.io.DataSource which you can choose to inherit from, but it is not a requirement so long as the following attributes and methods are defined.
Invoked when triv.io matches a given source URL to your DataStore.
Parameters: | parsed_url – The parsed URL that trivi.io mapped to the DataStore. |
---|
Returns an iterator of records.
rtype: tuple([record iterator], size, url)
Discussion
This is an optional method that DataStores implement if they wish to interpret the given url into records. Note: some DataStores, like the S3DataStore simply construct Segments that contain URLs to a different protocol like http. In which case the HTTPDataStore does the heavy lifting.
A toy implementation of this method would look like this:
class MyDataStore(object):
@staticmethod
input_stream(stream, size, url):
return enumerate([['1','2','3']]), None, url
The value of the URL minus the scheme and netlocation. Triv.io uses this value to group all your records into a table.
Implementation Note DataStores typically return parsed_url.path:
class MyDataStore(object):
... # other attributes and properties
@property
def path(self):
return self.parsed_query.path
Returns the scheme that the DataStore handles
Implementation Note DataStores typically return parsed_url.scheme:
@property
def scheme(self):
return self.parsed_url.scheme
Returns the url used to initialize the Datastore.
Implementation Note DataStores typically return a reconstructed parsed_url minus authentication:
@property
def url(self):
"""Returns the url for the source minus authentication"""
parsed_url = self.parsed_url
scheme, netloc, path, params, query, fragment = parsed_url
netloc = ':'.join(filter(None, [parsed_url.hostname,parsed_url.port]))
url = urlparse.urlunparse((scheme, netloc, path, params, query, fragment))
return url
Returns a pthyon datetime object representing time of the very first record in the datastore. This method will not be invoked if the rule.dtstart was specified. datetime.utcnow() should be returned if this value can not be determined.
rtype: | datetime |
---|
Returns a Segment that fits between the start and end time