# Overview
Sayari's bulk data provides access to billions of entities (i.e., vertices, nodes) and relationships (i.e., edges) as displayed in Sayari's suite of products. This documentation outlines the data structure, supported data formats, and data delivery mechanisms of Sayari's bulk data offering.
# Data Structure
Our data is delivered in two sets of files: entities and relationships. Within each file, a row describes one entity or relationship.
Note
Any entity or relationship may have multiple attributes of the same type; for example, an entity may have multiple addresses (physical, mailing, etc.). Accordingly, all attributes are included as arrays.
# Entities
Like an entity profile in the Sayari suite of products, a row in an entity file describes a single entity, including its attributes, risk factors, and summary information.
Summary information includes properties that describe the entity, as outlined below:
Field | Type | Description |
---|---|---|
entity_id | string | Primary key |
type | string | Entity type, see Entities |
label | string | Most commonly reported name |
label_en | string | Most commonly reported American Standard Code for Information Interchange (ASCII) name |
closed | boolean | Whether an entity is closed |
degree | long | Number of unique neighboring entities |
edge_counts | map | Number of neighbors per edge type |
sanctioned | boolean | See Sanctioned |
pep | boolean | See Politically Exposed Person (PEP) |
source | array[string] | List of data sources an entity was referenced in |
num_documents | long | Number of source documents an entity was referenced in |
# Relationships
A row in a relationship file describes a single relationship, including its attributes and summary information. Relationships connect two entities (i.e., vertices), which are specified by their entity_ids.
Field | Type | Description |
---|---|---|
src | string | entity_id of the tail vertex |
dst | string | entity_id of the head vertex |
type | string | Relationship type, see Relationships |
from_date | string | Start date of a relationship |
date | string | As-of date of a relationship |
to_date | string | End date of a relationship |
# Data Formats
# Data Delivery
# Signed URLs
Sayari provides bulk data access via signed URLs (opens new window). Signed URLs provide time-limited access to their corresponding data files. Signed URLs are delivered as a text file of newline delimited URLs.
Note
Signed URLs are valid for a maximum of 7 days.
# Example usage
wget -i signed_urls.txt -c -t 3
# Suggested tools
# SFTP Download
Sayari provides bulk data access via SFTP (opens new window). SFTP requires generating a Secure Shell (SSH) key pair and sharing a public key with Sayari. For guidance on how to generate an SSH key pair, please review the following tutorial: Generate a Secure Shell (SSH) key pair for an SFTP dropbox (opens new window). After receiving a public key, Sayari will provide the corresponding username.
# Example usage
sftp [email protected]
sftp> get path/to/file