# Overview

This documentation outlines supported data delivery methods and the data schema for Sayari's bulk data product.

# Data Delivery

# Signed URLs

Sayari provides bulk data access via signed URLs (opens new window). Signed URLs provide time-limited access to their corresponding data files. Note: signed URLs are valid for a maximum of 7 days.

# Example Usage

wget -i signed_urls.txt -c -t 3

# Suggested Tools

# SFTP Download

Sayari provides bulk data access via SFTP (opens new window). SFTP requires generating a Secure Shell (SSH) key pair and sharing a public key with Sayari. For guidance on how to generate a SSH key pair, please review the following tutorial: Generate a Secure Shell (SSH) key pair for an SFTP dropbox (opens new window). After sharing a public key, Sayari will provide the corresponding username.

# Example Usage

sftp [email protected]
sftp> get path/to/file

# Suggested Tools

# Bulk Data Format

Bulk data exports contain information about entities (AKA nodes, vertices) and relationships (AKA links, edges) in Sayari graph. The data are distributed in two sets of files, one set for entities and one set for relationships between those entities.

V1 exports support CSV and parquet formats. V2 exports are only in parquet format. When the data are in CSV format, both sets of files are gzipped, with the following characteristics:

  • Each file begins with a header containing column names
  • The delimiter is a comma - ,
  • The quote character (used to escape data that contains the delimiter) is a double quote - "
  • Complex data (lists, maps, etc.) are serialized to JSON before being written

When the data are in parquet format, compression is snappy, and complex data are represented as native parquet structures (not JSON).

# Entities Data

The entity files contain information about entities in the graph. Each entity is uniquely identified by the ​entity_id ​column.

# Entities Schema V1

Though each entity is uniquely identified by entity_id, the same entity can appear multiple times in the V1 schema dataset, in order to support multiple attributes of the same type for that entity (i.e. aliases). To accommodate multiple rows, an integer column ​i ​is used. For example:

entity_id i name
0-AWvH8du6YBVdRcxTSPUQ 0 {"value":"HESCO ENGINEERING & CONSTRUCTION CO (UK)"}
0-AWvH8du6YBVdRcxTSPUQ 1 {"value":"HESCO ENGINEERING & CONSTRUCTION CO"}
0-AWvH8du6YBVdRcxTSPUQ 2 {"value":"HESCO ENGINEERING AND CONSTRUCTION COMPANY LIMITED"}
0-AWvH8du6YBVdRcxTSPUQ 3 {"value":"HESCO ENG & CON. CO"}
0-FvUP2Fo3TqTfglWwsPsw 0 {"value":"ЛЕОНИД СОФРОНОВИЧ КОРНИЛОВ"}
0-FvUP2Fo3TqTfglWwsPsw 1 {"value":"LEONID SAFRONOVICH KORNILOV"}

This sample shows two entities with information split up over six rows. The first four rows correspond to information about entity ​0-AWvH8du6YBVdRcxTSPUQ​ and the last two rows correspond to information about entity ​0-FvUP2Fo3TqTfglWwsPsw​.

If a single row per entity is desired, the filter ​WHERE i = 0 ​can be applied. This row will include the most commonly cited attribute value of each type (i.e. name, address).

Summary fields include:

  • type​: type of entity
  • label: ​best name for entity
  • label_en​: best ASCII name for entity, if one exists
  • num_documents: ​number of underlying documents entity was extracted from
  • sanctioned: ​whether the entity is sanctioned
  • pep: ​whether the entity is a politically exposed person
  • degree: number of distinct neighboring entities
  • sources: the distinct data sources where an entity has been cited
  • edge_counts​: counts the number of neighbors per edge type

Summary fields are repeated for all rows with the same entity ID. So if an entity has 10 rows 0..10 ​these fields will be constant for every row.

The entity attribute fields are:

  • Identifier An ID number that uniquely identifies one entity when value and type are taken into account.
  • Additional Information A generic attribute used to hold miscellaneous information not covered by any other attribute. Includes 'value' (for the attribute itself), 'type' (a name, e.g. 'Real property description,') and 'extra' (a miscellaneous field to hold any other details) fields.
  • Address A physical location description. Addresses may exist as a simple string ('123 South Main St., South Bend, IN 46556'), or may be in smaller chunks with separate fields ('Number: 123,' 'Street name: South Main...'). Where possible, these fields will be parsed using the Libpostal ontology (https://github.com/openvenues/libpostal#parser-labels), which facilitates more robust address analysis and comparison.
  • Name An entity's name. The value may be straightforward (e.g. 'Acme LLC,' 'John Doe') or context-specific (e.g. 'Jones v. Smith' as a legal matter name).
  • Status The status of an entity.
  • Business Purpose Text and/or a code (NAICS, NACE, ISIC, etc.) that describes what a company is legally allowed to do or produce
  • Shares Shares associated with an entity (e.g. its number of issued shares, or the number of shares held by a shareholder)
  • Position An attribute used for many different relationship types that allows for the inclusion of a title or designation (e.g. member_of_the_board_of, Position: 'Secretary of the Board,' or shareholder_of, Position: 'Minority shareholder')
  • Monetary Value The financial value of an asset (e.g. FOB, CIF)
  • Company Type A type of legal entity in a given jurisdiction (e.g. 'LLC,' 'Sociedad Anonima,' 'Private Company Limited by Shares')
  • Contact Contact information for an entity
  • Weak Identifier A non-unique ID number, like a partially redacted tax ID or a registry identifier whose value and type may be shared by multiple entities
  • Risk Intelligence Risk intelligence metadata
  • Measurement A numerical representation in a standard unit of some dimension of an entity, for example, weight
  • Financials A summary of financial information at one point in time
  • Date Of Birth Birth date of a person
  • Country An affiliation of an entity with a given country through residence, nationality, etc.
  • Gender A person's gender
  • Translated Name A name that has been translated to English

Below is the full entities data schema which can provide more details on these attribute fields. The values in these columns for a single entity will change depending on the row. This is illustrated in the above provided sample, where the name column's value changes along with the index ​i​.

root
 |-- entity_id: string (nullable = true)
 |-- i: integer (nullable = true)
 |-- type: string (nullable = true)
 |-- label: string (nullable = true)
 |-- label_en: string (nullable = true)
 |-- num_documents: long (nullable = true)
 |-- sanctioned: boolean (nullable = true)
 |-- pep: boolean (nullable = true)
 |-- degree: long (nullable = true)
 |-- edge_counts: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- out: long (nullable = true)
 |    |    |-- in: long (nullable = true)
 |    |    |-- total: long (nullable = true)
 |-- gender: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |-- business_purpose: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |    |-- code: string (nullable = true)
 |-- person_status: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |-- finances: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: double (nullable = true)
 |    |-- context: string (nullable = true)
 |    |-- type: string (nullable = true)
 |    |-- currency: string (nullable = true)
 |-- name: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |    |-- language: string (nullable = true)
 |    |-- context: string (nullable = true)
 |-- identifier: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |    |-- type: string (nullable = true)
 |-- additional_information: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |    |-- type: string (nullable = true)
 |-- address: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |    |-- language: string (nullable = true)
 |    |-- house: string (nullable = true)
 |    |-- house_number: string (nullable = true)
 |    |-- po_box: string (nullable = true)
 |    |-- building: string (nullable = true)
 |    |-- entrance: string (nullable = true)
 |    |-- staircase: string (nullable = true)
 |    |-- level: string (nullable = true)
 |    |-- unit: string (nullable = true)
 |    |-- road: string (nullable = true)
 |    |-- metro_station: string (nullable = true)
 |    |-- suburb: string (nullable = true)
 |    |-- city_district: string (nullable = true)
 |    |-- city: string (nullable = true)
 |    |-- state_district: string (nullable = true)
 |    |-- island: string (nullable = true)
 |    |-- state: string (nullable = true)
 |    |-- postcode: string (nullable = true)
 |    |-- country_region: string (nullable = true)
 |    |-- country: string (nullable = true)
 |    |-- world_region: string (nullable = true)
 |    |-- category: string (nullable = true)
 |    |-- near: string (nullable = true)
 |-- shares: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- num_shares: double (nullable = true)
 |    |-- monetary_value: double (nullable = true)
 |    |-- currency: string (nullable = true)
 |    |-- percentage: double (nullable = true)
 |    |-- type: string (nullable = true)
 |-- company_type: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |-- weak_identifier: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |    |-- type: string (nullable = true)
 |-- date_of_birth: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |-- translated_name: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |    |-- original: string (nullable = true)
 |    |-- context: string (nullable = true)
 |-- status: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |    |-- text: string (nullable = true)
 |-- country: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |    |-- context: string (nullable = true)
 |    |-- state: string (nullable = true)
 |-- contact: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |    |-- type: string (nullable = true)
 |-- position: struct (nullable = true)
 |    |-- extra: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
 |    |-- date: string (nullable = true)
 |    |-- from_date: string (nullable = true)
 |    |-- to_date: string (nullable = true)
 |    |-- value: string (nullable = true)
 |-- sources: array (nullable = true)
 |    |-- element: string (containsNull = true)

# Entities Schema V2

In the V2 entities schema, there is one row per entity_id. When an entity has multiple attributes of a given type, they are collected into an array. In parquet and JSON, nested data is represented in native structures.

entity_id name
---32HZZhC_ZzK0G6g8JgA [[[Transliterated By -> Sayari],,,, Zhang Zhiwei,, transliteration,], [,,,, 张志伟,, primary,]]
---3sWkBuU5NybZXrv7RSQ [[[Transliterated By -> Sayari],,,, Hu Guoxing,, transliteration,], [,,,, 胡国兴,, primary,]]
---7mC9Y8c_ME2hhpMGD3w [[[Transliterated By -> Sayari],,,, Dong Guan Shi Zhong Ji Dian Zi Cai Liao You Xian Gong Si,, transliteration,], [,,,, 东莞市中洁电子材料有限公司,, primary,], [,,,, Dongguan Zhongjie Electronic Material Co., Ltd.,, google_translate, 东莞市中洁电子材料有限公司]]

Summary fields include:

  • type​: type of entity
  • label: ​best name for entity
  • label_en​: best ASCII name for entity, if one exists
  • num_documents: ​number of underlying documents entity was extracted from
  • sanctioned: ​whether the entity is sanctioned
  • pep: ​whether the entity is a politically exposed person
  • degree: number of distinct neighboring entities
  • closed: whether or not an entity is closed (relevant for company entities only)
  • edge_counts​: counts the number of neighbors per edge type
  • sources: the distinct data sources where an entity has been cited

The attribute fields are the same as in the V1 schema with a few exceptions:

  • the identifier and weak identifier attributes have been combined into one column, in order to better represent how they are shown in the application
  • the name and translated name attributes have been combined into one column, in order to better represent how they are shown in the application
  • the position attribute column has been removed, as it is an attribute that is only relevant in relationships data
  • the person status attribute column has been removed, as it has been deprecated and is always null

See the schema below for more information on how the attributes data is structured.

The V2 schema also includes risk factor information. The schema below can be used to demonstrate the data structure of risk factors, but may not list all of them, as new risk factors are being added over time. A complete list of risk factors can be found here.

root
 |-- entity_id: string (nullable = true)
 |-- type: string (nullable = true)
 |-- label: string (nullable = true)
 |-- label_en: string (nullable = true)
 |-- num_documents: long (nullable = true)
 |-- sanctioned: boolean (nullable = true)
 |-- pep: boolean (nullable = true)
 |-- degree: long (nullable = true)
 |-- closed: string (nullable = true)
 |-- edge_counts: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- in: long (nullable = true)
 |    |    |-- out: long (nullable = true)
 |    |    |-- total: long (nullable = true)
 |-- risk_factors: struct (nullable = true)
 |    |-- sanctioned: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- pep: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- cpi_score: struct (nullable = true)
 |    |    |-- value: double (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- eu_high_risk_third: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- basel_aml: struct (nullable = true)
 |    |    |-- value: double (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- reputational_risk_modern_slavery: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- high_risk_cash_intensive_business_purpose: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- state_owned: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- formerly_sanctioned: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- reputational_risk_terrorism: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- reputational_risk_organized_crime: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- reputational_risk_financial_crime: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- reputational_risk_bribery_and_corruption: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- reputational_risk_other: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- reputational_risk_cybercrime: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- regulatory_action: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- law_enforcement_action: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- export_controls: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- psa_sanctioned: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- psa_pep: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- sanctioned_distance: struct (nullable = true)
 |    |    |-- value: double (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- pep_distance: struct (nullable = true)
 |    |    |-- value: double (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |-- xinjiang_geospatial: struct (nullable = true)
 |    |    |-- value: boolean (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |-- name: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |    |    |-- language: string (nullable = true)
 |    |    |-- context: string (nullable = true)
 |    |    |-- original: string (nullable = true)
 |-- identifier: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |    |    |-- type: string (nullable = true)
 |-- date_of_birth: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |-- country: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |    |    |-- context: string (nullable = true)
 |    |    |-- state: string (nullable = true)
 |-- shares: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- num_shares: double (nullable = true)
 |    |    |-- monetary_value: double (nullable = true)
 |    |    |-- currency: string (nullable = true)
 |    |    |-- percentage: double (nullable = true)
 |    |    |-- type: string (nullable = true)
 |-- address: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |    |    |-- language: string (nullable = true)
 |    |    |-- house: string (nullable = true)
 |    |    |-- house_number: string (nullable = true)
 |    |    |-- po_box: string (nullable = true)
 |    |    |-- building: string (nullable = true)
 |    |    |-- entrance: string (nullable = true)
 |    |    |-- staircase: string (nullable = true)
 |    |    |-- level: string (nullable = true)
 |    |    |-- unit: string (nullable = true)
 |    |    |-- road: string (nullable = true)
 |    |    |-- metro_station: string (nullable = true)
 |    |    |-- suburb: string (nullable = true)
 |    |    |-- city_district: string (nullable = true)
 |    |    |-- city: string (nullable = true)
 |    |    |-- state_district: string (nullable = true)
 |    |    |-- island: string (nullable = true)
 |    |    |-- state: string (nullable = true)
 |    |    |-- postcode: string (nullable = true)
 |    |    |-- country_region: string (nullable = true)
 |    |    |-- country: string (nullable = true)
 |    |    |-- world_region: string (nullable = true)
 |    |    |-- category: string (nullable = true)
 |    |    |-- near: string (nullable = true)
 |    |    |-- x: double (nullable = true)
 |    |    |-- y: double (nullable = true)
 |    |    |-- precision_code: string (nullable = true)
 |-- additional_information: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |    |    |-- type: string (nullable = true)
 |-- business_purpose: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |    |    |-- code: string (nullable = true)
 |    |    |-- standard: string (nullable = true)
 |-- company_type: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |-- gender: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |-- finances: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- value: double (nullable = true)
 |    |    |-- context: string (nullable = true)
 |    |    |-- type: string (nullable = true)
 |    |    |-- currency: string (nullable = true)
 |-- contact: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |    |    |-- type: string (nullable = true)
 |-- status: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |    |    |-- text: string (nullable = true)
 |    |    |-- context: string (nullable = true)
 |-- sources: array (nullable = true)
 |    |-- element: string (containsNull = true)

# Relationships Data

The relationships files contain information about relationships between the entities in the entities data, and is the same between V1 and V2. Three key fields here are ​src​, ​dst, ​and ​type​:

src dst type
X1hfIDVLf09lSB_FoK7osw nUVBufHeWAaa7UilnXTj3g SHAREHOLDER_OF
_-UUAFZzEKD0-BTQyzbJyw 4k6j6CKzVIgWdYUV-6OhsA SHAREHOLDER_OF
FWW-PkGN0mXMuxLznOL9kQ QzFDUYCAkV1tmhO4x7tqfg LINKED_TO
QhMECOtIAN9FZQ2N8vX9Hw 7Op53EC0EOl1M38FWcd75A SHAREHOLDER_OF
kBmQaD7KoDMCad2IpjbaYw _86dvjUnV8whpBJwHdm4rQ LEGAL_REPRESENTATIVE_OF

In this example, the first row here indicates that the entity with entity_idX1hfIDVLf09lSB_FoK7osw​ is a shareholder of the entity with entity_id ​nUVBufHeWAaa7UilnXTj3g​. The rows in this table are unique according to these three fields, so there is a only a single row with ​src = X1hfIDVLf09lSB_FoK7osw​, ​dst = nUVBufHeWAaa7UilnXTj3g, ​and ​type = SHAREHOLDER_OF​.

The relationships files contain several date fields:

  • date
  • from_date
  • to_date

These fields, when populated, give information about the time period that the relationship is valid for. Relationships also have the following attribute fields:

  • position
  • additional_information
  • shares
  • business_purpose

See the full relationships schema below for more details on these attribute fields.

The final relationship field is ​match_keys.​ This field is only populated when ​type = POSSIBLY_SAME_AS ​to indicate that two entities are possibly the same entity. An example of data in the ​match_keys field is as follows:

[
    {
        "key": "house_number",
        "value": "5",
        "entity1": "5",
        "entity2": "5"
    },
    {
        "key": "road",
        "value": "ROOSEVELT STR YALTA CRIMEA",
        "entity1": "Roosevelt Str. Yalta Crimea",
        "entity2": "Roosevelt Str. Yalta Crimea"
    },
    {
        "key": "postcode",
        "value": "98600",
        "entity1": "98600",
        "entity2": "98600"
    },
    {
        "key": "name",
        "value": "YALTA MERCHANT SEA PORT",
        "entity1": "YALTA MERCHANT SEA PORT",
        "entity2": "Yalta Merchant Sea Port"
    }
]

Each item in the array indicates a field that matched between the two entities. ​key ​gives the field name, ​value ​gives the normalized value, ​entity1​ gives the value for the ​src ​entity, and entity2 ​gives the value for the ​dst​ entity. The above sample illustrates that the two entities are possibly the same due to a shared name and partial address match.

# Relationships Schema

Below is the full schema for relationships files when read in parquet format. The CSV files (available in V1) have the same fields, but with complex fields (lists, maps, etc.) serialized as JSON strings. The structure is identical between the V1 and V2 schemas.

root
 |-- src: string (nullable = true)
 |-- dst: string (nullable = true)
 |-- type: string (nullable = true)
 |-- date: string (nullable = true)
 |-- from_date: string (nullable = true)
 |-- to_date: string (nullable = true)
 |-- position: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |-- additional_information: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |    |    |-- type: string (nullable = true)
 |-- shares: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- extra: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- date: string (nullable = true)
 |    |    |-- from_date: string (nullable = true)
 |    |    |-- to_date: string (nullable = true)
 |    |    |-- num_shares: double (nullable = true)
 |    |    |-- monetary_value: double (nullable = true)
 |    |    |-- currency: string (nullable = true)
 |    |    |-- percentage: double (nullable = true)
 |    |    |-- type: string (nullable = true)
 |-- match_keys: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- key: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |    |    |-- entity1: string (nullable = true)
 |    |    |-- entity2: string (nullable = true)