# Data Model

# High Level

# Corporate Network

In this example, we'll look at how source documents from public UK company registries are modeled using Sayari's ontology.

The below image shows a simple network of companies (yellow) and people (blue).

nodes

The entities are connected by several different relationship types, based on the data in the registry. The highlighted entity in the bottom right (VICTORIA BECKHAM LIMITED) has a number of attributes, shown below.

# Attributes

attributes

The bolded text lists attribute types (Name, Address, etc). Entities and relationships can have multiple attributes of any given type. For example VICTORIA BECKHAM LIMITED has three names and two addresses.

Note that each attribute can contain a number of fields. For example the shares attribute has a number of fields that can be used to characterize a shareholding relationship.

Attributes for a single entity can be drawn from a large number of records. For example, the first address listed (Unit 33, Ransomes Dock Business Centre, 35-37 Parkgate Road, London SW11 4NP) was found in 6 records. Sayari uses several entity resolution techniques to identify cases where the same entity is mentioned in multiple records.

# Relationship Attributes

Relationships can also have attributes:

link link-attrs

In this case we use the position attribute to include useful information from the original source (The UK PSC register in this case)

# Fuzzy Entity Resolution

As mentioned above, it is often possible to resolve two entities with such a high degree of confidence that we are comfortable saying they are the same entity. This type of resolution is what allows us to aggregate attributes from multiple records under a single entity like VICTORIA BECKHAM LIMITED. However in other cases we do have reason to believe that two entities are the same, but we aren't quite confident enough to merge them.

In these cases we produce a "possibly-same-as" relationship between the two entities in question. For example:

psa

In this case VICTORIA CAROLINE BECKHAM of SPICE GIRLS LLP is connected to MR CONOR ROBERT DUFFICY through a shared association with a company, INSIDE TRACK PRODUCTIONS. MR CONOR ROBERT DUFFICY was possibly located in another registry, where his connection to HAMLET ASSOCIATES LIMITED is disclosed. The attributes for HAMLET ASSOCIATES LIMITED are shown below:

hamlet-attrs

Although we started in the UK registry with VICTORIA BECKHAM, this information comes from an entirely different registry: the Maltese Company registry. By providing these fuzzy "possible-same-as" relationships, we are able to bridge the gap between different countries.

# API

The information depicted on this page comes from our API. You can see a worked example of API usage here.

# Low Level

This page contains a brief overview of Sayari's internal data storage model. Understanding our data model will make it easier to use the API.

The below image provides a visual aid:

flow

# Entity

An entity (AKA node, vertex) is a discrete "thing" that Sayari provides information about. Usually it is a legal entity, company, or natural person.

You can see a list of entity types here.

# Relationship

A relationship (AKA link, edge) represents a connection between two entities.

You can see a list of relationship types here.

# Attributes

Attributes are facts about entities or relationships. For example, the name of an entity would be stored as an attribute.

You can see a list of attribute types here.

# Record

We use the term record to indicate a structured representation of a source document.

For example, we might have an HTML file as a source document. The record generated from this document would consist of all the entities and relationships found in the document, represented in a single, consist format.

# Document (or "Source Document")

The raw, unstructured data that Sayari collects. It is processed into the more structured records, and usually retained for sourcing/provenance purposes.

Presently our documents often consist of HTML and PDFs.

# Entity Resolution

Entities are resolved to the best of our ability. This means that when Sayari holds two different records that mention a single entity, we should only produce one entity.

An example of entity resolution is depicted in the image above. The top entity is found in two different records.

Entity resolution is not perfect, and we frequently store duplicated entities. Improving entity resolution is a constant goal.