# Ontology Resolution Overview

When we identify entities in public records, we extract as much information as we can about them, from dates of birth to corporate ID numbers, addresses, and alternate names. In the best circumstances, a given document will provide a unique, consistently formatted identification number that appears elsewhere and allows us to track a particular entity from document to document.

To benefit from these diverse identifying attributes, Sayari’s data scientists and analysts have created a series of rules around entity resolution – the merging of one or more entities together based on specific shared characteristics. Each time Graph receives new data, potential matches are checked against these rules, and entities are resolved again, allowing the system to improve with each new data source.

In some cases, however, we may only have a name and address for an entity – or just a name. In order to avoid merging false positives (and missing potential false negatives), we have a separate set of rules - short of the evidence needed for resolution - that, when met, generate a Possibly Same As relationship between two or more entities.

The tables below contains our rules for entity resolution and the generation of Possibly Same As relationships. Each rule is sufficient to allow a resolution/relationship generation by itself.

# Entity Resolution Rules

# Unique identifier

Andrés Manuel López Obrador has RFC (Mexican tax ID number) LOOA531113F15. Because RFC numbers are unique identifiers, any person sharing this RFC number will be merged with his profile.

# Notes

  • Identifiers qualify as unique if they apply to only one entity. Examples include U.S. social security numbers, Mexican RFC numbers, and Chinese company names. Non-unique identifiers are, in isolation, rarely used for entity resolution.
  • This is transitive. If Andrés Manuel has another unique identifier shared with another profile, all three are merged.
  • Merger occurs regardless of entity name.
  • Entity type acts as a fail-safe. People will not be merged with companies.
  • Graph will not resolve between otherwise-identical identifiers of different types (e.g. Peru company ID 12345 and Togo company ID 12345).

# Chinese company name

If two companies with names comprised of only eight or more Chinese characters are the same, the companies will be merged.

# Relationship-target pair

Lumber LLC has a unique identifier, and a document shows that Jack Lumber (no ID) is a director. If another person named Jack Lumber appears in another document as a director of Lumber LLC, the two Jack Lumber profiles are merged.

# Notes

  • Merger occurs regardless of whether Jack Lumber has an identifier or not.
  • Entity type, relationship type, and relationship directionality are taken into consideration.
  • If the two Jack Lumbers have different unique identifiers (UK person numbers excluded), the merger will not occur.

# Whitelisted non-unique identifier

Mohammed Chams (unknown passport 12345) is linked to Trade Shipments, LLC in China. Mohammed is merged with two other people named Mohammed Chams (unknown passport 12345) linked to five other companies in different countries.

# Notes

  • There are currently only three types of non-unique identifiers that qualify under this rule: Brazilian OAB and partial CPF numbers (as redacted by the Brazilian government in CNPJ data), and passports where the issuing country is unknown.
  • This process involves name component sorting (standardizes case, white space, and order).

# Person name and address

Florida Company A, Paraguay Company B, and Luxembourg Company C are linked to an individual named Zhang Wei at 123 Oak St. #56, London, England. The three Zhang Wei profiles are merged.

# Notes

  • This process involves name component sorting (standardizes case, white space, and order).
  • Address matching takes place in two stages. First, the address postal code, house number, or road name must match along with the person’s name. Second, if any of those is true, we compare the full address.

# Possibly Same As Relationship Generation Rules

# Company name and address

American Company Inc. (referenced in corporate data from Florida) and American Company Inc. (referenced in tax data from Paraguay) are both located at 350 Oak Rd., Miami Beach, FL. Graph draws a Possibly Same As relationship between the two based on their shared name and address.

# Notes

  • We currently resolve and merge people based on a shared name and address, but not companies. While people rarely change their most common identifiers (name, date of birth, ID number), companies often change names, legal structures, addresses, and ID numbers, and holding companies frequently share nearly identical names with their subsidiaries. To avoid merging false positives together, our resolution rules for companies are therefore more conservative than for people.

# Person name and date of birth

Victoria Caroline Beckham, born in April 1974, appears in three UK corporate data sources. Graph draws a Possibly Same As relationship between the three based on their shared name and date of birth.

# Notes

  • May be generated with either a dd-mm-yyyy or mm-yyyy birthday type.
  • The dates may have a different format (e.g. 1 April 1974 and April 1974), but may not conflict.

# Name + reference target match

Three people named Jose Mauricio Bringas Reyes are mentioned in records about the Mexican company Constructora Reyes. There is not sufficient evidence to resolve the three people. Graph draws a Possibly Same As relationship between the three based on their shared name and reference in records mentioning the same company.

# Notes

  • This is a weaker version of the ‘Relationship-target pair’ resolution rule. In this version, any relationship between the entity in question (Jose) and the reference target (Constructora Reyes) may be different (e.g. shareholder vs. auditor).
  • The entity in question (Jose) need only be mentioned in the same document as the reference target (Constructora Reyes), not necessarily linked to it.

# Co-director grouping

Large Sun Holding Corp. and Freight Forwarders LLC both have three directors named Mohamed Abdallah, Philippe Lasalle, and Peng Yongqing. Graph draws a Possibly Same As relationship between the three based on their mutually shared and co-occurring names tied to separate entities.

# Notes

  • Currently only active in Mexican and Lebanese corporate data sources. We hope to expand the functionality going forward.
  • Requires a minimum of three qualifying entities.
  • Specialized logic exists to avoid matching based on hyper-connected nodes. If KPMG is listed as an Auditor for 30% of all Lebanese companies, for instance, we will not draw Possibly Same As relationships to or from it based on this rule or include it in our grouping calculations.