# Ontology Resolution Overview

When we identify entities in public records, we extract as much information as we can about them, from dates of birth to corporate ID numbers, addresses, and additional names. In the best circumstances, a given record will provide a unique, consistently formatted identification number that appears in other records and allows us to track a particular entity from one data source/country to another.

In reality, many data sources do not provide this level of detail. To account for this, we use a set of entity resolution rules - developed jointly by Sayari data engineers and analysts - that determine when we merge two entities based on shared characteristics. Each time we add new data to Graph, entities that look similar are checked against these rules and our profiles are resolved again, allowing the database to improve with each data refresh cycle.

In some cases, however, we may not have enough information to determine whether two similar profiles are the same. In order to avoid merging false positives (and missing potential false negatives), we have a separate set of rules - short of the evidence needed for resolution - that, when met, generate a Possibly Same As relationship between two or more entities.

The tables below contains our rules for entity resolution and for the generation of Possibly Same As relationships. Each rule is sufficient to allow a resolution/relationship generation by itself.

# Entity Resolution Rules

# Unique identifier

Andrés Manuel López Obrador has RFC (Mexican tax ID number) LOOA531113F15. Because RFC numbers are unique identifiers, any person sharing this RFC number will be merged with his profile.

# Notes

  • Identifiers qualify as unique if they apply to only one entity. Examples include U.S. social security numbers, Mexican RFC numbers, and Chinese company names. Non-unique identifiers are, in isolation, rarely used for entity resolution.
  • This is transitive. If Andrés Manuel has another unique identifier shared with another profile, all three are merged.
  • Merger occurs regardless of entity name.
  • Entity type acts as a fail-safe. People will not be merged with companies.
  • Graph will not resolve between otherwise-identical identifiers of different types (e.g. Peru company ID 12345 and Togo company ID 12345).

# Chinese company name

If two companies with names comprised of only eight or more Chinese characters are the same, the companies will be merged.

# Relationship-target pair

Lumber LLC has a unique identifier, and a document shows that Jack Lumber (no ID) is a director. If another person named Jack Lumber appears in another document as a director of Lumber LLC, the two Jack Lumber profiles are merged.

# Notes

  • Merger occurs regardless of whether Jack Lumber has an identifier or not.
  • Entity type, relationship type, and relationship directionality are taken into consideration.
  • If the two Jack Lumbers have different unique identifiers (UK person numbers excluded), the merger will not occur.

# Whitelisted non-unique identifier and name

Mohammed Chams (unknown passport 12345) is linked to Trade Shipments, LLC in China. Mohammed is merged with two other people named Mohammed Chams (unknown passport 12345) linked to five other companies in different countries.

# Notes

  • This is currently used for a few select identifier types, including partially redacted CPFs in Brazil and passports issued by an unknown country.
  • This process involves name component sorting (standardizes case, white space, and order).

# Person name and address

Florida Company A, Paraguay Company B, and Luxembourg Company C are linked to an individual named Zhang Wei at 123 Oak St. #56, London, England. The three Zhang Wei profiles are merged.

# Notes

  • This process involves name component sorting (standardizes case, white space, and order).
  • Address matching takes place in two stages. First, the address postal code, house number, or road name must match along with the person’s name. Second, if any of those is true, we compare the full address.

# Person name and contact information

Alfonso Nicolau in Record A reports his phone number as 802-497-2211. In Record B, another person with the same name reports the same phone number. The two Alfonso Nicolau profiles are merged.

# Notes

  • This currently applies to phone and fax numbers.
  • Poor quality phone numbers, e.g. 00000, are dropped from our database to avoid false positives.

# Company name in a trade data source

Two companies called "Acme Group Trading, LLC" appear on bills of lading as shipment counter-parties in separate trade data records from the U.S. and Colombia. The two profiles for Acme Group Trading, LLC are merged.

# Notes

  • The name must be a minimum of either six characters or three tokens (e.g. 'Do Re Mi') in length.

# Possibly Same As Relationship Generation Rules

# Company name and address

American Company Inc. (referenced in corporate data from Florida) and American Company Inc. (referenced in tax data from Paraguay) are both located at 350 Oak Rd., Miami Beach, FL. Graph draws a Possibly Same As relationship between the two based on their shared name and address.

# Notes

  • We currently resolve and merge people based on a shared name and address, but not companies. While people rarely change their most common identifiers (name, date of birth, ID number), companies often change names, legal structures, addresses, and ID numbers, and holding companies frequently share nearly identical names with their subsidiaries. To avoid merging false positives together, our resolution rules for companies are therefore more conservative than for people.

# Person name and date of birth

Victoria Caroline Beckham, born in April 1974, appears in three UK corporate data sources. Graph draws a Possibly Same As relationship between the three based on their shared name and date of birth.

# Notes

  • May be generated with either a dd-mm-yyyy or mm-yyyy birthday type.
  • The dates may have a different format (e.g. 1 April 1974 and April 1974), but may not conflict.

# Name + reference target match

Three people named Jose Mauricio Bringas Reyes are mentioned in records about the Mexican company Constructora Reyes. There is not sufficient evidence to resolve the three people. Graph draws a Possibly Same As relationship between the three based on their shared name and reference in records mentioning the same company.

# Notes

  • This is a weaker version of the ‘Relationship-target pair’ resolution rule. In this version, any relationship between the entity in question (Jose) and the reference target (Constructora Reyes) may be different (e.g. shareholder vs. auditor).
  • The entity in question (Jose) need only be mentioned in the same document as the reference target (Constructora Reyes), not necessarily linked to it.

# Co-director grouping

Large Sun Holding Corp. and Freight Forwarders LLC both have three directors named Mohamed Abdallah, Philippe Lasalle, and Peng Yongqing. Graph draws a Possibly Same As relationship between the three based on their mutually shared and co-occurring names tied to separate entities.

# Notes

  • Currently active in Mexican, Lebanese, and Thai corporate data sources. We hope to expand the functionality going forward.
  • Requires a minimum of two to three qualifying entities.
  • Specialized logic exists to avoid matching based on hyper-connected nodes. If KPMG is listed as an Auditor for 30% of all Lebanese companies, for instance, we will not draw Possibly Same As relationships to or from it based on this rule or include it in our grouping calculations.