Tuesday, November 22, 2016

Name Matching with SQL

You are given a list of names and asked to looked them up in your organization's database. Manual look-up is tedious and time-consuming. 

In this code example I am matching names in a random list against first and last names in our organization's database (ourdb). This solution works as well as any.

with list as (
select first_name,
       last_name,
       first_name||n.last_name n1
  from random_list
  ),

ourdb as (
select id_number,
       first_name,
       last_name,
       city,
       state,
       first_name||a.last_name n2
  from entity
where person_or_org = 'P' --exclude orgs
   )
  
select t.*,
      ourdb.id_number,
      ourdb.first_name,
      ourdb.last_name,
      ourdb.city,
      ourdb.state,
      soundex(n1) as sdx_n1,
      soundex(n2) as sdx_n2,
      utl_match.edit_distance_similarity(n1, n2) as ed,
      utl_match.jaro_winkler_similarity(n1, n2) as jw

from ourdb, list t

where utl_match.jaro_winkler_similarity(n1, n2) > 95 order by sort_order asc, ed desc

These algorithms work pretty well with some futzing. You will likely want to tweak the code, altering the similarity value (number in bold above, "95"), or replacing the other algorithm after the "where" clause, to hone in on potential candidates. (WARNING: you will want to keep that number pretty high, unless you want your query to run for days.)

Fields in your database such as "city" and "state" (or even an engagement or connectivity score to your organization) can help you to identify the right individuals.

https://docs.oracle.com/cd/E18283_01/appdev.112/e16760/u_match.htm

No comments:

Post a Comment