If all of your data consisted of a single source of records that were complete and unambiguous, it would be relatively simple task to solve it.
In the real world, however, the picture is normally very different. Data is typically far from complete, frequently ambiguous, and often scattered over many different data sources, recording many different attributes with few overlapping fields. Collecting data from all the different sources into a single, central storage area is somеtimes impossible to complete it in a reasonable timeframe.
In the case we have company data from different systems and time periods. We need to find the model that assign one and the same company ID to the two or more records assuming it is the same Entity.
555 1ST FRANKLIN FINANCIAL
555 1ST FRANKLIN FINANCIAL CORP
555 1ST FRANKLIN FINANCIAL CORP.
555 1ST FRANKLIN FINANCIAL CORPORATION
556 THE FRANCLIN COMPANY
You could use all available public company info to add to give data sets and build reasonable algorithms to predict unique ID and assign it to all available records.