Deduplication

iLoader has the ability to perform a limited dedupe operation when loading data.

These are the parameters that can be added to a table declaration in order to invoke the deduping process.

Dedupe_Table

Required. You must specify the temporary deduping table that will be used, in the form [database].[table].

Dedupe_Selection

Optional. You can specify which records from the deduping table are to be used in the data load, for example, Dedupe_Selection = FIELD1 = 'FEMALE'.

Dedupe_Sort

Optional. This is the field upon which the data is sorted in order to obtain the first discrete values in the required order. It is the short field name without square brackets.

Dedupe_Order

Optional. If a Dedupe_ID has been specified, this parameter specifies the type of sort to be performed on the ID column before selecting the first record for each unique key in the deduping table. This must be ASC or DESC. The default value is ASC.

Dedupe_Key

Optional. If a composite key field is being used for the update then this parameter specifies how the composite key should be created on the working table. In addition, a key field must be set in the table definition file as this will be required to perform the FirstDiscrete function which selects only the first occurrence of each key in the sorted list.

Example

Firstly, the data is loaded into the dedupe table. A sub selection can be performed here by adding a Dedupe_Selection parameter (not present in this example).

The Dedupe_Order is Desc so the table is sorted by descending order on the Dedupe_Sort field (the final field in the example). A FirstDiscrete is then performed to select each first instance of the key field which is the first column in this instance. This data is then exported to a data file which is then used to perform the final load.

The Table Declaration for the example described above would look as follows. Note that the script is created as usual with the table declaration and definition wizards. The additional lines are manually added to change the load from a standard data file list update into a dedupe update.

BEGIN TABLE_DECLARATION

format = DELIMITED

delimiter = COMMA

qualifier = NONE

scriptfile = CustomerUpdate.txt

datfilelist = Updatelist.txt

Dedupe_table = [test].[dedupe_table]

Dedupe_Sort = FID

Dedupe_Order = Desc

type = UPDATE

width = 5

date = 03032004

END TABLE_DECLARATION

TABLE = [test].[people]

KEY = Customer_ID

BEGIN TABLE_DEFINITION

Customer_ID , TEXT , 0 , 4

Surname , TEXT , 1 , 20

Title , TEXT , 2 , 10

Category , TEXT , 3 , 1

FID , INTEGER , 4 , 5

END TABLE_DEFINITION

Additionally, the following line could be added to the table declaration script to select on records in the update file which fall into category C.

	Online & Instructor-Led Courses \| Training Videos \| Webinar Recordings
	© Alterian. All Rights Reserved. \| Privacy Policy \| Legal Notice