This short lesson concentrates on typical questions from practice and their possible solutions.
Marking of duplicates
In practice it often happens that “duplicates” for one information exist but you only want to consider the “newest” dataset. To prevent this problem, QlikView offers so called “interrecord-functions” within scripts. These allow to access previous or following datasets during the loadtime of a source. The solution for our problem would look like this:
//in the first step we load the tables without "change" RawTable: LOAD * FROM [$(PATH_DATASTAGING)TestSource.qvd]; //afterwards we sort them after our duplicate key and e.g. a time infomration SortedTable: NOCONCATENATE LOAD * RESIDENT RawTable ORDER BY %DublettenId, %Date; //In order to prevent circular references, we remove the old table DROP TABLE RawTable; //afterwards we compare the key and mark our dataset FinalTable: NOCONCATENATE LOAD * ,IF(PREVIOUS(%DublettenId) = %DublettenId,0,1) AS FLAG_VALID RESIDENT SortedTable; //at last we remove the table again in order to prevent circular references DROP TABLE SortedTable;
Dissolving of “multiple” keys
In practice it often happens that a relation between 2 tables exists on multiple fields. To improve performance and to keep the datamodel “clean”, it is worth it to combine those keys with a simple trick.
//Initial situation Table1: LOAD %CostcenterId, %ProductId, ... RESIDENT Table1; Table2: LOAD %CostcenterId, %ProductId, ... RESIDENT Table2; //we create a combined key to enhance the performance Table1: LOAD %CostcenterId & 'xx' & %ProductId AS %Connection, ..
Classification of data
Assign information based on a “map”. The following requirements on the map apply:
- The assignment table exists of 2 columns, the first one holds the comparison value, the second one the future “class value”.
- Both columns have to be named, although the name is not relevant later on. If you forget to name e.g. an inline-table, then the first value will never generate a hit.
- When using the assignment table, the values of a certain field will be compared and if it leads to a hit, the assignment value will be “replaced” otherwise the original value persists.
If you are asking yourself why we don’t drag the data via Join into the datamodel, then the answer is point 3 of the just now mentioned points. A JOIN would always enlarge the table and make it impossible to “keep” the original value.
Example with the use of an inline-table:
//at first we build our inline-table Discount Deadlines: LOAD * INLINE [ Discount, Deadline VZ1Y, 1 Year VZ2Y, 2 Year B1Y, 1 Year ]; //afterwards we build a mapping table Discount_Mapping: MAPPING LOAD Discount, Deadline RESIDENT Discount Deadlines; //don't forget to delete the table DROP TABLE Discount Deadlines; //The card thus created can now be placed on the orders Orders: LOAD *, applymap('Discount_Mapping',Discount,Discount) AS Discount Deadline ; SQL SELECT * FROM Orders;
Classification of values based on a lower and upper value. This function however can also be used for more complex questions like a time course analysis. To understand the principle behind the function, it is certainly easier to just use the example of a real estate agency, which wants to make a classification based on size and living space.
//at first we load the apartments Apartments: LOAD ApartmentId, Living space, Postcode; SQL SELECT * FROM Apartments WHERE Living space > 0; //Loading of the sizegroups Sizegroups: LOAD * INLINE [ VonQM, BisQM, SizeGroup 0, 10, small 10.0001, 40, intermediate 40.0001, 100, big 100.0001, 99999, gigantic ]; //now we assign the sizegroups with the living space IVMatch: IntervalMatch(Living space) LOAD DISTINCT VonQM, BisQM RESIDENT Sizegroups;
Can also be used for events and leads to the following datamodel in the background:
Performance – Linking of moving tables
The linking of moving tables – also with different column amounts and fields – gives, as long as all key fields are “overlapped”, a big performance improvement. As long as all key fields are included it is harmless for the later drilling in detail. Should however not all key fields be overlapping it is still possible, but the drilling can lead to breaks in detail, for which one might have to “explain” themselves.
Attention: QlikView atutomatically merges tables with the same structure. Which can lead to big problems, but also creates great opportunities. I will show you how to control this behaviour with the following example
//automatic merge LOAD a, b, c FROM Table1.csv; LOAD a, c, b FROM Table2.csv; //manual merge LOAD a, b, c FROM Table1.csv; CONCATENATE LOAD a, c FROM Table2.csv; //manual "not" merge LOAD a, b, c FROM Table1.csv; NOCONCATENATE LOAD a, b, c FROM Table2.csv;
In the last case you should keep working and change the table because there is a high risk of your datamodel to “burst”.
Execution of a function for every line of a table
QlikView permits you to create certain functions on your own and to request them with parameters. The easiest example is the writing of a message into a Log-file. (Just concerning the technique, don’t question the use!)
//Definition of the function SUB Qvc.Log (_msg, _level) ... END SUB; //here we load the initial table Persons: LOAD id, name FROM Table1.csv; //how many lines the table has LET varNoOfRecords = NoOfRows('Personen'); //creation of an empty event table LOAD 'F' as DummyField AUTOGENERATE (1); //start of a loop through the table FOR ii = 0 TO $(varNoOfRecords) LET varName = peek('name',$(ii),'Persons'); Qvc.Log('Hello $(varName)'); NEXT;