Lesson 1 – Data sources and data structures
This lesson deals with the import of data sources and the general understanding of data structures.
In today’s world, there are a variety of possible sources of data to be loaded and merged. The most common formats are:
- SQL Databanks (MSSQL, MYSQL, Postgres SQL, …)
- NOSQL Datenbanks (CouchDB, ElasticSearch, NoSQL, …)
- REST – SOAP – XML Web-Interfaces
- Excel, CSV, TXT files
This list can be extended in practice as desired. Fortunately, QlikView provides a “load” option for each data source.
AAttached a “Load-Script” for loading an Excel file:
(biff, embedded labels, table is Sheet1$);
The easiest way to create this syntax is to use the wizards in QlikView. Experienced users use “variables” as a reference to the filename.
You have the option of using <Ctrl-T> Or the menu item “Table structure displays” to get a preview of the imported data. Use this function, especially in “new” data sources to be able to get a faster picture about connections.
When loading data, QlikView operates in a predefined order. The following example with explanations:
//loading of the data starts here
a + b AS c
1 AS a,
2 AS b
In principle, QlikView runs the script from “top to bottom”. However, “blocks” are executed in reverse order. So in the “MyZuneiterBlock” the “lower” script is executed and then the result as “table” to the “upper” block. For SQL-experienced developers, you can imagine this as a sub-select, which is then accessed in the actual SQL.
In the course of the training, we will learn to integrate “more complex” data sources.
Data structures are a technical term which, from a purely technical point of view, describes the storage and storage of data. Here, however, I describe data structures in the sense of data models, which serve to find and define the structure for the data to be processed (in particular for the data to be stored) in the systems. The goal is to know the advantages of the various data models and then to respond to the relevant requirements during data modeling.
The “Star schema” is the most simple and popular data model.
There is a movement table (measures) that uses a foreign key to reference the master tables (dimensions).
This data model provides benefits for performance, simple querying, and writing aggregation functions (pivotalizations), but this type of data model does not provide sufficient means to ensure data integrity.
The “Snowflake Schema” is the most common format for managing data. It is a combination of several stars Schemata which are connected.
Here, too, there is a movement table (measures), which is linked to the master tables (dimensions) by means of foreign keys, which are then again linked via characteristics to further “detail master data”.
Entity Relationship Mapping
Behind this “magic concept” is hidden the simple but also very dangerous “foreign key recognition logic” of QlikView, which assumes that a field with the same name represents a relationship between 2 data tables.
In determining a relation between two tables, there are the following logical connections:
- 1:1 – For each record from table A there exists a record in the table B
- 1:n – For each record from table A, one or more records exist in table B
- m:n – Multiple records from table B exist for multiple records from table A.
Practical examples of the processing of the respective relationship will follow later.