Data Warehouse with Kettle (Pentaho Data Integration)

 

Course Introduction

Nowadays information processing has leveraged into two wide area of organizational activities, operation and decision-oriented analysis.  The later problem – mostly required by strategic levels – currently falls into a growingly popular IT solutions, i.e. Business Intelligence (BI).

Online Analytical Processing (OLAP) as part of BI denotes descriptive modeling for analysis based decision-oriented information processing. It is an approach to quickly answer multi-dimensional analytical queries.
Mondrian / Pentaho Analysis is a free open source OLAP Server written in Java. With its constant growth of developers and users adoption around the world, it is now one of the most affordable and popular OLAP solution.

Pair with JPivot you can access Mondrian’s OLAP data in web based manner with unlimited clients. Thus you can easily maintained and deployed OLAP solutions in your organizations.

Who Should Attend ?

This course is designed for business analysts, IT developer, database administrator and those who new and interest to OLAP and Pentaho Analysis.

Especially those who already implement ERP (Enterprise Resource Planning), CRM (Customer Relationship Management), SCM (Supply Chain Managemen), HRM (Human Resource Management), and other system which had huge data and need to be analysed in multi dimensional way.

Objectives

At the completion of this course, attendee should be able to :

  • understand basic concepts of OLAP and what kinds of problems can it solved
  • understand multi dimensional modeling like cubes, dimension and measures
  • develop OLAP Schemas for use with Pentaho Analysis
  • integrate Pentaho Analysis solution into Pentaho BI Platform (BI Server)
  • develop OLAP security / user access control

 

Course Requirements

  • PC or Laptop with minium of 2GHz CPU, 1 GB of RAM, DVD Drive and 2 GB of available hard disk space.
  • Softwares :
    1. Microsoft Windows XP Pro
    2. Java Runtime Environment (JRE)
    3. Apache Tomcat
    4. MySQL 5.0 Database Server
    5. Pentaho Data Integration
    6. Pentaho Data Analysis (Mondrian)

Course Outline

  1. Introduction to Data Warehouse
    • Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP)
    • Data Warehouse
    • Data Warehouse and OLAP
    • Delivering Solution with ETL (Extract, Transform, Load) Tool
  2. Pentaho Data  Integration (Kettle)
    • Introduction to Kettle as ETL
    • Architecture
    • Components
      • Spoon : Graphical UI Designer for job / transformation steps
      • Pan : Command line batch script for transformation execution
      • Kitchen : Command line batch script for transformation execution
      • Carte : Cluster server
  3. Installation and Configuration
    • Java Runtime
    • Pentaho Data Integration
    • Apache Web Server and MySQL  using XAMPP package
    • Apache Tomcat
    • Pentaho Analysis Service (Mondrian)
  4. Getting Started with Spoon
    • File system and RDBMS based Repository
    • Spoon Development Environment
    • Database Connections
    • Job and Transformation
      • Creating first job
      • Creating first transformation
      • Calling transformation from job
    • Job / Transformation Steps and Hops
    • Metadata
    • Log
  5. Short Introduction to MySQL
    • MySQL Storage Engines
    • Administering MySQL via PHPMyAdmin
  6. Dimensional Modelling
    • Normalized versus Dimensional Model
    • Fact Tables
    • Dimension Tables
    • Task :
      • Create a Kettle  transformation  to map PHI-Minimart  transactional database sample to dimensional modeling database
      • Create logs for each steps
  7. Hands On : Viewing OLAP Data using Mondrian / JPivot
    • Mondrian Installation
    • Creating scheme based on our fact and dimension tables
    • View and navigate our Cube using Web Browser
  8. Data Staging
    • What is Data Staging?
    • Background : Physical I/O  versus In-Memory Processing
    • Task :
      • Create  a  transformation  to  join  from  3  data  sources  :  text  file,  Excel
        spreadsheet, and RDBMS
      • Create a same functional transformation using staging database
  9. Advance Controls
    • Environment Variables
    • Shared Objects
    • Error Handling
    • Email job results
    • Task :
      • Refining existing transformations to use email alert
  10. Slowly Changing Dimension (SCD)
    • SCD Types
  11. Automation
    • Using Cron / Windows Task Sheduler to schedule our ETL job

 

About Administrator

ok deh
This entry was posted in Komputer. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s