pentaho data integration (kettle) · procedure accounts_dimension query accounts: select a.id,...

17
Pentaho Data Integration (Kettle)

Upload: dodiep

Post on 26-Apr-2018

228 views

Category:

Documents


2 download

TRANSCRIPT

Pentaho Data Integration(Kettle)

www.robertomarchetto.com

PDI Overview (Kettle)

● An entry-level tool for data manipulation (ETL)● PDI (Kettle) reads procedures stored in XML

format● Spoon is a graphical tool used to develop that

procedures● Procedures are designed linking components● Many data sources can be used, JDBC, files,

web services● JavaScript and Java support for complex

routines

www.robertomarchetto.com

Development enviroment

www.robertomarchetto.com

Example, Source database

www.robertomarchetto.com

Example, destination database

www.robertomarchetto.com

Schema comparison

www.robertomarchetto.com

Procedure users_dimension

Query users:

SELECT u.id, CONCAT(u.first_name, ' ', u.last_name) as fullname, u.title FROM users uWHERE u.first_name is not null and u.last_name is not null

www.robertomarchetto.com

Testing

www.robertomarchetto.com

Procedure accounts_dimension

Query accounts:

select a.id, a.name, a.industry, a.billing_address_postalcode, a.billing_address_city, a.billing_address_countryfrom accounts a

www.robertomarchetto.com

Procedure opportunities_fact

Query opportunities:

SELECT o.id, o.date_entered, o.date_closed, o.assigned_user_id, o.sales_stage, o.name, o.amount FROM opportunities o WHERE o.sales_stage in ('Closed Won', 'Closed Lost') ORDER BY o.id

www.robertomarchetto.com

Procedure dates_dimension

www.robertomarchetto.com

Collect procedures in a job

www.robertomarchetto.com

Using JNDI

● Edit JNDI /simple-jndi/jdbc.properties orC:/Documents and Settings/<user>/.pentaho/simple-jndi/default.properties

www.robertomarchetto.com

Running procedures

● Directly from Spoon● From Pentaho BI Suite● Using command line (Kitchen, Pan)

kitchen.bat /file:D:\Jobs\jobname.kjb /level:Basic

● In a clustered enviroment● Using a web services (Carte)

www.robertomarchetto.com

Publishing on Pentaho

www.robertomarchetto.com

Running from Pentaho

www.robertomarchetto.com

Scheduling

● Using Pentaho's scheduler● Using an external scheduler (cron)