Category Archives: Pentaho
Pentaho 9.1 is available!
Pentaho 9.1 is available
It’s that time of the year! A new release is available!
Go get EE through the support portal, and CE in the usual place!
Main features
· Google Data Proc Support
· Catalog Steps in Spoon
· New Upgrade Utility
· And a bunch of consolidation stuff:
o 20+ Continuous Improvements
o 10+ Platform Updates
o 200+ Performance/quality bugs
Google Data Proc
You can now access and process data from a Google Dataproc cluster in PDI. Google Dataproc is a cloud-native Spark and Hadoop managed service that has built-in integration with other Google Cloud Platform services, such as BigQuery and Cloud Storage. With PDI and Google Dataproc, you can migrate from on-premise to the Google Cloud.
You can use PDI’s Google Dataproc driver and named connection feature to access data on your Google Dataproc cluster as you would other Hadoop clusters, like Cloudera and Amazon EMR. See Set up the Pentaho Server to connect to a Hadoop cluster for further instructions.
§ What’s New
‒ New Hadoop driver
‒ AEL-Spark support
§ Version:
‒ Google Dataproc – 1.4 (Ubuntu 18.04 LTS, Hadoop 2.9, Spark 2.4)
§ Benefit
‒ Enables processing large data sets in Google Data Proc clusters
‒ On-premise data movement/migration
§ Hadoop Driver supports the following:
‒ Multi-cluster
‒ HDFS
‒ Hive
‒ PMR Hive
‒ Oozie
‒ Sqoop
‒ Hadoop Job Executor
‒ Pig
‒ Parquet / Avro / ORC
§ VFS support for GCS
§ Hbase is not supported
Lumada Data Catalog steps for PDI
Lumada Data Catalog lets data engineers, data scientists, and business users accelerate metadata discovery and data categorization, and permits data stewards to manage sensitive data. Data Catalog collects metadata for various types of data assets and points to the asset’s location in storage. Data assets registered in Data Catalog are known as data resources.
You can use the folllowing four new PDI steps to work with Data Catalog metadata and data resources within your PDI transformations:
Search Data Catalog’s existing metadata for specific data resources, including their storage location.
Revise the existing Data Catalog tags associated with an existing data resource.
Reads the CSV text file types or Parquet data formats of a Data Catalog data resource that is stored in a Hadoop or S3 ecosystem and outputs the data payload in the form of rows to use in a transformation.
Encodes CSV text file types or Parquet data formats using the schema defined in PDI to create a new data resource or to replace or update an existing data resource in Data Catalog.
New Upgrade utility
• Current Scope:
‒ Scope only 9.0 to 9.1 (coming later: will extend to 8.3 LTS)
• Reliable upgrades and rollback
‒ Initial environment check to detect product component and will only upgrade what is there
‒ White list to persist customization
‒ Will persist all plug-ins across upgrade
‒ Automatically whitelist all database driver jars
Compatibility Updates
This:
Other improvements:
This:
§ Data Integration
‒ S3 Multipart Upload now allow configurable part sizes (PDI-16606)
‒ MongoDB Plug-in now allows PLAIN credentials for LDAP integration (PDI-17228)
§ Dashboards / Reporting
‒ 10-100x performance improvement for certain large slices and roll-ups for Mondrian Cubes (JIRA Link)
‒ Option to remove/hide the filter panel when used in a dashboard (ANALYZER-2270)
‒ Count and Count Distinct Summary on currency fields uses the default format (PIR-699)
‒ Admins can now customize the template(s) used for exporting to PDF and Excel (ANALYZER-12)
§ Platform
‒ Passwords stored in the BA Server config files and repository are now encrypted (BISERVER-3497)
‒ Users are now able to change their own password (BISERVER-13699)
-pedro
Pentaho 9.0 is available
Pentaho 9.0 is available
Without further ado: Get Enterprise Edition here, and get Community Edition here
PDI Multi Cluster Hadoop Integration
Capability
Pentaho connects to Cloudera Distribution for Hadoop (CDH), Hortonworks Data Platform (HDP), Amazon Elastic MapReduce (EMR). Pentaho also supports many related services such as HDFS, HBase, Oozie, Zookeeper, and Spark.
Before this release, Pentaho Server as well as PDI design time environment – Spoon, can work with only one Hadoop cluster at a time. It required multiple transformations, instances, and pipelines to execute multiple Hadoop clusters. With 9.0 release, major architecture changes have occurred to easily configure, connect and manage multiple Hadoop clusters.
· Users can access and process data from multiple Hadoop clusters from different distros and versions- all from single transformation and instance of Pentaho.
· Also, within Spoon, users can now set up three distinct cluster configs, all having reference to the specific cluster, without having to restart Spoon. There is also a new configuration UI to easily configure your Hadoop drivers for managing different clusters.
· Improved cluster configuration experience and secure connection with the new UI
· Supports following distros: Hortonworks HDP v3.0, 3.1; Cloudera CDH v6.1, 6.2; Amazon EMR v5.21, 5.24.
Use Cases and Benefits
· Enables hybrid big data processing support (on-prem or cloud)- all within single pipeline
· Simplifies Pentaho’s integration with Hadoop clusters including enhanced UX of cluster configurations
Key Considerations
· Adaptive Execution Layer Spark isn’t validated to execute pipelines connecting to multiple Hadoop clusters.
· Pentaho Map Reduce isn’t validated to execute pipelines connecting to multiple Hadoop clusters.
Additional Resources
See Adding a new driver for how to add a driver. See Connecting to a Hadoop cluster with the PDI client for how to create a named connection.
· Follow the suggestions in the Big Data issues troubleshooting sections to help resolve common issues when working with Big Data and Hadoop, especially Legacy mode activated when named cluster configuration cannot be located.
PDI AEL-Spark Enhancements
Capability
The Pentaho Adaptive Execution Layer (AEL) is intended to provide flexible and transparent data processing with Spark, in addition to the native Kettle engine. The goal of AEL is to develop complex pipelines visually and then execute in Kettle or Spark based on data volume and SLA requirements. AEL allows PDI users to designate Spark as execution engine for their transformations apart from Kettle.
The v9.0.0 release includes the following performance and flexibility enhancements to AEL-Spark:
· Step level spark specific performance tuning options
· Enhanced logging configuration and information entered into PDI logs
Use Cases and Benefits
· Eliminates black box feel with better visibility
· Enable advanced Spark users with tools to improve performance
Key Considerations
Users must be aware of the following additional items related to AEL v9.0.0:
· Spark v2.2 is not supported.
· Native HBase steps are only available for CDH and HDP distributions.
· Spark 2.4 is the highest Spark version currently supported.
Additional Resources
See the following documentation for more details: About Spark Tuning in PDI, Setup Spark Tuning, Configuring Application Tuning Parameters for Spark
Virtual File System (VFS) Enhancements
Capability
· The changes to the VFS are in two main areas:
· 1. We added Amazon S3 and Snowflake Staging as VFS providers to named VFS Connections and introduced the Pentaho VFS (pvfs) that can reference defined VFS Connections and their protocols. In the S3 protocol, we support S3A and Session Tokens in 9.0.
· The general format of a Pentaho VFS URL is:
pvfs://VFS_Connection/path (including a namespace, bucket or similar)
Use Cases and Benefits
· Through the support of Pentaho VFS, you have an abstraction of the protocol. That means, when you want to change your provider in the future, all your jobs and transformations work seamless after this change in the VFS Connection. Today, you reference S3. Tomorrow, you want to reference another provider, for example HCP or Google Cloud. Using Pentaho VFS, your maintenance burden in these cases is much lower.
· VFS Connections also enables you to use different accounts and servers (including namespaces, buckets or similar) within one PDI transformation. Example: You want to process data within one transformation from S3 with different buckets and accounts.
· Combining named VFS connections with the new file browsing experience provides a convenient way to easily access remote locations and extend the reach of PDI. The new file browser also offers the ability to manage files across those remote locations. For example, a user can easily copy files from Google Cloud into an S3 bucket using the browser’s copy and paste capabilities. A user can then easily reference those files using their named connections, in supported steps and job entries.
A user can manage all files, whether they are local or remote in a central location. For example, there is no need to login to the Amazon S3 Management Console to create folders, rename, delete, move or copy files. Even a copy between the local filesystem and S3 is possible and you can upload/download files from within Spoon.
The new file browser also offers capabilities such as search, which allows a user to find filenames which match a specified search string. The file browser also remembers a user’s most recently accessed jobs and transformations for easy reference.
Key Considerations
As of PDI 9.0, the following protocols are supported: Amazon S3, Snowflake Staging (read only), HCP, Google CS
The following steps and job entries have been updated to use the new file open save dialog for 9.0: Avro input, Avro output, Bulk load into MSSQL, Bulk load into MySQL, Bulk load from MySQL, CSV File Input, De-serialize from file, Fixed File Input, Get data from XML, Get file names, Get files rows count, Get subfolder names, Google Analytics, GZip CSV input, Job (job entry), JSON Input, JSON Output, ORC input, ORC output, Parquet Input, Parquet output, Text file output, Transformation (job entry)
The File / Open dialog is still using the old browsing dialog. The new VFS browser for opening jobs and transformations can be reached through the File / Open URL menu entry.
Additional Resources
See Virtual File System connections, Apache Supported File Systems and Open a transformation for more information.
Cobol copybook steps
Capability
PDI now has two transformation steps that can be used to read mainframe records from a file and transform them into PDI rows.
· Copybook input: This step reads the mainframe binary data files that were originally created using the copybook definition file and outputs the converted data to the PDI stream for use in transformations.
· Read metadata from Copybook: This step reads the metadata of a copybook definition file to use with ETL Metadata Injection in PDI.
Use Cases and Benefits
Pentaho Data Integration supports simplified integration with fixed-length records in mainframe binary data files, so that more users can ingest, integrate, and blend mainframe data as part of their data integration pipelines. This capability is critical if your business relies on massive amounts of customer and transactional datasets generated in mainframes that you want to search and query to create reports.
Key Considerations
This step works with Fixed Length COBOL records only. Variable record types such as VB, VBS, OCCURS DEPENDING ON are not supported.
Additional Resources
For more information about using copybook steps in PDI, see Copybook steps in PDI
Additional Enhancements
New Pentaho Server Upgrade Installer
The Pentaho Server Upgrade Installer is an easy to use graphical user interface that automatically applies the new release version to your archive installation of the Pentaho Server. You can upgrade versions 7.1 and later of the Pentaho Server directly to version 9.0 using this simplified upgrade process via the user interface of the Pentaho Server Upgrade Installer.
See Upgradethe Pentaho Server for instructions.
Snowflake Bulk Loader improvement
The Snowflake Bulk Loader has added support for doing a table preview in PDI 9.0. When connected to Snowflake and on the Output tab, select a table in the drop-down menu. The preview window is populated, showing the columns and data types associated with that table. The user can see the expected column layout and data types to match up with the data file.
For more information, please see the job entry documentation of the Snowflake Bulk Loader.
Redshift IAM security support and Bulk load improvements
With this release, you have more Redshift Database Connection Authentication Choices, these are
· Standard credentials (default) – user password
· IAM credentials
· Profile located on local drive in AWS credentials file
Bulk load into Amazon Redshift enhancements: New Options tab and Columns option in the Output tab of the Bulk load into Amazon Redshift PDI entry. Use the settings on the Options tab to indicate if all the existing data in the database table should be removed before bulk loading. Use the Columns option to preview the column names and associated data types within your selected database table.
See Bulk load into Amazon Redshift for more information.
Improvements in AMQP and UX changes in Kinesis
The AMQP Consumer step provides Binary message support, for example allowing to process AVRO formatted data.
Within the Kinesis Consumer step, users can change the output field names and types.
See the documentation of the AMQP Consumer and Kinesis Consumer steps for more details.
Metadata Injection (MDI) Improvements
In PDI 9.0.0, we continue to enable more steps to support metadata injection (MDI):
· Split Field to Rows
· Delete
· String operations
In the Excel Writer step, the missing MDI step option “Start writing at cell”, has been added. This option can also be injected now.
Additionally, the metadata injection example is now available in the samples folder:
/samples/transformations/metadata-injection-example
See ETL metadata injection for more details.
Excel Writer: Performance improvement
The performance of the Excel Writer has been drastically improved when using templates. A sample test file with 40,000 rows needed about 90 seconds before 9.0 and now processes in about 5 seconds.
For further details, please see PDI-18422.
JMS Consumer changes
In PDI 9.0, we added the following fields to the JMS Consumer step: MessageID, JMS timestamp and JMS Redelivered.
This addition enables restartability and allows to omit duplicate messages.
For further details, please see PDI-18104 and the step documentation.
Text file output: Header support with AEL
You can set up the Text file input step to run on the Spark engine via AEL. The Header option of the Text file output step works now with AEL.
For further details, please see PDI-18083 and the Using the Text File Output step on the Spark engine documentation.
Transformation & Job Executor steps, Transformation & Job entries: UX improvement
Before 9.0, when passing parameters to transformations/jobs, the options “Stream column name” vs. “Value” (“Field to use” vs. “Static input value”) were ambiguous and led to hard to find issues.
In 9.0, we added behavior which prevents a user from entering values into both fields to avoid these situations.
For further details, please see PDI-17974.
Spoon.sh Exit code improvement
Spoon.sh (that gets called by kitchen.sh or pan.sh) sends the wrong exit status in certain situations.
In 9.0, we added a new environment variable FILTER_GTK_WARNINGS to control this behavior for warnings that effect the exit code. If the variable is set to anything, then a filter is applied to ignore any GTK warnings. If you don’t want to filter any warnings, then unset FILTER_GTK_WARNINGS.
For further details, please see PDI-17271.
Dashboard: Option for exporting analyzer report into CSV format.
Now it’s possible to export an analyzer report into a CSF format file even when embedded on a dashboard.
In the previous release the export option was available, but without the CSV format.
The CSV format was available only when using Analyzer outside dashboards, in this way we provide functional parity between Analyzer standalone charts and charts embedded in dashboards.
For further details, please see PDB-1327.
Analyzer: Use of date picker when selecting ranges for a Fiscal Date level relative filter.
Before 9.0 and for an AnalyzerFiscalDateFormat annotation on a level in a Time dimension, Analyzer did not show the “Select from date picker” link.
Now, relative dates can be looked up from the current date on the Date level, then the date picker can also be used to select the nearest fiscal time period.
For further details, please see ANALYZER-3149.
Mondrian: Option for setting the ‘cellBatchSize’ default value.
From a default installation the mondrian.properties does not include mondrian.rolap.cellBatchSize as a configurable property.
The purpose of this improvement is to include this property in the mondrian.properties by default in new builds so customers do not run into performance issues due to the default value for this property being set too low. The default value of the property should be clearly indicated in the properties file as well.
The default value has been updated to mondrian.rolap.cellBatchSize=1000000.
This value was chosen because this setting can run a very large 25M cell space report while keeping total server memory usage around 6.7 GB which is under the 8GB we list as the minimum memory required on a Pentaho server.
Home
This website contains links to useful resources concerning the Kettle open source data integration project. Go to the Kettle blog: Matt Casters on Data Integration and Graphs Download the nightly built Kettle REMIX version 8.2.0.1 (>1GB) Neo4j plugins for Kettle EnvironmentBlog
Downloads
Download the nightly built Kettle REMIX version 8.2.0.1 with initial Apache Beam support (UNSTABLE, >1GB)
WebSpoon docker image with Neo4j solutions plugins
Kettle plugins for Neo4j
Kettle Neo4j Logging
Kettle plugins
Needful Things
Data sets and unit testing
Debugging
Load Text From File (Apache Tika)
Metastore utilities
Read from MongoDB changes stream
Azure Event Hubs
Kettle integration with Apache Beam
Pentaho 8.2 is available!
I’ve come to accept my inefficiency on keeping up with the technical blog posts. This is the point where one accepts his complete uselessness (and I don’t even know if this is a real word!)
Anyway – up to the good things:
Pentaho 8.2 is available!
Release overview
- Enhance Eco System Integration
-
- Hitachi Content Platform (HCP) Connector I
- MapR DB Support
- Google Encryption Support
- Improve Edge to Cloud Processing
-
- Enhanced AEL
- Streaming AMQP
- Better Data Operation
-
- Expanded Lineage
- Status Monitoring UX
- OpenJDK support
- Enable Data Science & Visualization
-
- Python Executor
- PDI Data Science Notebook (Jupyter) Integration
- Push Streaming
- Improve Platform Stability and Usability
-
- JSON Enhancements
- BA Chinese Language Localization for PUC
- Expanded MDI
- Additional Improvements
Ecosystem Integration
Hitachi Content Platform (HCP) Connectivity
- Globally Compliant Retention Platform (GCRP)
-
- Meet Compliance & Legal Retention requirements (WORM, SEC 17A-4, CFTC and MSRB)
- Secure Analytics Archive
-
- Big data source/target (land) for secure analytic workflows
- Better Data portability
- Multi-tenant
- Protect data with much higher durability (up to fifteen 9s) and availability (up to ten 9s) with HCP
MapR DB support
Google Cloud Encryption
Edge to Cloud Processing
Adaptive Execution Layer (AEL) Improvements
- Support for execution of MDI driven transformation via “ETL Metadata Injection” step
- Support for sub-transformation steps Simple Mapping/Mapping (Transformation Executor was already supported)
- Native Spark implementation for HBase and Hive
- Support for S3 Cloud storage from AEL with native integration

Streaming AMQP
Data Operations
Lineage improvements
- What’s New
-
- Architecture improvements for 3rd party lineage bridges (like IGC)
- Add step and job entry “description” fields to lineage data output
- Continued upgrading to Custom Lineage Analyzers for the following steps and job entries: Hadoop File Input & Output, Spark Submit, Mapping (sub-transformation), ETL Metadata Injection step (added relation to the sub-transformation being executed)
- Benefits
-
- Better and easier integration of 3rd party lineage bridges also for future partnering
- Improve of using lineage information for documentation and compliance use cases
- Expand completeness of data lineage steps and job entries
Monitoring Status Page Update
OpenJDK support
Advanced Analytics and Visualizations
Python Executor
- Automated ability to Get fields from Python script
- Allows for multiple inputs (Row by Row or All Rows)
- Ability to Pick a Python environment from one or more installed Python installations, i.e. virtual environments
- Each Step gets its own Python session
PDI Data Science Notebook (Jupyter) Integration
- Pentaho Server needs to be running to host a PDI Data Service
- PDI Spoon needs to be connected to the repository to save/deploy/edit the Data Service
- PDI Data Service Client Jars need to be made available to be used by Jupyter Notebook
- Compatible with Python 2.7.x or Python 3.5.x
- Compatible with Jupyter Notebook 5.6.x
- Python JDBC package dependencies include JayDeBeApi and jpype
Streaming Vizualizations and CTools (Push)
Platform Updates
JSON Input updates
BA Chinese Language Localization for PUC
Expanded Metadada Injection support
- Get System Data
- Execute Row SQL Script
- Execute SQL Script
- User Defined Java Class
- AMQP Consumer
- AMQP Producer
- JMS Consumer
- JMS Producer
- Add a Checksum
- Set Field Value
- Set Field Value to a Constant
BA Analyzer Numeric Level Comparison Filters
- The numeric level comparison filters can be parametrized for use with Dashboard Designer
- The filters can be applied via the report URL
- If you are working with a high cardinality level, it may make sense to optimize performance by adjusting the mondrian.olap.maxConstraints property (ensure joins are handled by the underlying database) and/or rounding your data to manage cardinality
Additional Enhancements
PDI Step & Job Entry Improvements
- User Defined Java Class step: Support of Java 1.8
-
- Allow PDI users to make use of newer Java language features (e.g. enhanced for loops, lambda expressions, varargs, etc.)
- Text File Output step: Added support of variables in the “Split every…rows” property
-
- Improve creating of flexible output file sizes controlled by variables.
- FTPS job entries: Support “Advanced server protection level”
-
- All FTPS steps have been enhanced by supporting “private protection level”, so the data is secured by integrity and confidentiality.
- Rest Client step: Allow to provide custom content type headers.
-
- Many REST servers require custom content types to be sent to them. In particular W3C Semantic compliant data stores such as Allegrograph and MarkLogic Server.
- Text File Input Step: Provide the full stack trace when a file cannot be opened
-
- The full stack trace will provide very valuable debugging information and allow root cause analysis of problems to resolved them more quickly.
- Calculator step: Added exceptions when a file is not found.
-
- Instead of providing bad data when a file is not available, the process ends with an error to notify the user of the issue.
BA Improvements
- PUC Upload/Download: Users with ‘publish content’ permission can now upload/download files to PUC
-
- No longer need to rely on a few users with complete ‘admin’ rights to move content btwn environments
- Scheduling Access: PUC users without scheduling permissions can no longer see the scheduling perspective
-
- More logical permissions and user experience for BA customers
- MDX Performance: MDX optimizations for some scenarios that incl. subtotals, numeric filters, and percentages
-
- Better performance in some Analyzer/Mondrian query scenarios
- Analyzer Business Groups: Global setting option to expand or collapse Analyzer business groups
-
- Long lists of fields can be rolled up by default when a report is opened, reducing scrolling / improving UX
- Analyzer Numeric Dimension Filters: (*Stretch Goal*) Comparison filters ( < , > , btwn, …) on numeric levels (i.e. age, credit score, customer id)
-
- Much greater flexibility to query data with numeric levels (i.e. show me sales for customers between ages of 18 and 30). Previously every distinct level value would have to be manually added to an include filter criteria.
Catching up with Kettle REMIX
Dear Kettle and Neo4j friends,
Since I joined the Neo4j team in April I haven’t given you any updates despite the fact that a lot of activity has been taking place in both the Neo4j and Kettle realms.
First and foremost, you can grab the cool Neo4j plugins from neo4j.kettle.be (the plugin in the marketplace is always out of date since it takes weeks to update the metadata).
Then based on valuable feedback from community members we’ve updated the DataSet plugin (including unit testing) to include relative paths for filenames (for easier git support), to avoid modifying transformation metadata and to set custom variables or parameters.
I’ve also created a plugin to debug transformations and jobs a bit easier. You can do things like set specific logging levels on steps (or only for a few rows) and work with zoom levels.
Then, back on the subject of Neo4j, I’ve created a plugin to log the execution results of transformations and jobs (and a bit of their metadata) to Neo4j.

Graph of a transformation executing a bunch of steps. Metadata on the left, logging nodes on the right.
Those working with Azure might enjoy the Event Hubs plugins for a bit of data streaming action in Kettle.
The Kettle Needful Things plugin aims to fix bugs and solve silly problems in Kettle. For now it sets the correct local metastore on Carte servers AND… features a new launcher script called Maitre. Maitre supports transformations and jobs, local, remote and clustered execution.
The Kettle Environment plugin aims to take a stab at life-cycle management by allowing you to define a list of Environments:
In each Environment you can set all sorts of metadata but also the location of the Kettle and MetaStore home folders.
Finally, because downloading, patching, installing and configuring all this is a lot of work, I’ve created an automated process which does this for you on a daily bases (for testing) and so you can download Kettle Community Edition version 8.1.0.0 patched to 8.1.0.4 with all the extra plugins above in its 1GB glory at : remix.kettle.be
To get it on your machine simply run:
wget remix.kettle.be -O remix.zip
You can also give these plugins (Except for Needful-things and Environment) a try live on my sandbox WebSpoon server. You can easily run your own WebSpoon from the also daily updated docker container.
If you have suggestions, bugs, rants, please feel free to leave them here or in the respective github projects. Any feedback is as always more than welcome. In fact, thanks you all for the feedback given so far. It’s making all the difference. If you feel the need to contribute more opinions on the subjects of Kettle feel free to send me a mail (mattcasters at gmail dot com) to join our kettle-community Slack channel.
Enjoy!
Matt
Pentaho Community Meeting – PCM18! Bologna, Italy, November 23-25!
PCM 18!!
If you’ve been in one, no more words are needed, just go ahead and register! If you don’t know what I’m talking about, just go ahead and register as well!
It’s the best example of what Pentaho – how part of Hitachi Vantara – is all about. A very passionate group of people that are absolutely world class at what they do and still know how to spend a good time!
![]() |
PCM17 group photo |
Now shamelessly copy-pasting the content from it-novum:
Pentaho Community Meeting 2018
Pentaho Community Meeting 2018 will take place in Bologna from November 23-25. It will be organized by Italia Pentaho User Group and by it-novum, the host of PCM17. As always, it will be a 3-days event full of presentations, networking and fun and we invite Pentaho users of every kind to participate!
For PCM18 we will meet in the beautiful city of Bologna. The guys of Italia User Group will take care of the venue and the program. With Virgilio Pierini as group representative we not only have a Pentaho enthusiast but also a native of Bologna guiding us to the beautiful corners of the hometown of Europe’s oldest university!
What is Pentaho Community Meeting?
Pentaho Community Meeting is an informal gathering for Pentaho users from around the world. We meet to discuss the latest and greatest in Pentaho products and exciting geek stuff (techie track) as well as best practices of Pentaho implementations and successful projects (business track). Read this summary of Pentaho Community Meeting 2017 to learn more.
PCM18 is open to everyone who does something with Pentaho (development, extensions, implementation) or plans to do data integration, analytics or big data with Pentaho. Several Pentaho folks – architects, designers, product managers – will share their latest developments with us.
The event is community-oriented and open-minded. There’s room for networking and exchanging ideas and experiences. Participants are free to break off into groups and work together.
Call for Papers
For sure, this is intended to be a community event – for the community and by the community. To register your proposal for the agenda, please use the contact form to send a brief description including your name and title in English until September 30th.
Agenda
The agenda will be updated continuously, so stay tuned for updates! All updates will be posted on twitter, too.
Friday, November 23 | Hackathon
We start the three-day PCM with a hackathon, snacks and drinks. After a 2-hour hackathon, a highly esteemed jury will award the most intelligent/awkward/funny hacks.
Saturday, November 24 | Conference Day
- Welcome speech | Stefan Müller and the org team
- The future of Pentaho in Hitachi Vantara | Pedro Alves, Hitachi Vantara
- What’s new in PDI 9.0 | Jens Bleuel, Hitachi Vantara
- Useful Kettle plugins | Matt Casters, Neo4j (and founder of Kettle)
- IoT and AI: Why innovation is a societal imperative | Wael Elrifai, VP for Solution Engineering – Big Data, IOT & AI, Hitachi Vantara
- Pentaho at CERN | Gabriele Thiede, CERN
- Pentaho User Group Italia
- SSBI (Self Service BI ) – Pentaho Plugin Update | Pranav Lakhani, SPEC INDIA
- Scaling Pentaho Server with Kubernetes | Diethard Steiner
- Capitalizing on Lambda & Kappa Architectures for IoT with Pentaho | Issam Hizaji, Lead Sales Engineer, Data Analytics & IoT | Emerging & Southern
After the lunch, everybody splits up to join the business or the techie track.
Sunday, November 25 | Social Event
Brunch, sightseeing and… let´s see!
—-
Anyway, believe me, you want to go! GO REGISTER HERE!
Pentaho 8.1 is available
Pentaho 8.1 is available
For CE, you can find it on the community home!
Cloud
Google Storage (EE)
Google BigQuery – JDBC Support (EE/CE)
Google BigQuery – Bulk Loader (EE)
Analytics over BigQuery (EE/CE, depending on the tool used)
AWS S3 Security Improvements (IAM) (EE/CE)
Big Data / Adaptive Execution Layer (AEL) Improvements
Bigger and Better (EE/CE)
Sub Transformation support (EE/CE)
Big Data formats: Added support for Orc (EE/CE)
Worker Nodes (EE)
Streaming
New Streaming Datasources: MQTT, and JMS (Active MQ / IBM MQ) (EE/CE)
Streaming results (EE/CE)
CTools and Streaming Visualizations (EE/CE)
Time Series Visualizations (EE/CE)
Data Exploration Tool Updates (EE)
Additional Updates
Download it
Go get your Enterprise Edition or trial version from the usual places
For CE, you can find it on the community home!
Farewell Pentaho

Dear Kettle friends,
12 years ago I joined a wonderful team of people at Pentaho who thought they could make a real change in the world of business analytics. At that point I recently open sourced my own data integration tool (then still called ‘ETL’) called Kettle and so I joined in the role of Chief Architect of Data Integration. The title sounded great and the job included everything from writing articles (and a book), massive amounts of coding, testing, software releases, giving support, doing training, workshops, … In other words, life was simply doing everything I possibly and impossibly could to make our software succeed when deployed by our users. With Kettle now being one of the most popular data integration tools on the planet I think it’s safe to say that this goal has been reached and that it’s time for me to move on.
I don’t just want to announce my exit from Pentaho/Hitachi Vantara. I would also like to thank all the people involved in making our success happen. First and foremost I want to express my gratitude to the founders (Richard, Doug, James, Marc, …) for even including a crazy Belgian like myself on the team but I also want to extend my warmest thanks to everyone who I got to become friends with at Pentaho for the always positive and constructive attitude. Without exaggeration I can say it’s been a lot of fun.
I would also explicitly like to thank the whole community of users of Kettle (now called Pentaho Data Integration). Without your invaluable support in the form of new plugins, bug reports, documentation, forum posts, talks, … we could never have pulled off what we did in the past 12 years! I hope we will continue to meet at one of the many popular community events.
Finally I want to thank everyone at Hitachi and Hitachi Vantara for being such a positive and welcoming group of people. I know that Kettle is used all over Hitachi and I’m quite confident this piece of software will not let you down any time soon.
Now I’m going to go skiing for a week and when I get back it’s time to hunt for a new job. I can’t wait to see what impossible problems need solving out there…
Cheers,
Matt
Pentaho 8 is now available!
I recently wrote about everything you needed to know about Pentaho 8. And now is available! Go get your Enterprise Edition or trial version from the usual places
For CE, you can find it on the new community home!
Enjoy!
-pedro
A new collaboration space
With the move to Hitachi Vantara we’re not letting the community go away – exactly on the contrary. And one of the first things is trying to give the community a new home, in here: http://community.pentaho.com
We’re trying to gather people from the forums, user groups, whatever, and give a better and more modern collaboration space. This space will continue open, also because the content is extremely value, so the ultimate decision is yours.
Your mission, should you choose/decide to accept it, is to register and try this new home. Counting on your help to make it a better space
See you in http://community.pentaho.com
Cheers!
-pedro