Pentaho 7.1 is available!
Pentaho 7.1 is out
Remember when I said at the time of the previous release that Pentaho 7.0 was the best release ever? Well, I was true till today! But not any more, as 7.1 is even better! :p
- Visual Data Experience
-
- Data Exploration (PDI)
-
- Drill Down
- New Viz’s: Geo map, sunburst, Heat Grid
- Tab Persistency
- Several other improvements including performance
- Viz API 3.0 (Beta)
-
- Viz API 3.0, with documentatino
- Rollout of consistent visualizations between Analyzer, PDI and Ctools
- Enterprise Platform
-
- VCS-friendly features
-
- File / repository abstraction
- PDI files properly indented
- Repository performance improvements
- Reintroducing Ops Mart
- New default theme on User Console
- Pentaho Mobile deprecation
- Big Data Innovation
-
- AEL – Adaptive Execution Layer (via Spark)
- Hadoop Security
-
- Kerberos Impersonation (for Hortonworks)
- Ranger support
- Microsoft Azure HD Insights shim
Adaptive Execution with Spark
- Pentaho – the classic pentaho engine
- Spark – you’ve guessed it…
![]() |
AEL execution of Spark |
Scale as you go
Pentaho’s message is one of future-proofing the IT architecture, leveraging the best of what the different technologies have to offer without imposing a certain configuration or persona as the starting point. The market is moving towards a demand for BA/DI to come together in a single platform. Pentaho has an advantage here as we have seen the differentiation of BI and DI better together with our customers and what sets us apart from the competition. Gartner predicts that BI and Discovery tool vendors will partner to accomplish this. Larger, proprietary vendors, will attempt to build these platforms themselves. With this approach from the competition, Pentaho has a unique and early lead in delivering this platform.
A good example is the story we can tell about governed blending. We don’t need to impose on customers any pre-determined configuration; We can start with the simple use of dataservices and unmaterialized data sets. If it’s fast enough, we’re done. If not, we can materialize the data into a data base or even an enterprise data warehouse. If it’s fast enough, we’re done. If not we can resort to other technologies – NoSQL, Lucene based engines, etc. If it’s fast enough, we’re done. If everything else fails, we can setup a SDR blueprint which is the ultimate scalability solution. And throughout this entire journey we never let go of the governed blending message.
This is an insanely powerful and differentiated message; We allow our customers to start simple, and only go down the more complex routes when needed. When going down a single path, a user knows, accepts and sees the value in extra complexity to address scalability
Adaptive Execution Layer
![]() |
AEL conceptual diagram |
Implementation in 7.1 – Spark
![]() |
An architectural diagram so beautiful it should almost be roughly correct |
Runtime flow
![]() |
Creating and selecting a Spark run configuration |
- Some steps are safe to run in parallel while others are not parallelizable or not recommended to run in clustered engines such as Spark. All the steps that take one row as input and one row as output (calculator, filter, select values, etc, etc), all of them are parallelizable; Steps that require access to other rows or depend on the position and order on the row set, still run on spark, but have to run on the edge node, which implies a collect of the RDDs (spark’s datasets) from the nodes. It is what it is. And how do we know that? We simply tell PDI which steps are safe to run in parallel, and which are not
- Some steps can leverage Spark’s native APIs for perfomance and optimization. When that’s the case, we can pass to PDI a native implementation of the step, greatly increasing the scalability on possible bottleneck points. Examples of these steps are the hadoop file inputs, hbase lookups, and many more
Feedback please!
Visual Data Experience (PDI) Improvements
In the 7.1 release, Pentaho provides new Data Explorer capabilities to further support the following key use cases more completely:
- Data Inspection: During the process of cleansing, preparing, and onboarding data, organizations often need to validate the quality and consistency of data across sources. Data Explorer enables easier identification of these issues, informing how PDI transformations can be adjusted to deliver clean data.
- BI Prototyping: As customers deliver analytic ready data to business analysts, Data Explorer reduces the iterations between business and IT. Specifically, It enables the validation of metadata models that are required for using Pentaho BA. Models can be created in PDI and tested in Data Explorer, ensuring data sources are analytics-ready when published to BA.
New visualization: Heatgrid
This chart can display 2 measures (metrics) and 2 attributes (categories) at once. Attributes are displayed on the axes and measures are represented by the size and color of the points on the grid. It is most useful for comparing metrics at the ‘intersection’ of 2 dimensions, as seen in the comparisons of quantity and price across combinations of different territories and years below (did I just define what an heatgrid is?! No wonder it’s taking me hours to write this post!):
![]() |
Look at all those squares! |
New visualization: Sunburst
![]() |
Circles are also pretty! |
New visualization: Geo Maps
The geo map uses the same auto-geocoding as Analyzer, with out of box ability to plot latitude and longitude pairs, all countries, all country subdivisions (state/province), major cities in select countries, as well as United States counties and postal codes.
![]() |
Geo Map visualization |
Drill down capabilities
–> When using dimensions in Data Explorer charts or pivot tables, users can now expand hierarchies in order to see the next level of data. This is done by double clicking a level in the visualization (for instance, double click a ‘country’ bar in a bar chart to drill down to ‘city’ data).
![]() |
Drill down in the visualizations… |
This can be done though the visualizations or though the labels / axis. Once again, look at this as the beginning of a coherent way to handle data exploration!
![]() |
… or from where it makes more sense |
And this is only the first of a new set of actions we’ll introduce here…
Analysis persistency
In 7.0 these capabilities were a one-time inspection only. Now we’ve taken a step further – they get persisted with the transformations. You can now use to validate the data, get insights right on the spot, and make sure everything is lined up to show to the business users.
![]() |
Analysis persistency indicator |
Viz Api 3.0
Every old timer knows how much disparity we’ve had throughout the stack in terms of offering a consistent visualization. This is not an easy challenge to solve – the reason they are different is because different parts of our stack were created in completely different times and places, so a lot of different technologies were used. An immediate follow-up consequence is that we can’t just add a new viz and expect it to be available in several places of the stack
We’re been working on a visualization layer, codenamed VizAPI (for a while, actually, but now we reached a point where we can make it available on beta form), that brings this so needed consistency and consolidation.
![]() |
Viz API compatible containers |
In order to make this effort worthwhile, we needed the following solve order:
- Define the VizAPI structure
- Implement the VizAPI in several parts of the product
- Document and allow users to extend it
And… we did it. We re-implemented all the visualizations in this new VizAPI structure, adapted 3 containers – Analyzer, Ctools and DET (Data Exploration) in PDI, and as a consequence, the look and feel of the visualizations are the same
![]() |
Analyzer visualizations are now much better looking _and_ usable |
One important note though – migration users will still default to the “old” VizAPI (yeah, we called it the same as well, isn’t that smart :/ ) not to risk interfering with existing installations. In order for you to test an existing project with the new visualizations you need to change the VizAPI version number in analyzer.properties. New installs will default to the new ones.
In order to allow people to include their own visualization and promote more contributions to Pentaho (I’d love to start seeing more contributions to the marketplace with new and shiny Viz’s), we need to really make it easy for people to know how to create them.
And I think we did that! Even though this will require it’s own blog post, just take a look at the documentation the team prepared for this
![]() |
Instructions for how to add new visualizations |
You’ll see this documentation has beta written on it. The reason is simple – we decided to put it out there, collect feedback from the community and implement any changes / fine tunes / etc before 8.0 timeframe, where we’ll lock this down, guaranteeing long term support for new visualizations
MS HD Insights
HD Insights (HDI) is a hosted Hadoop cluster that is part of Microsoft’s Azure cloud offering. HDI is based on Hortonworks Data Platform (HDP). One of the major differences between the standard HDP release and HDI’s offering is the storage layer. HDI connects to local cluster storage via HDFS or to Azure Blob Storage (ABS) via a WASB protocol.
Hortonworks security support
Added support for Hadoop user impersonation |
Data Processing-Enhanced Spark Submit and SparkSQL JDBC
- Amazon EMR
- MapR
- Azure HD Insights
VCS Improvements
Repository agnostic transformations and jobs
![]() |
The classic way to reference dependent objects |
![]() |
The current approach to define dependencies |
KTR / KJB XML format
Repository performance improvements
Operations Mart Updates
Other Data Integration Improvements
Metadata Injection Enhancement
Lineage Collection Enhancement
XML Input Step Enhancement
New Mobile approach (and the deprecation of Pentaho Mobile)

Pentaho User Console Updates
![]() |
Sapphire theme in PUC |
