Tag Archives: feature

The New Auto Number Feature in Dynamics 365

New Auto Number Features 300x225 The New Auto Number Feature in Dynamics 365

Auto numbers or unique alphanumeric strings are usually needed by CRM users to uniquely identify records in UI. We have developed plugins in the past, which would generate unique identifiers for us by using counters or random strings.

Microsoft has released an “Auto number” attribute in the 9.0 version of Dynamics 365. Adding the attribute from UI is not allowed yet, but it will be soon. Until then, we can add the attribute via API.

Let’s add one.

1. Create a console application in Visual Studio.

021518 1842 TheNewAutoN1 The New Auto Number Feature in Dynamics 365

2. Add the Dynamics 365 V9.0 dlls as references. Currently, the preview version of the dlls is available in NuGet.

https://www.nuget.org/packages/Microsoft.CrmSdk.CoreAssemblies/9.0.0.4-Preview

3. Create a connection to your organization and call the CreateAutoNumberAttribute function from the Main() function.

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Threading.Tasks;

using System.Configuration;

using Microsoft.Xrm.Sdk;

using Microsoft.Xrm.Sdk.Query;

using Microsoft.Xrm.Sdk.Client;

using Microsoft.Xrm.Sdk.Messages;

using Microsoft.Xrm.Sdk.Metadata;

using System.ServiceModel.Description;

static
void Main(string[] args)

{


string orgServiceUrl = ConfigurationManager.AppSettings[“orgServiceUrl”];


string userName = ConfigurationManager.AppSettings[“username”];


string password = ConfigurationManager.AppSettings[“password”];


Uri oUri = new
Uri(orgServiceUrl);


//** Your client credentials


ClientCredentials clientCredentials = new
ClientCredentials();

clientCredentials.UserName.UserName = userName;

clientCredentials.UserName.Password = password;


//Create your Organization Service Proxy


OrganizationServiceProxy _serviceProxy = new
OrganizationServiceProxy(

oUri,


null,

clientCredentials,


null);

CreateAutoNumberAttribute(“account”, “new_accountnumber”, _serviceProxy);

}

4. Following function creates an auto number attribute of “string” type in “Account” entity.

“AutoNumberFormat” is the new property that has been added to the AttributeMetadata class to help us add the auto number attribute.

public
static
void CreateAutoNumberAttribute(string entityName, string attributename, OrganizationServiceProxy _orgServiceProxy)

{


CreateAttributeRequest newAttributeReq = new
CreateAttributeRequest

{

EntityName = entityName.ToLower(),

Attribute = new
StringAttributeMetadata

{

AutoNumberFormat = “ACC-{RANDSTRING:6}”,

LogicalName = attributename.ToLower(),

SchemaName = attributename.ToLower(),

MaxLength = 150,

RequiredLevel = new
AttributeRequiredLevelManagedProperty(AttributeRequiredLevel.None),

FormatName = StringFormatName.Text,

DisplayName = new Microsoft.Xrm.Sdk.Label(“Account Number”, 1033),

Description = new Microsoft.Xrm.Sdk.Label(“Unique alphanumberic number for each account”, 1033)

}

};

_orgServiceProxy.Execute(newAttributeReq);

}

5. Time to check out the new attribute!

021518 1842 TheNewAutoN2 The New Auto Number Feature in Dynamics 365

6. Let’s create few accounts and check the auto numbers!

021518 1842 TheNewAutoN3 The New Auto Number Feature in Dynamics 365

7. 9.0 version of Dynamics 365 supports the following tokens for Auto numbers:

Static String
DATE:[format]
SEQNUM:size
RANDSTRING:6

There you have it! We’re happy to have this feature added to Dynamics 365. Be sure to check out our blog for more updates, news, and tips for Dynamics 365!

Happy Dynamics 365’ing!

Let’s block ads! (Why?)

PowerObjects- Bringing Focus to Dynamics CRM

Data as a Feature: The New Differentiator for Software Builders

iStock 639637200 e1517933445411 Data as a Feature: The New Differentiator for Software Builders

It is a great time to be alive if you are a technology consumer. The era of “there’s an app for that” has quickly evolved into the era of “there are 10 apps for that”, meaning users have more choice than ever over which applications they adopt. Lower barriers to developing and providing an application have lowered the common denominator for software builders.

With an influx of new applications, product managers are having to find creative new ways to differentiate their offerings from the pack. User experience has proven to be one of the new differentiators and, in many in cases, is perceived as more important to users than core features and capabilities. In the new era of software, the best applications—the ones that stick—are those that pair great user experience with the powerful potential that lies within data. The best applications treat data as a feature of their product or service.

What exactly is “data as a feature”? It is the act and process of treating data as a core component of an application in a way that delivers value to the end user. One of the primary drivers for any product manager is to build a product that helps users achieve a goal or set of goals. When designed and packaged in the right way, data is a potent asset that allows users to reach goals and appreciate the full value of an application.

Need proof? Look at consumer applications like Mint.com and Strava—two of the most successful apps in their respective domains. Mint.com took the traditionally difficult task of deciphering financial information and made it easy for virtually any user to intelligently manage their finances. Strava accomplished the same feat in the world of personal fitness by allowing users to train smarter and reach cycling and running goals. Both applications provide highly visual interfaces that allow users of any competency to intuitively interact with data. Having these data experiences within the context of the application allows users to not only consume helpful insights, but act on those insights at the point of consumption.

And although data as a feature started in consumer products, business applications are capitalizing on the same practice as well. The HR & Finance application, Workday, uses thoughtfully designed charts and visualizations to give its non-technical, non-analyst users the power to make data-driven decisions around recruiting and workforce management.

Software product managers are a rare breed. Not many people have the creative vision, technical chops, and pragmatic decision-making skills that come with the job of product management. To meet demands of the next generation of application users, product managers must employ all of these weapons and look for new ways to differentiate their products from an increasingly crowded market. With a surplus of data at our disposal, many software builders are unknowingly sitting on gold mines of untapped data that can be unlocked for their users. By treating data as a feature, product managers can make stickier applications by helping users reach goals faster and more confidently with data.

 
To explore the concept of data as a feature and learn key considerations around embedded analytics, download the complimentary new O’Reilly eBook, Data as a Feature: A Guide for Product Managers.

Let’s block ads! (Why?)

The TIBCO Blog

Power BI Desktop February Feature Summary

There are two themes for this month’s Power BI Desktop update. The first is a focus on some of your major asks including multi-selecting data points across multiple charts and syncing slicer values across multiple pages of your report. We also have a major update to our SAP HANA connector. Our second theme is a focus on performance improvements across three different areas of our product.

5d7f8edd 60a2 45e6 8cd8 5a7fa35e97b5 Power BI Desktop February Feature Summary

Don’t forget you can now get Power BI Desktop from the Windows Store if you’re on Windows 10!

c36916c4 7a57 41f0 9e2f 5ba0ce3db9ed Power BI Desktop February Feature Summary

Here’s the complete list of February updates:

Reporting

Analytics

Custom visuals

Data connectivity

Other

For a summary of the major updates, you can watch the following video:

We are very excited to announce the release of one of our most highly asked for features! You can now ctrl+click on data points in multiple charts and have the cross filters applied to your entire report. As you select data points across multiple charts, we will apply both those filters together. For example, if you select Computers in a column chart and Regular in a pie chart, then all the charts on your report page will be filtered to products that are computers and regular.

1f2cc058 0fce 45d0 a098 dbf643eab96e Power BI Desktop February Feature Summary

Watch the following video to learn more about multi-selecting data points:

Another huge ask from the community is to have slicers that apply to multiple pages. To meet this request, we’ve added a “Sync slicers” pane that will let you select slicers to stay in sync across multiple pages. For instance, selecting “Red” on a product color slicer on page 1 will be reflected in the product color slicer on page 2 if the two pages are synced. You can reach the Sync slicers pane from the View tab.

Once you’ve opened the pane, select a slicer to see the options. You can quickly make the slicer apply to every page by clicking the ‘Add to all’ link. A hidden slicer will be put on every page, synced to the currently selected slicer. You can also use the right column to pick if the slicer is visible on each page or not.

3f76256d 598a 45c0 9349 cc13bbf4d7af Power BI Desktop February Feature Summary

Watch the following video to learn more about syncing slicers:

We also have a minor improvement to our numeric slicer preview visual. Now if you create a range slicer using a whole number column, it will snap to rounded values and not apply decimal filters.

8be77bce e748 46d1 85e1 7592ba1fbbb5 Power BI Desktop February Feature Summary

Watch the following video to learn more about syncing slicers:

Starting this month, we are planning on flighting a new, faster way to geocode your map data. This new method can have up to a 6x improvement on your geocoding speed depending on the amount of data you have. While you shouldn’t notice any changes to the actual location results you are getting back, the results will come back much faster. We wanted to be very safe with this change, so we are choosing to flight this feature. This means we will be releasing it to a small group of users and will over time slowly increase the number of users getting the feature. When we flight features, they only affect the Power BI service, not your Desktop reports, but we wanted to share the feature details here so everyone is aware of the changes.

By default, Power BI will hide your data labels if you place them inside the data points and they don’t fit. This month we added a formatting option that lets the data label overflow past the edge of the data point, like the experience you’d see in Excel. Note that we will still automatically hide data labels that would collide with others to prevent overlapping text.

f44dc7ec fc8a 4248 9aad 877e17e821ff Power BI Desktop February Feature Summary

Watch the following video to learn more about the overflow formatting option:

 

We have so many options now in our formatting and analytics pane that it can sometimes be hard to find the option you are looking for. To help with this, we’ve added a search box that will filter down the options to only those that match your search terms.

1205cb1c a526 49cd 88cb 332b2eedd935 Power BI Desktop February Feature Summary

Watch the following video to learn more about searching the formatting pane:

Until now, you could only bring your own date tables by importing from Excel (using PowerPivot) or through LiveConnect (SSDT). Now in Power BI, you can mark a table as a date table. This will enable date hierarchies and time intelligence for the selected table.

In order for a table to be marked as a date table, it must contain at least one column of type ‘Date’, or ‘Date/Time’.

You can mark your own table as a date table either through right-clicking on the desired table or using the ribbon button.

22180020 9248 4c8b bd6b 36dfcb862353 Power BI Desktop February Feature Summary

When marking a table as a date table, you must select a date column to be used as a primary date column. Make sure that the column you pick for this:

  • Contains no null values
  • Contains unique values
  • Contains contiguous date values (from start to end)
  • If it is a date/time column, the timestamp must be the same across each value

markAsDateTable Power BI Desktop February Feature Summary

When a table is marked as a date table, all our built-in date tables associated with this table will be removed. If you later un-marked it as a date table, our built-in date tables will be re-created for the date columns in the table.

Watch the following video to learn more about marking a custom date table:

Our quick measures feature is now generally available and no longer requires the preview feature switch to be turned on. Additionally, thanks to the above feature to mark your own custom data table, you no longer need to use Power BI’s date hierarchy for the time intelligence quick measures. You can use the date field from your own date table in any quick measures. 

We’ve added two new DAX functions this month that help if you’re working with date-time data across timezones. DAX has long supported the NOW() and TODAY() functions that return time and date in the timezone that the function’s being used – so if a .pbix file is passed to someone in a different timezone they’ll see different results. UTCNOW() and UTCTODAY() will always return the current time or date in UTC so you can guarantee consistent results wherever you are (and also when you upload the workbook to the Power BI service).

Custom visuals enable developers and users to visualize their data in the way that best helps them understand the data and gain insights from it. Often, these visuals are specific to the organization’s needs and the way that it runs its business. The organization typically doesn’t want to publish the visual to the public. Nevertheless, there is a real need to get those visuals into the hands of data analysts inside the organization, the ones that build and maintain the analytics reports and dashboards. It’s especially important that it’s easy for them to discover the appropriate visuals and to know they don’t need to maintain and update these visuals to use them.

Your organization’s Power BI admin can deploy specific visuals as part of your organizational repository. This means you’ll see these deployed visuals under a new section of the Power BI custom visual store integrated into Power BI Desktop called My organization. Since these visuals have been vetted by your company, you can have confidence that they are well made.

d236c9a2 cf4a 4c22 b991 1a5fbb0ce0df Power BI Desktop February Feature Summary

This feature is currently in preview, so you’ll need to enable it from the Options menu (File->Options and settings->Options->Preview features).

Learn more about organizational custom visuals.

If you are a Power BI admin, you can add new custom visuals to this part of the store by going to the Admin portal in the Power BI service and selecting Organization visuals tab. From there you can click the Add a custom visual button and upload the .pbiviz file for the custom visual you want.

ebf9eaf0 b9e7 40bd 8d4b 9caa2f768154 Power BI Desktop February Feature Summary

Learn more about deploying and managing organization custom visuals.

Watch the following video to learn more about organizational custom visuals:

The Network Chart by ZoomCharts has the ability to visualize data as a network structure and filter data through a variety of gestures and smart touches on all devices. The network structure lets you have multiple vertice types (more than two), making it possible to see the connections among many categories. The box select feature lets you easily select multiple nodes within the chart to filter your other visuals. Visualizing the data as a network structure lets you see the implied connections in data making important decision making more efficient and obvious.

5f4ee9d7 366c 4ad9 8782 a4728b9b9f87 Power BI Desktop February Feature Summary

Some practical use cases of the interactive Network Chart and Filter visual are:

  • Customer segmentation
  • Cost and revenue analysis
  • Log/Event analysis

There are two versions of this chart. The free version is available on Microsoft AppSource and the customizable paid version can be purchased on the ZoomCharts website.

Features of the free Network Chart visual include:

  • Fully depict data as a network structure with the ability to filter the data from within the visual
  • Smooth connecting visuals
  • Cross-category filtering
  • Dynamic force feedback layout
  • Expanding/collapsing vertices
  • Hide/Unhide vertices
  • Select one or multiple vertices
  • Category-coded colors for vertices (non-configurable)
  • Inner and outer labels for vertices (non-configurable)
  • Dynamic radius for vertices (non-configurable)

Paid features include:

  • Legend control
  • Customizable vertices
  • Support for images inside vertices
  • Value-based gradient colors for vertices and edges
  • Inside/outside label configuration
  • Configurable dynamic radius for vertices
  • Configurable dynamic width for edges
  • Edge decorations
  • Up-to 9 categories

Watch the following video to learn more about the Network Chart:

Slope Chart by MAQ Software allows you to analyze trends in data at a glance. This visual is useful for comparing interactions between two data points based on time or other user-chosen parameters. A slope chart is most useful if there are notable changes to highlight or for showing comparisons between groups.

Some key features of the visual include:

  • Quick comparison of indicator growth or loss across categories
  • Easy interaction with many data points using zoom functionality
  • Quickly download an image of the chart with the capture image widget

794c88a4 3075 45cd 9241 ddf7d45dfcf3 Power BI Desktop February Feature Summary

Watch the following video to learn more about the Slope Chart:

The Filter by List visual lets you bulk apply filter values for a report. Currently if you want to filter a report with a large number of categories, such as 100 products, you’d need to check each box in the filter pane one by one. With this visual you can do this immediately by pasting the entire list of products into the slicer visual.

201cd90b 2b68 4328 8aac d34215c6223f Power BI Desktop February Feature Summary

Watch the following video to learn more about the Filter by List visual:

This visual helps you find overlaps and gaps of dates in your data. This is useful for showing things such as team vacation schedules or project plans.

1e58f689 297f 44b6 bb3f 90280f05c8cf Power BI Desktop February Feature Summary

Watch the following video to learn more about the as Timeline visual:

T-accounts are used by most accounting professionals to understand difficult accounting journal entries and with this visual you can now create T-accounts in Power BI directly.

d6aaf731 1bee 499f 90af 395b9ee7e49a Power BI Desktop February Feature Summary

Watch the following video to learn more about the as T-Accounts visual:

Power BI has long supported SAP HANA, both for import and DirectQuery. However, the DirectQuery support largely treated SAP HANA as a relational source, meaning that care had to be taken to ensure that measures aggregated correctly when dealing with non-additive SAP HANA measures like Counters or ratios and to ensure that the reports had good performance.

There is now a preview of a new approach to SAP HANA that will treat it as a multi-dimensional source by default, in a similar manner to SAP Business Warehouse or Analysis Services. You’ll need to turn on the “New SAP HANA connector” preview feature switch under File > Options and Settings > Options > Preview features.

When connecting to SAP HANA using this approach, a single analytic or calculation view is selected, and all the measures, hierarchies, and attributes of that view will be available in the field list. As visuals are created, the correct aggregate data will always be retrieved from SAP HANA.

It is still possible to use the old approach to DirectQuery over SAP HANA. This is controlled by selecting File > Options and settings and then Options > DirectQuery, then selecting the option Treat SAP HANA as a relational source. Check this option to create a DirectQuery report over SAP HANA using the old approach.

Note that this option controls the approach that will be used for any new connections created. It has no effect on any existing SAP HANA connections in the current report, nor on connections in any other reports that are opened. This means that any reports connecting to SAP HANA that were created prior to February 2018 will continue to treat SAP HANA as a relational source.

Given the completely different behavior between the two approaches, it’s not possible to switch an existing report from one approach to the other.

The two approaches are described in detail in our documentation.

While in preview, reports created using the new multi-dimensional approach can’t be published to the Power BI service. Doing so will result in errors when opened within the service.

This latest update includes a large number of improvements to the performance of reports using DirectQuery. We expect some improvement to many queries and substantial improvement in particular cases. The key performance improvements are:

  • Combining additive and non-additive aggregates into single query. Power BI will now combine both additive aggregates (e.g., SUM, MIN) and non-additive aggregates (e.g., DISTINCTCOUNT) into a single SQL query to the underlying source, rather than sending two queries. This will improve the performance of any visual containing both DistinctCount and another additive aggregate.
  • Improved handling of queries with a large number of literals. The performance of any query involving a large number of literals has been improved, such as queries that involve measure filtering, TopN filtering, or bi-directional relationships.
  • Use of a single SQL query rather than multiple for some visuals and calculations. Power BI used to send several SQL queries using DISTINCTCOUNT for certain kinds of visuals and calculations. For example, this could occur when using bi-directional relationships, or using a quick measure, or in a measure using CALCULATE of DISTINCTCOUNT. Now, a single SQL query is sent, resulting in an order of magnitude improvement in some cases.
  • Optimized queries to reduce post-aggregation. In some scenarios, Power BI used to send SQL queries at a lower granularity than needed for the visual and perform post-aggregation locally. Now, it pushes all the joins and post-aggregations down to the underlying source, improving performance. For example, this applies to any visual where the underlying measure column is directly related to only a subset of the columns being grouped by (and there is some non-trivial filtering), and most quick measures.
  • Optimized use of TREATAS in calculations. Power BI will optimize common patterns for the use of TREATAS in calculations to combine into a single query. For example, a visual containing the following two measures:

    CALCULATE(SUM(Table1[Col1], TREATAS(VALUES(Table2[Col3]), Table1[Col3])) and
    CALCULATE(MIN(Table1[Col2], TREATAS(VALUES(Table2[Col3]), Table1[Col3]))

    would previously have resulted in two separate SQL queries, but will now result in just one.

  • Fewer SQL queries are sent when there is multi-column tuple filtering. For example, if Include or Exclude is used to filter out data items based on multiple columns, e.g., to exclude the items (“USA”,”Old”) and (“France”,”Young”) from a visual.

We’ve made improvements to our performance when opening and saving files, especially when your files are quite large. In some cases, you can see between a 40% and 60% performance improvement.

We’ve done some work to improve the Show items with no data feature.

1df339d1 7bc8 438a b2c0 e576b4cae245 Power BI Desktop February Feature Summary

Some visible improvements you’ll see are:

  • For visuals that have Show items with no data enabled, you’ll see major performance improvements for visuals with columns or filters directly related in the model (as in they have one to many relationship). On average we improved the performance by 100%.
  • Filters related to columns that have Show items with no data enabled now respect model bidirectional cross filtering

We are planning to release a new feature for reports in the Power BI service very soon where we will persist filters, slicers, and other data view changes as your report consumers last left them. This will be unique to each of them, so they will no longer need to spend their valuable time setting up or filtering the reports on every visit. Instead, they can pick up right where they left off!

In order to give you more control over this feature when it does get released in the service, we have added a new option under Report settings called “Persistent Filters” in this month’s release. The feature is on by default, but you can choose to disable it here.

61a4ec49 7854 4bc2 9d14 642529b665b6 Power BI Desktop February Feature Summary

Watch the following video to learn more about the persistent filters option:

From the new Consulting Services resource on the Help ribbon, you can explore a wide variety of curated fixed price, fixed scope, and fixed duration consulting services from our vibrant partner community. These affordable packaged briefings, assessments, workshops, proof-of-concept, and implementation offerings can help you get started with Power BI, take your skills to the next level, or help set you on the path to solve difficult business problems. 

Watch the following video to learn more about the Consulting Services resource:

That’s all for this month! We hope that you enjoy these updates and continue sending us your feedback. Please don’t forget to vote for other features that you’d like to see in the Power BI Desktop.

5d7f8edd 60a2 45e6 8cd8 5a7fa35e97b5 Power BI Desktop February Feature Summary

Let’s block ads! (Why?)

Microsoft Power BI Blog | Microsoft Power BI

Power BI Service and Mobile January Feature Summary

The Power BI service and mobile team is thrilled to kick off the first feature summary blog for 2018! With a new year in full swing, we’ve been hard at work on features to make it easier for you to achieve more with your data. In January, we released some highly anticipated features in the Power BI service and mobile apps. Here’s a quick recap in case you missed it:

Dashboards and reports provide two distinct ways of presenting information in Power BI, each with their own advantages. Since we launched the product, dashboards have been the primary way to share content with others in the service. However, there has always been an overwhelming ask from our community and users to extend the same capability to reports – and that’s exactly what we delivered in January. Yes, you read that right – you can now sharereports directly with others in Power BI!

The feature works just like dashboard sharing. Simply head to any report, look for the “Share” option in the top bar, and select it to launch into a familiar pane experience where you can select the recipients that you want to share the report with and choose to give them re-share permissions.

7520df47 eab4 4ac8 90a1 29c2ca60db74 Power BI Service and Mobile January Feature Summary

After you share a report, it will appear in their “Shared with me” page and receive an e-mail if you’ve selected it.

In addition, we have also made it easy for you to quickly find and access Power BI content by allowing you to favorite reports, paralleling the functionality in existing dashboards. Just select the “Favorite” option with the star on the top bar of a report, and it should immediately appear in your Favorites content list. Learn more

It doesn’t stop there. You can also access reports that are shared with you on to go using your Power BI mobile app and even favorite them for quick access!

Starting in January, dataset owners can choose to use AAD OAuth credentials when connecting the Power BI service to Azure SQL databases. With this update, Power BI can leverage all the security features of AAD when connecting to Azure SQL – including multi-factor authentication, when its configured. The OAuth authentication method is supported when connecting for both import and DirectQuery. In addition, when connecting to these data sources via DirectQuery, the Power BI service can now be configured to connect to the data source with the end users’ credentials, allowing the service to respect the security settings configured at the data source level. Learn more

We’re pleased to announce that Power BI customers around the globe who enable auditing in their tenant will now have these audit logs stored in the same Office 365 region as their tenant.  This region is determined for each tenant, based on the country selected during initial signup for the first Microsoft service in the tenant, such as Power BI or Office 365. Specific audit log locations can be found in the Office 365 trust center by finding the Exchange Online locations based on your tenant country. Another benefit of this change to how audit logs are handled is the data will be available much sooner than it was previously, where customers would often not see events in their audit logs for up to 24 hours after they originally occurred.  Now, these same events will be available within an hour or so, and often times even sooner.

We are excited to announce the general availability of embedding interactive Power BI reports in SharePoint Online. This feature enables SharePoint authors to easily embed Power BI reports directly on their page without any code! Since the introduction of this feature almost a year ago, we have made significant improvements based on your feedback to ensure reliability and performance.

c74b4003 ded1 436d a384 63c38388d6ef Power BI Service and Mobile January Feature Summary

To learn how to use the Power BI web part, see Embed with report web part in SharePoint Online on the Power BI documentation site.

We are giving IT admins more control over whether users in their organization can  view reports containing custom visuals. Please note that the setting cannot be limited to specific groups and applies to the entire organization. Learn more

671043a1 8708 420b a3bb 3b34c90ca893 Power BI Service and Mobile January Feature Summary

Power BI has had seamless connectivity to Azure Analysis Services since it was first launched. However, we only supported live connections – until now. We are excited to announce that Power BI now supports imported data from Azure Analysis Services and for the scheduling of those imports to be refreshed.

We continued to make enhancements and shipped a new update for the Power BI On-premises data gateway in January. The update includes a beta release of the PowerShell Cmdlets to manage on-premises data gateway in the public PowerShell gallery, configuration that lets you control the number of mashup engine containers that can run in parallel, and a new version of the Mashup Engine. Try it out for yourself by installing the new gateway and continue to send us feedback on any new capabilities you’d like to see in the future.

We’ve continued to improve your mobile experience by now allowing you to directly navigate to links within custom visuals and open them on your favorite mobile browser.

Resources

Let’s block ads! (Why?)

Microsoft Power BI Blog | Microsoft Power BI

Multi-objective Optimization for Feature Selection: Part 3

In my previous posts (Part 1 and Part 2), we discussed why feature selection is a great technique for improving your models. By having the model analyze the important signals, we can focus on the right set of attributes for optimization. As a side effect, less attributes also mean that you can train your models faster, making them less complex and easier to understand. Finally, less complex models tend to have a lower risk of overfitting. This means that they are more robust when it comes to creating the predictions for new data points.

We also discussed why a brute force approach to feature selection is not feasible for most data sets and tried multiple heuristics for overcoming the computation problem. We looked at evolutionary algorithms which turned out to be fast enough for most data sets. And they have a higher likelihood to find the optimal attribute subset.

So, we’ve found the solution then, right? Well, not quite yet.

Regularization for Feature Selection

So far, we have been optimizing for model accuracy alone. We know from regular machine learning methods that this is a bad idea. If we only look for the most accurate model on the training data it will lead to overfitting. Our models will perform worse on new data points as a result. Most people only think about overfitting when it comes to the model itself. We should be equally concerned about overfitting when we make any decision in data analysis.

If we decide to merge two data sets, take a sample, or filter down the data, we are making a modeling decision. In fact, a more sophisticated machine learning model could have made this decision on its own. We need to be careful about overfitting and validating all decisions, not just the model itself. This is the reason it does not make sense to separate data preparation from modeling. We need to do both in an integrated fashion, and validate them together.

It doesn’t matter if we do feature selection automatically or manually. Any selection becomes a part of the model. And as such it needs to be validated and controlled for overfitting. Learn more about this in our recent blog series on correct validation.

We need to perform regularization to control overfitting for feature selection, just like we do for any other machine learning method. The idea behind regularization is to penalize complexity when you build models. The concept of regularization is truly at the core of statistical learning. The image below defines regularized risk based on the empirical risk and the structural risk. The empirical risk is the error we make on the training data. Which is simply the data we use for doing our feature selection. And the structural risk is a measurement of complexity. In case of an SVM the structural risk would be a low width of the margin. In case of feature selection it is simply the number of features. The more features, the higher the structural risk. We of course want to minimize both risks, error and complexity, at the same time.

trade off factor Multi objective Optimization for Feature Selection: Part 3

Minimizing the number of features and maximizing the prediction accuracy are conflicting goals. Less features means reduced model complexity. More features mean more accurate models. If we can’t minimize both risks at the same time, we need to define which one is more important. This is necessary so we can decide in cases where we need to sacrifice one for the other.

When we make decisions dealing with conflicting objectives, we can introduce a trade-off factor. This is the factor “C” in the formula above. The problem is that we cannot determine C without running multiple experiments. What are the possible accuracy ranges we can reach with our models on the given data set? Do we need 10 or 100 features for this? We can define C without knowing those answers. We could use hold-out data set for testing and then try to optimize for a good value of C, but that takes time.

Wouldn’t it be great if we didn’t have to deal with this additional parameter “C”? Finding a complete range of potential solutions. Some models would be great for lots of accuracy while others would use as little attributes as possible. And then of course some solutions for the trade-off in between. At this point, we also want to use an automated algorithm in a fast and feasible way.

The good news: this is exactly the key idea behind multi-objective optimization. We will see that we can adapt the evolutionary feature selection from our previous post. Once we do that, our model will deliver all good results rather than a single solution.

Or… You just optimize for both simultaneously

We want to maximize the accuracy and minimize the number of features at the same time. Let’s begin by drawing this solution space so that we can compare different solutions. The result will be an image like the one below. We use the number of features on the x-axis and the achieved accuracy on the y-axis. Each point in this space is now representing a model using a specific feature set. The orange point on the left, for example, represents a model which uses only one feature. And the accuracy of this model is 60% (or “0.6” like in the image).

Is the point on the left now better or worse than the other ones? We do not know. Sometimes we prefer less features which makes this a good solution. Sometimes we prefer more accurate models where more features work better. One thing is for sure, we want to find solutions towards the top left corner in this chart. Those are the models run with as little features as possible, and are also the most accurate. This means that we should prefer solutions in this corner over those more towards the bottom right.

Let’s have a look at some examples to make this clearer. In the image below we added three blue points to the orange ones we already had. Are any of those points better than the orange ones? The blue point on the left has only one attribute, which is good. But we have a better model with one attribute: the orange point on top of it. Hence, we prefer the left orange solution over the left blue one. Something similar is true for the blue point on the right. We achieve 85% accuracy, but this is already possible with the solution using only 5 instead of 6 features. We would prefer the less complex model over the right blue point then. The blue point in the middle is even worse: it has less accuracy and more features than necessary. We certainly would prefer any of the orange points over this blue one.

In short, the blue points are clearly inferior to the orange ones. We can say that the orange points dominate the blue ones. The image below shows a bigger set of points (blue) which are all dominated by the orange ones:

We will now transform those concepts into a new feature selection algorithm. Evolutionary algorithms are among the best when you optimize for multiple conflicting criteria. All we need is a new selection approach for our evolutionary feature selection. This new selection technique will simultaneously optimize for more accuracy and less features.

We call this approach non-dominated sorting selection. We can simply replace the single-objective tournament selection with the new one. The image below shows the idea of non-dominated sorting. We start with the first rank of points which are dominating the other points. Those solutions (feature sets in our case) make it into the population of the next generation. They are shown as transparent below. After removing those feature sets, we look for the next rank of dominating points (in solid orange below). We again add those points to the next population as well. We continue this until we reach the desired population size.

The result of such a non-dominated sorting selection is what we call a Pareto front. See the image below for an example. Those are the feature sets which dominate all others. Any solution taken from this Pareto front is equally good. Some are more accurate, some are less complex. They describe the trade-off between both conflicting objectives. We will find those solutions without the need of defining a trade-off factor beforehand.

pareto front Multi objective Optimization for Feature Selection: Part 3

Multi-Objective Feature Selection in Practice

This is one of things which makes multi-objective optimization so great for feature selection. We can find all potentially good solutions without defining a trade-off factor. Even better, we can find all those solutions with a single optimization run. So, it is also a very fast approach. We can inspect the solutions on such a Pareto front. By doing so we can learn from the interactions of features. Some features might be very important in smaller feature sets, but become less important in larger ones. It can happen that interactions of other features become stronger predictors instead. Those are additional insights we can get from the Pareto front. We can see at a glance what the achievable accuracy range is, and what the good range of features is for which we should focus on. Do we need to consider 10 to 20 features or between 100 and 200? This is valuable information in building models.

Let’s now run such a multi-objective optimization for feature selection. Luckily we do not need to code all those algorithms. In RapidMiner, we just need to make two little adaptions in the visual workflow. First, we have to change the selection scheme from tournament selection to non-dominated sorting. This is a parameter of the regular evolutionary feature selection operator. Second, we need to add a second performance criterion besides the accuracy. This would be the number of features. Although not necessary, I have also defined some parameters to add some visual output. This will show the movement of the Pareto front during the optimization. And it will also display the details of all feature sets on the Pareto front at the end.

Results

We are going to use the Sonar data set. As a reminder, the attributes represent bands in a frequency spectrum. And the goal is to classify if an object is a rock or a mine. The basic setup is the same. We have now applied the discussed changes to turn this into a multi-objective feature selection. The image below shows the resulting Pareto front of the feature selection:

pareto front sonar data Multi objective Optimization for Feature Selection: Part 3

The resulting Pareto front has 9 different feature sets. They span a range between 1 and 9 attributes. And the accuracy range we can achieve is between 73% and 82%. We show the attribute count as a negative number, simply because RapidMiner always tries to maximize all objectives. Minimizing the number of attributes or maximizing the negative count are the same. This also means that this Pareto front will move to the top right corner, not to the top left like we discussed before.

Here is the full trade-off between complexity and accuracy. We can see that it does not make sense to use more than 9 features. And we should not accept less than 73% accuracy, since that result can already be achieved with one single feature.

It is also interesting to look further into the details of the resulting attribute sets. If we configure the workflow, we end up with a table showing the details of the solutions:

We again see the range of attributes (between 1 and 9) and accuracies (between 73% and 82%). We can gain some additional insights if we look into the actual attribute sets. If we only use one single feature, let it be attribute_12. But if we use two features, then attribute_12 is inferior to the combination of attribute_11 and attribute_36. This is another indicator why hill-climbing heuristics like forward selection have such a hard time.

The next attribute we should add is attribute_6. But we should drop it again for the attribute sets with 4 and 5 features in favor of other combinations. This attribute becomes interesting only for the larger sets again.

Finally, we can see that the largest attribute sets consisting of 8 or 9 attributes cover all relevant areas of the frequency spectrum. See the previous post for more details on this.

It is these insights which make multi-objective feature selection the go-to-method for this problem. Seeing good ranges for attribute set sizes or the interactions between features allow us to build better models. But there is more! We can use multi-objective feature selection for unsupervised learning methods like clustering. Stay tuned, we discuss this in the next blog post.

RapidMiner Processes

You can download RapidMiner here. Download the processes below to build this machine learning model yourself in RapidMiner.

Download the zip-file and extract its contents. The result will be an .rmp file which can be loaded into RapidMiner via “File” -> “Import Process”.

Let’s block ads! (Why?)

RapidMiner

Power BI Desktop January Feature Summary

Happy new year to everyone! We are kicking off the new year with an update to Power BI Desktop focused on incremental improvements to popular features you are already using, including automatic date hierarchy, data label and axis formatting, and our relative date slicer. The ability to hide pages is another big update that gives you much more flexibility over how users consume your reports.

5d7f8edd 60a2 45e6 8cd8 5a7fa35e97b5 Power BI Desktop January Feature Summary

Don’t forget you can now get Power BI Desktop from the Windows Store if you’re on Windows 10!

003c6541 690b 4fef aad3 28981197f13c Power BI Desktop January Feature Summary

Here’s the complete list of January updates:

Reporting

Analytics

Custom Visuals

Data connectivity

Other

For a summary of the major updates, you can watch the following video:

While creating your reports, you may have needed to hide a page or two perhaps because of in-progress work. You still wanted users to have access to other parts of the report. Hiding pages is also useful when you start using drillthrough or creating your own navigation through bookmarking, and you don’t want you users to access a page through the normal page navigation.

Starting this month, you can now hide any number of pages you want through right clicking the page name and selecting hide.

ce8aa5f5 b857 497f 9867 830e3b5c1728 Power BI Desktop January Feature Summary

While you are editing the report, you’ll still see the pages, but they will look different to indicate they are hidden. You won’t be able to see the pages at all when you move to reading mode.

9f62ffba 6982 496a bfd6 a918edf820f2 Power BI Desktop January Feature Summary

Watch the following video to learn more about hiding pages:

You can now control the background color for data labels in Cartesian and map visuals. This is useful for stylistic changes and readability, especially when you have a situation where the data label is halfway over a data point. You’ll find the option under the Data labels card in the formatting pane. f2ee1553 3e81 4bab 9a57 b8c7f70c7926 Power BI Desktop January Feature Summary

Watch the following video to learn more about data label background colors:

If you have a chart where the axis labels are being cut off, previously your only option would be to increase the size of your chart.

919f3d4c f87e 4739 87df 1bfaa0bf2495 Power BI Desktop January Feature Summary

Of course, that isn’t a great option for many of you as report page space is limited, so this month we are adding a maximum size control for the axis to the formatting pane. This new slider can be used to increase the percentage of the chart used by the axis labels.

356e90dc 2d36 4a44 b583 702bf6b7c2ff Power BI Desktop January Feature Summary

Watch the following video to learn more about axis size control:

Another new axis formatting option we’ve added is the ability to control the inner padding between bars. You can make the inner padding up to 50% of the bar’s width.

01421a97 d97c 4f08 93be 2ca2d43c9e29 Power BI Desktop January Feature Summary

Or all the way down to zero to get a histogram effect.

5276d74e 5c51 4c7c 9ca8 3061f5e81f0b Power BI Desktop January Feature Summary

Watch the following video to learn more about padding control:

For a long time now, we’ve been automatically creating a date hierarchy for you whenever you are using our internal date table. We haven’t been showing that hierarchy in the field list though. Instead we have just created it automatically for you when you drag your data column into a visual. With this latest update you can now see the entire hierarchy in the field list. You’ll need to turn on the preview feature switch under File > Options and Settings > Options > Preview features.

1f187d1a e617 463a 8bc4 945bad4625f6 Power BI Desktop January Feature Summary

This is useful not only because we are being more transparent about automatically generating this hierarchy, but because you can now use individual fields from the hierarchy in your visuals just by dragging or checking the specific level you are interested in. Previously, to do this you would have needed to put the entire hierarchy into the visual and then remove the levels you didn’t want.

Watch the following video to learn more about date hierarchies:

When using the relative date slicer, you may want to start further back in time than today. This is very common whenever your data is complete until a few days later when all sales are finalized. You can now do this with the relative date slicer by setting an anchor date. Once you have an anchor date set, the slicer will be evaluated relative to that specific date. You can set it in the date range options in the formatting pane.

fd76b914 84e9 4fdb b1b1 72618bba2222 Power BI Desktop January Feature Summary

Watch the following video to learn more about anchor dates:

After bringing Q&A to Power BI Desktop in December, we’re continuing to improve the natural language engine itself. You can now ask top-n type questions such as “What are my top 3 products by sales.”

Or “top manufacturer by NSAT”

Both top and bottom work, so try it out on your data!

This quick measure was suggested by Daniil Maslyuk in the quick measures gallery. It calculates the Pearson correlation coefficient between two measures within a certain category – a really common statistical measure of how related two values are.

With the new PowerApps custom visual, you aren’t only able to quickly find insights in your data, you can take action on it as well. PowerApps makes it easy to build automated workflows, and with the new custom visual, that experience is embedded right in your report. If you want to learn more about this visual make sure to read our dedicated blog or checking out the custom visual on AppSource.

You can also watch the following video to learn more about the PowerApps custom visual:

The TreeViz custom visual by AutoCont CZ a.s. lets you display your hierarchical data in a tree like structure. You can expand and collapse each level and decide how many children data points to show before grouping the smaller ones into a single child group.

0d5e9484 fda3 4a74 b956 31590bc039a4 Power BI Desktop January Feature Summary

You can check out this custom visual on AppSource.

Watch the following video to learn more about the TreeViz custom visual:

The Funnel with Source custom visual is perfect to track any metric of interest over various stages along with the source of entry of the data point to the funnel. For example, in the case of sales journey data, this visual displays the channel through which any lead enters the funnel and then the corresponding journey through the various stages of the sales cycle. You can also filter data based on the selection of channel and on the selection of any stage of the sales journey.

3dfe82eb 7999 4298 9d61 cbca841c0e5b Power BI Desktop January Feature Summary

You can find this visual on AppSource.

Watch the following video to learn more about the Funnel with Source custom visual:

Box and whisker charts are most commonly used in statistical analysis to see the spread of your data. They consist of two parts—the main body called the box and the thin vertical lines coming out of the box called whiskers. The first quartile forms the bottom and the third quartile forms the top of the box. The whiskers connect the minimum and the maximum values to the box. In addition to showing median, first and third quartile, and maximum and minimum values, the Box and Whisker chart by MAQ Software is also used to depict mean, standard deviation and quartile deviation.

f2a05aca d24e 4e98 a8f2 1862233c03fe Power BI Desktop January Feature Summary

Make sure to check out this visual on AppSource.

Watch the following video to learn more about the Box ans Whisker chart custom visual:

The Agility Planning Matrix Light by Managility is a matrix visual for planning and budgeting. Some benefits of this matrix visual is the ability to selectively drilldown on specific rows and to display currently applied filters on the visual itself.

f008f92f f8c7 4e0f 946e 254e6bcfdcd9 Power BI Desktop January Feature Summary

Check out this custom visual on AppSource.

Watch the following video to learn more about the Agility Planning Matrix Light custom visual:

The Image Grid custom visual by Fredrik Hedenström is like a word cloud for images. You can display many images and choose for the size to be based on the sort order or the value of a measure. You can also decide to show them in a grid view instead with the top visual to be based on the sort order.

83d0207f cb5d 40ee 85a4 eb0484e83399 Power BI Desktop January Feature Summary

Get this custom visual on AppSource, and watch the following video to learn more about the Image Grid custom visual:

The Gantt Chart by MAQ Software visual is used for scheduling and management of tasks. This visual shows additional details of the task by providing a grid where one can view more data related to the task and can display the hierarchy of data category. Also, it can be sorted based on any data point of the task.

1444afdc fb4e 4c7a bf76 d293ea78f4ce Power BI Desktop January Feature Summary

You can download this custom visual on AppSource.

Watch the following video to learn more about the Image Grid custom visual:

One of the most frequent requests from customers using the Azure SQL Database and Data Warehouse connectors has been being able to leverage Azure Active Directory authentication in order to connect to these Azure services within Power BI.

With this month’s update for Power BI Desktop, we’re introducing a new “Microsoft Account” option within the Credentials dialog for the Azure SQL Database and Data Warehouse connectors, allowing users to authenticate using their AAD accounts.

Note that support for this new authentication type in the Power BI service will come in a few weeks. Please stay tuned to this blog for updates on this capability.

7368d923 c7cf 4084 a111 c87d0090b19c Power BI Desktop January Feature Summary

If you’re using the Windows store app, you can now specify the display language instead of using the default experience of the Windows display language. You can also specify the model language separate from the display language. This is useful if you are working for a global company and there are people from all over the world consuming and creating reports in languages other than the default language the report was created in.

You’ll see these new options under the Regional Settings tab in the Options dialog. Changing the application language will ignore the Windows default display language and switch to the chosen language. Changing the model language will decouple the model language from the application language for any newly created reports. (Note that this won’t affect the model language for existing reports.) Changing either of these settings requires a restart.

9e933c84 d619 4d39 a840 d8b5c7010d15 Power BI Desktop January Feature Summary

That’s all for this month! We hope that you enjoy these updates and continue sending us your feedback. Please don’t forget to vote for other features that you’d like to see in the Power BI Desktop.

b6fbb8f7 ba81 4316 9f6e bec3d29025ee Power BI Desktop January Feature Summary

Let’s block ads! (Why?)

Microsoft Power BI Blog | Microsoft Power BI

Congratulations to the Winners of the “Bookmarking Feature” Contest!

You may remember that the “Bookmarking Feature” contest launched a few weeks ago to feature the recently updated bookmark feature in Power BI Desktop.

We had so many amazing submissions that showcased your creativity as well as the functionality of the bookmark feature. Power BI Program Managers Will Thompson and Amanda Cofsky reviewed each entry and, with great difficulty, have narrowed down the winners!

Bookmark Feature Winners

Our grand prize winner is The Power BI Whisky Experience by Ryan Bentham. Judges had this to say about the report: “This is a well-designed report with a fun topic. It makes great use of bookmarking in several ways to create an app-like experience. It’s a lot of fun to change all the parameters and see what kind of whiskeys come up!”

Our first runner up is Team Performance vs. Target by Pawel Jaszczynski. It should be noted that this report was the best business-focused report that was submitted. Our judges also note that “this report is a very beautiful and well-organized report. It also takes great advantage of both drill-through and bookmarking to create a great example of a report that could be used daily in a sales department.”

The next runner up is Power BI Quest by David Eldersveld; judges’ note: “Seeing this entry was very exciting for us! When we first started working on the bookmarking feature, one of the first things we thought was how fun it would be to use the feature to create a ‘Choose your own adventure’ style game, so it was very cool to see someone actually do that!”

Thank you so much for your awesome participation! If you missed this contest, check out the “Year in Review” Contest!

Let’s block ads! (Why?)

Microsoft Power BI Blog | Microsoft Power BI

Multi-objective Optimization for Feature Selection: Part 3

In my previous posts (Part 1 and Part 2), we discussed why feature selection is a great technique for improving your models. By having the model analyze the important signals, we can focus on the right set of attributes for optimization. As a side effect, less attributes also mean that you can train your models faster, making them less complex and easier to understand. Finally, less complex models tend to have a lower risk of overfitting. This means that they are more robust when it comes to creating the predictions for new data points.

We also discussed why a brute force approach to feature selection is not feasible for most data sets and tried multiple heuristics for overcoming the computation problem. We looked at evolutionary algorithms which turned out to be fast enough for most data sets. And they have a higher likelihood to find the optimal attribute subset.

So, we’ve found the solution then, right? Well, not quite yet.

Regularization for Feature Selection

So far, we have been optimizing for model accuracy alone. We know from regular machine learning methods that this is a bad idea. If we only look for the most accurate model on the training data it will lead to overfitting. Our models will perform worse on new data points as a result. Most people only think about overfitting when it comes to the model itself. We should be equally concerned about overfitting when we make any decision in data analysis.

If we decide to merge two data sets, take a sample, or filter down the data, we are making a modeling decision. In fact, a more sophisticated machine learning model could have made this decision on its own. We need to be careful about overfitting and validating all decisions, not just the model itself. This is the reason it does not make sense to separate data preparation from modeling. We need to do both in an integrated fashion, and validate them together.

It doesn’t matter if we do feature selection automatically or manually. Any selection becomes a part of the model. And as such it needs to be validated and controlled for overfitting. Learn more about this in our recent blog series on correct validation.

We need to perform regularization to control overfitting for feature selection, just like we do for any other machine learning method. The idea behind regularization is to penalize complexity when you build models. The concept of regularization is truly at the core of statistical learning. The image below defines regularized risk based on the empirical risk and the structural risk. The empirical risk is the error we make on the training data. Which is simply the data we use for doing our feature selection. And the structural risk is a measurement of complexity. In case of an SVM the structural risk would be a low width of the margin. In case of feature selection it is simply the number of features. The more features, the higher the structural risk. We of course want to minimize both risks, error and complexity, at the same time.

trade off factor Multi objective Optimization for Feature Selection: Part 3

Minimizing the number of features and maximizing the prediction accuracy are conflicting goals. Less features means reduced model complexity. More features mean more accurate models. If we can’t minimize both risks at the same time, we need to define which one is more important. This is necessary so we can decide in cases where we need to sacrifice one for the other.

When we make decisions dealing with conflicting objectives, we can introduce a trade-off factor. This is the factor “C” in the formula above. The problem is that we cannot determine C without running multiple experiments. What are the possible accuracy ranges we can reach with our models on the given data set? Do we need 10 or 100 features for this? We can define C without knowing those answers. We could use hold-out data set for testing and then try to optimize for a good value of C, but that takes time.

Wouldn’t it be great if we didn’t have to deal with this additional parameter “C”? Finding a complete range of potential solutions. Some models would be great for lots of accuracy while others would use as little attributes as possible. And then of course some solutions for the trade-off in between. At this point, we also want to use an automated algorithm in a fast and feasible way.

The good news: this is exactly the key idea behind multi-objective optimization. We will see that we can adapt the evolutionary feature selection from our previous post. Once we do that, our model will deliver all good results rather than a single solution.

Or… You just optimize for both simultaneously

We want to maximize the accuracy and minimize the number of features at the same time. Let’s begin by drawing this solution space so that we can compare different solutions. The result will be an image like the one below. We use the number of features on the x-axis and the achieved accuracy on the y-axis. Each point in this space is now representing a model using a specific feature set. The orange point on the left, for example, represents a model which uses only one feature. And the accuracy of this model is 60% (or “0.6” like in the image).

Is the point on the left now better or worse than the other ones? We do not know. Sometimes we prefer less features which makes this a good solution. Sometimes we prefer more accurate models where more features work better. One thing is for sure, we want to find solutions towards the top left corner in this chart. Those are the models run with as little features as possible, and are also the most accurate. This means that we should prefer solutions in this corner over those more towards the bottom right.

Let’s have a look at some examples to make this clearer. In the image below we added three blue points to the orange ones we already had. Are any of those points better than the orange ones? The blue point on the left has only one attribute, which is good. But we have a better model with one attribute: the orange point on top of it. Hence, we prefer the left orange solution over the left blue one. Something similar is true for the blue point on the right. We achieve 85% accuracy, but this is already possible with the solution using only 5 instead of 6 features. We would prefer the less complex model over the right blue point then. The blue point in the middle is even worse: it has less accuracy and more features than necessary. We certainly would prefer any of the orange points over this blue one.

In short, the blue points are clearly inferior to the orange ones. We can say that the orange points dominate the blue ones. The image below shows a bigger set of points (blue) which are all dominated by the orange ones:

We will now transform those concepts into a new feature selection algorithm. Evolutionary algorithms are among the best when you optimize for multiple conflicting criteria. All we need is a new selection approach for our evolutionary feature selection. This new selection technique will simultaneously optimize for more accuracy and less features.

We call this approach non-dominated sorting selection. We can simply replace the single-objective tournament selection with the new one. The image below shows the idea of non-dominated sorting. We start with the first rank of points which are dominating the other points. Those solutions (feature sets in our case) make it into the population of the next generation. They are shown as transparent below. After removing those feature sets, we look for the next rank of dominating points (in solid orange below). We again add those points to the next population as well. We continue this until we reach the desired population size.

The result of such a non-dominated sorting selection is what we call a Pareto front. See the image below for an example. Those are the feature sets which dominate all others. Any solution taken from this Pareto front is equally good. Some are more accurate, some are less complex. They describe the trade-off between both conflicting objectives. We will find those solutions without the need of defining a trade-off factor beforehand.

pareto front Multi objective Optimization for Feature Selection: Part 3

Multi-Objective Feature Selection in Practice

This is one of things which makes multi-objective optimization so great for feature selection. We can find all potentially good solutions without defining a trade-off factor. Even better, we can find all those solutions with a single optimization run. So, it is also a very fast approach. We can inspect the solutions on such a Pareto front. By doing so we can learn from the interactions of features. Some features might be very important in smaller feature sets, but become less important in larger ones. It can happen that interactions of other features become stronger predictors instead. Those are additional insights we can get from the Pareto front. We can see at a glance what the achievable accuracy range is, and what the good range of features is for which we should focus on. Do we need to consider 10 to 20 features or between 100 and 200? This is valuable information in building models.

Let’s now run such a multi-objective optimization for feature selection. Luckily we do not need to code all those algorithms. In RapidMiner, we just need to make two little adaptions in the visual workflow. First, we have to change the selection scheme from tournament selection to non-dominated sorting. This is a parameter of the regular evolutionary feature selection operator. Second, we need to add a second performance criterion besides the accuracy. This would be the number of features. Although not necessary, I have also defined some parameters to add some visual output. This will show the movement of the Pareto front during the optimization. And it will also display the details of all feature sets on the Pareto front at the end.

Results

We are going to use the Sonar data set. As a reminder, the attributes represent bands in a frequency spectrum. And the goal is to classify if an object is a rock or a mine. The basic setup is the same. We have now applied the discussed changes to turn this into a multi-objective feature selection. The image below shows the resulting Pareto front of the feature selection:

pareto front sonar data Multi objective Optimization for Feature Selection: Part 3

The resulting Pareto front has 9 different feature sets. They span a range between 1 and 9 attributes. And the accuracy range we can achieve is between 73% and 82%. We show the attribute count as a negative number, simply because RapidMiner always tries to maximize all objectives. Minimizing the number of attributes or maximizing the negative count are the same. This also means that this Pareto front will move to the top right corner, not to the top left like we discussed before.

Here is the full trade-off between complexity and accuracy. We can see that it does not make sense to use more than 9 features. And we should not accept less than 73% accuracy, since that result can already be achieved with one single feature.

It is also interesting to look further into the details of the resulting attribute sets. If we configure the workflow, we end up with a table showing the details of the solutions:

We again see the range of attributes (between 1 and 9) and accuracies (between 73% and 82%). We can gain some additional insights if we look into the actual attribute sets. If we only use one single feature, let it be attribute_12. But if we use two features, then attribute_12 is inferior to the combination of attribute_11 and attribute_36. This is another indicator why hill-climbing heuristics like forward selection have such a hard time.

The next attribute we should add is attribute_6. But we should drop it again for the attribute sets with 4 and 5 features in favor of other combinations. This attribute becomes interesting only for the larger sets again.

Finally, we can see that the largest attribute sets consisting of 8 or 9 attributes cover all relevant areas of the frequency spectrum. See the previous post for more details on this.

It is these insights which make multi-objective feature selection the go-to-method for this problem. Seeing good ranges for attribute set sizes or the interactions between features allow us to build better models. But there is more! We can use multi-objective feature selection for unsupervised learning methods like clustering. Stay tuned, we discuss this in the next blog post.

RapidMiner Processes

You can download RapidMiner here. Download the processes below to build this machine learning model yourself in RapidMiner.

Download the zip-file and extract its contents. The result will be an .rmp file which can be loaded into RapidMiner via “File” -> “Import Process”.

Let’s block ads! (Why?)

RapidMiner

Power BI Desktop December Feature Summary

This month we are very excited to integrate our Q&A experience into the reporting canvas, offering a whole new way to create your visuals. We are also updating our bookmarking preview with new features that provide you much more flexibility. A highly requested feature, the ability to drill to filter other visuals on the page, is also coming out this month. You can take advantage of several new connectors and major updates to existing ones.

5d7f8edd 60a2 45e6 8cd8 5a7fa35e97b5 Power BI Desktop December Feature Summary

Don’t forget you can now get Power BI Desktop from the Windows Store if you’re on Windows 10!

003c6541 690b 4fef aad3 28981197f13c Power BI Desktop December Feature Summary

Here’s the complete list of December updates:

Reporting

Analytics

Data connectivity

For a summary of the major updates, you can watch the following video:

Our goal is to answer questions about your data as quickly and easily as possible, and our Q&A feature can make that easier than ever. For a while now, Q&A has let you ask questions about your data using natural language on your dashboards and get a visual back that answers that question. Then you could pin those answers to the dashboard. This month we are also unlocking this powerful feature on your reports. Double clicking on a blank area of your report page or clicking the Ask a question button on the ribbon gets you a question box to type your question.

9653b8e1 4b8e 4ff1 be1d 9924569a12dc Power BI Desktop December Feature Summary

As you type, a visual is created in real-time answering the question and updates as you modify or add to your question.

64ebfaba b81f 483d a9ae 7f0f4b770e26 Power BI Desktop December Feature Summary

Once your visual is made, you can use it the exact same way as a normal Power BI visual, changing fields around and formatting it to match your report’s style.

With Q&A now built into the report experience, you can get answers to your questions just by typing them. It also lets you quickly create an entire report without needing to use the field well or list. Q&A is currently a preview feature, so you will need to enable it through our Options dialog the first time you want to use it.

Watch the following video to learn more about Q&A integrated into the report canvas:

We have two exciting updates for our bookmarking preview this month, the first of which is the ability to bookmark your cross-highlighted state in addition to your filters, slicers, and other bookmarkable features. Cross-highlighting is an important part of any analysis and can often reveal interesting insights. Now that you can bookmark cross-highlighting, you can easily share these insights with everyone who uses your reports.

b479ee0b 7e04 4b0e 960f 925db3682a9e Power BI Desktop December Feature Summary

Some of the early feedback we received for bookmarking was that you needed more flexibility in what is bookmarked. You wanted to be able to control the types of properties that were bookmarked and what visuals on a page were affected by the bookmark. This month, we’re adding two new sets of controls to let you do just that.

The first lets you pick if the bookmark will apply the “data” properties such as filters and slicers, the “visual” properties such as spotlight and the visibility, and/or the page change that moves users to the page that was visible when the bookmark was added. You can mix and match any of these group of properties to bookmark just what you need. A common use case for this feature is when you are using bookmarking to switch between visual types, as I showed in the October blog post. When you are doing this, you likely want to turn the data properties off so that filters aren’t reset as users change visual types.

0dad5fb3 70c2 40f5 8a89 9a245b50b8d1 Power BI Desktop December Feature Summary

The second lets you pick what visuals the bookmark is applied to. By default, bookmarks apply to the entire page. However, if you want the bookmark to apply to only part of the page, for example because you divided your page in half to do a side by side comparison, you can now do that.

You’ll set up the bookmark how you want it to look and then multi-select the visuals you want included in the bookmark. Then create or update your bookmark while the visuals are still selected. You can now change the bookmark through the bookmark’s menu to Selected visuals instead of All visuals, and it will only apply the bookmark to those visuals you selected. As an example, in the image below, you can see that since I only selected the four visuals on the right and not table and other visuals on the left, the bookmark will only apply to those visuals on the right half of my page.

b125a3c0 eaea 4210 90a4 c642d5a6cd8f Power BI Desktop December Feature Summary

Watch the following video to learn more about these bookmarking enhancements:

The field properties pane allows you to change the name and description of a field from your fields list. You can access this pane by selecting Properties from the field menu.

40fd18e8 29a1 4408 8cf5 40764195438d Power BI Desktop December Feature Summary

In the pane, you’ll be able to rename the field and give it a description.

ea185c46 1d1b 497f 84cc 94aed4c0062d Power BI Desktop December Feature Summary

Once, you give a field a description, you can see if anytime by hovering over the field in the list.

e7b8d8c9 ec7c 47b6 a3b7 78ad4e38cb30 Power BI Desktop December Feature Summary

If you have multiple people editing your reports, this is a great way for everyone to understand what fields they should use and how measures are being calculated.

The description is also shown if you’re using a connection to an external Analysis Services model. This means you can see this information from centralized IT-developed models if the developer set the description there as well.

Watch the following video to learn more about the field properties pane:

Until now, the way to create a scatter visual was to use two measures, one for each axis, and a categorical value in the details bucket to group on. Now, you don’t need to add a category to the details bucket. We will instead plot all the x- and y- pairs that you have in your model. To create your scatter visual this way, just pick Don’t summarize as the aggregation type. As a reminder Don’t summarize is only an option for numeric columns. This won’t work for measures.

5677c13d 4199 472a ab31 651fde8c7d89 Power BI Desktop December Feature Summary

Watch the following video to learn more about this scatter chart improvement:

This month we are extending our high density sampling algorithm, which we already support in scatter and line visuals, to maps using latitude and longitude. This new algorithm will better preserve the shape of your data while also surfacing outliers. To do this, we prioritize showing points that aren’t going to be hidden by neighboring points. With this change, you should see a noticeably improved experience with any map visuals using latitude and longitude that exceed the data point limit. This sampling algorithm allows you to represent a large number of data points without sacrificing performance and load time.

If you would like to use the previous sampling algorithm or test out the improvements, you can turn off High Density Sampling in the General card of the formatting pane. We will continue to invest in this area and improve other visuals as well, so be sure to give us feedback on the new experience!

0b8b93fd d33d 4bf7 89d5 e096ccc7675e Power BI Desktop December Feature Summary

Responsive slicers are the latest update to our responsive visuals work. With responsive slicers, you can resize them to a variety of different sizes and shapes, from horizontal to square to vertical, and the vales in the slicer rearrange themselves as you do. You can make both horizontal slicers and date and range slicers responsive. Both the date and range slicers also have the added benefit of improved touch areas so it’s easier to change them with a fingertip. As with all our responsive visuals, the benefits are best seen when you use them on the Power BI mobile apps as well, since they resize automatically to fit well on all platforms. You can read our documentation to learn more.

Card Browser

The Card Browser visual is a document set viewer featuring flippable, double-sided thumbnails for natural navigation of media collections.

The Preview face of each card renders the headline image, title, and origin of the story with a text sample, enabling rapid discovery of documents of interest.

47001b74 89f0 4297 a34d 614e1fdd18d5 Power BI Desktop December Feature Summary

Flipping the cards reveals the MetaData face, which lists document properties. Clicking a card expands it in place for detailed reading.

c2457fc4 c361 4f60 857c 3bc95c15a499 Power BI Desktop December Feature Summary

Watch the following video to learn more about the Card Browser visual:

Table Heatmap

The Table Heatmap visual is a great way to compare data values in an easy and intuitive way. You can give it a category and all the measures you want to compare. The visual will then color the matrix in a stepped color gradient. You can either pick your own start and end color or from a list of predefined color palettes and also how many steps you want to have.

f5e04b5c d700 4170 a938 02506b91571a Power BI Desktop December Feature Summary

Watch the following video to learn more about the Table Heatmap visual:

Data Image by CloudScope

The Data Image visual by CloudScope is great visual for displaying dynamic web images. You can pick to crop the image into a circle and to add a border.

9a669449 8295 4545 b90e 5aa3f1000f1d Power BI Desktop December Feature Summary

Watch the following video to learn more about the Data Image visual by CloudScope:

Power KPI Matrix

The Power KPI Matrix visual enables balanced scorecards in Power BI and displays an unlimited number of metrics and KPIs in a single, customized list. Power KPI Matrix is similar to the Power KPI custom visual in supporting the creation and display of KPI indicator symbols and values along with current and historical trends of actual and target values, but it has the added benefit of the tabular format that allows for an unlimited number of metrics and KPIs in a single matrix. This visual is very flexible and customizable with fine grain control of many formatting options.

b5c942a3 3dce 4a55 9a03 90ff8a89a0e2 Power BI Desktop December Feature Summary

Watch the following video to learn more about the Power KPI Matrix visual:

Text Filter

The Text Filter visual is a search box that can be used to filter all the visual on your report page to any categories in a given field that contains the text you specify.

94da7f99 1a77 4741 8669 f70052caa8df Power BI Desktop December Feature Summary

Watch the following video to learn more about the Text Filter visual:

ChartAccent – LineChart

The ChartAccent visual’s line chart is filled with rich annotation features, including highlighting individual data points, data series, points within a range, or even points above the average value. You have fine-grained control over the data label formatting, the ability to define ranges on the axis, and the ability to annotate the chart with shapes, text, and images. Each annotation is listed on the top right section of the pane, so you can see what you’ve done and easily make edits to or remove annotations.

76fee45d 1d32 47ad 8fbe a4f06a379f29 Power BI Desktop December Feature Summary

ChartAccent – BarChart

The ChartAccent visual’s bar chart is just as flexible as the line chart version. It’s goal is still to provide you an easy way to highlight your data points by providing advanced formatting options.

f4c611ea 6ec5 44fa b8a3 fd4b524cd51a Power BI Desktop December Feature Summary

Watch the following video to learn more about the ChartAccent visuals:

This month we are adding one of the most popular drill features, the ability for other visuals on your report to be filtered as you drill down in a visual. Now, as you drill down in a given category in your visuals, you can have it also filter all visuals on your page by that same category. This new option is on by default for all new reports, but you can turn it off (or turn it on for older reports), through the option in the Format tab.

741f94d2 8bf6 4e85 8d7d bb424bd4f79f Power BI Desktop December Feature Summary

Watch the following video to learn more about drill filtering other visuals:

This month we’re releasing a new connector for Adobe Analytics. This connector allows you to easily import and analyze your Adobe Analytics data within Power BI.

The connector is currently available as a Beta connector in Power BI Desktop. Note that end-to-end support for this connector in the Power BI Service will become available in a few months.

The new Adobe Analytics connector can be found in the Get Data dialog, under the Online Services category.

0fb23fd6 11f1 4bb5 a6b0 c875f6719237 Power BI Desktop December Feature Summary

After selecting the Adobe Analytics connector, you can authenticate using your credentials.

1bb99b48 5fbf 4075 9f0f f3a8a01c849e Power BI Desktop December Feature Summary

9c39c779 4ea4 4490 8d99 21dd97f523cf Power BI Desktop December Feature Summary

Once the connection has been successfully established, you can preview and select multiple dimensions and measures within the Navigator dialog to create a single tabular output. As part of this selection, you can also provide any necessary input parameters for the selected items.

After selecting the desired information, you can decide to directly load your query into Power BI Desktop or perform any additional data transformations and filters within the Query Editor UX upfront, via the Edit option in this dialog.

cc367e7b 56a4 4f40 9ad5 d471ac1b77df Power BI Desktop December Feature Summary

This connector is in preview, so you will need to enable it through our Options dialog the first time you want to use it.

Another new connector in this month’s Power BI Desktop release is a new connector for HDInsight Interactive Query. Interactive Query leverages Hive on LLAP in Apache Hive 2.1, bringing the interactivity to your complex data warehouse style queries on large datasets stored on commodity cloud storage.

The new HDInsight Interactive Query connector can be found under the Azure tab in the Get Data dialog:

d2b4edc9 2b1c 43d6 9b6c a05f121354db Power BI Desktop December Feature Summary

Upon selecting this connector, you can specify which HDInsight Interactive Server and Database to connect to, as well as the connectivity mode. This connector supports both Import and DirectQuery modes.

a4d7becd bd25 49e8 94ef 09efd8547798 Power BI Desktop December Feature Summary

Once connection details have been provided, you can authenticate using user name/password, then navigate, preview, and select one or multiple tables from your database, just like you would do for other data sources.

004a7c4d 2582 49b3 9d8f 285fb66d0413 Power BI Desktop December Feature Summary

In partnership with Data.World, we have shipped a new connector that allows you to import Data.World datasets into Power BI. The new Data.World connector can be found under the Online Services category in the Get Data dialog.

7f2d219d 18f7 4f98 bf50 c2944f1f8cdc Power BI Desktop December Feature Summary

This connector requires you to provide an Owner and Dataset ID and, optionally, a query to run.

29395bc5 6881 49c9 8a06 164a612f2ce3 Power BI Desktop December Feature Summary

After specifying these fields, you will be able to preview the dataset and load it directly into Power BI Desktop or leverage the Query Editor to apply data transformations and filters before the data is loaded.

We’ve made a couple of significant improvements to the SAP BW connector in this release, including:

Support for connecting to an SAP BW Message Server

One common request from SAP BW connector users was being able to specify a Message Server rather than an Application Server when setting up the connection from Power BI. To achieve this, we have added a new SAP BW Application Server connector under the Database category in the Get Data dialog.

3bea500a 36dd 42e1 bd0b dc924227fb43 Power BI Desktop December Feature Summary

This new connector allows users to specify a Server, System ID, Client ID, and Logon group for the Message Server to connect to. Both Import and DirectQuery connectivity modes are supported.

24f3fe3b 9696 4f16 8b9e 448da907b3ab Power BI Desktop December Feature Summary

Support for Windows Authentication

Windows credentials are now supported when connecting to either SAP BW Application or Message Servers. The extra Windows Authentication option can be found in the Credentials dialog after specifying a server to connect to. This new authentication type allows you to connect using the “current user” or specifying “alternate credentials,” as well as their SNC partner name. You can also specify which SNC library to use.

8b3edf0d 2d4d 47f7 939f aa35cd2fce4c Power BI Desktop December Feature Summary

Please note that these two new features for SAP BW are only available in Power BI Desktop at this point. Support for them in the Power BI service, via the On-premises data gateway, will be added in the future.

We’re glad to announce that the IBM Netezza connector is now GA. This connector has been available as a Beta connector in Power BI Desktop since June and is supported end-to-end in the Power BI service (via the On-premises data gateway) since August. We have received very positive feedback from users during this period, helping us get to this GA release.

The IBM Netezza connector can be found within the Get Data dialog, under the Database category. The connector allows you to import data or create DirectQuery-based reports on top of IBM Netezza databases.

That’s all for this month! We hope that you enjoy these updates and continue sending us your feedback. Please don’t forget to vote for other features that you’d like to see in the Power BI Desktop.

5d7f8edd 60a2 45e6 8cd8 5a7fa35e97b5 Power BI Desktop December Feature Summary

Let’s block ads! (Why?)

Microsoft Power BI Blog | Microsoft Power BI

Better Machine Learning Models with Multi-Objective Feature Selection: Part 1

The Basics of Feature Selection

Feature selection can greatly improve your machine learning models. In this blog series, I’ll outline all you need to know about feature selection. In Part 1 below I discuss why feature selection is important, and why it’s in fact a very hard problem to solve. I’ll detail some of the different approaches which are used to solve feature selection today.

Why should we care about Feature Selection?

There is a consensus that feature engineering often has a bigger impact on the quality of a model than the model type or its parameters. Feature selection is a key part of feature engineering, not to mention Kernel functions and hidden layers are performing implicit feature space transformations. Therefore, is feature selection then still relevant in the age of support vector machines (SVMs) and Deep Learning? Yes, absolutely.

First, we can fool even the most complex model types. If we provide enough noise to overshadow the true patterns, it will be hard to find them. The model starts to use the noise patterns of the unnecessary features in those cases. And that means, that it does not perform well. It might even perform worse if it starts to overfit to those patterns and fail on new data points. This is made even easier for a model with many data dimensions. No model type is better than others in this regard. Decision trees can fall into this trap as well as multi-layer neural networks. Removing noisy features can help the model focus on relevant patterns.

But there are other advantages of feature selection. If we reduce the number of features, models are generally trained much faster. And often the resulting model is simpler and easier to understand. We should always try to make the work easier for we model. Focus on the features which carry the signal over those that are noise and we will have a more robust model.

Why is this a hard problem?

Let’s begin with an example. Let’s say we have a data set with 10 attributes (features, variables, columns) and one label (target, class). The label column is the one we want to predict. We’ve trained a model on this data and determined the accuracy of the model built on data is 62%. Can we identify a subset of those 10 attributes where a trained model would be more accurate?

We can depict any subset of 10 attributes as bit vectors, i.e. as a vector of 10 binary numbers 0 or 1. Zero means that the specific attribute is not used, and 1 depicts an attribute which is used for this subset. If we want to indicate that we use all 10 attributes, we would use the vector (1 1 1 1 1 1 1 1 1 1). Feature selection is the search for such a bit vector that produces the optimal accuracy. One possible approach for this would be to try out all the possible combinations. Let’s start with using only a single attribute. The first bit vector looks like this:

As we can see, when we use the first attribute we come up with an accuracy of 68%. That’s already better than our accuracy with all attributes, 62%.  But can we improve this even more? Let’s try using only the second attribute:

Still better than using all 10 attributes, but not as good as only using the first.

We could continue to go through all possible subsets of size 1. But why we should stop there?  We can also try out subsets of 2 attributes now:

Using the first two attributes immediately looks promising with 70% accuracy. We can collect all accuracies of these subsets until we have tried all of the possible combinations:

We call this a brute force approach.

How many combinations did we try for 10 attributes? We have two options for each attribute: we can decide to either use it or not.  And we can make this decision for all 10 attributes which results in 2 x 2 x 2 x … = 210 or 1,024 different outcomes. One of those combinations does not make any sense though, namely the one which does not use any features at all. So, this means that we only need to try 210 – 1 = 1,023 subsets. Even for a small data set, we can see there are a lot of attribute subsets. It is also helpful to keep in mind that we need to perform a model validation for every single one of those combinations. If we use a 10-fold cross-validation, we need to train 10,230 models. It is still doable for fast model types on fast machines.

But what about more realistic data sets?  If we have 100 instead of only 10 attributes in our data set, we already have 2100 – 1 combinations bringing the number combination to 1,267,650,600,228,229,401,496,703,205,375. Even the largest computers can no longer perform this.

Heuristics to the Rescue!

Going through all possible attribute subsets is not a feasible approach then. We should however try to focus only the combinations which are more likely to lead to more accurate models. We could try to prune the search space and ignore feature sets which are not likely to produce good models. However, there is of course no guarantee that we will find the optimal solution any longer. If we ignore complete areas of our solution space, it might be that we also skip the optimal solution, but these heuristics are much faster than our brute force approach. And often we end up with a good, and sometimes even with the optimal solution in a much faster time. There are two widely used approaches for feature selection heuristics in machine learning. We call them forward selection and backward elimination.

Forward Selection

The heuristic behind forward selection is very simple. We first try out all subsets with only one attribute and keep the best solution. But instead of trying all possible subsets with two features next, we only try specific 2-subsets. We try the 2-subsets which contain the best attribute from the previous round. If we do not improve, we stop and deliver the best result from before, i.e. the single attribute. But if we have improved the accuracy, we continue trying by keeping the best attributes so far and try to add one more. We continue this until we no longer have to improve.

What does this mean for the runtime for our example with 10 attributes from above? We start with the 10 subsets of only one attribute which is 10 model evaluations. We then keep the best performing attribute and try the 9 possible combinations with the other attributes. This is another 9 model evaluations then. We stop if there is no improvement or keep the best 2-subset if we get a better accuracy. We now try the 8 possible 3-subsets and so on. So, instead of going brute force through all 1,023 possible subsets, we only go through 10 + 9 + … + 1 = 55 subsets. And we often will stop much earlier as soon as there is no further improvement.  We see below that this is often the case. This is an impressive reduction in runtime. And the difference becomes even more obvious for a case with 100 attributes. Here we will only try at most 5,050 combinations instead of the 1,267,650,600,228,229,401,496,703,205,375 possible ones.

Backward Elimination

Things are similar with backward elimination, we just turn the direction around. We begin with the subset consisting of all attributes first. Then, we try to leave out one single attribute at a time. If we improve, we keep going. But we still leave out the attribute which led to the biggest improvement in accuracy. We then go through all possible combinations by leaving out one more attribute. This is in addition to the best ones we already left out. We continue doing this until we no longer improve. Again, for 10 attributes this means that we will have at most 1 + 10 + 9 + 8 + … + 2 = 55 combinations we need to evaluate.

Are we done?  It looks like we found some heuristics which work much faster than the brute force approach. And in certain cases, these approaches will deliver a very good attribute subset. The problem is that in most cases, they unfortunately will not. For most data sets, the model accuracies form a so-called multi-modal fitness landscape. This means that besides one global optimum there are several local optima. Both methods will start somewhere on this fitness landscape and will move from there. In the image below, we have marked such a starting point with a red dot.  From there, we continue to add (or remove) attributes if the fitness improves. They will always climb up the nearest hill in the multi-modal fitness landscape. And if this hill is a local optimum they will get stuck in there since there is no further climbing possible. Hence, those algorithms do not even bother with looking out for higher hills. They take whatever they can easily get. Which is exactly why we call those “greedy” algorithms. And when they stop improving, there is only a very small likelihood that they made it on top of the highest hill. It is much more likely that they missed the global optimum we are looking for. Which means that the delivered feature subset is often a sub-optimal result.

Slow vs. Bad.  Anything better out there?

This is not good then, is it? We have one technique which would deliver the optimal result, but is computationally not feasible.  This is the brute force approach. But as we have seen, we cannot use it at all on realistic data sets.  And we have two heuristics, forward selection and backward elimination, which deliver results much quicker. But unfortunately, they will run into the first local optimum they find. And that means that they most likely will not deliver the optimal result.

Don’t give up though – in our next post we will discuss another heuristic which is still feasible even for larger data sets. And it often delivers much better results than forward selection and backward elimination. This heuristic is making use of evolutionary algorithms.

Let’s block ads! (Why?)

RapidMiner