Tag Archives: analytics

FICO Receives Analytics 50 Award for FICO Score XD

Analytics 50 2017 FICO Receives Analytics 50 Award for FICO Score XD

Drexel University’s LeBow College of Business and CIO.com have named analytic software firm FICO a winner of the Analytics 50 Awards for the second year in a row. The awards program honors organizations using analytics to solve business challenges.  FICO received the award for FICO® Score XD, which leverages groundbreaking analytic technologies and alternative data to help safely and responsibily expand credit access.

For more information check out the full award article.

Led by Radha Chandra, principal scientist in the Scores business unit at FICO, the FICO analytic development team posed the question: Can alternative data expand credit access?  After extensive research and validation, FICO launched FICO Score XD.  Through the development of FICO® Score XD, FICO provides a potential onramp to credit access for the majority of 50+ million Americans who are identified as ‘unscorable’.

In addition to traditional credit data, FICO® Score XD consumes alternative data from telco, cable, and other payment history, plus public record and property data.  One specific design goal of  FICO Score XD was to mirror the standards of the traditional FICO Score – same 300-850 score range, similar relationship between a given score and likelihood of repayment, and same characteristic logic and treatment.

FICO® Score XD uses positive and negative data from National Consumer Telecom and Utilities Exchange, Inc. (NCTUE) including new connect requests, payment history, current and historical account status and LexisNexisRisk Solutions including property ownership records, frequency of residential moves, plus bankruptcies, evictions, and liens. All data sources comply with the Fair Credit Reporting Act (FCRA), the federal law that regulates how consumer reporting agencies use data.

While FICO Scores based on traditional credit data remain the cornerstone of the FICO business, this initiative helps banks and other lenders expand their addressable market by leveraging scores that are built on models utilizing alternative data. Findings demonstrate that consumers with a high FICO® Score XD who go on to obtain credit, maintain a high traditional FICO score in the future – 49% scored 700 just 24 months later.

“FICO’s pioneering analytics are a key driver to our commitment to financial inclusion and provide lenders with a reliable, broad-based risk score that enables precise assessment of creditworthiness for consumers,” said Ethan Dornhelm, vice president, Scores and Analytics, FICO. “It is an honor that FICO – and Radha Chandra on our team – have been recognized by Drexel University for our work to help millions more people gain access to credit.”

FICO Score XD research and solutions have also paved the way for similar initiatives in Mexico, Russia, Turkey and now India. The latest solution – FICO Score X Data for India – was recently announced at Money 20/20.

The Analytics 50 is a collaboration between Drexel University’s LeBow College of Business and CIO.com to recognize companies using analytics to solve business challenges. Honorees were selected by a panel of researchers and practitioners based on the complexity of the business challenges they faced, the analytics solutions implemented, and the solutions’ business impact on the organization.

Let’s block ads! (Why?)

FICO

Periscope Data launches unified platform for data warehousing and analytics

 Periscope Data launches unified platform for data warehousing and analytics

Periscope Data is trying to make it easier for data teams to process large amounts of information from across their companies with a new cloud data warehouse service that the company announced today.

The Unified Data Platform is designed to provide a single source for companies to perform data processing, analytics, and visualization, which is important for generating insights from the information that’s tied up in business systems.

It’s supposed to solve a key problem: While companies have plenty of data about how their business is doing, it can be hard for analysts to reason over all of it, owing to a series of technical silos that makes collating and processing all that information difficult. Periscope Data’s new product helps fix that by bringing all of a company’s data into Amazon Web Services’ Redshift database.

Once the data is loaded in, teams can query the information using SQL, R, and Python. Periscope Data’s software is also able to provide visualizations in the form of both reports and dashboards. According to Periscope Data CEO Harry Glaser, this whole system is supposed to replace the existing approach to building data warehousing and analytics capabilities, which often requires chaining together multiple software packages.

“Generally speaking, if you’re a data team and you’re getting up and running, what you will do is assemble a platform out of various siloed products — some of them B2B products, some of them open source or consumer products — and then stitch them together with spit and duct tape,” he said in an interview with VentureBeat.

That causes a problem, though, since companies will then need to pay engineers to maintain that platform in order to keep their entire data processing regime working. Glaser said that the Unified Data Platform is designed to fix that.

In addition, having one unified platform means that it’s possible to see how changes to data affect an entire system, so companies can see what reports would be affected when a particular information source is shut off, for example.

Customers will get billed for the Unified Data Platform based on how many people use the software, as well as how much data they have stored within the data warehouse. When all is said and done, it will end up costing customers tens or hundreds of thousands of dollars per year, according to Glaser.

Glaser said that he could see a future in which customers would be able to select from different cloud platforms and different database engines in order to match the performance characteristics that they want. But that’s still a ways off. Right now, Periscope Data is quite happy working with Amazon, which was a partner for this launch.

What’s not on the table is an on-premises version of the Unified Data Platform. Glaser said that the company is remaining focused on serving only customers who want to have their information in the public cloud, since that’s where the market is headed, in his view.

Let’s block ads! (Why?)

Big Data – VentureBeat

Two Ways to Approach Federated Queries with U-SQL and Azure Data Lake Analytics

Did you know there are two ways to do federated queries with Azure Data Lake Analytics (ADLA)? By federated queries, I mean a query that combines (federates) data from multiple sources — in this case, from within Azure Data Lake and another data store. Federated queries are one aspect of data virtualization which helps us to access data without requiring the physical movement of data or data integration:

FederatedQueries Two Ways to Approach Federated Queries with U SQL and Azure Data Lake Analytics

The two methods for federated queries with U-SQL and ADLA are:

  1. Schema-less (aka “lazy metadata”)
  2. Via a pre-defined schema via an external table

You might be familiar with external tables in SQL Server, Azure SQL Data Warehouse, or APS. In those platforms, external tables work with PolyBase for purposes of querying data where it lives elsewhere, often for the purpose of loading it into the relational database. That same premise exists in Azure Data Lake Analytics as well. However, in the data lake there’s two approaches – an external table is still a good idea most of the time but it isn’t absolutely required.

Option 1: Schema-Less

Following are the components of making schema-less federated queries work in ADLA:

ADLA Schemaless Two Ways to Approach Federated Queries with U SQL and Azure Data Lake Analytics

Pros of the schema-less option:

  • Access the data quickly for exploration without requiring an external table to be defined in the ADLA Catalog
  • More closely aligned to a schema-on-read paradigm because of its flexibility 
  • Query flexibility: can retrieve a subset of columns without having to define all the columns

Cons of the schema-less option:

  • Additional “burden” on the data analyst doing the ad hoc querying with U-SQL to always perform the schema-on-read within the query
  • Repeating the same schema-on-read syntax in numerous U-SQL queries, rather than reusing the definition via an external table — so if the source system table or view changes, it could involve altering numerous U-SQL scripts.
  • Requires a rowset in the U-SQL schema-on-read queries – i.e., cannot do a direct join so this approach involves slightly longer, more complex syntax

Option 2: With a Pre-Defined Schema in an External Table

The following introduces an external table to the picture in order to enforce a schema:

ADLA SchemaExternalTable Two Ways to Approach Federated Queries with U SQL and Azure Data Lake Analytics

Pros of using an external table:

  • Most efficient on the data analyst doing the ad hoc querying with U-SQL
  • Easier, shorter syntax on the query side because columns and data types have already been predefined in the ADLA Catalog, so a direct join to an external table can be used in the query without having to define a rowset
  • Only one external table to change if a modification does occur to the underlying SQL table

Cons of using an external table:

  • Schema must remain consistent – a downstream U-SQL query will error if a new column is added to the remote source and the external table has not been kept in sync
  • All remote columns must be defined in the external table (not necessarily a big con – but definitely important to know)

In summary, the schema-less approach is most appropriate for initial data exploration because of the freedom and flexibility. An external table is better suited for ongoing, routine queries in which the SQL side is stable and unchanging. Solutions which have been operationalized and promoted to production will typically warrant an external table. 

Want to Know More?

During my all-day workshop, we set up each piece step by step including the the service principal, credential, data source, external table, and so forth so you can see the whole thing in action. The next workshop is in Washington DC on December 8th. For more details and how to register check here: Presenting a New Training Class on Architecting a Data Lake  

You Might Also Like…

Querying Multi-Structured JSON Files with U-SQL

Running U-SQL on a Schedule with Azure Data Factory to Populate Azure Data Lake

Handling Row Headers in U-SQL

Data Lake Use Cases and Planning Considerations

Let’s block ads! (Why?)

Blog – SQL Chick

Why Predictive Analytics Has Become Essential For Optimized FP&A

274993 274993 h ergb s gl e1508779915236 Why Predictive Analytics Has Become Essential For Optimized FP&A

Let me start with a quote from McKinsey, that in my view hits the nail right on the head:

“No matter what the context, there’s a strong possibility that blockchain will affect your business. The very big question is when.”

Now, in the industries that I cover in my role as general manager and innovation lead for travel and transportation/cargo, engineering, construction and operations, professional services, and media, I engage with many different digital leaders on a regular basis. We are having visionary conversations about the impact of digital technologies and digital transformation on business models and business processes and the way companies address them. Many topics are at different stages of the hype cycle, but the one that definitely stands out is blockchain as a new enabling technology in the enterprise space.

Just a few weeks ago, a customer said to me: “My board is all about blockchain, but I don’t get what the excitement is about – isn’t this just about Bitcoin and a cryptocurrency?”

I can totally understand his confusion. I’ve been talking to many blockchain experts who know that it will have a big impact on many industries and the related business communities. But even they are uncertain about the where, how, and when, and about the strategy on how to deal with it. The reason is that we often look at it from a technology point of view. This is a common mistake, as the starting point should be the business problem and the business issue or process that you want to solve or create.

In my many interactions with Torsten Zube, vice president and blockchain lead at the SAP Innovation Center Network (ICN) in Potsdam, Germany, he has made it very clear that it’s mandatory to “start by identifying the real business problem and then … figure out how blockchain can add value.” This is the right approach.

What we really need to do is provide guidance for our customers to enable them to bring this into the context of their business in order to understand and define valuable use cases for blockchain. We need to use design thinking or other creative strategies to identify the relevant fields for a particular company. We must work with our customers and review their processes and business models to determine which key blockchain aspects, such as provenance and trust, are crucial elements in their industry. This way, we can identify use cases in which blockchain will benefit their business and make their company more successful.

My highly regarded colleague Ulrich Scholl, who is responsible for externalizing the latest industry innovations, especially blockchain, in our SAP Industries organization, recently said: “These kinds of use cases are often not evident, as blockchain capabilities sometimes provide minor but crucial elements when used in combination with other enabling technologies such as IoT and machine learning.” In one recent and very interesting customer case from the autonomous province of South Tyrol, Italy, blockchain was one of various cloud platform services required to make this scenario happen.

How to identify “blockchainable” processes and business topics (value drivers)

To understand the true value and impact of blockchain, we need to keep in mind that a verified transaction can involve any kind of digital asset such as cryptocurrency, contracts, and records (for instance, assets can be tangible equipment or digital media). While blockchain can be used for many different scenarios, some don’t need blockchain technology because they could be handled by a simple ledger, managed and owned by the company, or have such a large volume of data that a distributed ledger cannot support it. Blockchain would not the right solution for these scenarios.

Here are some common factors that can help identify potential blockchain use cases:

  • Multiparty collaboration: Are many different parties, and not just one, involved in the process or scenario, but one party dominates everything? For example, a company with many parties in the ecosystem that are all connected to it but not in a network or more decentralized structure.
  • Process optimization: Will blockchain massively improve a process that today is performed manually, involves multiple parties, needs to be digitized, and is very cumbersome to manage or be part of?
  • Transparency and auditability: Is it important to offer each party transparency (e.g., on the origin, delivery, geolocation, and hand-overs) and auditable steps? (e.g., How can I be sure that the wine in my bottle really is from Bordeaux?)
  • Risk and fraud minimization: Does it help (or is there a need) to minimize risk and fraud for each party, or at least for most of them in the chain? (e.g., A company might want to know if its goods have suffered any shocks in transit or whether the predefined route was not followed.)

Connecting blockchain with the Internet of Things

This is where blockchain’s value can be increased and automated. Just think about a blockchain that is not just maintained or simply added by a human, but automatically acquires different signals from sensors, such as geolocation, temperature, shock, usage hours, alerts, etc. One that knows when a payment or any kind of money transfer has been made, a delivery has been received or arrived at its destination, or a digital asset has been downloaded from the Internet. The relevant automated actions or signals are then recorded in the distributed ledger/blockchain.

Of course, given the massive amount of data that is created by those sensors, automated signals, and data streams, it is imperative that only the very few pieces of data coming from a signal that are relevant for a specific business process or transaction be stored in a blockchain. By recording non-relevant data in a blockchain, we would soon hit data size and performance issues.

Ideas to ignite thinking in specific industries

  • The digital, “blockchained” physical asset (asset lifecycle management): No matter whether you build, use, or maintain an asset, such as a machine, a piece of equipment, a turbine, or a whole aircraft, a blockchain transaction (genesis block) can be created when the asset is created. The blockchain will contain all the contracts and information for the asset as a whole and its parts. In this scenario, an entry is made in the blockchain every time an asset is: sold; maintained by the producer or owner’s maintenance team; audited by a third-party auditor; has malfunctioning parts; sends or receives information from sensors; meets specific thresholds; has spare parts built in; requires a change to the purpose or the capability of the assets due to age or usage duration; receives (or doesn’t receive) payments; etc.
  • The delivery chain, bill of lading: In today’s world, shipping freight from A to B involves lots of manual steps. For example, a carrier receives a booking from a shipper or forwarder, confirms it, and, before the document cut-off time, receives the shipping instructions describing the content and how the master bill of lading should be created. The carrier creates the original bill of lading and hands it over to the ordering party (the current owner of the cargo). Today, that original paper-based bill of lading is required for the freight (the container) to be picked up at the destination (the port of discharge). Imagine if we could do this as a blockchain transaction and by forwarding a PDF by email. There would be one transaction at the beginning, when the shipping carrier creates the bill of lading. Then there would be look-ups, e.g., by the import and release processing clerk of the shipper at the port of discharge and the new owner of the cargo at the destination. Then another transaction could document that the container had been handed over.

The future

I personally believe in the massive transformative power of blockchain, even though we are just at the very beginning. This transformation will be achieved by looking at larger networks with many participants that all have a nearly equal part in a process. Today, many blockchain ideas still have a more centralistic approach, in which one company has a more prominent role than the (many) others and often is “managing” this blockchain/distributed ledger-supported process/approach.

But think about the delivery scenario today, where goods are shipped from one door or company to another door or company, across many parties in the delivery chain: from the shipper/producer via the third-party logistics service provider and/or freight forwarder; to the companies doing the actual transport, like vessels, trucks, aircraft, trains, cars, ferries, and so on; to the final destination/receiver. And all of this happens across many countries, many borders, many handovers, customs, etc., and involves a lot of paperwork, across all constituents.

“Blockchaining” this will be truly transformational. But it will need all constituents in the process or network to participate, even if they have different interests, and to agree on basic principles and an approach.

As Torsten Zube put it, I am not a “blockchain extremist” nor a denier that believes this is just a hype, but a realist open to embracing a new technology in order to change our processes for our collective benefit.

Turn insight into action, make better decisions, and transform your business. Learn how.

Comments

Let’s block ads! (Why?)

Digitalist Magazine

Pentaho Business Analytics Blog

Today, our parent company Hitachi, a global leader across industries, infrastructure and technology, announced the formation of Hitachi Vantara , a company whose aim is to help organizations thrive in today’s uncertain and turbulent times and prepare for the future. This new company unifies the mission and operations of Pentaho,…

Let’s block ads! (Why?)

Pentaho Business Analytics Blog

How Executives Can Leverage Data Analytics To Enhance Performance

The business world is now firmly in the age of data. Not that data wasn’t relevant before; it was just nowhere close to the speed and volume that’s available to us today. Businesses are buckling under the deluge of petabytes, exabytes, and zettabytes. Within these bytes lie valuable information on customer behavior, key business insights, and revenue generation. However, all that data is practically useless for businesses without the ability to identify the right data. Plus, if they don’t have the talent and resources to capture the right data, organize it, dissect it, draw actionable insights from it and, finally, deliver those insights in a meaningful way, their data initiatives will fail.

Rise of the CDO

Companies of all sizes can easily find themselves drowning in data generated from websites, landing pages, social streams, emails, text messages, and many other sources. Additionally, there is data in their own repositories. With so much data at their disposal, companies are under mounting pressure to utilize it to generate insights. These insights are critical because they can (and should) drive the overall business strategy and help companies make better business decisions. To leverage the power of data analytics, businesses need more “top-management muscle” specialized in the field of data science. This specialized field has lead to the creation of roles like Chief Data Officer (CDO).

In addition, with more companies undertaking digital transformations, there’s greater impetus for the C-suite to make data-driven decisions. The CDO helps make data-driven decisions and also develops a digital business strategy around those decisions. As data grows at an unstoppable rate, becoming an inseparable part of key business functions, we will see the CDO act as a bridge between other C-suite execs.

Data skills an emerging business necessity

So far, only large enterprises with bigger data mining and management needs maintain in-house solutions. These in-house teams and technologies handle the growing sets of diverse and dispersed data. Others work with third-party service providers to develop and execute their big data strategies.

As the amount of data grows, the need to mine it for insights becomes a key business requirement. For both large and small businesses, data-centric roles will experience endless upward mobility. These roles include data anlysts and scientists. There is going to be a huge opportunity for critical thinkers to turn their analytical skills into rapidly growing roles in the field of data science. In fact, data skills are now a prized qualification for titles like IT project managers and computer systems analysts.

Forbes cited the McKinsey Global Institute’s prediction that by 2018 there could be a massive shortage of data-skilled professionals. This indicates a disruption at the demand-supply level with the needs for data skills at an all-time high. With an increasing number of companies adopting big data strategies, salaries for data jobs are going through the roof. This is turning the position into a highly coveted one.

According to Harvard Professor Gary King, “There is a big data revolution. The big data revolution is that now we can do something with the data.” The big problem is that most enterprises don’t know what to do with data. Data professionals are helping businesses figure that out. So if you’re casting about for where to apply your skills and want to take advantage of one of the best career paths in the job market today, focus on data science.

I’m compensated by University of Phoenix for this blog. As always, all thoughts and opinions are my own.

For more insight on our increasingly connected future, see The $ 19 Trillion Question: Are You Undervaluing The Internet Of Things?

The post Data Analysts and Scientists More Important Than Ever For the Enterprise appeared first on Millennial CEO.

Comments

Let’s block ads! (Why?)

Digitalist Magazine

FogHorn raises $30 million to provide IoT edge computing analytics

 FogHorn raises $30 million to provide IoT edge computing analytics

FogHorn, which provides data analytics software for industrial and commercial Internet of Things (IoT) applications, announced today that is has secured $ 30 million in funding. Intel Capital and Saudi Aramco Energy Ventures co-led this round, with new investor Honeywell Ventures joining in. All previous investors also participated, including March Capital, GE, Dell, Bosch, Yokogawa, Darling Ventures, and The Hive.

“In the industrial application of IoT, such as manufacturing facilities, oil and gas production sites, and transportation systems, there are often hundreds or thousands of physical and video/audio sensors continuously producing a massive amount of high velocity data,” wrote FogHorn CEO David King, in an email to VentureBeat. “This data is being collected at the network ‘edge’, what Cisco coined the ‘fog’ layer several years ago.”

Edge computing is a method of optimizing cloud computing systems by performing data processing at the edge of the network, near the source of the data. According to King, industrial operators face several challenges when collecting and processing data, including a high volume of data-collecting sources, high costs in transporting this data in the cloud, and limits to real-time insights.

While latency may be fine when you are conversing with Amazon’s Alexa, having a delayed response to a gas leak could be extremely dangerous.

FogHorn’s Lighting platform has been purpose-built to run in very small footprint (256MB or smaller) edge computing systems. “The reason this is important is that the vast majority of data streaming from IoT sensors is useless within a very short period of time,” wrote King. “The information that is valuable — the anomalies and hard-to-detect patterns — need to be acted upon while operators can take corrective action.”

FogHorn licenses its software on a subscription basis to dozens of customers, according to King. The chief executive does not see any direct competitors focusing solely on tapping into streaming edge data for analytics, machine learning, and AI. “Amazon Greengrass and Microsoft Azure Edge are now targeting the edge with reduced footprint versions of their heavy cloud software stacks, but both still send most data to the cloud for advanced data science functionality,” he added.

The investment from Saudi Aramco Energy Ventures should secure FogHorn’s foothold in Saudi Arabia, which is one of the world’s biggest oil producers.

“Given the heavy presence of oil and gas, we expect it to be a large market in the future,” wrote King. “By partnering with Saudi Aramco Energy Ventures, we’re just beginning our reach into this market.”

To date, FogHorn has raised a total of $ 47.5 million. The Mountain View, California-based startup will use the fresh injection of capital to hire more engineers and increase sales and marketing efforts.

Founded in 2014 as part of The Hive incubator, FogHorn currently has more than 40 employees.

Sign up for Funding Daily: Get the latest news in your inbox every weekday.

Let’s block ads! (Why?)

Big Data – VentureBeat

Kleiner Perkins leads $15 million investment in Incorta’s data analytics software

 Kleiner Perkins leads $15 million investment in Incorta’s data analytics software

Incorta, which provides software to analyze data in real time, today announced that it has secured an additional $ 15 million in funding. Kleiner Perkins Caufield & Byers led the round, with participation from existing investors GV and Ron Wohl, Oracle’s former executive vice president. This new round of funding comes on the heels of a $ 10 million investment announced last March.

The San Mateo, California-based startup provides an analytics platform that enables real-time aggregation of large, complex sets of business data, such as for enterprise resource planning (ERP), eliminating the need to painstakingly prepare the data for analysis.

“With Incorta, there is no additional need to put data into a traditional data warehouse ahead of time,” wrote Incorta cofounder and CEO Osama Elkady, in an email to VentureBeat. “This reduces the time to build analytic applications from months to days.”

The startup claims that over the past year it has increased revenue more than 300 times and signed new customers, including Shutterfly.

“Incorta customers who used Exadata or Netezza appliances, or data engines like Redshift or Vertica, enjoyed performance gains of 50 to 100 times when they switched to Incorta, even on the most complex requests,” wrote Elkady.

The enterprise offering is licensed as an annual subscription on a per user basis and can be deployed on Google Cloud, Microsoft Azure, and Amazon Web Services (AWS). The startup is in talks with other cloud providers, according to Elkady.

Today’s fresh injection of capital will be used to further product development and increase sales and marketing. “It will also enable us to more quickly realize our vision for a third-party marketplace where vendors and content providers can build and distribute applications powered by Incorta’s platform,” wrote Elkady.

Founded in 2014, Incorta has raised a total of $ 27.5 million to date and currently has 70 employees.

Sign up for Funding Daily: Get the latest news in your inbox every weekday.

Let’s block ads! (Why?)

Big Data – VentureBeat

TIBCO Named a Leader in The Forrester Wave™: Streaming Analytics, Q3 2017

forrester wave blog TIBCO Named a Leader in The Forrester Wave™: Streaming Analytics, Q3 2017

Forrester has named TIBCO a Leader in The Forrester Wave™: Streaming Analytics, Q3 2017 among thirteen vendors that were evaluated. For the Strategy category, we received a 4.9 out of a possible 5 points.

TIBCO StreamBase is also recognized for “[unifying] real-time analytics” with a “full-featured streaming analytics solution that integrates with applications to automate actions and also offers Live DataMart to create a real-time visual command center.”

Today’s organizations don’t just want streaming analytics or analytics at rest—they want the ability to operationalize analytics insights and the ability to capture streams, both the raw input and resulting predictions and streaming analytics to analyze and generate new insights, which they then operationalize. Streaming analytics customers will be more successful—and more satisfied in the long term—doing the full analytics round trip, and TIBCO has the tools to do it.

Learn more about TIBCO StreamBase here.

Download a complimentary copy of the report here.

TIBCO is focused on insights. Not the garden variety insights that lay dormant and unactionable on someone’s desk. Rather, TIBCO focuses on perishable insights that companies must act upon immediately to retain customers, remove friction from business processes, and prevent logistics chains from stopping cold. —Excerpt from The Forrester Wave: Streaming Analytics, Q3 2017

Let’s block ads! (Why?)

The TIBCO Blog

Leveraging Open Source Technologies in Analytics Deployments

rsz bigstock open source developer program 125449196 Leveraging Open Source Technologies in Analytics Deployments

Many organizations are eagerly hiring new data scientists fresh out of college. Many of those millennial data scientists have been educated in software development techniques that move away from reliance on traditional and commercial development platforms, toward adoption of open source technologies. Typically, these individuals arrive with skills in R, Python, or other open-source technologies.

Employers, as well as enterprise software vendors like Statistica, are choosing to support the use of these technologies, rather than forcing the new data scientists (who are scarce and highly valued resources) to adopt commercial tools. People with R, Python, C# or other language capabilities can integrate them into the Statistica workspace.

This type of framework allows a simple, one-click deployment. Deploying an R script by itself can be complex and difficult, although there are new, high-level technologies that simplify the process. Statistica has chosen to allow integration of the script directly into a workflow. The developer can then deploy that script into the Statistica enterprise platform, leverage the version control, auditing, digital signatures, and all the tools needed to meet a company’s regulatory requirements.

That’s a key advantage: The ability to incorporate an open source script into a regulated environment with security and version control without jeopardizing regulatory compliance. This capability is not entirely unique—some other, relatively new platforms can provide this ability to degree. But it has been feasible in the Statistica platform for a number of years, and is extensively proven in production deployments.

The capability came out of Statistica’s experience in the pharmaceutical industry, one of the most regulated of all commercial environments. Governments require extensive documentation and validation of virtually every business process involved in producing drugs for human consumption. This includes every aspect of data collection and reporting.

We have taken what we learned in this rigorously constrained context and applied it to a general analytical asset. That body of experience is differentiating among analytics platforms, as is the way in which scripts are incorporated into the Statistica workspace.

Within a workspace, we can create a script, and pull in the packages and components from the open source community. This enables Statistica adopters to leverage the collective intelligence of data scientists throughout the world, and contribute to the development of these open source technologies. This is in character with the open source community, in which developers not only contribute new code but inspect, test, criticize, and fine tune each other’s work. Our users are extending the capabilities of Statistica through these collectively developed packages.

The user creates a node in the workspace that can be parameterized. The developer can create an R script, and we can create a set of parameters attached to that node and then deploy that node as a reusable template. That template can be used like any other node within the workspace by a non-developer business user—what we also call a “citizen data scientist.”

We can import the data, merge, translate, create variables, etc. If we want to create a subset of data, we can deploy an R model developed specifically for this purpose by seasoned data scientist who has, in effect, added it to a library of gadgets that a business user can drop into the workspace, change the parameters, and get the standard output, as well as any downstream data that the script may produce.

From a business perspective, committing to open source adoption is attractive:

  • Because it’s free, so the adopter is spared an additional license cost; and
  • Because it opens up a world of new capabilities. There are new open-source packages being developed every day, and some will have quite compelling functionality.

There are, of course, uncertainties in adopting new code from an unregulated developer community. Because Statistica sells into highly regulated markets, we are audited annually to ensure that our code meets the requirements for regulatory compliance. Open-source code does not undergo that kind of audit, and that can introduce certain risks. But the platform enables deployment of the code into a rigorously controlled operational environment, mitigating this risk.

At least as important as the risk management element, the ability to promote the adoption of open- source scripting provides an attractive work environment for the current generation of data scientists. Given the intense competition for recent graduates, this can be a powerful incentive in itself for employers.

Find out more about TIBCO Statistica’s capabilities.

Let’s block ads! (Why?)

The TIBCO Blog