Category Archives: Sisense

[Infographic] What’s Big in Business Intelligence for 2018?

It’s that time of year again. You know, the time when people and organizations start to put together their big predictions for the year to come. With the explosive growth of BI and analytics in 2017, we had to get in on the action.

Last year we had five big predictions for 2017, which included businesses reaching new levels of data complexity, BI tools liberating users, machines getting smarter, information democracy rising, and BI separating the winners from the losers.

In 2018, it’s not about a complete change in direction. Instead, it’s about building on what has happened in the past year to take BI and analytics to the new heights. How will organizations make sense of all of their complex data? Augmented Analytics. What happens when data is truly democratized? Collaborative/cooperative BI. What exactly will separate the winners from the losers? If you’re asking us, it’s embedded analytics.

So, without further ado, here are our top trends that will emerge in 2018.

Trends2018 Infographics [Infographic] What’s Big in Business Intelligence for 2018?Embed this infographic on your site:

Let’s block ads! (Why?)

Blog – Sisense

Exploratory and Confirmatory Analysis: What’s the Difference?

1200x628 Explorer2 Exploratory and Confirmatory Analysis: What’s the Difference?

How does a detective solve a case? She pulls together all the evidence she has, all the data that’s available to her, and she looks for clues and patterns.

At the same time, she takes a good hard look at individual pieces of evidence. What supports her hypothesis? What bucks the trend? Which factors work against her narrative? What questions does she still need to answer… and what does she need to do next in order to answer them?

Then, adding to the mix her wealth of experience and ingrained intuition, she builds a picture of what really took place – and perhaps even predicts what might happen next.

But that’s not the end of the story. We don’t simply take the detective’s word for it that she’s solved the crime. We take her findings to a court and make her prove it.

In a nutshell, that’s the difference between Exploratory and Confirmatory Analysis.

Data analysis is a broad church, and managing this process successfully involves several rounds of testing, experimenting, hypothesizing, checking, and interrogating both your data and approach.

Putting your case together, and then ripping apart what you think you’re certain about to challenge your own assumptions, are both crucial to Business Intelligence.

Before you can do either of these things, however, you have to be sure that you can tell them apart.

What is Exploratory Data Analysis?

Exploratory data analysis (EDA) is the first part of your data analysis process. There are several important things to do at this stage, but it boils down to this: figuring out what to make of the data, establishing the questions you want to ask and how you’re going to frame them, and coming up with the best way to present and manipulate the data you have to draw out those important insights.

That’s what it is, but how does it work?

As the name suggests, you’re exploring – looking for clues. You’re teasing out trends and patterns, as well as deviations from the model, outliers, and unexpected results, using quantitative and visual methods. What you find out now will help you decide the questions to ask, the research areas to explore and, generally, the next steps to take.

Exploratory Data Analysis involves things like: establishing the data’s underlying structure, identifying mistakes and missing data, establishing the key variables, spotting anomalies, checking assumptions and testing hypotheses in relation to a specific model, estimating parameters, establishing confidence intervals and margins of error, and figuring out a “parsimonious model” – i.e. one that you can use to explain the data with the fewest possible predictor variables.

In this way, your Exploratory Data Analysis is your detective work. To make it stick, though, you need Confirmatory Data Analysis.

What is Confirmatory Data Analysis?

Confirmatory Data Analysis is the part where you evaluate your evidence using traditional statistical tools such as significance, inference, and confidence.

At this point, you’re really challenging your assumptions. A big part of confirmatory data analysis is quantifying things like the extent any deviation from the model you’ve built could have happened by chance, and at what point you need to start questioning your model.

Confirmatory Data Analysis involves things like: testing hypotheses, producing estimates with a specified level of precision, regression analysis, and variance analysis.
In this way, your confirmatory data analysis is where you put your findings and arguments to trial.

Uses of Confirmatory and Exploratory Data Analysis

In reality, exploratory and confirmatory data analysis aren’t performed one after another, but continually intertwine to help you create the best possible model for analysis.

Let’s take an example of how this might look in practice.

Imagine that in recent months, you’d seen a surge in the number of users canceling their product subscription. You want to find out why this is, so that you can tackle the underlying cause and reverse the trend.

This would begin as exploratory data analysis. You’d take all of the data you have on the defectors, as well as on happy customers of your product, and start to sift through looking for clues. After plenty of time spent manipulating the data and looking at it from different angles, you notice that the vast majority of people that defected had signed up during the same month.

On closer investigation, you find out that during the month in question, your marketing team was shifting to a new customer management system and as a result, introductory documentation that you usually send to new customers wasn’t always going through. This would have helped to troubleshoot many teething problems that new users face.

Now you have a hypothesis: people are defecting because they didn’t get the welcome pack (and the easy solution is to make sure they always get a welcome pack!).

But first, you need to be sure that you were right about this cause. Based on your Exploratory Data Analysis, you now build a new predictive model that allows you to compare defection rates between those that received the welcome pack and those that did not. This is rooted in Confirmatory Data Analysis.

The results show a broad correlation between the two. Bingo! You have your answer.

Exploratory Data Analysis and Big Data

Getting a feel for the data is one thing, but what about when you’re dealing with enormous data pools?

After all, there are already so many different ways you can approach Exploratory Data Analysis, by transforming it through nonlinear operators, projecting it into a difference subspace and examining your resulting distribution, or slicing and dicing it along different combinations of dimensions… add sprawling amounts of data into the mix and suddenly the whole “playing detective” element feels a lot more daunting.

The important thing is to ensure that you have the right tech stack in place to cope with this, and to make sure you have access to the data you need in real time.

Two of the best statistical programming packages available for conducting Exploratory Data Analysis are R and S-Plus; R is particularly powerful and easily integrated with many BI platforms. That’s the first thing to consider.

The next step is ensuring that your BI platform has a comprehensive set of data connectors, that – crucially – allow data to flow in both directions. This means that you can keep importing Exploratory Data Analysis and models from, for example, R to visualize and interrogate results – and also send data back from your BI solution to automatically update your model and results as new information flows into R.

In this way, you not only strengthen your Exploratory Data Analysis, you incorporate Confirmatory Data Analysis, too – covering all your bases of collecting, presenting and testing your evidence to help reach a genuinely insightful conclusion.

Your honor, we rest our case.

Ready to learn how to incorporate R for deeper statistical learning? You can watch our webinar with renowned R expert Jared Lander to learn how R can be used to solve real-life business problems.

Let’s block ads! (Why?)

Blog – Sisense

12 Websites & Blogs Every Data Analyst Should Follow

While demand for data analysts is at an all-time high, the online community still leaves some to be desired. It can be difficult to find good, unbiased online resources and websites dedicated to data professionals. We’ve asked our own data analysts to tell us about some of their favorite sites and created this list of must-follow forums, data analytics blogs, and resource centers. We’re sure there are many additional great ones out there, so if you know of any please tell us in the comments!

List is organized in alphabetical order.

Cross Validated (Stack Exchange)

Cross Validated 12 Websites & Blogs Every Data Analyst Should Follow

Part of the Stack Exchange network of Q&A communities for developers, Cross Validated is a Q&A site for statistics, machine learning, data analysis, data mining, and visualization. A great place if you’re stuck with a professional question and need answers from fellow professionals.

Data Science and Beyond (Yanir Seroussi)

Yanir Seroussi 12 Websites & Blogs Every Data Analyst Should Follow

Mr. Seroussi is an independent data scientist and computer programmer who posts about solving practical problems in data science (such us migrating a web app from MongoDB to Elasticsearch). The blog is fluently written and highly detailed, complete with relevant code samples.

Data Science Central

data science central 12 Websites & Blogs Every Data Analyst Should Follow

This website by Vincent Granville offers both a social-community experience as well as a content repository with an endless flow of new articles posted on topics such as data plumbing, Hadoop, data visualization and more.

Learn to make your insights shine with our on-demand webinar “Telling a Story Through Data: Dashboard Design Guidelines”

DBMS2

DBMS2 12 Websites & Blogs Every Data Analyst Should Follow

A blog of sorts, written by Curt Monash of Monash Research and covering database management, data analytics, and related technologies. Offers well-written, comprehensive and vendor-neutral analysis from a technical and business perspective.

DZone

dzone1 12 Websites & Blogs Every Data Analyst Should Follow

DZone is an online community that publishes resources for software developers and covers topics from big data, AI, data science, and analytics. Their material is sourced from community members as well as influcencers within the tech space.

Edwin Chen’s Blog

Edwin Chen 12 Websites & Blogs Every Data Analyst Should Follow

While this blog is not updated very frequently, every post is a fascinating example of practical data analysis, often applied to a real-life use case, along with many clear and intuitive explanations of complex concepts in data science and machine learning.

KDnuggets

KDnuggets 12 Websites & Blogs Every Data Analyst Should Follow

KDnuggets is one of the leading big data, data science, and machine learning sites. Content is from contributor, but edited by Gregory Piatetsky-Shapiro and Matthew Mayo, and ranges between tutorials to opinion pieces and everything inbetween.

KPI Library

KPI Library  12 Websites & Blogs Every Data Analyst Should Follow

While this website requires registration, it’s absolutely free to do so and once you’re in you have access to literally thousands of examples and suggestions for key performance indicators across dozens of industries, frameworks and business processes.

Simply Statistics

Simply Statistics 12 Websites & Blogs Every Data Analyst Should Follow

A site maintained by three professors of biostatistics, featuring a variety of articles and additional media on statistical techniques and deep data analysis. There are practical examples as well as theoretical material and the site is updated fairly regularly.

Statistical Inference, Causal Inference, and Social Science

andrew gelman 12 Websites & Blogs Every Data Analyst Should Follow

The articles here are contributed by six different writers, each writing from their own practical experience in modeling and analyzing data and covering a wide range of categories and topics.

R-bloggers

rbloggers 12 Websites & Blogs Every Data Analyst Should Follow

Content hub that aggregates RSS feeds of bloggers who write about the popular open-source R language, and a great place to keep your R knowledge up to date and see what’s new in the R community.

What’s The Big Data?

whats the big data 12 Websites & Blogs Every Data Analyst Should Follow

Gil Press is a thought leader in the Big Data sphere and has contributed in developing some of the milestones in estimating the size and growth of digital data. His personal website and Forbes column are a great source for news and commentary on Big Data, data science, IoT and related topics.

Telling a Story Through Data yellow 12 Websites & Blogs Every Data Analyst Should Follow

Let’s block ads! (Why?)

Blog – Sisense

12 Websites & Blogs Every Data Analyst Should Follow

While demand for data analysts is at an all-time high, the online community still leaves some to be desired. It can be difficult to find good, unbiased online resources and websites dedicated to data professionals. We’ve asked our own data analysts to tell us about some of their favorite sites and created this list of must-follow forums, data analytics blogs, and resource centers. We’re sure there are many additional great ones out there, so if you know of any please tell us in the comments!

List is organized in alphabetical order.

Cross Validated (Stack Exchange)

Cross Validated 12 Websites & Blogs Every Data Analyst Should Follow

Part of the Stack Exchange network of Q&A communities for developers, Cross Validated is a Q&A site for statistics, machine learning, data analysis, data mining, and visualization. A great place if you’re stuck with a professional question and need answers from fellow professionals.

Data Science and Beyond (Yanir Seroussi)

Yanir Seroussi 12 Websites & Blogs Every Data Analyst Should Follow

Mr. Seroussi is an independent data scientist and computer programmer who posts about solving practical problems in data science (such us migrating a web app from MongoDB to Elasticsearch). The blog is fluently written and highly detailed, complete with relevant code samples.

Data Science Central

data science central 12 Websites & Blogs Every Data Analyst Should Follow

This website by Vincent Granville offers both a social-community experience as well as a content repository with an endless flow of new articles posted on topics such as data plumbing, Hadoop, data visualization and more.

Learn to make your insights shine with our on-demand webinar “Telling a Story Through Data: Dashboard Design Guidelines”

DBMS2

DBMS2 12 Websites & Blogs Every Data Analyst Should Follow

A blog of sorts, written by Curt Monash of Monash Research and covering database management, data analytics, and related technologies. Offers well-written, comprehensive and vendor-neutral analysis from a technical and business perspective.

DZone

dzone1 12 Websites & Blogs Every Data Analyst Should Follow

DZone is an online community that publishes resources for software developers and covers topics from big data, AI, data science, and analytics. Their material is sourced from community members as well as influcencers within the tech space.

Edwin Chen’s Blog

Edwin Chen 12 Websites & Blogs Every Data Analyst Should Follow

While this blog is not updated very frequently, every post is a fascinating example of practical data analysis, often applied to a real-life use case, along with many clear and intuitive explanations of complex concepts in data science and machine learning.

KDnuggets

KDnuggets 12 Websites & Blogs Every Data Analyst Should Follow

KDnuggets is one of the leading big data, data science, and machine learning sites. Content is from contributor, but edited by Gregory Piatetsky-Shapiro and Matthew Mayo, and ranges between tutorials to opinion pieces and everything inbetween.

KPI Library

KPI Library  12 Websites & Blogs Every Data Analyst Should Follow

While this website requires registration, it’s absolutely free to do so and once you’re in you have access to literally thousands of examples and suggestions for key performance indicators across dozens of industries, frameworks and business processes.

Simply Statistics

Simply Statistics 12 Websites & Blogs Every Data Analyst Should Follow

A site maintained by three professors of biostatistics, featuring a variety of articles and additional media on statistical techniques and deep data analysis. There are practical examples as well as theoretical material and the site is updated fairly regularly.

Statistical Inference, Causal Inference, and Social Science

andrew gelman 12 Websites & Blogs Every Data Analyst Should Follow

The articles here are contributed by six different writers, each writing from their own practical experience in modeling and analyzing data and covering a wide range of categories and topics.

R-bloggers

rbloggers 12 Websites & Blogs Every Data Analyst Should Follow

Content hub that aggregates RSS feeds of bloggers who write about the popular open-source R language, and a great place to keep your R knowledge up to date and see what’s new in the R community.

What’s The Big Data?

whats the big data 12 Websites & Blogs Every Data Analyst Should Follow

Gil Press is a thought leader in the Big Data sphere and has contributed in developing some of the milestones in estimating the size and growth of digital data. His personal website and Forbes column are a great source for news and commentary on Big Data, data science, IoT and related topics.

Telling a Story Through Data yellow 12 Websites & Blogs Every Data Analyst Should Follow

Let’s block ads! (Why?)

Blog – Sisense

Infographic: How Some of the Biggest Companies in the World Used Growth Hacking to Get to the Top

Some of the biggest tech giants in the world have become household names with a little help from growth hacking. But how exactly did they do it? And how can you learn from their techniques?

Let’s have a look.

GrowthHacking Info Infographic: How Some of the Biggest Companies in the World Used Growth Hacking to Get to the Top

Ready to learn how to become a growth hacker? Download our free whitepaper for five simple steps to get started.

Let’s block ads! (Why?)

Blog – Sisense

How We Found our North Star Metric (& How You Can Find Yours!)

Growth hacking. There’s no doubt you’ve heard this term thrown around recently. But what exactly is it and do you actually need to be doing it?

The long and short of it is yes, you definitely need to be growth hacking. We do it here at Sisense and soon I’ll guide you through exactly how we found our North Star metric and what steps you need to take to find yours.

But before we do that, let’s talk about what exactly a “North Star” is in the first place.

What is a North Star Metric?

When you first think of the North Star you probably imagine a guiding light in the sky that can help you if you ever find yourself lost in the middle of nowhere. While a North Star metric won’t be found among the stars, the concept isn’t too far off.

Your North Star metric is your guiding light when it comes to growth hacking. It’s the metric that shines above all other metrics. It’s a powerful driver that should capture the core value your product delivers to customers.

How We Found Our North Star

Something all businesses think about on a constant basis is churn. Why do customers churn? How can we reduce churn? Where does a customer’s decision to churn really come from?

No small task, as Head of Product Intelligence, I was given the job to find a way to prevent customer churn through recognizing suspect patterns in data and using it to proactively fix any related issues internally and for our customers. But with so much data available, it can be daunting to figure out exactly where to start. Here’s what I did:

1. Interview

We always stress to our customers that when implementing BI they need to involve the stakeholders in their organization that will actually be using the dashboards from the very beginning. Finding a North Star metric is no different. I started by sitting and interviewing stakeholders and Customer Success Managers in order to define, understand, and distinguish between success and failure stories.

Take away: Talk to people! You can choose to dive straight into the data and try and find your way but if you can’t define how your project will affect the people that will be using it, you’re likely missing a huge piece of the puzzle.

Learn how to become a growth hacker in five easy steps. Download the free whitepaper now.

2. Explore

Once I was able to define what a success and a failure looked like for our internal stakeholders and Customer Success Managers, I began exploring the data.

First, I built a data exploration dashboard that presented, side by side, the usage patterns of happy customers (those that renewed) and sad customers (those that churned). Combining usage data from our events tracking system, demographic data from our CSM, and license details from our back office database, I focused just on the first 12 months of usage.

2017 10 02 13h50 13 770x342 How We Found our North Star Metric (& How You Can Find Yours!)

During this first phase of research I looked for differences between the two populations by observing their usage patterns. I compared any and every measure I could think of, including:

  • MAU % Growth: The growth of number of Monthly Active Users during month to month period.
  • License % Utilization: MAU/licenses purchased represents the license utilization ratio. When license utilization is greater than one – it’s time to grow the account. When license utilization is less than 0.3 we have an adoption problem and potential churn risk.
  • Stickiness: Daily Active Users/Monthly Active Users (ratio of DAU out of MAU), determines how often the product is being used. DAU/MAU that is closer to one (greater than 0.7) reflects daily use.

I started first by plotting these different measures, one by one, on a trend line to get a sense if the data was even interesting to look at. The x-axis of my trend line represented the age of the account, with my goal being to find the change in usage patterns over the two populations over time. I then plotted each measure, again one by one, on a box plot to present the metric by time. Box plots help to compare between distributions and in this case, I was looking at the distribution between each metric over age/time.

2017 10 02 13h59 39 770x180 How We Found our North Star Metric (& How You Can Find Yours!)

From here it was all a matter of trial and error. I tested select use cases but couldn’t find a significant difference between the two populations.

Take away: You won’t find your North Star metric right away, and that’s okay. But, if you want to get there fast you need to play with the data in any and every way you can think of. Don’t be afraid to try new things!

3. Test

When I couldn’t find any significant difference between the two populations I realized I needed to know more. I talked with more people and thought deeper about the story behind the happy and sad customers. The key point that continued to come back to mind is that a customer needs to get real business value in order to renew.

In order to really see if a customer is getting business value, we needed to focus in on dashboard consumption actions because they indicate how end users are interacting with the system. It became clear that those accounts that churned were focusing mostly on data preparation actions and not end-user engagement – likely leaving them with little reason to justify having BI.

In order to take into account size of a customer’s organization, I created a dashboard that broke down consumption ratio, which calculates what percent of the overall actions are dashboard actions such as querying, creation, sharing, and interacting.

2017 10 02 14h20 04 770x305 How We Found our North Star Metric (& How You Can Find Yours!)

This calculation follows the reasoning behind the 90-9-1 Rule for Participation Inequality in Social Media, and would answer the question of what percent of the customer’s total actions were dashboard consumption related.

The moment I put this measure into a box plot it became crystal clear that there is a difference between the happy and sad customers, regardless of customer type or license attributes. I was on to something! But I couldn’t rest easy just yet.

I exported my results and loaded them into R in order to ensure my findings were statistically significant. In order to do this, I applied a statistical significance test and found that the only significant measure was the percent of dashboard consumption ratio.

Later on, I built a decision tree and this exact same measure appeared to be the strongest churn predictor again. I realized that, although customer demographics have different benchmarks, whenever the percent of dashboard consumption does not “beat” the percent of data preparation, a red flag needs to be raised. Success!

DT 770x429 How We Found our North Star Metric (& How You Can Find Yours!)

Take away: Dig deeper. It may seem repetitive to continue to run tests upon tests but this is your overarching, most important metric. Attacking it from all sides with statistical tools and techniques is a must.

4. Monitor

Now that you’ve found your North Star metric, it’s time to roll it out to those who you involved all the way back in step one. Present the metric to them and get them used to using it in their day to day report.

Whenever you’re dealing with a customer based benchmark, which we are here, make sure to create an alert based system so that users don’t have to constantly be checking in manually. Of course, we use Sisense Pulse, set the appropriate threshold, and let Sisense do the rest.

Congratulations! You found your North Star metric. Your work, however, is never complete. We know all too well that businesses change faster than a speeding bullet. It’s important you’re constantly monitoring your North Star metric and running tests to make sure it stays accurate. If anything changes, commit to following the winning data and make changes as you go.

Take away: Once you find your North Star metric take a minute to celebrate. Remember though, it’s not something you can set and forget. You need to iterate, test, and commit to following the data to improve your product.

All data represented in this post is for informational purposes only and is not accurate.

GrowthHacking 770x250 1 1 770x250 How We Found our North Star Metric (& How You Can Find Yours!)

Let’s block ads! (Why?)

Blog – Sisense

How to Calculate Total Cost of Ownership for Business Intelligence

Imagine you’re comparing gym memberships to figure out which one offers the best value. Sure, you could simply look at the monthly fee and go for the cheapest, but that wouldn’t tell you everything you need to know about the total cost of ownership.

For starters, you’d want to know what the cost includes. Does it offer all the machines and classes? Do you have to rent/buy extra equipment? Then there are the less obvious considerations. Do you need to pay a trainer to get true value? What’s the price of travel? Is there enough capacity to cope with the crowds, even during peak hours?

Loosely speaking, the approach to buying a new gym membership should be, for the majority of savvy businesses, the same approach they use for price comparisons when weighing up different tech solutions for their business – especially with a solution as powerful and intricate as Business Intelligence.

Business Intelligence Pricing – There’s a Catch

There are many things to consider when pricing out the total cost of ownership of BI. To really get a feel for the cost of implementing a BI solution, start by making sure that the platform in question does everything you need and has enough capacity for all of your data – or if not, how much you’ll need to spend on additional technical infrastructure, tools, or the necessary consulting / IT expertise manpower to tailor a solution version that does work for you.

Try to estimate how much you’ll need to commit in terms of internal budget and resources, whether you’ll need to pay to take on new staff, and the opportunity costs of taking existing personnel off revenue-generating projects to ensure smooth deployment and daily use.

Then, once you’ve tallied up all the hidden costs of rolling out and operating a workable solution, choose the option that offers the best value for the price tag.

Sounds sensible, right? Well, yes – in 99% of cases, this formula works just fine.

But BI is different. To work out the real cost of using your BI platform, you have to take a final, vital step: calculate the value that a BI solution gives you – it’s cost of new analytics.

770x250 TOC 2 770x250 How to Calculate Total Cost of Ownership for Business Intelligence

Considering the Cost of New Analytics

Let’s look at the gym membership example again. Imagine that you spot in the small print that one of the gyms is only open on weekends, whereas the other one is open every day.

Until this point, you’d thought Gym A offered the better deal. You’d calculated the total cost of ownership at $ 820 per year, while Gym B worked out at $ 1200 per year.

But if you can only visit Gym A a maximum of twice a week, even if you take every available opportunity to go, you’re still paying a significant amount of money per session. The gym is only open 104 days of the year, so the absolute minimum you pay per workout will be:

$ 820 / 104 = $ 7.8

Gym B, on the other hand, might be more expensive, but it’s open seven days a week. In fact, it’s only closed on two days out of the whole year. If you took advantage of this and went there on every possible day, the minimum you’d pay per workout would be:

$ 1200 / 363 = $ 3.3

Suddenly, Gym B looks like a much better option, right?

This is precisely how you need to approach your value assessment of a BI platform, too.

That’s because BI platforms vary wildly in the time it takes you to submit a new data query, generate results and present them in a format that makes sense – for example, an easy-to-process dashboard showing progress on your KPIs.

On first look, it might seem that the annual total cost of ownership of one product is much higher than another. Once you factor in the turnaround time for a data analysis project, though, and divide your number by the maximum amount of data projects you can process in a year, this could quickly start to look very different indeed.

That’s because BI tools aren’t best measured by total cost of ownership per annum, but by the cost of running each individual analysis.

How to Calculate the Cost of New Analytics

In short, it’s putting a concrete number on the actual value you and your team are going to be getting from a BI solution.

Since we have already established that upfront costs is just one aspect of a bigger equation, businesses are now using a newer, more clever and accurate way of measuring the total cost of ownership of a BI solution by incorporating the full value potential of BI – how much will you and your team benefit from BI – that’s by calculating the cost of new analytics.

Ask yourself: What is the cost of a new analytics report for my team? This is precisely how you need to approach your value assessment of a BI platform because the cost of new analytics essentially calculates how quickly your team can churn out (and benefit from) new analytics and reports, which actually measures how much value for how much investment you are getting from your BI tool.

A Formula for Calculating BI’s Total Cost of Ownership

By incorporating the notion of speed, you will quantify how agile a BI tool is, which depends on quickness on operations.

Get our guide on calculating the total cost of ownership of a BI tool to see an exact formula on how you can quantify the cost of new analytics and take all costs – from technical infrastructure to manpower- into account before you buy a business intelligence solution.

770x250 TOC 2 770x250 How to Calculate Total Cost of Ownership for Business Intelligence

Let’s block ads! (Why?)

Blog – Sisense

Postgres vs. MongoDB for Storing JSON Data – Which Should You Choose?

In the fast-moving world of unstructured data, does it make more sense to use a database management system (DBMS) built from the start to handle the widely accepted JSON data format? Or can an SQL database that now includes JSON functionality be a better choice? Postgres with its SQL roots started offering NoSQL functionality early on with its key-value store functionality, called hstore and introduced in 2006. JSON document storage and management for Postgres arrived somewhat later, after MongoDB began life in 2009 as a native JSON document DBMS. Since then, MongoDB and Postgres have both been enhancing their JSON storage capabilities.

What is MongoDB? What is PostgreSQL?

The question of MongoDB vs PostgreSQL is not a new one. Let’s take a look at the most basic differences between the two commonly used databases.

MongoDB is an open source database designed for scalability and agility. It uses dynamic schemas so you can make records without defining the structure first, and supports hierarchical documentation of data.

mongo

On the other hand, PostgreSQL is an open source relational database with a focus on standards compliance and extensibility. PostgreSQL uses both dynamic and static schemas and, unlike MongoDB, supports relational data and normalized form storage.

The Rise and Rise of JSON and JSONB

To better understand similarities and differences between the two database systems, let’s quickly recap on JavaScript Object Notation, or JSON for short. Unstructured and human-readable, the JSON data format is something of a milestone on the road to user-friendly computing. It offers the ability to dump data into a database as it comes. Fields in a data record can be nested and different fields can be added to individual data records as required. Preferred now by many over XML, the flexible JSON data format is used by a number of NoSQL data stores. Because basic JSON lacks indexing, the JSONB data format was created. It stores data in a binary format, instead of a simple JSON blob. JSONB data input is little slower, but processing is then significantly faster because the data does not need to be reparsed.

banner database selection Postgres vs. MongoDB for Storing JSON Data – Which Should You Choose?

Deliberate Constraints and Collateral Limitations

Both Postgres and MongoDB offer JSON and JSONB (MongoDB calls its JSONB “BSON”) data storage functionality. There are however differences:

  • The BSON format used by MongoDB is limited to a maximum of 64 bits for representing an integer or floating point number, whereas the JSONB format used by Postgres does not have this limit.
  • Postgres provides data constraint and validation functions to help ensure that JSON documents are more meaningful: for example, preventing attempts to store alphabetical characters where numerical values are expected.
  • MongoDB offers automatic database sharding for easy horizontal scaling of JSON data storage. Scaling of Postgres installations has often been vertical. Horizontal scaling of Postgres is also possible, but tends to be more involved or use an additional third party solution.
  • MongoDB also offers the possibility of increasing write throughput by deferring writing to disk. The tradeoff is potential loss of data, but this may suit users who have less need to persist their data.

In offering both SQL as well as JSON storage, Postgres lets users keep their options open. Data can be routed to a JSON column for possible data modeling afterwards, or to a table using an SQL schema, all within the same Postgres database.

Native JSON Data Stores do not always have the Best Performance

One of the advantages frequently cited for NoSQL database management systems is their performance. Operating with simpler data structures than those of SQL databases, NoSQL database systems have often shown faster speeds of storage and retrieval. While they may lack the ACID (atomicity, consistency, isolation and durability) properties needed for financial transactions, for example, they may offer advantages in handling larger volumes of unstructured data more rapidly.

However, NoSQL fans got a shock when performance ratings from EnterpriseDB (enterprisedb.com) in 2014 showed Postgres performance to be significantly better than that of MongoDB. The tests were based on selecting, loading, and inserting complex document data to the tune of 50 million records. Postgres was about twice as fast in data ingestion, two-and-half times as fast in data selection, and three times as fast in data inserts. Postgres also consumed 25% less disk space.

Still, performance ratings are made to be beaten. With the introduction of its WiredTiger database engine, MongoDB 3.0 offered improvements in write speeds (between 7 and 10 times as fast), together with data compression of 50% to cut disk space.

Use Cases and Factors Affecting the Choice of Postgres or MongoDB

The question is – where does this leave us in terms of choosing either Postgres or MongoDB for JSON data storage? The answer is that any choice will depend on your goals and your circumstances.

  • Focus on the application. MongoDB minimizes the number of database management commands needed in application development. This can fit well with rapid prototyping, as well as queries and commands built on demand by the application. On the other hand, the application itself must insert meaningful data. Software maintenance may require more effort afterwards as well.
  • Structure needed later. Postgres offers similar broad powers for unstructured data, but also lets developers migrate to a mixture of unstructured and structured data later. If ACID compliance is likely to be a future requirement as data collected or generated becomes more valuable to its owners, Postgres may be a more suitable choice from the beginning for JSON data storage.
  • Static JSON data. For relatively static JSON data and active data naturally structured for SQL storage, Postgres offers the advantage of efficient JSONB representation and indexing capabilities (although ODBC and BI integration enable running SQL queries in MongoDB reporting as well).
  • JSON data modification. On the other hand, for JSON data that will be modified within the data store, MongoDB, engineered from the start around JSON documents, offers possibilities for updating individual fields that Postgres does not. While Postgres is efficient in the storage and retrieval of JSON documents, JSON field modification in Postgres requires the extraction of the entire JSON document concerned, modification of the field concerned, and the subsequent rewriting of the document back into the data store.
  • Dynamic queries. Typical uses of MongoDB focus on frequently changing data of different types, without any complex transactions between objects. It is suited to dynamic queries of frequently written or read data, offering good performance for the storage of JSON documents with a large number of fields with ad hoc queries on a small subset of those fields.
  • Automatic sharding. The automatic sharding functionality of MongoDB may fit well with IT environments using multiple instances of standardized, commodity hardware (converged architectures).
  • Costs and resources. The availability and costs of hosting platforms for Postgres and MongoDB may be part of the decision criteria, as well as the ease or expense of hiring developers with the corresponding skills. Resources of Postgres knowledge and talent have been built up over time, encouraged among other things by the inclusion of Postgres at no extra cost in many Linux operating systems. On the other hand, since its introduction, MongoDB has already achieved the status of fifth most popular database technology out of all the technologies available (and not just NoSQL), suggesting that it too benefits from a reasonable pool of talent.

Conclusion

Emotions sometimes run high, even when it comes to purely technical choices. Data-driven decisions are not always easy to make when new releases and new performance ratings continually upset previous evaluations. In addition, the use cases above show that there is no automatic winner. If you have already made a choice between Postgres and MongoDB, sunk effort and acquired expertise may make a change undesirable. However, the experiences of some business users related on the net show that sometimes such choices are reversed even after a significant period of deployment and operation.

In the future, a choice between Postgres and MongoDB for JSON storage may depend on yet other factors. When commenting on the release of JSONB functionality for Postgres, Robert Haas, the Chief Architect at EnterpriseDB, said, “The implication for NoSQL solutions is that innovation around the format in which you store your data is not enough; you’ve got to come up with truly novel capabilities for working with the data, which is much harder.”

banner database selection Postgres vs. MongoDB for Storing JSON Data – Which Should You Choose?

Let’s block ads! (Why?)

Blog – Sisense

How to Streamline Query Times to Handle Billions of Records

Here at Sisense, we love a challenge, so when a client comes to us and tells us they need to find a way to run queries on billions of records without this slowing them down, our ears perk up and we leap at the chance to find a solution.

In fact, that’s how we recently found ourselves testing a billion transactional records and three million dimensional records – totaling a whopping 500GB of data – with 100 concurrent users and up to 38 concurrent queries, with a total setup time of just two hours… and an average query time of 0.1 seconds!

But wait! I’m getting ahead of myself. Let’s start by talking through some of the issues that affect how fast you can query data.

How Are You Storing Your Data?

Let’s start with the obvious: data warehousing.

Typically, working with masses of data means you also need extensive data warehousing in place to handle it, alongside Extract-Transform-Load tools that uploads data from the original source on a regular basis (Extract), adjusts formats and resolve conflicts to make the datasets compatible (Transform), and then delivers all of this data into the analytical repository where it’s ready for you to run queries, calculations, and trend analysis (Load).

This creates a single version of the truth – a source of data that brings together all your disparate pieces into one place. While this is great, there are also some drawbacks to data warehousing.

First of all, data warehouses are highly structured, and the row-and-column schema can be overly restrictive for some forms of data. Also, the sheer volume of data quickly overloads most systems, grinding to a halt if you run queries that attempt to tap into the entire data pool.

Then, there are data marts.

To help tackle the issues that come with working with huge data sets, many IT teams deploy data marts alongside their databases. These essentially siphon off access to a smaller chunk of the data – and then you select which data marts each department or user has access to. The outcome of this is that you put less pressure on your hardware, as your computer is tapping into smaller pools of data, but the flipside is that you have vastly reduced access to the organization’s total data assets in the first place.

At the other end of the scale, you have data lakes.

These are used to store massive amounts of unstructured data, helping to bypass some of the issues that come with using conventional data warehouses. They also make sandboxing easier, allowing you to try out different data models and transformations before you settle on a final schema for your data warehouse – to avoid getting trapped into something that doesn’t work for you.

The trouble with data lakes is that, while the offer formidable capacity for storing data, you do need to have all kinds of tools in place to interface between the data lake and your data warehouse, or with your end data analytics tool if you want to skip the need for warehousing on top. Systems like this that use data lakes aren’t exactly agile, so your IT team will need to be pretty heavily involved in order to extract the insights you want.

Alternatively, you might deal with unstructured data using an unconventional data storage option.

For example, you might use a NoSQL database like MongoDB.

This gives you tons of freedom in terms of the kind of data you add and store, and the way that you choose to store it. MongoDB also makes use of sharding techniques to avoid piling the pressure on your IT infrastructure, allowing for (pretty much) infinite scaling.

The downside, of course, is that the thing that makes this so great – the unstructured, NoSQL architecture – also makes it tricky to feed this data straight into a reporting tool or analytics platform. You need a way to clean up and reconcile the data first.

What About Tools Used for Analysis?

Dynamic DBMS tools like PostgreSQL can open doors.

PostgreSQL is an analytics and reporting tool that allows you to work with an enormous variety of data types – including native data types that give you much more freedom as you come to build and manipulate a BI solution, and “array” types, which help you to aggregate query results rapidly on an ad hoc basis.

Introducing PostgreSQL into the mix can be massively helpful in bringing together your disparate strands – but again, it can’t do everything. It can’t help much with qualitative data, and as a non-relational database (which wasn’t built to handle Big Data) it will buckle under huge volumes of information.

You can also use R for high end predictive analytics.

Lastly, once you have a solid BI system in place, you can add another layer of awesomeness by using R to build working models for statistical analysis, quickly and easily. R is incredibly versatile, and allows you to move away from static reporting by programming a system for analysis that you can adapt and improve as you go.

The thing is, though, this is an add-on: it doesn’t replace your current BI or data analytics system. R is an excellent programming language that can help you generate predictive analytics fast, but you need to have a rock-solid system in place for handling and preparing data in the first place.

How to Streamline Everything

I know what you’re thinking: I said I was going to explain how to streamline your data queries to help you generate results faster, but so far, all I’ve done is dangle some potential solutions and then show you how they fall short!

That’s because I haven’t revealed the secret sauce that binds all these pieces together in perfect harmony.

As you can see, each of the tools we’ve discussed are used to fix one problem in the storage, flow and use of data within your organization, but they don’t help with the big picture.

That’s where Sisense’s Elasticube comes in.

The Elasticube allows you to store data or drag it in directly from your existing stores at lightning speed, giving users unfettered access to their entire pool of data, whatever format it’s kept in (unless you choose to stagger permissions). Thanks to clever use of In-Chip Processing and a Columnar Database structure, you tap into only the data you need for the query, without restricting yourself permanently, as you would with a data mart.

You can then reconcile and harmonize this data with minimal hassle to treat all these strands as a single data source for the purpose of analysis and reporting.

Still within the Elasticube, you can map and manipulate these data sources to build your own dashboards and run your own queries at incredible speed.

Plus, using our range of custom-built connectors, you can link your Sisense Elasticube directly to MongoDB, PostgreSQL and other DMBS tools, and you can integrate Sisense with R for even more in-depth predictive analytics.

Where the Magic Happens

So that’s the big secret. Using the Sisense Elasticube, I was able to set up a system in 120 minutes that could run concurrent queries on data representing one billion online purchases, from three million origins/ destinations, with an average query time of 0.1 seconds and a maximum query time of just 3 seconds.

Pretty impressive, huh? Here’s what it looked like:

Sisense performance 4 770x415 How to Streamline Query Times to Handle Billions of Records

And here’s an example dashboard that we used to display the results in real time:

Sisense performance 5 770x401 How to Streamline Query Times to Handle Billions of Records

How’s that for streamlined?

Want to see exactly how I built this super-streamlined query model for yourself? Click here for the detailed tutorial.

Let’s block ads! (Why?)

Blog – Sisense

Infographic: Why You Need Embedded Analytics

These days, customers demand a lot from their service providers. Any why shouldn’t they?

But in a world with so many requests coming from your customers, business intelligence shouldn’t fall to the wayside. Embedding analytics not only expands your offering and creates happy users, it also offers a wide variety of benefits internally.

Let’s have a look at how embedded analytics can benefit the C-Suite, Product Development, and R&D teams.

OEM Package Infographic 1 Infographic: Why You Need Embedded Analytics

Want to learn even more about how embedded analytics can benefit you? Check out our whitepapers, tailor made for the C-Suite, Head of Product, and R&D Team.

Let’s block ads! (Why?)

Blog – Sisense