• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Tag Archives: data

Kili Technology unveils data annotation platform to improve AI, raises $7 million

January 26, 2021   Big Data
 Kili Technology unveils data annotation platform to improve AI, raises $7 million

How open banking is driving huge innovation

Learn how fintechs and forward-thinking FIs are accelerating personalized financial products through data-rich APIs.

Register Now


Poor or uncategorized raw data can be a major impediment for enterprises that want to build high-quality artificial intelligence that has a meaningful impact on their business. Organizing unstructured data such as images and audio can present a particularly daunting obstacle in this regard.

Today, Paris-based Kili Technology unveiled its service that allows enterprises to annotate raw data such as video, drone aerial images, contracts, and emails. The company’s collaborative platform enables employees to make the data labeling process more efficient.

The company also said it had raised its first outside funding in a round led by Serena Capital and e.ventures, which invested along with business angels such as Datadog CEO Olivier Pomel, Algolia CEO Nicolas Dessaigne, and PeopleDoc founders Stanislas de Bentzmann and Gus Robertson. After a fast start, the company has ambitious plans to expand its international reach.

“The mission is super simple,” said Kili CEO and cofounder François-Xavier Leduc. “To build AI, you need three things. You need the computing power that you can buy easily on Amazon, you need an algorithm that is available as open source, and you need training sets. We are making the bridge between the raw data and what is required to build AI at scale for companies. Our mission is to help our customers turn this raw data into training data so that they can scale AI applications on their internal challenges to solve their issues.”

The company is part of a fast-moving and competitive data annotation sector. Dataloop last year raised $ 16 million for its data annotation tools. SuperAnnotate raised $ 3 million for its AI techniques that speed up data labeling. And earlier last year, IBM released annotation tools that tap AI to label images.

All of these companies have identified similar issues with developing high-quality AI: Getting data that can be readily processed to train AI. According to Kili, 29,000 Gigabytes of unstructured data are published every second, but much of it remains useless when it comes to training AI.

Founded in 2018 by Leduc and CTO Édouard d’Archimbaud, Kili offers a stable of experts to complement a company’s internal teams and help accelerate the annotation process.

Kili builds on work d’Archimbaud did while at BNP Paribas, where he ran the bank’s artificial intelligence lab. His team was trying to build models for processing unstructured data and ended up creating their own tools for data annotation.

Kili’s system, as d’Archimbaud explained, relies on a basic concept, similar to tagging people in a photo on Facebook. When users click on an image, a little box pops up so they can type in a name and attach a label to the image. Kili uses AI to allow enterprises to take this process to an industrialized scale to create higher-quality datasets.

“Before, people were thinking that AI was about algorithms, and having the most state-of-the-art algorithm,” d’Archimbaud said. “But it’s not the case anymore. Today, AI is about having the best data to train models.”

Kili’s cofounders bootstrapped the company for its first two years. But Kili has already attracted large customers in Europe, China, and the U.S. across a variety of industries.

As Kili gained more traction, the confounders decided to raise their first outside round of funding to accelerate sales and marketing. But they also intentionally sought out business angels who worked in other data-related startups to help provide practical guidance on building a global company to seize a growing opportunity.

“Two years ago, the data annotations market was estimated to be $ 2 billion in four years,” Leduc said. “And now it’s estimated to be $ 4 billion. It’s going to go fast, and it will definitely be huge. And it’s a new category. So there is an opportunity to be a worldwide leader. Today, we are positioned to be one of them.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Center for Applied Data Ethics suggests treating AI like a bureaucracy

January 22, 2021   Big Data
 Center for Applied Data Ethics suggests treating AI like a bureaucracy

How open banking is driving huge innovation

Learn how fintechs and forward-thinking FIs are accelerating personalized financial products through data-rich APIs.

Register Now


A recent paper from the Center for Applied Data Ethics (CADE) at the University of San Francisco urges AI practitioners to adopt terms from anthropology when reviewing the performance of large machine learning models. The research suggests using this terminology to interrogate and analyze bureaucracy, states, and power structures in order to critically assess the performance of large machine learning models with the potential to harm people.

“This paper centers power as one of the factors designers need to identify and struggle with, alongside the ongoing conversations about biases in data and code, to understand why algorithmic systems tend to become inaccurate, absurd, harmful, and oppressive. This paper frames the massive algorithmic systems that harm marginalized groups as functionally similar to massive, sprawling administrative states that James Scott describes in Seeing Like a State,” the author wrote.

The paper was authored by CADE fellow Ali Alkhatib, with guidance from director Rachel Thomas and CADE fellows Nana Young and Razvan Amironesei.

The researchers particularly look to the work of James Scott, who has examined hubris in administrative planning and sociotechnical systems. In Europe in the 1800s, for example, timber industry companies began using abridged maps and a field called “scientific forestry” to carry out monoculture planting in grids. While the practice resulted in higher initial yields in some cases, productivity dropped sharply in the second generation, underlining the validity of scientific principles favoring diversity. Like those abridged maps, Alkhatib argues, algorithms can both summarize and transform the world and are an expression of the difference between people’s lived experiences and what bureaucracies see or fail to see.

The paper, titled “To Live in Their Utopia: Why Algorithmic Systems Create Absurd Outcomes,” was recently published and accepted by the ACM Conference on Human Factors in Computing Systems (CHI), which will be held in May.

Recalling Scott’s analysis of states, Alkhatib warns against harms that can result from unhampered AI, including the administrative and computational reordering of society, a weakened civil society, and the rise of an authoritarian state. Alkhatib notes that such algorithms can misread and punish marginalized groups whose experiences do not fit within the confines of data considered to train a model.

People privileged enough to be considered the default by data scientists and who are not directly impacted by algorithmic bias and other harms may see the underrepresentation of race or gender as inconsequential. Data Feminism authors Catherine D’Ignazio and Lauren Klein describe this as “privilege hazard.” As Alkhatib put it, “other people have to recognize that race, gender, their experience of disability, or other dimensions of their lives inextricably affect how they experience the world.”

He also cautions against uncritically accepting AI’s promise of a better world.

“AIs cause so much harm because they exhort us to live in their utopia,” the paper reads. “Framing AI as creating and imposing its own utopia against which people are judged is deliberately suggestive. The intention is to square us as designers and participants in systems against the reality that the world that computer scientists have captured in data is one that surveils, scrutinizes, and excludes the very groups that it most badly misreads. It squares us against the fact that the people we subject these systems to repeatedly endure abuse, harassment, and real violence precisely because they fall outside the paradigmatic model that the state — and now the algorithm — has constructed to describe the world.”

At the same time, Alkhatib warns people not to see AI-driven power shifts as inevitable.

“We can and must more carefully reckon with the parts we play in empowering algorithmic systems to create their own models of the world, in allowing those systems to run roughshod over the people they harm, and in excluding and limiting interrogation of the systems that we participate in building.”

Potential solutions the paper offers include undermining oppressive technologies and following the guidance of Stanford AI Lab researcher Pratyusha Kalluri, who advises asking whether AI shifts power, rather than whether it meets a chosen numeric definition of fair or good. Alkhatib also stresses the importance of individual resistance and refusal to participate in unjust systems to deny them power.

Other recent solutions include a culture change in computer vision and NLP, reduction in scale, and investments to reduce dependence on large datasets that make it virtually impossible to know what data is being used to train deep learning models. Failure to do so, researchers argue, will leave a small group of elite companies to create massive AI models such as OpenAI’s GPT-3 and the trillion-parameter language model Google introduced earlier this month.

The paper’s cross-disciplinary approach is also in line with a diverse body of work AI researchers have produced within the past year. Last month, researchers released the first details of OcéanIA, which treats a scientific project for identifying phytoplankton species as a challenge for machine learning, oceanography, and science. Other researchers have advised a multidisciplinary approach to advancing the fields of deep reinforcement learning and NLP bias assessment.

We’ve also seen analysis of AI that teams sociology and critical race theory, as well as anticolonial AI, which calls for recognizing the historical context associated with colonialism in order to understand which practices to avoid when building AI systems. And VentureBeat has written extensively about the fact that AI ethics is all about power.

Last year, a cohort of well-known members of the algorithmic bias research community created an internal algorithm-auditing framework to close AI accountability gaps within organizations. That work asks organizations to draw lessons from the aerospace, finance, and medical device industries. Coauthors of the paper include Margaret Mitchell and Timnit Gebru, who used to lead the Google AI ethics team together. Since then, Google has fired Gebru and, according to a Google spokesperson, opened an investigation into Mitchell.

With control of the presidency and both houses of Congress in the U.S., Democrats could address a range of tech policy issues in the coming years, from laws regulating the use of facial recognition by businesses, governments, and law enforcement to antitrust actions to rein in Big Tech. However, a 50-50 Senate means Democrats may be forced to consider bipartisan or moderate positions in order to pass legislation.

The Biden administration emphasized support for diversity and distaste for algorithmic bias in a televised ceremony introducing the science and technology team on January 16. Vice President Kamala Harris has also spoken passionately against algorithmic bias and automated discrimination. In the first hours of his administration, President Biden signed an executive order to advance racial equality that instructs the White House Office of Science and Technology Policy (OSTP) to participate in a newly formed working group tasked with disaggregating government data. This initiative is based in part on concerns that an inability to analyze such data impedes efforts to advance equity.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Improving Dynamics 365 Data Integrations with Alternate Keys

January 22, 2021   CRM News and Info
3190124784 Improving Dynamics 365 Data Integrations with Alternate Keys

Dynamics 365 Data integrations is something that we’re quite familiar with – whether it’s a custom integration tailored for one of our customers or it’s utilizing our popular pre-built solution CRM For Distribution.  Microsoft added alternate key support to Dynamics 365, which allows us to greatly improve integration speed.

The challenge

One challenge that we’ve had in the past was speed. In other words, how much time it takes to actually get your data from another system into Dynamics 365. If all that’s needed is to Create records, then speed usually isn’t an issue. The challenge comes into play when data already exists in Dynamics 365 or you need to backfill data. This could sometimes take several days to several weeks! One of the main reasons for this was that this required 2 steps:

  • First, query Dynamics 365 to determine if the record you want to update already exists
  • If it doesn’t exist, then Create it. If it does exist, then Update it.

That might not seem like much, but if you can imagine trying to backfill thousands or millions of rows of data, this can take a huge toll on performance.

The solution

A few years ago, Microsoft began allowing Dynamics 365 developers to use Upsert statements.  Upserts can dramatically improve the speed of your data processing. Instead of making 2 separate calls to Dynamics 365, an Upsert allows for one call that can determine if a record needs to be Updated or Created based on a unique key. The key defines the unique identifier for the record (using one field or a combo of fields).

With the introduction of alternate keys in Dynamics 365 not too long ago, this made using Upsert statements incredibly easy. Alternate keys provide developers a way to specify their own unique identifier for an entity based on the source data. Alternate keys also provide a safe-net to prevent duplicate records when used, because it is enforced at the database level.

Developers can create an alternate key programmatically or via the power platform admin center. You can specify up to 5 alternate keys per entity which is sufficient. There are a few other constraints as well, but these likely won’t hold you back from using them:

  • Only field types of decimal, whole number, single line of text, datetime, lookup and picklist can be included in the key
  • All fields included in the key must be allowed to be populated via either a create or update of the entity records
  • Field-level security needs to be turned off for the included fields (which is the default)
  • Fields can’t be logical or inherited. For example, address fields on Account and Contact entities
  • The key size can be a maximum of 900 bytes and 16 columns
  • Certain special characters (/,<,>,*,%,&,:,\) in key field values will cause errors when performing a Get or Patch

The challenge with Alternate Keys

You may run into a challenge when creating Alternate Keys in an existing Dynamics 365 organization if it already contains duplicate records as defined by the Key. In that case, you will notice that the Key status is set to Inactive. In order to Activate it, you must eliminate the duplicate records, either by deleting the duplicates or removing the values on one of the corresponding fields for each duplicate record.

You can find duplicates in Dynamics 365 a couple different ways:

Create and publish a duplicate detection rule. Then, run a duplicate detection job on all records for that entity.  This process works well for the most part, but can be slow if you have a ton of data.
or
Use the Dynamics 365 SDK or 3rd party tools which can help you identify  duplicates within minutes.

Note: While Dynamics 365 allows you to merge records, this does not remove the duplicate record from the database, so the alternate key will remain Inactive. You can give your end users control over which duplicate records are removed by instructing them to merge the records or utilize a custom merge process. Then, use an automated process to delete the Inactive merged records.

We’ve been implementing alternate keys with Upserts in our data integration solutions for our customers and we have seen an incredible improvement in speed! It allows us to push more data in a timely manner and the keys help enforce record uniqueness at the database level which in turn makes our customers happy.

If you want to learn more about how you can improving Dynamics 365 data integrations with alternate keys, contact Beringer today!

Beringer Technology Group, a leading Microsoft Gold Certified Partner specializing in Microsoft Dynamics 365 and CRM for Distribution. We also provide expert Managed IT Services, Backup and Disaster Recovery, Cloud Based Computing and Unified Communication Solutions.

Let’s block ads! (Why?)

CRM Software Blog | Dynamics 365

Read More

TIBCO4Good and She Loves Data Offer Free Data Skills Workshops During a Time of Vulnerability

January 20, 2021   TIBCO Spotfire
TIBCO SheLovesData e1611075446577 696x365 TIBCO4Good and She Loves Data Offer Free Data Skills Workshops During a Time of Vulnerability

Reading Time: 2 minutes

New to designing meaningful, interactive data visualizations and dashboards? Want to learn how to better understand and prepare your data? Then TIBCO4Good™’s recently launched “Back on Your Feet” program is for you. Built in partnership with Singapore Management University (SMU) and She Loves Data, this global program can help you up your data visualization and dashboard building skills and stay competitive in today’s professional job market. 

Back on Your Feet Program: Empowering Individuals Impacted by the Pandemic  

The “Back on Your Feet” program was designed specifically to help educate individuals affected by the global COVID-19 pandemic. It offers a series of workshops and training sessions to empower individuals who may have lost their jobs due to the pandemic and want to increase their competitiveness in an increasingly tough marketplace. 

SMU, a premier university in Asia, internationally recognized for its research and distinguished teaching, has developed the course content, including discussions on how to design, develop, and deploy analytic dashboards. “The pandemic has upended the lives and livelihoods of many around the world. As an engaged city university with a strong suite of expertise in computing research and education, the School of Computing and Information Systems is keen to contribute to this meaningful program by TIBCO through our partnership with She Loves Data to upskill and re-skill individuals impacted by the economic malaise, thereby creating a positive impact on the community in Singapore and beyond,” said Professor Pang Hwee Hwa, Dean of SMU School of Computing and Information Systems.

 TIBCO4Good and She Loves Data Offer Free Data Skills Workshops During a Time of Vulnerability
From left to right: Nina Conseil, Global Certification Programs at She Loves Data, Bernard Tay and Justin Choy are Teaching Assistants and Rafael J. Barros is Senior Lecturer at the School of Computing and Information Systems, Singapore Management University

Program Benefits and Session Structure

Upon completion of the workshops, participants are certified as dashboard and forecasting specialists and receive a free TIBCO Spotfire® license for a year. The program began in APJ in November and December 2020, will extend to the U.S. in early 2021, followed closely by EMEA.

Here’s a quick overview of the program structure:

  • Module 1: Dashboard exploration as an end user; Marking and filtering concepts 
  • Module 2: Asking the right questions by understanding what data is important; Foundations of charts and basic visualizations
  • Module 3: Basics in data cleaning, advanced visualizations, and connecting to multiple data sources
  • Module 4: Designing the dashboard for end-user consumption 

Your Chance to Up Your Dashboarding Skills

Join us for the second edition of a series of four training modules designed to help you become a dashboard specialist. In the first session, 230 attendees participated actively in the program. Register now for the next event on February 20th!
And to learn more about other TIBCO4Good initiatives helping communities around the world, head to www.tibco.com/tibco4good.

Let’s block ads! (Why?)

The TIBCO Blog

Read More

“Without Data, Nothing” — Building Apps That Last With Data

January 20, 2021   Sisense

Every company is becoming a data company. Data-Powered Apps delves into how product teams are infusing insights into applications and services to build products that will delight users and stand the test of time.

In philosophy, a “sine qua non” is something without which the phenomenon under consideration cannot exist. For modern apps, that “something” is data and analytics. No matter what a company does, a brilliant app concept alone is insufficient. You have to deftly integrate data and analytics into your product to succeed. 

Whatever your audience, your users are getting more and more used to seeing data and analytics infused throughout apps, products, and services of all kinds. We’ll dig into ways companies can use data and analytics to succeed in the modern app marketplace and look at some now-extinct players that might have thrived with the right data in their platforms.

data analytics successful apps blog cta banner 770x250 1 “Without Data, Nothing” — Building Apps That Last With Data

Sentiment analysis in customer messages

Yik Yak was an anonymous chat app that looked promising initially but failed because of problems that could have been resolved with data and analytics. What made Yik Yak popular was the exotic feature that enabled members to chat anonymously with others in the same geographic vicinity. Unfortunately, that feature was also the cause of the app’s demise: Yik Yak capitalized as a startup with about $ 75 million and grew to a value of $ 400 million before uncontrolled cyberbullying ruined its reputation. After Yik Yak’s name was spoiled as a result of abusive chat, the company could not sell ads on its platform, meaning it could no longer monetize its innovative concept.

How could Yik Yak have used data and analytics to avert disaster? Luma Health showed how message data can be analyzed for mood and meaning by using AI/ML methods on a data lake of chat messages. Yik Yak could have tagged message content with the originating IP address and then quickly blocked messages from that IP after abusive language was detected. This hindsight can now become foresight for other enterprising companies.

The benefits of leveraging collective data

Color Labs was another successful startup whose failure could have been avoided with the right analytics. Although the company’s investment in AI and convolutional neural networks (CNNs) may have been significant, in retrospect, an innovative use of these technologies on the right data could have given it a better shot at survival. The basic service model behind Color Labs’ app was that users would share images and then see images from other users who were posting pictures in the same vicinity (a media-based counterpart to Yik Yak’s concept). The app failed in part for reasons that new dating apps often fail: Needing to go live with a million users on day one! Color Labs’ users joined up only to find little or nothing posted in their vicinity, giving them little incentive to post and share. and leaving them feeling alone in an empty room. The company ultimately folded.

How could data insights have solved this problem for Color Labs? Leveraging the right collective datasets with CNNs could have identified images tagged to a geographical place already freely shared on the internet. Those images could be used to populate the app and get the user engagement ball rolling. Using CNNs in that way is expensive but justifiable if it means keeping the company afloat long enough to reach profitability. New dating app startups actually use a similar trick — purchasing a database of names and pictures and then filling in the blanks to create an artificial set of matches to temporarily satisfy new subscribers’ cravings for instant gratification (one such database is marketed as “50,000 profiles.”) The gamble is that new subscribers will remain hopeful long enough for a number of subscribers to join up and validate their existence. Color Labs could have benefited from existing media with a much lower cost in terms of ethical compromise as well.

Forecasting and modeling business costs

Shyp was an ingenious service app that failed for a number of reasons, but one of those reasons could have been fixed easily with data insights. The basic innovation of Shyp was to package an item for you and then ship it using a standard service like FedEx. The company’s shortcut, which turned out to be a business model error, was to charge a fixed rate of $ 5 for packaging. Whether the item to ship was a mountain bike or a keychain, the flat rate of $ 5 for packaging was a hole in Shyp’s hull, one that sank the company in short order.

Shyp’s mistake could have been resolved cleverly by using the wealth of existing data about object volume, weight, fragility, temperature sensitivity, and other factors to create an intelligent packaging price calculator. Such a database could even have included local variations in the price of packing materials such as foam peanuts, tape, boxes, and bubble wrap, and have presented the calculation at time of payment. Flat fees are attractive and can be used as loss leaders when trying to gather new customers or differentiate oneself in a crowded market, but if you aren’t Amazon, then you need to square the circle somehow. A data-driven algorithm for shipping prices (or whatever your service is) doesn’t just make good business sense — it can even be a selling point!

Social vs. personal networks: Sentiment analysis in data

“Path” fashioned itself an anti-Facebook: According to its founder, former Facebook developer Dave Morin, Path was a “personal network,” not a social network, where people could share “the story of their lives with their closest friends and family.” And for a moment it almost looked like Path might allow people to do just that. The startup boasted a whopping $ 500 million value with steadfast investor confidence that lasted all the way until it faded into obscurity, ultimately being purchased by a Korean tech firm and then removed from app stores. Path intended to enforce its mission to provide personal networks of true friends by limiting each user’s friend count to 50. The friend limit was perceived as detrimental to Path’s success at a time when Facebook users often had thousands of friends, but this alone did not account for the apparent irrelevance of the novel app. What was the missing piece? Data analysis.

Path could have sustained itself as a stalwart alternative to Facebook users disenchanted with the endless mill of likes and heart emojis. The key would have lain in sentiment analysis of user message content: By using natural language processing methods to distinguish close friends from distant acquaintances, Path could have offered its users an innovative platform for knowing who their “real friends” were.

Data analytics and the competitive future

We have seen that startup apps based on ingenious concepts and with funding levels over $ 100 million failed for a variety of reasons that could have been ameliorated or averted with savvy, transformative uses of data, analytics, and insights. One of the original e-hailing taxi companies failed for no other reason than the founding designers’ lack of awareness that Yellow cab drivers in New York at that time did not carry mobile phones!

Data is not only useful for calculating and forecasting the future, it’s a must-have for your app. Every company with a novel concept to unleash into the market must face the reality, as these companies did, that a good idea alone won’t guarantee an app’s success. Innovative use of data in concert with that idea is something that no modern app can survive without.

Jack Cieslak is a 10-year veteran of the tech world. He’s written for Amazon, CB Insights, and others, on topics ranging from ecommerce and VC investments to crazy product launches and top-secret startup projects.

Let’s block ads! (Why?)

Blog – Sisense

Read More

How Profectus Delivers Value from Data

January 9, 2021   Sisense

Every company is becoming a data company. In Data-Powered Businesses, we dive into the ways that companies of all kinds are digitally transforming to make smarter data-driven decisions, monetize their data, and create companies that will thrive in our current era of Big Data.

Streamlining data management across high-volume transactions 

Profectus is an international technology and services company that provides leading technologies for rebate and deal management, contract compliance and accounts payable audits. Founded 20 years ago and with offices in Australia, NZ, USA and The UK, their solutions are leveraged by 100 ASX listed companies, including Westpac and HSBC, Coca Cola Amatil, Vodafone, Coles, Kmart, JP Morgan, and Rio Tinto, just to name a few.  

For Profectus, data is absolutely everything. From accounts payable data coming in as direct feeds from ERP finance systems, to hundreds of thousands of invoices Profectus’ solutions ingest on behalf of its customers, along with any agreement data that their customers have with their suppliers.

“We crunch enormous Accounts Payable data files, and thousands of rebate agreements, and invoices,” Profectus’ Chief Technology Officer, Mark Webster told attendees. “In the retail sector, for one of our biggest customers, we have 4TB of data that we crunch through every few months. That’s billions and billions of rows of data that we go through to find the different variances in order to find the best value for our customers.”

packages CTA banners BI and Analytics1 How Profectus Delivers Value from Data

Part of Profectus’ suite of services is ensuring that every transaction is aligned with a particular deal. But Mark revealed that despite the data-rich services the company provides, a lot of teams still use Excel spreadsheets.

“These have their limitations due to their size and data sets,” he explained. “And when you become a large organization, spreadsheets just aren’t going to cut it for you anymore.”

According to Mark, Profectus found that on average, somewhere between 3.5-4 transactions per 10k transactions contained an error. This number may seem relatively insignificant, but when repeated across millions or even billions of transactions, these errors add up.

“With our solutions, we’re able to save millions of dollars for our clients, simply because we are able to find the details of these transactions buried deep in the data,” Mark explained. “And the reason why we’re able to do that is because we really pride ourselves on focusing on the detail and accuracy of our data analysis. We don’t use aggregate data, we don’t use rollups. We use full detail — and that’s where we find the full value.”

Leveraging smarter data tools to unlock deeper insights

Profectus does a lot of processing, with around 90 people in their office busy “crunching” through row by row of data. But with the company growing fast, the challenge is finding a better way to boost the productivity, efficiency, and accuracy of processing these vast volumes of data at scale.

“Our COO was wondering, how could we possibly bring on more customers and then try to grow the team?” Mark said. “If a customer signs up, well sales are doing their jobs properly. But as they bring all these extra customers on, who can service them? Our business is growing, but our cost base is growing with it, because we just have to hire more and more people to trawl through more and more spreadsheets — that can’t be the sustainable way to do it.”

Profectus began looking for technology to take over and find a solution to automate the process of extracting extremely large volumes of data.

“We wanted to have algorithms, ‘visualization stations’, that actually tease out the differences in the data in a lot more automated way, so that we’re not just throwing more and more human capital at it, but actually leveraging smarter technology,” he added. “Spreadsheets just die at a certain size, and communicating the results becomes extremely difficult.”

“Think about the resources taken for teams to carefully handcraft and curate large spreadsheets, then attach them into an email. Then the customer comes back with various edits and more attachments. Trying to merge all the edits and figure out which version is the right one just gets out of control. And this whole process just breaks down at scale.” 

Discovering the “single source of truth” with Sisense

For Profectus, having a streamlined, automated online system, where there’s a single source of truth was their “holy grail” solution.

“We did a very thorough and rigorous examination of the BI space and we put all of the different platforms through the ringer, but Sisense came out as the leading BI solution on the market,” Mark said. “With Sisense, not only is the data stored safely and securely, but we can extract the full value from our data and we can get the consistent repeatable and scalable answers our business needs.

“We also are using embedded analytics, with a portal that our customers can log in to and see easily for a more unified customer experience — and Sisense allows us to do this far more easily.”

Importantly, it was the sheer scalable power of Sisense’s solution that Profectus found was unmatched in the market.

Unlocking data in Snowflake to deliver insights through Sisense

With a high-powered data warehouse in place, Profectus needed a tool to unlock data that answered critical business questions. Through a combined pairing of Sisense and Snowflake, the Profectus team is now able to unlock the data in Snowflake with datasets they provide, including CSVs, spreadsheets, and third-party API integrations. Snowflake’s speed supports the live connections, ensuring Profectus sees the freshest data in its warehouse whenever up-to-date metrics are needed.

“My team now relies on Sisense and Snowflake to simplify a variety of recurring data aggregation workflows, from reports to spend analysis. Anything that used to require manually aggregating and merging spreadsheets can be pulled out of Sisense.”

“As an example, we ran a representative data set that we had in our Snowflake data warehouse through a competing solution, but we killed the process at 20 minutes because that was already unacceptable both from a customer experience and cost perspective,” Mark explained. “With Sisense, we ran the same data set, and it processed the query within 20 seconds! That was our aha moment.”

“This sort of data efficiency gain is a big deal for us, because it helps us to achieve the scale we need to serve our customers and grow as an organization.” 

The data-driven vision for the future

Moving forward, Profectus is excited to reap the benefits of its new “project Delta,” which involves leveraging Sisense’s solution as part of a revolutionary shift towards smarter data-driven decision making.

“Project Delta for us is all about leveraging the right technology solutions to instigate new and exciting change,” Mark explained. “We want to enable behavior change in our customers, and for our customers to be able to optimize their business decisions, transform the way they do business with their suppliers, and help them enjoy much greater value. We’re confidently shifting towards automating a lot of our processing, taking the problem away from all the 90 people who have to manually check line after line of data, and actually getting the computer to do the job.”

“Importantly, we’re putting the right visualizations online to solve our communications problems, so our customers, their suppliers, and our own analysts can all log into the same solution and look at the same source of data treatment. They can all actually see the same story at the same time consistently, with full version control and no errors.”

Ultimately, Profectus wasn’t just looking for a “software vendor,” but a technology and business partner to work together, to help bring these great solutions to market.

“This is where Sisense really shines for us, because they have very much the same vision that we have around how to unlock insights from data and then take powerful actions based on those insights,” Mark added. “Sisense has a very compelling vision, which fits perfectly with what we’re trying to achieve.”

packages CTA banners Product Teams How Profectus Delivers Value from Data

David Huynh is a Customer Success Manager with Sisense. He holds a degree in Business Information Systems and has spent the last 9 years in a variety of fields including sales and project management. David is passionate about helping businesses leverage data and technology to succeed. When not in the office, he enjoys cooking, travelling, and working on cars.

Let’s block ads! (Why?)

Blog – Sisense

Read More

Kubeflow for data scientists introduction

January 8, 2021   BI News and Info

Kubeflow is a Machine Learning platform that runs on top of Kubernetes and provides end to end functionality for executing machine learning projects. Google created it for internal use of running Tensorflow jobs on Kubernetes, and they later released it as open-source in 2018. Kubeflow has become an essential toolkit for data scientists today since it abstracts them from the underlying complexities of Kubernetes and provides a seamless platform for easy execution and faster delivery of machine learning projects. To appreciate how Kubeflow can make a remarkable difference in a machine learning project, you first need to understand the pain points of data scientists.

word image 36 Kubeflow for data scientists introduction

Why is Kubeflow important for data scientists?

Around four to five years back when the hype of machine learning and data science had just started, everyone tried to capitalize on the trend in a rush. Individuals spent a considerable amount of time and effort to learn machine learning. In contrast, companies pumped millions of dollars overnight to launch their ML and DS projects. Yet, according to a Dec 2019 report, only 22% of companies running machine learning projects could deploy a model to production at all. And more than 43% of the respondents admitted they struggle to scale the ML projects according to the company’s needs.

The main reason behind this high failure rate is that everyone focused only on learning ML and DS concepts with POC work on their local Jupyter notebooks in the initial days. There was no thought process on how to practically execute real-world ML projects and deliver them to production successfully. This lack of understanding became visible when these projects started to fail in companies.

word image 37 Kubeflow for data scientists introduction

Since the ML project life cycle differs from the traditional software life cycle, the concept of MLOPs was soon introduced as a framework similar to DevOps to speed up the delivery of ML projects. To bring consistency and ease in the scalable model deployment process, containerization technologies like Docker and Kubernetes were also introduced for ML projects. Kubernetes is an orchestration framework for containers, specifically allowing easier deployment, horizontal scaling, and load balancing for the ML models.

However, as another report suggests, 39% of the data scientists still find it difficult to work with Docker and Kubernetes. This skill gap becomes a challenge for deploying ML models successfully to production. Even though Docker and Kubernetes can make life easy, they are separate technologies and require different expertise than machine learning to make the best use of them.

There was a growing realization that the data scientists should not be exposed to the complexities of managing the infrastructure side of the ML projects and should be given an abstracted platform where they can focus on what they can do best, crunch data, and create ML models. This is where the release of Kubeflow by Google became a game-changer for data scientists, and you’ll see how in the next section.

Features of Kubeflow

As mentioned in the beginning, Kubeflow is an end-to-end platform for creating, training, and deploying ML models and can run on any place where Kubernetes is already present. Kubeflow is now available on Google Cloud Platform, AWS, and Azure as services but you can also install Kubeflow on-premises or your local laptop. Let us now do a deep dive into Kubeflow offerings.

Model building and training

Kubeflow provides managed Jupyter Notebook instances that can be used for experimenting and creating prototypes of the ML models. It supports the popular libraries of Scikit Learn, Tensorflow, PyTorch, XGBoost and you can also carry out distributed training with the help of TF Jobs.

word image 38 Kubeflow for data scientists introduction

Jupyter Notebook on KubeFlow (Source)

Hyperparameter tuning

Finding the right set of hyperparameters for your model is not an easy manual task as it can be very time-consuming and may not even guarantee an optimal set of hyperparameters.

Katib is Kubeflow’s Hyperparameter tuning system that runs on Kubernetes underneath it to automatically optimize hyperparameters for the best results in less time.

word image 39 Kubeflow for data scientists introduction

Hyperparameter Tuning using Kubeflow Katib (Source)

Model deployment and serving

As shown above, deployment and serving of ML models in production in a scalable manner is the most challenging task for data scientists, but Kubeflow has made this task very easy with plenty of serving tools available for your needs

First of all, it provides KFServing which is a model serving tool that supports multiple frameworks like Tensorflow, PyTorch, Scikit Learn, XGBoost, ONNX. Under the hood, KFServing sets up serverless inference on Kubernetes by hiding the underlying complexity from the user. It takes care of autoscaling and health check of the underlying Kubernetes cluster on its own.

Besides, KFServing, there is another option of Seldon Core and BentoML which are other multi-framework supported serving tools. And in case you are working on the TensorFlow model you can also use the TensorFlow Serving that is available on Kubeflow.

word image 40 Kubeflow for data scientists introduction

KFServing (Source)

Portability and flexibility

Even though Kubeflow has various components to cater to different phases of an ML project life cycle, it does not restrict you to use it only for end-to-end purposes. It gives the flexibility to choose one or more components as per your needs, and, to support this flexibility, it also ensures portability across multiple infrastructures and clouds. This enables you to build and train the model externally and then use KubeFlow only for model deployment purposes. Or you may create and train the model on KubeFlow and then deploy it on some cloud for serving.

word image 41 Kubeflow for data scientists introduction

Kubeflow provides portability across clouds and other infrastructure (Source)

KubeFlow pipelines for CI/CD

The concept of machine learning pipelines for MLOPs actually comes from the DevOPs pipeline to ensure continuous integration and continuous deployment. Kubeflow CI/CD pipelines not only ensure automation of the ML workflows for faster delivery of changes but are also useful to create workflows that are reproducible for scalability.

word image 42 Kubeflow for data scientists introduction

Kubeflow Pipeline (Source)

Kubeflow Fairing

Kubeflow provides a high-level Python SDK – Fairing for creating, training, and deploying machine learning models locally and more importantly, remotely on Cloud. Fairing abstracts the users from the complexity of working with Cloud by streamlining the training and deployment process with just a few lines of codes so that you can focus only on ML models as data scientists.

As per the current documentation, Fairing supports working with GCP, AWS, Azure, and IBM Cloud.

Example – Kubeflow Fairing with AWS

The example below deals with the House Pricing Prediction problem and shows model creation, training, deployment, and serving using Fairing.

  1. ML Code – This snippet shows the code for training and prediction written inside the HousingServe class. (Additional details are omitted from here to keep the focus on the Fairing part, original code can be found here )

word image 43 Kubeflow for data scientists introduction

  1. AWS Setup – The next section shows how to set up Kubeflow Fairing with an AWS account, Docker registry, S3 bucket. You will have to replace the details with your AWS details, but the steps remain similar.

word image 44 Kubeflow for data scientists introduction

  1. Training remotely on AWS – You can submit your ML training job on AWS in just two lines by using TrainJob module of Fairing. The HousingServe class, training data, and AWS docker image are passed as arguments.

word image 45 Kubeflow for data scientists introduction

  1. Deploy Model on AWS – Similarly, deployment of the ML model on AWS is quite easy with the help of PredictionEndpoint module of Fairing. Make note, this time, you are passing the trained model file in the argument.

word image 46 Kubeflow for data scientists introduction

  1. Serving Prediction – The earlier step will generate a prediction endpoint which can be used in the following way for serving prediction. Replace the <endpoint> with the output of the above section.

word image 47 Kubeflow for data scientists introduction

As shown in the example, a Data Scientist only needs to focus on step 1, the ML model creation and other related data pre-processing tasks. All other steps from 2 to 5 are standard Fairing code which is relatively easy to execute for remote training and deployment on the cloud.

Conclusion

This article gave a gentle introduction to Kubeflow for data scientists and touched upon why Kubeflow is an important machine learning toolkit for data scientists. You also saw various functionalities offered by Kubeflow and finally understood Kubeflow Python SDK, Fairing with the help of an example.

If you like this article, you might also like Building Machine Learning Models to Solve Practical Problems – Simple Talk (red-gate.com)

Let’s block ads! (Why?)

SQL – Simple Talk

Read More

Starburst raises $100 million to take on data lake rivals

January 8, 2021   Big Data
 Starburst raises $100 million to take on data lake rivals

Transform 2021

Join us for the world’s leading event about accelerating enterprise transformation with AI and Data, for enterprise technology decision-makers, presented by the #1 publisher in AI and Data

Learn More


Starburst Data has raised $ 100 million as the data analytics company continues to ride the surge in data lakes. Andreessen Horowitz led the round, which included Index Partners, Coatue, and Salesforce’s venture capital arm.

The funding comes just six months after Starburst raised $ 42 million, bringing its total to $ 164 million for a valuation of $ 1.2 billion. And the latest announcement came on the same day another data lake company, Dremio, announced it had raised $ 100 million.

So what’s this arms race all about? As companies grapple with growing amounts of information, data lakes allow them to pool structured and unstructured data in one spot, which then facilitates the movement and processing of that data.

“We believe we are solving the biggest problem that the big data era couldn’t: offering fast access to data, regardless of where it lives,” Starburst CEO Justin Borgman wrote in a blog post.

In the case of Starburst, it’s built on Presto, an open source project developed at Facebook. Indeed, three of Starbursts’ cofounders are from Facebook, where they worked on the project.

Starburst began life as Hadapt, a startup founded by Borgman. Teradata acquired Hadapt in 2014 but spun Starburst off in 2017. Along the way, Hadapt-Starburst shifted its focus from Hadoop to Presto.

VentureBeat

VentureBeat’s mission is to be a digital townsquare for technical decision makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you,
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more.

Become a member

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Dynamics 365 CE On Premises: Data Files – CRM database autogrowth

January 6, 2021   Microsoft Dynamics CRM

Data Files – CRM database autogrowth

Dynamics 365 CE database usually has 1 data file. If you didn’t change the default configuration, is very probable you have the autogrowth by 1 MB and unlimited. If so, this number is very low and when the file reaches the size and needs to grow, the SQL Server needs to take a little time to increase more 1 MB – it means it will occur very often. In the other hand, let’s suppose you have the autogrowth by 10,240 MB (10 GB) and unlimited. This number is very large and when the file reaches the size and needs to grow, the SQL Server can freeze for a while to take time to increase more 10 GB.

Let’s take a look at the Autogrowth configuration:

autogrowth config Dynamics 365 CE On Premises: Data Files – CRM database autogrowth

According the picture above, we can have the following configuration set:

Autogrowth:

  • Enabled
    • File Growth
      • in Percent
      • in Megabytes
    • Maximum File Sile
      • Limited to (MB)
      • Unlimited
  • Disabled

If you enabled the autogrowth, you need to specify the metrics. You can set to grow in percent or in Megabytes. Also, you can set to grow until the limited value in MB or leave it unlimited (it means the file can grow until the total disk space capacity). Let’s suppose you set your file growth in percent. In this case is pretty common to keep it increasing by 10 percent. It can be very quick if the size of your data file is up to 1 GB (it would increase ~100 MB). But if your data file has 1 TB, 10 percent will increate 100 GB and your SQL Server can be very busy until the file growth finishes.

autogrowth crm Dynamics 365 CE On Premises: Data Files – CRM database autogrowth

Consider changing the configuration of SQL Server data files for MSCRM. One option may be to increase the maximum allocation size manually and disable automatic growth, to prevent SQL Server from taking time to increase 10 GB frequently. Obviously, it requires monitoring to avoid reaching the maximum disk allocation reserved for data files.

Walter Carlin – MBA, MCSE, MCSA, MCT, MCTS, MCPS, MBSS, MCITP, MS

Senior Customer Engineer – Dynamics 365 – Microsoft – Brazil

6661.microsoft Dynamics 365 CE On Premises: Data Files – CRM database autogrowth

Let’s block ads! (Why?)

Dynamics 365 Customer Engagement in the Field

Read More

4 Data Modeling Techniques to Drive Business Impact

January 5, 2021   BI News and Info
pexels pixabay 373543 min 4 Data Modeling Techniques to Drive Business Impact

What Is a Data Model?

Simply put, a data model is a way of organizing disparate data elements while keeping track of how those elements relate to each other. A data model consists of entities—high-level buckets like products or customers—and attributes—details about those entities, like customer addresses or product descriptions.

Three flavors of data models

In general, at a high level, we can think of three categories of data models:

  • A conceptual model organizes the data in a way that’s independent of how it might be used or specific technologies. This provides an overview of business’s data, as well as allowing the data to be harvested and used for other purposes.
  • A logical model codifies the structure of the data in terms of features like relational tables or XML tags. Thus, unlike the conceptual model, the logical model is not technology agnostic. You can think of this as a graphical way of representing data relationships.
  • A physical model describes and organizes the data using database components. It includes things like column names and how different tables relate to one another. Essentially, it includes all of the details that the data needs to live in a database structure.

Another way to think about types of data models

In addition to the above (a more abstract way of thinking about the details that data includes in different kinds of models), we can also explore the models that businesses use based on what they want to accomplish.

  • A hierarchical model organizes the data into a tree-like structure. This model proposes that each child record has only one parent, while each parent can have many child records. It’s a simple, rigid structure, but it can help illuminate relationships between different data points.
  • A network model is similar in structure to a hierarchical model but, unlike hierarchical models, records can have more than one parent.
  • A relational model represents data in tables and allows users to describe the data with their own queries, while the database management system takes care of storing and retrieving data.
  • An object-oriented model is a collection of ‘objects’ that use real world terminology to describe the features of the data, just as object-oriented programming does.
  • An entity-relationship model represents entities and the relationships between those entities. You can think of it as being similar to a hierarchical or network model but describing more than just the parent-child relationship.

4 Key Data Modeling Techniques

So how do you actually use these different kinds of models in your organization? Here are the top four techniques business analysts and data modelers use to help drive impact.

  1. Entity relationship diagrams show the relationships between the entities, and, in some cases, help to illustrate how business-related concepts interact. By looking at the relationships as a diagram, database modelers can develop a deeper understanding of the system.
  2. Development of a data dictionary, which provides information about the attributes of the data model, including names, definitions, and other elements that are captured in a database, can be used to help better understand where the data is coming from, what it looks like, and how it might be used.
  3. Data mapping, which can be thought of as a specialized version of a data dictionary, is used during a data migration project or when merging two disparate existing systems. In a data mapping scenario, data attributes are mapped across different data sets in a way that different labels can easily be merged.
  4. A glossary, which acts as the source of business truth to ensure that people are on the same page when discussing various data attributes and entities of the model. In general, a data glossary is owned by the business, not the IT-focused team. And it’s a good practice to have only one glossary, even if there is more than one data dictionary.

How to Prepare for Better Data Modeling

If you’re about to embark on a data modeling mission, here are some ways to get prepared.

  • Develop a deep understanding of the business, the requirements and goals, and the planned use of the results of the data modeling mission. The first parts of this are pretty straightforward—meet regularly with business-focused colleagues and leaders to learn and understand their drivers, obstacles, and success metrics. How the data will be used is trickier, as requirements and goals may change. This means that it’s incumbent on the data modeler to design the model in a way that makes it agile and able to adapt to shifting conditions.
  • Rather than a huge spreadsheet or otherwise flat screen of data, many data modelers and the businesspeople they’re working with find that it’s often easier to visualize the data in a graphical format. This can help ensure that the data is clean, complete, and consistent.
  • Filter your data for the initial modeling to only use what you need, rather than everything that’s available. As Einstein is reputed to have said, “everything should be made as simple as possible, but not simpler.” The larger the dataset, the slower the query responses will be, and the more likely you’ll get lost in irrelevant minutiae.
  • When you’re setting up your model, make sure that each step in the process is checked prior to moving on to the next step, because once the data grows, it’s harder to correct errors.

Wrapping up

The underlying goal of building a data model is to make your organization more successful. With that in mind, it requires a deep understanding of key business objectives, as well as technical and database knowledge. Armed with that, your data modeling mission should achieve great success.

If you’d like help getting started with a new data project, sign up for a free AI Assessment—we can help you figure out what would have the most impact on your business.

Let’s block ads! (Why?)

RapidMiner

Read More
« Older posts
  • Recent Posts

    • Rickey Smiley To Host 22nd Annual Super Bowl Gospel Celebration On BET
    • Kili Technology unveils data annotation platform to improve AI, raises $7 million
    • P3 Jobs: Time to Come Home?
    • NOW, THIS IS WHAT I CALL AVANTE-GARDE!
    • Why the open banking movement is gaining momentum (VB Live)
  • Categories

  • Archives

    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited