• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Tag Archives: graph

How to automatically add the edge width of a hand-drawn graph to the respective imported morphological graph in Mathematica?

March 24, 2020   BI News and Info

I am still relatively new to Mathematica, but with the great help of this forum, I managed to transform a doodled sketch into a graph, see attached file. I have used this code:

doodle = Import["filename.jpg"];
graph =
 MorphologicalGraph@
  SkeletonTransform@
   Thinning@Closing[ColorNegate@Binarize@doodle, DiskMatrix[2]];

However, what I would like to do now is to output a value for each edge that reflects how thick or thin the edge was (on average over its length) in the original doodle sketch. Is this possible at all? If so, how do I do it?HU75V How to automatically add the edge width of a hand drawn graph to the respective imported morphological graph in Mathematica?

Let’s block ads! (Why?)

Recent Questions – Mathematica Stack Exchange

Read More

How to find graph data for a path graph

February 14, 2020   BI News and Info
 How to find graph data for a path graph

I have recently updated to Mma 12 (from Mma 9).
I input: GraphData[{“Path”, 5}] and I get nonsense. Is this a bug? Is there a work around?

Let’s block ads! (Why?)

Recent Questions – Mathematica Stack Exchange

Read More

TigerGraph raises $32 million for graph database tools

September 25, 2019   Big Data
 TigerGraph raises $32 million for graph database tools

TigerGraph, a Redwood City, California-based software developer providing a suite of enterprise graph database tools, today announced that that it’s secured $ 32 million in series B funding led by private equity firm SIG. The infusion of capital comes after a $ 31 million series A in September 2017 and nearly doubles the startup’s haul to $ 60 million, as it continues to attract marquee clients like Zillow, Intuit, Amgen, Wish, Kickdynamic, and China Mobile.

“Today’s vast amount of data, together with increasingly powerful processing capabilities enabled by the cloud, means it is now possible to ask complex questions across complex data, which is not always practical or even possible at scale using SQL queries,” said CEO and founder Yu Xu, a former IBM, Teradata, and Twitter engineer and systems architect who founded TigerGraph in 2011. “The funding will fuel a new wave of growth and expansion for TigerGraph to make deep link analysis accessible to virtually every organization in the world and help users unleash the power of interconnected data.”

Graph databases and graph-oriented databases leverage graph structures for semantic queries, with nodes, edges, and properties that store and represent data. They’re a type of non-relational technology that depicts the relationships connecting various entities (like two people in a social network, for instance) and that can analyze interconnected data.

TigerGraph says its cloud-hosted and pay-as-you-go service — which is now generally available — simplifies graph management and configuration organization-wide, even for departments lacking the technical prowess to produce graph databases from scratch. To this end, its three-step graph-generating tool ostensibly gets apps up and running within minutes to hours. Plus, TigerGraph delivers a dozen starter kits addressing use cases like fraud detection, personalized real-time recommendation, computation, explainable AI, machine learning, and supply chain analysis.

TigerGraph’s eponymous TigerGraph Cloud scales up to tens of terabytes, 100 billion vertices, and 600 billion edges on the high end. It can support with a single machine more than 100,000 real-time deep link analytics queries and 50GB to 150GB of data per second. On a cluster of 20 commodity machines, it’s capable of streaming over 2 billion daily events in real time.

TigerGraph’s SQL-like graph query language enables ad-hoc data exploration and analysis, while its architecture makes use of compression to minimize memory overhead. Graphs are structured such that vertices and edges act as parallel storage and computation units, each of which can hold any amount of arbitrary information. This allows TigerGraph Cloud to run multiple engines hosting graphs with different partitioning algorithms, queries to which a front-end server can automatically route based on type.

“At Kickdynamic we know that compelling, individualized experiences are the most effective way to create customer loyalty and drive revenue,” said Kickdynamic chief product officer Gabriele Corti. “Having tried various other solutions, we found that TigerGraph offered the best combination of performance and advanced, real-time, analytical capabilities. TigerGraph’s scalable graph database will enhance our platform and enable us to continue to achieve our vision of delivering advanced personalization in email.”

Markets and Markets anticipates the graph database market will reach $ 2.4 billion by 2023 from $ 821.8 million in 2018, and analysts at Gartner expect that enterprise graph processing and graph databases will grow 100% annually through 2022. Startups like Neo4j, MongoDB, Cambridge Semantics, DataStax, and others have risen to meet the need, in addition to incumbents like Microsoft and Oracle. Even Amazon threw its hat in the ring in November 2017 with the launch of Neptune, a fully managed graph database powered by its Amazon Web Services division.

Sign up for Funding Daily: Get the latest news in your inbox every weekday.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

SQL Server 2019 Graph Database and SHORTEST_PATH

September 18, 2019   BI News and Info

My crystal ball seems to be working again: The new addition to SQL Server 2019, shortest_path, was the subject of many of the technical sessions I delivered as one of the missing features of SQL Server Graph Database.

I will use an example that is similar to the first article I wrote on this topic. You can create the database and tables needed for this article using this script. Maybe you would also like to read the previous blog post about some recent improvements in Graph Databases.

Since the data is essentially a graph, a solution to visually show the results is helpful. An application like Gephi to can be used to view the graph. (You can download this application here.) To use Gephi with SQL Server, make sure that the TCP/IP protocol and mixed authentication are enabled. The app only supports SQL authentication, so you will need a SQL Server login to connect.

Using Gephi

Follow these steps to view the graph in Gephi.

  • After opening Gephi, select New Project on the Welcome screen
  • On the File menu, select Import Database->Edge List. This will open the Database settings screen.

a screenshot of a cell phone description automati SQL Server 2019 Graph Database and SHORTEST PATH

  • In the Configuration Name field, create a name for the configuration, such as SQLServer2019.
  • In the Driver field, select SQL Server.
  • In the Host field, insert the machine/instance name of your SQL Server
  • In the Port field, enter the port used, typically 1433. If you’re not sure, look for the port number in the SQL Server Configuration Manager program.
  • In the Database field, insert the name of the database which contains the graph tables which is probably GraphDemo if you have been following along
  • In the Username field, enter the username of a SQL Server login with SELECT permission to the tables in the GraphDemo database.
  • In the Password field, insert the password for the SQL login.

In addition to the configuration required to connect to SQL Server, the Database settings screen also requires two queries: One to retrieve the list of nodes from the server and another to retrieve the list of edges from the server. I’ll explain those queries next.

The screen itself contains information at the top of the dialog about the columns the queries need to return, and it’s important to notice these columns are case sensitive.

The nodes table in SQL Server has the pseudo-column $ node_id, which is a JSON column. A good option is to extract the id from the JSON and use the MemberName as the label for the node. The query will look like this:

SELECT <strong>Json_value</strong>($ node_id, ‘$ .id’) AS id, 

       membername                   AS label 

FROM   forummembers 

The edge query has also to extract the id from  $ to_node and $ from_node, but besides that, you need to filter the nodes returned, because the nodes table from the previous article has a relation between two members and between members and messages. For this example, return only the relation between two members. Here is the query:

SELECT <strong>Json_value</strong>($ from_id, ‘$ .id’) AS source, 

       <strong>Json_value</strong>($ to_id, ‘$ .id’)   AS target 

FROM   likes 

WHERE <strong>Json_value</strong>($ from_id, ‘$ .table’) = ‘ForumMembers’ 

      AND <strong>Json_value</strong>($ to_id, ‘$ .table’) = ‘ForumMembers’ 

It’s important to note that the source and target id’s in the edge query need to match with the ids in the node query. There are some additional columns that could be used, but for this example, you’ll only need these.

The final configuration will look similar to the following image:

word image 9 SQL Server 2019 Graph Database and SHORTEST PATH

After you click OK, the next window, Import Report, involves many details but they will not be covered in this article. What’s important here is the number of nodes and edges found, confirming that the queries are correct. However, there is only a single step to be done on this screen:

Select the option Append to existing workspace.

a screenshot of a cell phone description automati 1 SQL Server 2019 Graph Database and SHORTEST PATH

After this first screen, you will see the Workspace1 and Preview tabs and Preview Settings window. It’s time to build the graph.

a screenshot of a social media post description a SQL Server 2019 Graph Database and SHORTEST PATH

Click on the Refresh button inside the Preview Settings pane. This will result in an image similar to the one below. Note that yours may look different, but in the next step, you can verify that the nodes are connected correctly.

a picture containing sky kite water outdoor de SQL Server 2019 Graph Database and SHORTEST PATH

Improve this graph just a bit by making these changes in the Preview Settings tab.

  • Change the Opacity property to 0 under Nodes.
  • Mark the Show Labels property under Node Labels.
  • Change the Outline opacity under Node Labels to 0

After clicking refresh, the graph will change to something like the image below:

a close up of a map description automatically gen SQL Server 2019 Graph Database and SHORTEST PATH

As you may notice, even a simple graph with a small amount of data can be quite complex to identify information such as the shortest path between two nodes in the graph. That’s why you need a tool to calculate this, and SQL Server 2019 can make this kind of calculation.

Calculating the Path

When you think about a function to calculate the shortest path between two points, you may think that it will be a simple function. To make this calculation work, however, you need way more than a simple function. You must establish paths among the graph data. The data has many paths, and each one has many nodes with a beginning and an end. Each path is a group of nodes, in some ways like the group by function.

One similarity, for example, is that you can’t read the column directly from the path. You need to apply aggregation functions to the set of nodes that are part of the path to read the information.

The syntax to build this query resembles the GROUP BY clause, requiring you to use aggregate functions to make calculations on every (shortest) path of nodes, getting grouped results.

To make this simpler, start to build the query over the model piece by piece. First, the From clause

FROM 

    ForumMembers P1, 

    ForumMembers FOR PATH as P2, 

    Likes FOR PATH as IPO

The FOR PATH expression in the edge and node tables indicates these tables will be used to calculate paths, a grouping, in other words. Over the columns of these particular tables you can only apply aggregation functions; you can’t retrieve the columns directly.

You may have noticed the node table appears twice in the query, one using the FOR PATH and another without using the FOR PATH. Since you can’t retrieve columns directly from the FOR PATH tables, you can include the nodes table twice to retrieve individual values of the node columns, usually for the start node of the path.

Now analyse the list of columns and expressions in the SELECT list:

SELECT 

    P1.MemberID, 

    P1.MemberName, 

    <strong>STRING_AGG</strong>(P2.MemberName,

        ‘->’) WITHIN GROUP (GRAPH PATH) AS [MemberName], 

    LAST_VALUE(P2.MemberName) WITHIN GROUP (GRAPH PATH) 

        AS FinalMemberName, 

    COUNT(P2.MemberId) WITHIN GROUP (GRAPH PATH) AS Levels

The first two columns come from the P1 table, which is not marked as FOR PATH. The MemberID and MemberName columns are from the first node of the path. The WHERE clause will also expose this.

The other three expressions use aggregate functions. They use the following functions:

Count: is a well know aggregate function

STRING_AGG: was introduced in a recent version of SQL Server and can be used to concatenate string values

LAST_VALUE: is a windowing function, but it can be used with any kind of aggregation, including a graph path.

Besides each aggregation function, you have the WITHIN GROUP (GRAPH PATH) statement, a special statement created for the grouping generated by the shortest_path function.

Finally, here’s the WHERE clause:

WHERE MATCH(SHORTEST_PATH(P1(-(IPO)->P2)+))

It’s like a regular MATCH clause that you already know about, but using the new function, SHORTEST_PATH. The first table, which is not part of the grouping, is related to the edge and node tables, which is part of the grouping. The edge and second node tables appear between parenthesis. The SHORTEST_PATH function understands this as an instruction to use recursion, creating the groups.

The ‘+‘ symbol indicates that you would like information about the entire path between each member of P1 and P2, without limiting the number of hops

Here’s the final query:

1

2

3

4

5

6

7

8

9

10

11

12

13

SELECT 

    P1.MemberID, 

    P1.MemberName, 

    <strong>STRING_AGG</strong>(P2.MemberName,

       ‘->’) WITHIN GROUP (GRAPH PATH) AS [MemberName], 

    LAST_VALUE(P2.MemberName) WITHIN GROUP (GRAPH PATH) 

       AS FinalMemberName, 

    COUNT(P2.MemberId) WITHIN GROUP (GRAPH PATH) AS Levels 

  FROM 

    ForumMembers P1, 

    ForumMembers FOR PATH as P2, 

    Likes FOR PATH as IPO 

  WHERE MATCH(SHORTEST_PATH(P1(-(IPO)->P2)+));

The image below gives an idea of the result. You have the information about the first member of the path, you have the entire path (not including the first member) created by the function STRING_AGG, you have the last name of the path created by the LAST_VALUE function, and you also have the number of levels in the path, created by the function COUNT

word image 10 SQL Server 2019 Graph Database and SHORTEST PATH

You may be wondering about how Carl connects to Carl. This image shows the path:

word image 11 SQL Server 2019 Graph Database and SHORTEST PATH

Performance

The execution plan is big. There is no doubt that you will need to take care when using it in large environments. The full execution plan doesn’t fit here, but you may notice by the piece below that the plan makes extensive use of tempdb.

You may notice the following details in the execution plan:

  • It starts with the edge table
  • It creates three temporary tables from the edge table. You will see many table operations in the plan, but checking the names, you will notice there are only three

a screenshot of a computer description automatica SQL Server 2019 Graph Database and SHORTEST PATH

  • It joins one of the temporary tables with the edge table, the nodes table and a second temporary table, aggregating the result and inserting in the temporary tables
  • It has a Sequence operator, which then is joined with the node table to get the results

a screenshot of a computer description automatica 1 SQL Server 2019 Graph Database and SHORTEST PATH

Filtering

By adding one more predicate, you can view just the paths starting from one forum member:

1

2

3

4

5

6

7

8

9

10

11

12

13

SELECT

   P1.MemberID,

   P1.MemberName,

   STRING_AGG(P2.MemberName,‘->’) WITHIN GROUP

       (GRAPH PATH) AS [MemberName],

   LAST_VALUE(P2.MemberName) WITHIN GROUP (GRAPH PATH)

        AS FinalMemberName,

   COUNT(P2.MemberId) WITHIN GROUP (GRAPH PATH) AS Levels

FROM ForumMembers P1,

ForumMembers FOR PATH as P2,

Likes FOR PATH as IPO

WHERE MATCH(SHORTEST_PATH(P1(-(IPO)->P2)+))

  AND p1.MemberID=7;

a screenshot of a social media post description a 1 SQL Server 2019 Graph Database and SHORTEST PATH

The execution plan changes. You may notice the following:

  • The sequence is still the middle of the execution plan
  • The plan starts by the node, not the edge
  • After the sequence, the Query Optimizer is able to use an index for one of the nodes

a screenshot of a video game description automati SQL Server 2019 Graph Database and SHORTEST PATH

  • There is a fourth temporary table, called Source
  • The number of paths before the sequence increases

a screenshot of a cell phone description automati 2 SQL Server 2019 Graph Database and SHORTEST PATH

If you would like to see only the path between Jonh and Steve, you may be surprised that you have to separate the query into a CTE or subquery and filter the results. One reason is that a windowing function cannot be directly filtered. The MATCH statement doesn’t offer a solution either, you can’t filter by a P2 field, for example, because it’s marked as FOR PATH.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

WITH qry AS (

  SELECT

   P1.MemberID,

   P1.MemberName,

   STRING_AGG(P2.MemberName,‘->’) WITHIN GROUP (GRAPH PATH)

     AS [Path],

   LAST_VALUE(P2.MemberName) WITHIN GROUP (GRAPH PATH)

     AS FinalMemberName,

   COUNT(P2.MemberId) WITHIN GROUP (GRAPH PATH) AS Levels

  FROM

   ForumMembers P1,

   ForumMembers FOR PATH as P2,

   Likes FOR PATH as IPO

  WHERE MATCH(SHORTEST_PATH(P1(-(IPO)->P2)+))

   and p1.MemberID=7)

SELECT * FROM qry

WHERE FinalMemberName=‘Steve’;

It’s not difficult to predict: The query plan is bad. You are calculating the path from Jonh to all other members and only after this does the filter kick in to get the path between Jonh and Steve. In the image below you may notice the filter for Steve only after the sequence, after calculating all the paths from Jonh

a screenshot of a cell phone description automati 3 SQL Server 2019 Graph Database and SHORTEST PATH

You may have already noticed the + symbol in the Match clause. It means you allow an unlimited number of hops. However, you can use a slightly different syntax to find all the forum members that are, for example, up to two hops away from others:

1

2

3

4

5

6

7

8

9

10

11

12

13

SELECT

   P1.MemberID,

   P1.MemberName,

   STRING_AGG(P2.MemberName,‘->’) WITHIN GROUP (GRAPH PATH)

      AS [MemberName],

   LAST_VALUE(P2.MemberName) WITHIN GROUP (GRAPH PATH)

      AS FinalMemberName,

   COUNT(P2.MemberId) WITHIN GROUP (GRAPH PATH) AS Levels

FROM

   ForumMembers P1,

   ForumMembers FOR PATH as P2,

   Likes FOR PATH as IPO

WHERE MATCH(SHORTEST_PATH(P1(-(IPO)->P2)<strong>{</strong>1,2<strong>}</strong>));

However, the start needs always to be 1. If you want to see the people that are an exact number of hops from other, once again you will need to filter the result of the query:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

WITH qry as

(

  SELECT

   P1.MemberID,

   P1.MemberName,

   STRING_AGG(P2.MemberName,‘->’) WITHIN GROUP (GRAPH PATH) AS [path],

   LAST_VALUE(P2.MemberName) WITHIN GROUP (GRAPH PATH) AS FinalMemberName,

   COUNT(P2.MemberId) WITHIN GROUP (GRAPH PATH) AS Levels

  FROM

   ForumMembers P1,

   ForumMembers FOR PATH as P2,

   Likes FOR PATH as IPO

  WHERE MATCH(SHORTEST_PATH(P1(-(IPO)->P2){1,2}))

)

SELECT * FROM qry WHERE levels=2;

The shortest_path function is a great new feature for the SQL Server graph database, but being unable to filter the end node or the exact number of hops without performing the entire calculation and only then filter the result is still a problem for query performance.

Indexing

You can create indexes on the pseudo-columns of the edge and nodes. Considering the execution plans you saw, you can create the following clustered index for the edge:

CREATE CLUSTERED INDEX indLikes ON likes($ from_id,$ to_id);

For the nodes, on the other hand, the index didn’t help so much. If you create a clustered index, it will result in a scan operation. If you create a non-clustered index, you would need to include all the columns involved in the operation. It may sound a bit strange, but one of the columns is the graph_id. There is no pseudo-column for the graph_id in the nodes table. There is one function to retrieve the graph_id, GRAPH_ID_FROM_NODE_ID, but you can’t use this function to create a computed column. This instruction bellow, for example, will fail:

ALTER TABLE forumMembers ADD graph_id

AS (GRAPH_ID_FROM_NODE_ID($ node_id))

PERSISTED;

On the other hand, for the queries where you are filtering for the start member, a non-clustered index on the member id helps:

CREATE NONCLUSTERED INDEX indMembers ON ForumMembers(memberid);

There is not much news about the more complex situations filtering by levels or end node: It’s an additional filter after all the calculations of shortest_path.

Conclusion

The function shortest_path is an excellent addition to the graph database features; however, the two limitations it has may create heavy queries in the environment:

  • You can’t specify an end node.
  • You can’t specify a start number of hops which would allow selecting an exact number of hops with same start and end.

Let’s block ads! (Why?)

SQL – Simple Talk

Read More

How can i plot graph without getting long result answer of inexact coffecient?

February 5, 2019   BI News and Info
 How can i plot graph without getting long result answer of inexact coffecient?

wn = 100;
aap = 0.2;
ome = 9;
Solve[{(2 x + 2) ((dco – wnaa px)^2 + 1) + 4*ome*x == 0}, {x}]
Plot[{x /. %}, {dco, -40, 0}]

i am using this code and getting the right graph. but the Problem is that i do not want the result in this form as it is showing like this-
{{x -> 0.0333333 (-10. + dco) + ((0.0000208333 –
0.0000360844 I) (-137200. – 16000. dco –
400. dco^2))/(-6560. – 1371. dco – 60. dco^2 – 1. dco^3 +
5.19615 Sqrt[
99259. + 143320. dco + 24722. dco^2 + 1160. dco^3 +
19. dco^4])^(
1/3) – (0.00833333 + 0.0144338 I) (-6560. – 1371. dco –
60. dco^2 – 1. dco^3 +
5.19615 Sqrt[
99259. + 143320. dco + 24722. dco^2 + 1160. dco^3 +
19. dco^4])^(1/3)}, {x ->
0.0333333 (-10. + dco) + ((0.0000208333 +
0.0000360844 I) (-137200. – 16000. dco –
400. dco^2))/(-6560. – 1371. dco – 60. dco^2 – 1. dco^3 +
5.19615 Sqrt[
99259. + 143320. dco + 24722. dco^2 + 1160. dco^3 +
19. dco^4])^(
1/3) – (0.00833333 – 0.0144338 I) (-6560. – 1371. dco –
60. dco^2 – 1. dco^3 +
5.19615 Sqrt[
99259. + 143320. dco + 24722. dco^2 + 1160. dco^3 +
19. dco^4])^(1/3)}, {x ->
0.0333333 (-10. + dco) – (
0.0000416667 (-137200. – 16000. dco – 400. dco^2))/(-6560. –
1371. dco – 60. dco^2 – 1. dco^3 +
5.19615 Sqrt[
99259. + 143320. dco + 24722. dco^2 + 1160. dco^3 +
19. dco^4])^(1/3) +
0.0166667 (-6560. – 1371. dco – 60. dco^2 – 1. dco^3 +
5.19615 Sqrt[
99259. + 143320. dco + 24722. dco^2 + 1160. dco^3 +
19. dco^4])^(1/3)}}
i want my ans in interpolation form.
i think you can help me.

Let’s block ads! (Why?)

Recent Questions – Mathematica Stack Exchange

Read More

SQL Server Graph Databases – Part 5: Importing Relational Data into a Graph Database

May 3, 2018   BI News and Info

The series so far:

  1. SQL Server Graph Databases – Part 1: Introduction
  2. SQL Server Graph Databases – Part 2: Querying Data in a Graph Database
  3. SQL Server Graph Databases – Part 3: Modifying Data in a Graph Database
  4. SQL Server Graph Databases – Part 4: Working with hierarchical data in a graph database
  5. SQL Server Graph Databases – Part 5: Importing Relational Data into a Graph Database

With the release of SQL Server 2017, Microsoft introduced graph database features to support data sets that contain complex relationships between entities. The graph capabilities are integrated into the database engine and require no special configurations or installations. You can use these features in addition to or independently of the traditional relational structures. For example, you might implement a graph database for a new master data management solution that could benefit from both graph and relational tables.

When creating a graph database, you might be working with new data, existing data, or a combination of both. In some cases, the data might already exist in relational tables, which do not support the graph features. Only node and edge tables in a graph database allow you to use the new capabilities, in which case, you must either copy the data over to the graph tables or forget about using the graph features altogether.

For those interested in the first option, this article demonstrates how to move from a relational structure to a graph structure, using data from the AdventureWorks2017 sample database. The database might not represent the type of data you had in mind, but it provides a handy way to illustrate how to migrate to a graph structure, using a relational schema already familiar to many of you. Such a recognizable structure also helps demonstrate various ways to query the data once it’s in the graph tables.

Moving from Relational Tables to Graph Tables

The AdventureWorks2017 database includes transactional data related to the fictitious company Adventure Works, which sells bicycles and related equipment to retail outlets and online customers. For this article, we’ll focus on the retail outlets that ordered the products, the sales reps who sold the products, and the vendors who supplied the products, along with such details as the number of items ordered and the amount paid for those items.

To retrieve this type of data from the AdventureWorks2017 database as it exists in its current state, you would be accessing different combinations of the tables shown in the following figure.

word image 236 SQL Server Graph Databases – Part 5: Importing Relational Data into a Graph Database

For those who’ve been around SQL Server documentation for a while, such tables as SalesOrderHeader, SalesOrderDetail, Product, and Person should be quite familiar because they’re included in countless examples that demonstrate various ways to work with relational data. However, suppose that you now want to pull some of this information into a graph database, in which case, the data model might look more like the one shown in the next figure.

word image 237 SQL Server Graph Databases – Part 5: Importing Relational Data into a Graph Database

The data model includes only four nodes (Stores, SalesReps, Vendors, and Products) and only three edges (Purchases, Sells, and Supplies). Together these nodes and edges define the following relationships:

  • Stores purchase products
  • Sales reps sell products
  • Vendors supply products

You’ll define these relationships within the edge tables by mapping the originating node to the terminating node for each relationship, as you saw in the first article in this series. The implication here is that you should first populate the node tables before the edge tables so you can reference the originating and terminating node IDs when defining your relationships.

Creating and Populating the Node Tables

Before you can create and populate your node tables, you must determine where to put the tables. For the examples in this article, I created the graph schema within the AdventureWorks2017 database, using the following T-SQL code:

  USEAdventureWorks2017;

  GO

  CREATESCHEMAgraph;  

  GO

You do not need to locate the graph tables in the graph schema or even in the AdventureWorks2017 database. However, if you plan to try out the examples to follow and want to locate the graph tables elsewhere, be sure to update the T-SQL code accordingly.

With the graph schema in place (or wherever you locate the tables), you can then create and populate the Stores node table, which includes two user-defined columns, StoreID and StoreName, as shown in the following CREATETABLE statement:

1

2

3

4

5

6

7

8

9

10

  DROPTABLEIFEXISTSgraph.Stores;

  GO

  CREATETABLEgraph.Stores (

    StoreIDINTPRIMARYKEY,

    StoreNameNVARCHAR(50)NOTNULL

  )ASNODE;

  INSERTINTOgraph.Stores(StoreID,StoreName)

  SELECTc.CustomerID,s.Name

  FROMSales.CustomercINNERJOINSales.Stores

    ONc.StoreID=s.BusinessEntityID;

The example follows the same procedures used in the first article to create and populate node tables, so be sure to refer back to the article if you’re unsure about what we’re doing here. Keep in mind that you must include the ASNODE clause in the CREATETABLE statement. You can also add whatever other user-defined columns you want to include. SQL Server will automatically generate the table’s $ node_id column.

You can then use an INSERT…SELECT statement to populate the Stores table, as you would with any SQL Server table. In this case, you must join the Sales.Customer table to the Sales.Store table to get the store name. In addition, when supplying values for the StoreID column in the Stores table, you should use the CustomerID value in the Customer table, rather than use the StoreID value in that table, because the SalesOrderHeader table uses the CustomerID value. This approach helps to keep things simpler when populating the edge tables. SQL Server automatically populates the $ node_id column.

That’s all there is to setting up the Stores table, and creating and populating the SalesReps table is even easier:

1

2

3

4

5

6

7

8

9

10

11

  DROPTABLEIFEXISTSgraph.SalesReps;

  GO

  CREATETABLEgraph.SalesReps (

    SalesRepIDINTPRIMARYKEY,

    FirstNameNVARCHAR(50)NOTNULL,

    LastNameNVARCHAR(50)NOTNULL,

  )ASNODE;

  INSERTINTOgraph.SalesReps(SalesRepID,FirstName,LastName)

  SELECTBusinessEntityID,FirstName,LastName

  FROMPerson.Person

  WHEREPersonType=‘SP’;

For this example, you can pull all the data directly from the Person table, limiting the results to those rows with a PersonType value of SP (for salesperson). If you want to include such information as sales quotas or job titles in the table, you must join the Person table to the SalesPerson or Employee table (or both). For this example, however, the Person table is enough.

The next table to create and populate is Products. For this, you can pull all the data from the Production.Product table:

1

2

3

4

5

6

7

8

9

10

11

  DROPTABLEIFEXISTSgraph.Products;

  GO

  CREATETABLEgraph.Products (

    ProductIDINTPRIMARYKEY,

    ProductNameNVARCHAR(50)NOTNULL,

    StandardCostMONEYNOTNULL,

  )ASNODE;

  INSERTINTOgraph.Products(ProductID,ProductName,StandardCost)

  SELECTProductID,Name,StandardCost

  FROMProduction.Product

  WHEREFinishedGoodsFlag=1;

For this example, when retrieving data from the Product table, you should include a WHERE clause that filters the data so that only rows with a FinishedGoodsFlag value of 1 are included. This ensures that you include only salable products in the Products table.

The final node table is Vendors, which gets all its data from the Purchasing.Vendor table:

1

2

3

4

5

6

7

8

9

10

  DROPTABLEIFEXISTSgraph.Vendors;

  GO

  CREATETABLEgraph.Vendors (

    VendorIDINTPRIMARYKEY,

    AccountNumberNVARCHAR(15)NOTNULL,

    VendorNameNVARCHAR(50)NOTNULL

  )ASNODE;

  INSERTINTOgraph.Vendors(VendorID,AccountNumber,VendorName)

  SELECTBusinessEntityID,AccountNumber,Name

  FROMPurchasing.Vendor;

That’s all there is to creating and populating the node tables. Once they’re in place, you can start in on your edge tables.

Creating and Populating the Edge Tables

Creating an edge table is just as simple as a node table, with a few notable differences. For the edge table, the table definition requires an ASEDGE clause, rather than an ASNODE clause, and the user-defined columns are optional. (Node tables require at least one user-defined column.) In addition, SQL Server automatically generates the $ edge_id column, rather than the $ node_id column.

The first edge table is Orders, which includes three user-defined columns, as shown in the following CREATETABLE statement:

  DROPTABLEIFEXISTSgraph.Orders;

  GO

  CREATETABLEgraph.Orders (

    OrderDateDATETIMENOTNULL,

    OrderQtySMALLINTNOTNULL,

    LineTotalMONEYNOTNULL

  )ASEDGE;

After you create the Orders table, you can add the data, which relies on the SalesOrderHeader and SalesOrderDetail tables to supply the values for the user-defined columns and, more importantly, to provide the structure for defining the relationships between the Stores and Products nodes:

  INSERTINTOgraph.Orders($ from_id,$ to_id,OrderDate,OrderQty,LineTotal)

  SELECTs.node1,p.node2,h.OrderDate,d.OrderQty,d.LineTotal

  FROMSales.SalesOrderHeaderhINNERJOINSales.SalesOrderDetaild

    ONh.SalesOrderID=d.SalesOrderID

    INNERJOIN(SELECT$ node_idASnode1,StoreIDFROMgraph.Stores)s

    ONh.CustomerID=s.StoreID

    INNERJOIN(SELECT$ node_idASnode2,ProductIDFROMgraph.Products)p

    ONd.ProductID=p.ProductID;

After joining the SalesOrderHeader and SalesOrderDetail tables, the SELECT statement joins the SalesOrderHeader table to the Stores tables, based on the CustomerID and StoreID values. The join uses a subquery to retrieve only the $ node_id and StoreID columns from the Stores table and to rename the $ node_id column to node1. The query will fail if you try to use $ node_id in the SELECT list. You can then join the SalesOrderHeader table to the Products table, using the same logic as when joining to the Stores table.

The node1 and node2 columns returned by the SELECT statement provide the values for the $ from_id and $ to_id columns in the edge table. As you’ll recall from the first article, you must specifically provide these values when inserting data into an edge table. The values are essential to defining the relationships between the originating and terminating nodes. SQL Server automatically populates the $ edge_id column.

The next step is to create and populate the Sells edge table, which works much the same way as the Orders table, even when it comes to the user-defined columns. The main difference is that the relationships originate with the SalesReps table, as shown in the following T-SQL code:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

  DROPTABLEIFEXISTSgraph.Sells;

  GO

  CREATETABLEgraph.Sells (

    OrderDateDATETIMENOTNULL,

    OrderQtySMALLINTNOTNULL,

    LineTotalMONEYNOTNULL

  )ASEDGE;

  INSERTINTOgraph.Sells($ from_id,$ to_id,OrderDate,OrderQty,LineTotal)

  SELECTs.node1,p.node2,h.OrderDate,d.OrderQty,d.LineTotal

  FROMSales.SalesOrderHeaderhINNERJOINSales.SalesOrderDetaild

    ONh.SalesOrderID=d.SalesOrderID

    INNERJOIN(SELECT$ node_idASnode1,SalesRepIDFROMgraph.SalesReps)s

    ONh.SalesPersonID=s.SalesRepID

    INNERJOIN(SELECT$ node_idASnode2,ProductIDFROMgraph.Products)p

    ONd.ProductID=p.ProductID;

The fact that the Orders and Sells tables include the same user-defined columns points to the possibility that you could create a fifth node table for sales orders and then include columns such as OrderDate in there. However, this approach could make your schema and queries unnecessarily complicated, while providing little benefit. On the other hand, this approach helps to eliminate duplicate data. As with any database, the exact layout of your graph model will depend on the type of data you’re storing and how you plan to query that data.

The last step is to create and populate the Supplies table. In this case, the structure for the relationships is available through the ProductVendor table:

1

2

3

4

5

6

7

8

9

10

11

12

  DROPTABLEIFEXISTSgraph.Supplies;

  GO

  CREATETABLEgraph.Supplies (

    StandardPriceMONEYNOTNULL

  )ASEDGE;

  INSERTINTOgraph.Supplies($ from_id,$ to_id,StandardPrice)

  SELECTv.node1,p.node2,pv.StandardPrice

  FROMPurchasing.ProductVendorpv

    INNERJOIN(SELECT$ node_idASnode1,VendorIDFROMgraph.Vendors)v

    ONpv.BusinessEntityID=v.VendorID

    INNERJOIN(SELECT$ node_idASnode2,ProductIDFROMgraph.Products)p

    ONpv.ProductID=p.ProductID;

The ProductVendor table does all the product-vendor mapping for you and includes the StandardPrice values. You need only join this table to the Vendors and Products tables to get the originating and terminating node IDs.

Retrieving Store Sales Data

With the graph tables now defined and populated, you’re ready to start querying them, just like you saw in the second article in this series. For example, you can use the following SELECT statement to return information about the products that each store has ordered:

  SELECTs.StoreName,p.ProductName,o.OrderQty,o.LineTotal,

    CAST(o.OrderDateASDATE)ASOrderDate

  FROMgraph.Storess,graph.Orderso,graph.Productsp

  WHEREMATCH(s-(o)->p)

  ORDERBYs.StoreID;

The SELECT statement uses the MATCH function to specify what data to retrieve. As described in the second article, the function lets you define a search pattern based on the relationships between nodes. You can use the function only in the WHERE clause of a query that targets node and edge tables. The following table shows part of the results that the SELECT statement returns. (The statement returns over 60,000 rows.)

word image 238 SQL Server Graph Databases – Part 5: Importing Relational Data into a Graph Database

In the above example, the MATCH clause specifies the relationship store orders product. If you were to retrieve the same data directly from the relational tables, you could not use the MATCH clause. Instead, your query would look similar to the following:

  SELECTc.CustomerID,c.StoreID,st.NameStoreName,p.nameProductName,

    d.OrderQty,d.LineTotal,s.OrderDate

  FROMsales.SalesOrderHeadersINNERJOINsales.SalesOrderDetaild

    ONs.SalesOrderID=d.SalesOrderID

    INNERJOINsales.customercONs.CustomerID=c.CustomerID

    INNERJOINsales.storestONc.storeid=st.BusinessEntityID

    INNERJOINproduction.productpONd.ProductID=p.ProductID

  WHEREp.FinishedGoodsFlag=1

  ORDERBYst.BusinessEntityID;

Although this query is more complex than the previous one, you can use it without having to create and populate graph tables. As with any data, you’ll have to determine on a case-by-case basis when a graph database will be useful to your circumstances and which structure will deliver the best-performing queries.

Returning now to the graph tables, you can modify the preceding example by grouping the data based on the stores and products, as shown in the following example:

  SELECTs.StoreName,p.ProductName,

    SUM(o.OrderQty)ASTotalItems,SUM(o.LineTotal)ASTotalAmount

  FROMgraph.Storess,graph.Orderso,graph.Productsp

  WHEREMATCH(s-(o)->p)  

  GROUPBYs.StoreName,p.ProductName

  HAVINGSUM(o.OrderQty)>100

  ORDERBYs.StoreName;

As you can see, you can use the MATCH function in conjunction with other clauses, including the HAVING clause, which in this case, limits the results to rows with a total quantity greater than 100. The following figure shows the data now returned by the SELECT statement.

word image 239 SQL Server Graph Databases – Part 5: Importing Relational Data into a Graph Database

When implementing a graph database based on existing relational data, you might want to copy only part of the data set into the graph tables, in which case, you’ll likely need to create queries that can retrieve data from both the graph and relational tables. One way to achieve this is to define a common table expression (CTE) that retrieves the graph data and then use the CTE when retrieving the relational data, as shown in the following example:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

  WITHStoreOrdersAS

  (

    SELECTs.StoreName,p.ProductID,p.ProductName,

      SUM(o.OrderQty)ASTotalItems,SUM(o.LineTotal)ASTotalAmount

    FROMgraph.Storess,graph.Orderso,graph.Productsp

    WHEREMATCH(s-(o)->p)  

    GROUPBYs.StoreName,p.ProductID,p.ProductName

    HAVINGSUM(o.OrderQty)>100

  )

  SELECTso.StoreName,so.ProductName,ps.NameASProductSubcategory,

    pc.NameASProductCategory,so.TotalItems,so.TotalAmount

  FROMStoreOrderssoINNERJOINProduction.Productpr

    ONso.ProductID=pr.ProductID

    INNERJOINProduction.ProductSubcategoryps

    ONpr.ProductSubcategoryID=ps.ProductSubcategoryID

    INNERJOINProduction.ProductCategorypc

    ONps.ProductCategoryID=pc.ProductCategoryID

  ORDERBYso.StoreName;

In this case, the outer SELECT statement joins the data from the CTE to the Product, ProductSubcategory, and ProductCategory tables in order to include the product categories and subcategories in the results, as shown in the following figure.

word image 240 SQL Server Graph Databases – Part 5: Importing Relational Data into a Graph Database

Being able to access both graph and relational data makes it possible to implement a graph database for those complex relationships that can justify the additional work, while still retaining the basic relational structure for all other data.

Retrieving Sales Rep and Vendor Data

Of course, once you have your graph tables in place, you can run a query against any of them. For example, the following query returns a list of sales reps and the products they have sold, along with details about the orders:

  SELECTr.LastName+‘, ‘+r.FirstNameASFullName,p.ProductName,

    CAST(s.OrderDateASDATE)ASOrderDate,s.OrderQty,s.LineTotal

  FROMgraph.SalesRepsr,graph.Sellss,graph.Productsp

  WHEREMATCH(r-(s)->p)

  ORDERBYr.LastName,r.FirstName;

As you can see, retrieving information about the Sells relationships works just like returning data about the Orders relationships, but now the results are specific to each sales rep, as shown in the following figure.

word image 241 SQL Server Graph Databases – Part 5: Importing Relational Data into a Graph Database

The results shown here are only a small portion of the returned data. The statement actually returns over 60,000 rows. However, you can aggregate the data just as you saw earlier:

  SELECTr.LastName+‘, ‘+r.FirstNameASFullName,YEAR(s.OrderDate)ASOrderYear,

    SUM(s.OrderQty)ASTotalItems,CAST(SUM(s.LineTotal)ASINT)ASTotalSales

  FROMgraph.SalesRepsr,graph.Sellss,graph.Productsp

  WHEREMATCH(r-(s)->p)

  GROUPBYr.LastName,r.FirstName,YEAR(s.OrderDate)

  ORDERBYr.LastName,r.FirstName,YEAR(s.OrderDate);

Now the SELECT statement returns only 58 rows, with the first 10 shown below.

word image 242 SQL Server Graph Databases – Part 5: Importing Relational Data into a Graph Database

There’s little difference between returning data based on the Orders relationships or the Sells relationships, except that the originating nodes are different. You can also take the same approach to retrieve vendor data. Just be sure to update the table alias references as necessary, as shown in the following example:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

  WITHVendorProductsAS

  (

    SELECTv.VendorName,p.ProductID,p.ProductName,

      AVG(s.StandardPrice)AvgPrice

    FROMgraph.Vendorsv,graph.Suppliess,graph.Productsp

    WHEREMATCH(v-(s)->p)

    GROUPBYv.VendorName,p.ProductID,p.ProductName

  )

  SELECTvp.VendorName,vp.ProductName,ps.NameASProductSubcategory,

     pc.NameASProductCategory

  FROMVendorProductsvpINNERJOINProduction.Productpr

    ONvp.ProductID=pr.ProductID

    INNERJOINProduction.ProductSubcategoryps

    ONpr.ProductSubcategoryID=ps.ProductSubcategoryID

    INNERJOINProduction.ProductCategorypc

    ONps.ProductCategoryID=pc.ProductCategoryID

  WHEREpc.Name=‘Components’

  ORDERBYvp.VendorName,vp.ProductName;

This should all look familiar to you. The SELECT statement uses a CTE to join the graph and relational data together. The following table shows the first 10 rows of the 32 that the statement returns.

word image 243 SQL Server Graph Databases – Part 5: Importing Relational Data into a Graph Database

As you can see, the results include the vendor and product names, along with the product subcategories and categories.

Digging into the Graph Data

Once you get the basics down of how to query your graph tables, you can come up with other ways to understand the relationships between the nodes. For example, the following SELECT statement attempts to identity sales reps who might be focusing too heavily on certain vendors:

  SELECTr.LastName+‘, ‘+r.FirstNameASFullName,v.VendorName,

    CAST(SUM(sl.LineTotal)ASINT)ASTotalAmount

  FROMgraph.SalesRepsr,graph.Sellssl,graph.Productsp,

    graph.Suppliessp,graph.Vendorsv

  WHEREMATCH(r-(sl)->p<-(sp)-v)

  GROUPBYr.LastName,r.FirstName,v.VendorName

  HAVINGSUM(sl.LineTotal)>

    (SELECTAVG(LineTotal)FROMgraph.Sells)*50

  ORDERBYr.LastName,r.FirstName,v.VendorName;

The statement groups the data by the name of the sales reps and then by the vendors. The statement also includes a HAVING clause that calculates an amount 50 times the average sales and then compares that to the total sales of each sales rep. Only reps that go over the calculated amount are included in the results, as shown in the following figure.

word image 244 SQL Server Graph Databases – Part 5: Importing Relational Data into a Graph Database

By being able to return this type of information, you can identify patterns that point to anomalies or specific trends in the data set. For instance, suppose you now want to identify the products that stores have bought based on a specified product that they also bought (a scenario sometimes referred to customers who bought this also bought that).

One way to get this information is to use a CTE to retrieve the IDs of the stores that ordered the specified product and then, for each store return the list of other products that the store ordered. To achieve this, use the CTE to qualify your query so it returns only the other products that the stores bought:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

  WITHStoreIDsAS

  (

    SELECTs.StoreID

    FROMgraph.storess,graph.Orderso,graph.Productsp

    WHEREMATCH(s-(o)->p)

      ANDp.ProductName=‘Sport-100 Helmet, Blue’

    GROUPBYs.StoreID,s.StoreName

    HAVINGSUM(o.OrderQty)>75

  )

  SELECTs.StoreName,p.ProductName

  FROMgraph.storess,graph.Orderso,graph.Productsp

  WHEREMATCH(s-(o)->p)

    ANDs.StoreIdIN(SELECTStoreIDFROMStoreIDs)

    ANDp.ProductName<>‘Sport-100 Helmet, Blue’

  GROUPBYs.StoreName,p.ProductName

  HAVINGSUM(o.OrderQty)>75

  ORDERBYStoreName,ProductName;

The outer SELECT statement returns the list of products that each of the three stores has ordered. The key is to use the IN operator in a WHERE clause condition to compare the StoreId value to a list of store IDs returned by the CTE. You should also include a WHERE clause condition to exclude the product Sport-100 Helmet, Blue. The SELECT statement returns the results shown in the following figure.

word image 245 SQL Server Graph Databases – Part 5: Importing Relational Data into a Graph Database

There are other ways you can get at customers who bought this also bought that information, such as using Python or R, but this approach provides a relatively simple way to get the data from a graph database, without having to jump through too many hoops.

Making the Best of Both Worlds

Because the graph database features are integrated with the database engine, there’s no reason you can’t work with graph and relational data side-by-side, depending on your application requirements and the nature of your data. This integration also gives you the flexibility to incorporate graph tables into an existing relational structure or make them both part of the design when planning for a new application. Keep in mind, however, that the graph features are still new to SQL Server and lack some of the advanced capabilities available to more established graph products. Perhaps after a couple more releases, SQL Server will be a more a viable contender in the graph database market, at least when used in conjunction with relational data.

Let’s block ads! (Why?)

SQL – Simple Talk

Read More

SQL Server Graph Databases – Part 2: Querying Data in a Graph Database

April 8, 2018   BI News and Info

The series so far:

  1. SQL Server Graph Databases – Part 1: Introduction
  2. SQL Server Graph Databases – Part 2: Querying Data in a Graph Database

Microsoft incorporated the graph database in SQL Server 2017, providing a logical structure for storing and querying data sets that contain complex many-to-many or hierarchical relationships. The first article in this series introduced you to the basics of graph databases and described how to define node and edge tables and populate them with data. In this article, we turn to the querying side of the equation, with a focus on retrieving related data in multiple node tables.

For the examples in this article, I used the FishGraph database from the first article in this series, except that I added more sample data. The database is based on a fictitious fish-lovers forum and includes three node tables (FishSpecies, FishLover, and FishPost) and three edge tables (Likes, Posts, and LinksTo). The following figure shows the data model used to build the database.

word image 83 SQL Server Graph Databases – Part 2: Querying Data in a Graph Database

The rectangles represent the nodes, and the arrows connecting the nodes represent the edges, with the arrows pointing in the direction of the relationships. You can download the T-SQL script used to create and populate the database at the bottom of this article.

Introducing the MATCH Function

For the most part, querying tables in a graph database works much the same way as querying regular relational tables, although there are some limitations, such as not being able to perform cross-database queries on graph tables. (For more information about graph database limitations, refer to the Microsoft document SQL Graph Architecture.)

Despite the limitations, you should find that most queries work as expected. For example, the following SELECT statement joins the FishLover, Likes, and FishSpecies tables in order to retrieve a list of users who like certain fish species:

SELECTfl.Username,fs.CommonName,fs.ScientificName

  FROMFishLoverflINNERJOINLikeslk

      ONfl.$ node_id=lk.$ from_id

    INNERJOINFishSpeciesfs

      ONlk.$ to_id=fs.$ node_id;

The SELECT statement uses the $ node_id, $ from_id, and $ to_id column aliases to join the tables and return the data shown in the following figure

word image 84 SQL Server Graph Databases – Part 2: Querying Data in a Graph Database

As you can see, querying graph tables is a fairly straightforward process, especially when you take advantage of the graph column aliases. Of course, if you’re trying to retrieve data based on more complex relationships, the query itself also becomes more complex and can even get a bit unwieldy. For this reason, Microsoft has added the MATCH function for retrieving related data from graph tables.

The MATCH function lets you define a search pattern based on the relationships between nodes. You can use the function only in the WHERE clause of a SELECT statement that queries node and edge tables. The following syntax shows the elements that go into defining the search pattern:

1

2

3

4

5

6

7

8

9

10

MATCH(<search_pattern>)

<search_pattern>::=

    <node_alias>

    {-(<edge_alias>)->|<-(<edge_alias>)-}

    <node_alias>

    [{AND<search_pattern>}[...n]]

  <node_alias>::=

    node_table_name|node_alias

  <edge_alias>::=

    edge_table_name|edge_alias

A search pattern can define one or more relationships. For each relationship, you must identify the originating and terminating nodes, as well as the edge that ties the two nodes together. You must also specify the direction of the relationship, using dashes and arrows, with the edge situated between the two nodes. For example, if you want to define a single relationship that originates with node1 and terminates with node2, you would use the following syntax for your WHERE clause:

WHEREMATCH(node1-(edge)->node2)

Notice that the edge is enclosed in parentheses, with a dash preceding the edge, and a dash and right arrow following the edge. This defines a relationship that moves from left-to-right. You can reverse this order by specifying node1 on the right side of the search pattern and node2 on the left side, with the arrow pointing in the opposite direction:

WHEREMATCH(node2<-(edge)-node1)

In either case, the search pattern indicates that the WHERE clause should return only those rows in which a relationship exists between node1 (the originating node) and node2 (the terminating node), as defined in the edge element.

With these basics in mind, you can rewrite the SELECT statement above to simplify the query:

SELECTLover.Username,Species.CommonName,Species.ScientificName

FROMFishLoverLover,Likes,FishSpeciesSpecies

WHEREMATCH(Lover-(Likes)->Species);

In the FROM clause, you simply list the participating tables – without the ON clause– providing table aliases where appropriate. You can then reference the aliases in the search pattern of the MATCH function. In this case, the search pattern defines the relationship fish lover likes a fish species. The statement will return the same results as those returned by the SELECT statement above.

You can also reverse the order of the relationship so that it is defined from right-to-left:

SELECTLover.Username,Species.CommonName,Species.ScientificName

FROMFishLoverLover,Likes,FishSpeciesSpecies

WHEREMATCH(Species<-(Likes)-Lover);

Again, the SELECT statement returns the same results as the previous two SELECT statements.

You can include other elements in the WHERE clause, in addition to the MATCH function. For example, the following WHERE clause adds a search condition specifying that only rows with a Username value of hooked should be returned:

SELECTSpecies.CommonName,Species.ScientificName

FROMFishLoverLover,Likes,FishSpeciesSpecies

WHEREMATCH(Lover-(Likes)->Species)

  ANDLover.Username=‘hooked’;

The statement now returns the results shown in the following figure.

word image 85 SQL Server Graph Databases – Part 2: Querying Data in a Graph Database

As you can see, using the MATCH function to define a search pattern based on a single relationship is a fairly straightforward process. In many cases, however, you’ll want to return data based on multiple relationships, which is where the function can be particularly handy.

Creating Compound MATCH Expressions

If you refer back to the syntax for the MATCH function, you’ll notice that you can use the AND operator when defining your search pattern, allowing you to string together multiple relationships within a single expression. Note, however, that the MATCH function does not support the OR operator or the NOT operator, so the logic you can define is somewhat limited. Even so, the AND operator can still be very useful. For example, the following search pattern uses the operator to string together two relationships:

SELECTLover.Username,Post.Title,Species.CommonName

FROMFishLoverLover,Likes,FishPostPost,LinksTo,FishSpeciesSpecies

WHEREMATCH(Lover-(Likes)->PostANDPost-(LinksTo)->Species);

For the SELECT statement to return a row, a fish lover must like a fish post and the fish post must link to a fish species. In this case, the statement returns only two rows, as shown in the following figure.

word image 86 SQL Server Graph Databases – Part 2: Querying Data in a Graph Database

By being able to link together multiple relationships, you can dig more deeply into how the nodes in a graph database are interconnected. For example, the preceding SELECT statement might help to determine whether users are more inclined to like a post that specifically links to a fish species.

In some cases, you can link together relationships without using the AND operator, as long as your search pattern defines the same logic. For instance, you can rewrite the preceding SELECT statement by eliminating the AND operator and one of the references to the Post table alias:

SELECTLover.Username,Post.Title,Species.CommonName

FROMFishLoverLover,Likes,FishPostPost,LinksTo,FishSpeciesSpecies

WHEREMATCH(Lover-(Likes)->Post-(LinksTo)->Species);

The SELECT statement returns the same results as the preceding one, even though the search pattern has been simplified. You can also define a search pattern that contains two relationships terminating with the same node, as in the following example:

SELECTLover.Username,Species.CommonName,Post.Title

FROMFishLoverLover,Likes,FishSpeciesSpecies,FishPostPost,LinksTo

WHEREMATCH(Lover-(Likes)->Species<-(LinksTo)-Post);

The first relationship (fish lover likes a fish species) is defined from left-to right, and the second relationship (fish post links to a fish species) is defined from right-to-left. As a result, the SELECT statement returns only those rows in which a fish species is both liked and linked to, giving us the results shown in the following figure.

word image 87 SQL Server Graph Databases – Part 2: Querying Data in a Graph Database

You can also define search patterns that include more than two relationships, using the AND operator where appropriate. For example, the search pattern in the following SELECT statement includes three relationships but only one instance of the AND operator:

SELECTLover.Username,Post.Title,Species.CommonName

FROMFishLoverLover,LikesLikes1,FishPostPost,

  LinksTo,FishSpeciesSpecies,LikesLikes2

WHEREMATCH(Lover-(Likes1)->Post-(LinksTo)->Species

  ANDLover-(Likes2)->Species);

Because the Likes table is referenced twice within the search pattern, it must be included twice in the FROM clause, with a different alias assigned to each instance. The search pattern then uses these aliases when defining the three relationships (fish lover likes a fish post, fish post links to a fish species, and fish lover likes a fish species). The SELECT statement will return only those rows in which a fish species is both liked and linked to from a post that is liked, as shown in the following figure.

In this case, the user underwatercasey likes both the fish post and fish species, and the fish post links to the fish species. You can also rewrite the search pattern to eliminate the AND operator altogether:

SELECTLover.Username,Post.Title,Species.CommonName

FROMFishLoverLover,LikesLikes1,FishPostPost,

  LinksTo,FishSpeciesSpecies,LikesLikes2

WHEREMATCH(Lover-(Likes1)->Post-(LinksTo)->Species<-(Likes2)-Lover);

The SELECT statement returns the same results as the preceding statement but simplifies the search pattern. In some cases, however, you might find that the AND operator makes it easier to read and troubleshoot your code as you heap on more and more relationships.

Defining a Self-referencing Query

Because of the way in which a graph database is structured, it is just as easy to perform a self-referencing query as any other type of query. For example, the following SELECT statement returns a list of fish posts that link to other posts:

SELECTPost1.TitleTitle1,Post2.TitleTitle2

FROMFishPostPost1,LinksTo,FishPostPost2

WHEREMATCH(Post1-(LinksTo)->Post2);

To define the relationship in the search pattern, you must include two instances of the FishPost table in the FROM clause, assigning a different alias to each instance, similar to how you included multiple instances of the Likes edge table in the preceding two examples. The SELECT statement returns the results shown in the following figure.

word image 89 SQL Server Graph Databases – Part 2: Querying Data in a Graph Database

You can use the same logic to determine which fish lovers like other fish lovers:

SELECTLover1.UsernameUser1,Lover2.UsernameUser2

FROMFishLoverLover1,Likes,FishLoverLover2

WHEREMATCH(Lover1-(Likes)->Lover2);

This time the FROM clause includes two instances of the FishLover table, with a unique alias assigned to each one, giving us the results shown in the following figure.

word image 90 SQL Server Graph Databases – Part 2: Querying Data in a Graph Database

You can also use this approach when defining more than two relationships in your search pattern. For example, the following SELECT statement returns a list of users that are liked by user hooked, along with the users that they like:

SELECTLover2.UsernameUser2,Lover3.UsernameUser3

FROMFishLoverLover1,LikesLikes1,FishLoverLover2,

  LikesLikes2,FishLoverLover3

WHEREMATCH(Lover1-(Likes1)->Lover2-(Likes2)->Lover3)

  ANDLover1.Username=‘hooked’;

The FROM clause now includes three instances of the FishLover table and two instances of the Likes table. The search pattern uses these instances to define the relationships fish lover1 likes fish lover2 and fish lover2 likes fish lover3, giving us the results shown in the following figure.

word image 91 SQL Server Graph Databases – Part 2: Querying Data in a Graph Database

Essentially what we have here is a friend-of-a-friend type scenario:

  • The user hooked likes the users powerangler and deepdive.
  • The user powerangler likes the user deepdive.
  • The user deepdive likes the user underwatercasey.

Self-referencing queries make it easy to discover how data is related wherever it resides. Because the relationships are stored within the edge tables, it does not matter if the relationships cross multiple nodes, are contained within a single node, or are a combination of both.

Combining the MATCH Function with Other T-SQL Features

So far in this article, the SELECT statements we’ve covered have included only the SELECT, FROM, and WHERE clauses. However, you can use the MATCH function with other query types, such as queries that group and aggregate data. For example, the following SELECT statement includes the GROUPBY and ORDERBY clauses:

SELECTLover.Username,COUNT(*)ASTotalLikes

FROMFishLoverLover,Likes,FishSpeciesSpecies

WHEREMATCH(Lover-(Likes)->Species)

GROUPBYLover.Username

ORDERBYUsername;

The GROUPBY clause groups the data by the Username values in the FishLover table in order to provide a count of total likes per user, as shown in the following results.

word image 92 SQL Server Graph Databases – Part 2: Querying Data in a Graph Database

Although this is a fairly basic example, it demonstrates how you can build various types of queries based on data from a graph database, just like queries against relational tables. You can even use a graph database in conjunction with Microsoft Machine Learning Services (MLS). For example, the following T-SQL script retrieves data from a graph database and then uses it in a Python script to generate a bar chart:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

DECLARE@pscriptNVARCHAR(MAX);

SET@pscript=N‘

# import matplotlib modules

import matplotlib

matplotlib.use(“PDF”)

import matplotlib.pyplot as plt

  

# define df data frame

df = InputDataSet

  

# create bar chart object

pt = df.plot.barh()

pt.set_yticklabels(labels=df.Username)

  

# save bar chart to PDF file

plt.savefig(“c:\datafiles\TotalFishLikes.pdf”,

  bbox_inches=”tight”, pad_inches=.5)’;

  

DECLARE@sqlscriptNVARCHAR(MAX);

SET@sqlscript=N‘

  SELECT Lover.Username, COUNT(*) AS TotalLikes

  FROM FishLover Lover, Likes, FishSpecies Species

  WHERE MATCH(Lover-(Likes)->Species)

  GROUP BY Lover.Username

  ORDER BY Username DESC;’;

  

EXECsp_execute_external_script

  @language=N‘Python’,

  @script=@pscript,

  @input_data_1=@sqlscript;

GO

I won’t go into too much detail about running the Python script because I covered it in my series on Python and MLS. (The series starts with the article SQL Server Machine Learning Services – Part 1: Python Basics.) Basically, the T-SQL script does the following:

  • Declares the @pscript variable and assigns the Python script to the variable. The script creates a data frame based on the data returned from the graph database and then generates a bar chart based on the data.
  • Declares the @sqlscript variable and assigns the SELECT statement to the variable. This is the same SELECT statement shown in the previous example.
  • Passes in the two variables as parameter values to the sp_execute_external_script stored procedure, which retrieves the graph data and runs the Python code.

When you run the sp_execute_external_script stored procedure, the Python engine generates the bar chart shown in the following figure.

word image 93 SQL Server Graph Databases – Part 2: Querying Data in a Graph Database

As these examples demonstrate, graph databases have been well integrated into the SQL Server ecosystem, providing you with a great deal of flexibility for working with graph data. MLS in particular opens up a wide range of possibilities for analyzing and visualizing the data in a variety of ways.

Conclusion

As with storing and populating graph tables, querying those tables is a relatively painless process, once you understand the fundamentals. Where applicable, you can use the MATCH function to streamline your queries. When that’s not practical, you can revert to regular queries to get the data you need. The only trick is to make sure you’re working against an instance of SQL Server 2017 and not an earlier version. Fortunately, you can download the Developer’s edition for free, which comes with all the features available to the Enterprise edition, including support for graph databases. You can then use what you learned so far in this series to start exploring graph databases on your own.

Let’s block ads! (Why?)

SQL – Simple Talk

Read More

How to Map Relational Data to a Graph Database

March 31, 2018   TIBCO Spotfire
iStock 817442226 e1522345309668 How to Map Relational Data to a Graph Database

Organizations, especially a new class of employees like business analysts, data scientists, and marketing strategists, are actually trying to get better insight using the relations in all the data records of their enterprise.  In order to better serve these new ways of searching for information, it is becoming critical to find a way to expose existing enterprise data in a graph database format. Unfortunately, most of today’s data is stored in relational database management systems (RDBMS).

In the relational data storage model, there is no physical link to indicate the relation among different data records. They are only linked logically by a special attribute called a foreign key, rendering the search of related records not as effective. In order to find relationships in a relational data storage model, you have to build complex queries that perform actions over and over again. It’s not very efficient. While relational databases are widely used, they are not ideal for storing and querying data that have a high degree of relationships.

On the other hand, the graph data storage model keeps data attributes together naturally (in nodes). By creating physical links called edges, all related nodes are kept together, highlighting the relationships between data sets. This focus on relationships in graph database makes it a far better home for today’s interconnected data than traditional relational databases.

But, since most of today’s data is still in relational database format, how do you convert your current data to a graph database format? Mapping data from a relational database to graph database is fundamentally a task of converting the relational representation from one database to the other. More specifically, we can use the foreign keys of relational data model to build edges, thus transforming loosely coupled data records into a highly bounded group of nodes. Nodes form the basis of graph database architecture.

For a detailed walk-through of how to do this, download the brief: “How to Map Relational Data to a Graph Database”. In it, we go step by step and use a well-known example to demonstrate an approach to map information from a relational database to a graph database.

Let’s block ads! (Why?)

The TIBCO Blog

Read More

Graph Databases: The Next Generation of Fraud Detection Technology

December 12, 2017   TIBCO Spotfire
graph database Graph Databases: The Next Generation of Fraud Detection Technology

This is a follow-up blog post in our Graph Database series. Learn what a graph database is and why it’s important here.

Today’s approach to fraud detection: Discrete Analysis

With its terrific ability to enable users to spot patterns quickly and easily, graph database has wide implications for use in a number of industries including fraud detection. Fraud accounts for billions of dollars lost per year and it seems each year, fraudsters get more sophisticated in outsmarting the banks. Unfortunately, the most common attribute of fraud is misdirection and burying their patterns in lots of data.  With traditional data storage techniques, it really isn’t possible to see beyond individual points to the connections between them. That sort of heightened, overarching pattern-like view has now become more feasible with graph database. The connected data view in graph database can more easily uncover these larger complex patterns and make fraud evasion harder. To understand how useful an addition graph database can be in detecting fraud, it’s best to first understand Gartner’s layered approach to fraud detection.

Gartner analysts say “No single layer of fraud prevention or authentication is enough to keep determined fraudsters out of enterprise systems.” They say that you need a combination of monitoring and controls to combat fraud. Therefore, they recommend following the 5 layered fraud prevention approach. This approach recommends that companies use all five layers, ranging from the most basic of security measures (secure browsing) to the most complex (analysis of relationships), and encourages companies to face the fact that some determined fraudsters will break through the first few layers of security. Because some will make it in.

Most of today’s available fraud prevention solutions only address the first four layers of Gartner’s recommended approach. For instance, banks mainly use transaction monitoring systems (TMS) which rely on relational databases. Due to their linear, discrete analysis approach, most of today’s solutions can usually only spot trends and incidents after they’ve happened and only a limited basis.

While the discrete approach is an easy one that helps users spot patterns and capture fraudsters acting alone, it doesn’t necessarily detect patterns between all the different data endpoints and therefore, is not very useful in detecting elaborate crime rings. Further, modern fraud rings have become very familiar with the ways of discrete data analysis and know how to avoid detection with this approach. This leads us to the next frontier of fraud detection which revolves around connected analysis.

Tomorrow’s approach to fraud detection: Connected Analysis

This is where graph database can really add value. Graph database addresses Gartner’s fifth layer of fraud prevention: entity link analysis. Graph database enables banks to look beyond the individual data points of discrete analysis to the connections that link them. With graph database, banks can see their data in “graphs” and more easily visualize patterns and opportunities to better predict when and where fraud might occur.

Another important trait that makes graph database a value add to any fraud prevention solution is its inherent speed in calculating relationships. Since the relationships in graph database are treated with as much value as the database records themselves, the engine that navigates the connections between nodes can do so efficiently, enabling millions of connections per second. Graph database enables quick extraction of new insight from large and complex databases to help uncover unknown interactions and relationships. This means that with a graph database, banks can process data and compute risks quicker than today’s current relational databases so they can spot opportunities and threats before the competition.

According to Forbes, graph database reduces false positives, improve false negative detection, ease investigations, and reduce overall fraud investigation costs. Fundamentally, fraud detection depends on the ability to analyze the relationships between customers and transactions, and recognize patterns or trends. All at the speed of today’s transactions. Graph database provides the speed and the ability to detect large patterns making it the ideal addition to any fraud prevention solution.

Let’s block ads! (Why?)

The TIBCO Blog

Read More

Is there a Bond Graph toolbox or Package?

November 22, 2017   BI News and Info
 Is there a Bond Graph toolbox or Package?

It seems like there was one, but it doesn’t exist anymore

Let’s block ads! (Why?)

Recent Questions – Mathematica Stack Exchange

Read More
« Older posts
  • Recent Posts

    • C’mon hooman
    • Build and Release Pipelines for Azure Resources (Logic Apps and Azure Functions)
    • Database version control: Getting started with Flyway
    • Support CRM with New Dynamics 365 Field Service Mobile App
    • 6 Strategies for Achieving Your Business Goals in the New Year
  • Categories

  • Archives

    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited