Sunday 24 September 2017

September 24, 2017

How to Hierarchical Query, Parents, Children and Their Cycles

This is a three-part Series
  1. How to Hierarchical Query, Parents, Children and Their Cycles
  2. How To Create Report in Oracle APEX based on Hierarchical Query
  3. How to use Hierarchical Query in Tree Plugin

Part-1 How to Hierarchical Query, Parents, Children and Their Cycles

Almost every one on this planet like trees right. Some of us like them as essential part of the world ecological system while others have created a very peculiar taste for it, like the timber mafia and the likes.

Usually you would imagine a tree standing up , however , programmers usually think of trees up-side-down,weird hmm... till it all make sense.😇

An Example of a Hierarchy is an Organization chart representing a hierarchy of employees.

First, Know your terminology.
  • You can have multiple trees in a hierarchical table. 
  • Node
    • A row in a table that represents a specific entry in a hierarchical tree structure. 
  • Root [node]
    • The uppermost node/Starting Point in a hierarchical structure. 
    • It has no parent. and only one root in any given tree
  • Parent [node]
    • A node that is one level up in a tree. It has a child node below it.
  • Child [node]
    • A node that is one level down in a tree. It has a parent node above it.
  • Leaf [node]
    • A node at the lowest levels with no children. As is analogous with the leaf of a tree there is nothing beyond it.
    • Leaf Nodes do not all need to be at the same level, but must be without children.
  • Level
    • A layer of nodes. From the Root and Traversing(hopping from one node to another) through Its children, on each hop of Node downards is the next Level.


Here is what we gonna do:

  1. The Foremost Clauses, Operators and The Template Query
  2. Research into Data to find How to Construct your Hierarchy
  3. A Little Extra Detail

The Foremost Clauses, Operators and The Template Query

The Foremost Clauses and Operators:

  • The START WITH condition1
    • All rows that satisfy START WITH condition1 are considered root rows.
    • If you don't specify the START WITH clause, all rows are considered root rows.
  • CONNECT BY condition2 clause and The PRIOR operator
    • condition2 must contain the PRIOR operator, which is used to identify columns from the parent row, 
    • condition2 cannot contain a subquery.
    • PRIOR is used to connect each child row to its parent row by connecting manager_emp_id in the child to emp_id in the parent
  • The LEVEL pseudocolumn
    • Shows the Level of the current row in the Hierarchy
  • ORDER SIBLINGS BY Column;
    • Sorting at each level while ensuring that child nodes remain underneath their parents, at the same time preserving the hierarchy

The Template Query

The Template Query will cover the following:

  • Tree Traversal Using ANSI SQL, 
  • Identifying the Number of Levels
  • Identifying Leaf Nodes
  • Finding the Path to a Node
  • Ordering Hierarchical Data


If You Use the Template Query You would at least know what is going on with the Hierarchy, You will get Enough Information to see the Hierarchy.
Code
Select LEVEL, LPAD('  ',3*(LEVEL - 1)) || Column AS  Alias,
  DECODE(CONNECT_BY_ISLEAF, 1, 'TRUE', 'FALSE') IS_LEAF,
 TRIM(LEADING '>' FROM
            TRIM(LEADING '=' FROM SYS_CONNECT_BY_PATH(Column, '=>'))) As Path
from
 Table t
START WITH Condition1
CONNECT BY Condition2
ORDER SIBLINGS BY Column;
Uses

  • LEVEL  => display which Level is the Current row on.
  • CONNECT_BY_ISLEAF => pseudocolumn that returns 1 if the current row is a leaf, and returns 0 if the current row is not a leaf
  • LPAD('  ',3*(LEVEL - 1)) ||  => to have a nice Indented Display with each level
  • TRIM(LEADING '>' FROM  TRIM(LEADING '=' FROM SYS_CONNECT_BY_PATH(Column, '=>'))) => to have a nice display From the Root Node to the Current Node.
  • ORDER SIBLINGS BY Column; => for Ordering on a Column and ensuring that child nodes remain underneath their parent
I have used the scott[y] from Star Trek 😉 , schema throughout the article
Just plugin the Table and Columns Names to see the Result


Select LEVEL,
       DECODE(CONNECT_BY_ISLEAF, 1, 'TRUE', 'FALSE') IS_LEAF,
       LPAD('  ', 3 * (LEVEL - 1)) || t.ename Name,
       TRIM(LEADING '>' FROM
            TRIM(LEADING '=' FROM SYS_CONNECT_BY_PATH(t.ename, '=>'))) Path,
            CONNECT_BY_ROOT t.ename as "Top Manager"
           
  from emp t
 START WITH t.mgr is NULL
CONNECT BY PRIOR t.empno = t.mgr
 ORDER SIBLINGS BY t.ename;


PRIOR is used to connect each child row to its parent row by connecting manager_empno in the child to empno in the parent

Research into Data to find How to Construct your Hierarchy

Now it's time play around and research into data to see how will you construct your hierarchy.

  • Finding Root Nodes
  • Finding the Connection/Relationship between Nodes
  • Finding a Node's Immediate Parent
  • Finding Leaf Nodes


Finding Root Nodes

For Finding Root Nodes, We know that Root [node] The uppermost node/Starting Point in a hierarchical structure which has no parent
So I need employee which has no manager and hence t.mgr is NULL

Select t.empno ,t.ename , t.mgr
 FROM emp t
 where
  t.mgr is NULL

This will form the Starting base for my Hierarchical query.
As
START WITH t.mgr is NULL


Finding the Connection/Relationship between Nodes

As we know that each employee has a manager ,  except the root node ,  and the for each employee the manager(empno) is stored in the mgr Column, so we can use this information to construct our relatioship of parent-child.

CONNECT BY PRIOR t.empno = t.mgr

Finding a Node's Immediate Parent


We use a simple self-join to find the immediate parent, i.e. each employee's Manager.

Select e.ename "Employee" , m.ename "Manager"
 FROM emp e JOIN emp m
 ON e.mgr = m.empno
 Order by m.ename

Finding Leaf Nodes

Here is a Co-related Subquery using the EXISTS operator, to get the employees who does not have children, i.e. who are only employees but not a manager and thus whose empno is not in the mgr Column.

Select *
 FROM emp e
 WHERE NOT EXISTS (SELECT 1 FROM emp t where e.empno = t.mgr)
Remember I used the CONNECT_BY_ISLEAF  in the Template Query , however , here we are talking about research in to data for preparing and comparison for validaty.


A Little Extra Detail

  • Finding the Number of Levels
  • Checking for Ascendancy/Finding in the Sub-Tree
  • Listing Multiple Root Nodes
  • Listing the Top Few Levels of a Hierarchy
  • Aggregating a Hierarchy
  • Identifying Cycles and ignoring it

Finding the Number of Levels

Here are a few queries to find

The Maximum depth of the Hierarchy
Select MAX(LEVEL)        
  from emp t
 START WITH t.mgr is NULL
CONNECT BY PRIOR t.empno = t.mgr

The No of Nodes Per Level
Select LEVEL, COUNT(t.ename) NO_OF_NODES      
  from emp t
 START WITH t.mgr is NULL
CONNECT BY PRIOR t.empno = t.mgr
 GROUP BY LEVEL;

Checking for Ascendancy/Finding in the Sub-Tree

To see if a  Node is Direct or Indirect Ascendant(Parent) of  a [Child/Descendant] Node.Let's say If we want to see if a Manager is a Direct or Indirect Manager of an Employee. For example, to see if Jones is a Manager(Direct/Indirect) of  Smith or Ward
 Select LEVEL,
       DECODE(CONNECT_BY_ISLEAF, 1, 'TRUE', 'FALSE') IS_LEAF,
       LPAD('  ', 3 * (LEVEL - 1)) || t.ename Name,
       TRIM(LEADING '>' FROM
            TRIM(LEADING '=' FROM SYS_CONNECT_BY_PATH(t.ename, '=>'))) Path,
            CONNECT_BY_ROOT t.ename as "Top Manager"
         
  from emp t
  WHERE t.ename = 'SMITH'
START WITH t.ename = 'JONES'
CONNECT BY PRIOR t.empno = t.mgr
 ORDER SIBLINGS BY t.ename;
Here we START WITH t.ename = 'JONES'  ANd Look for 'SMITH' in the Sub-tree , WHERE t.ename = 'SMITH'


Listing Multiple Root Nodes

For Example Finding the Topmost Manager of each department
Select LEVEL,
       DECODE(CONNECT_BY_ISLEAF, 1, 'TRUE', 'FALSE') IS_LEAF,
       LPAD('  ', 3 * (LEVEL - 1)) || t.ename Name,
       TRIM(LEADING '>' FROM
            TRIM(LEADING '=' FROM SYS_CONNECT_BY_PATH(t.ename, '=>'))) Path,
            CONNECT_BY_ROOT t.ename as "Top Manager"
         
  from emp t
 START WITH t.mgr is NULL
CONNECT BY t.mgr = PRIOR t.empno
and t.deptno != PRIOR t.deptno
 ORDER SIBLINGS BY t.ename;

by adding and t.deptno != PRIOR t.deptno found the top-most managers.

Listing the Top Few Levels of a Hierarchy


So Listing on the Top-Managers
Select LEVEL,
       DECODE(CONNECT_BY_ISLEAF, 1, 'TRUE', 'FALSE') IS_LEAF,
       LPAD('  ', 3 * (LEVEL - 1)) || t.ename Name,
       TRIM(LEADING '>' FROM
            TRIM(LEADING '=' FROM SYS_CONNECT_BY_PATH(t.ename, '=>'))) Path,
            CONNECT_BY_ROOT t.ename as "Top Manager"
         
  from emp t
  Where LEVEL <= 2
 START WITH t.mgr is NULL
CONNECT BY t.mgr = PRIOR t.empno
 ORDER SIBLINGS BY t.ename;

We used the Where LEVEL <= 2 to limit the query to only the First  2 Levels.

Aggregating a Hierarchy


For example, you may want to sum the salaries of all employees reporting to a specific employee
Select SUM(t.sal)    
  from emp t
 START WITH t.ename = 'JONES'
CONNECT BY PRIOR t.empno = t.mgr
You may want to consider each employee as a root/Manager, and for each employee print out the sum of the salaries of all subordinate employees.

For Each Manager , The Sum of the Manager and  Subordinates Salary
 WITH
   T
AS
(
   SELECT
      CONNECT_BY_ROOT ENAME ENAME,
      SAL
   FROM
      EMP
   CONNECT BY
      MGR=PRIOR EMPNO
)
SELECT
   ENAME,
   SUM(SAL)
FROM
   T
GROUP BY
   ENAME;


 Select LEVEL,
       DECODE(CONNECT_BY_ISLEAF, 1, 'TRUE', 'FALSE') IS_LEAF,
       LPAD('  ', 3 * (LEVEL - 1)) || t.ename Name,
       TRIM(LEADING '>' FROM
            TRIM(LEADING '=' FROM SYS_CONNECT_BY_PATH(t.ename, '=>'))) Path,
            CONNECT_BY_ROOT t.ename as "Top Manager"
         
  from emp t

 --START WITH t.mgr is NULL
CONNECT BY PRIOR t.empno = t.mgr
 ORDER SIBLINGS BY t.ename;

For Each Manager , The Sum of its Subordinates Salary
WITH
   T
AS
(
  Select LEVEL,
       DECODE(CONNECT_BY_ISLEAF, 1, 'TRUE', 'FALSE') IS_LEAF,
       t.ename Name,
       CONNECT_BY_ROOT t.ename as Manager ,
       t.sal SALARY,
       TRIM(LEADING '>' FROM
            TRIM(LEADING '=' FROM SYS_CONNECT_BY_PATH(t.ename, '=>'))) Path      
  from emp t
 --START WITH t.mgr is NULL
Where   CONNECT_BY_ROOT t.ename != t.ename
CONNECT BY PRIOR t.empno = t.mgr
 ORDER SIBLINGS BY t.ename
)
SELECT
   Manager,
   SUM(SALARY)
FROM
   T
GROUP BY
   Manager;


Here are the important Points

  •  --START WITH t.mgr is NULL , is commented, Remember when I said, If you don't specify the START WITH clause, all rows are considered root rows. Here is exactly the case , we are considering all rows and their Mangers
  • Where   CONNECT_BY_ROOT t.ename != t.ename, therefore , employee will consider itself in the salary computation.
  • If you find that the With Clause is not applicable in your situation, Just replace it with an Inline View in the from Clause.


Identifying Cycles and Ignoring it

When a node's child is also its parent , then their is a cycle in the Hierarchy.

  • To Ignore Cycle you can use the CONNECT BY NOCYCLE  clause.
  • To identify a Cycle you can use CONNECT_BY_ISCYCLE 
    • pseudocolumn returns 1 if the current row has a child that is also its ancestor; otherwise, it returns 0
    • can be used only in conjunction with the CONNECT BY NOCYCLE  clause 

September 24, 2017

How To Treat The NULL Vacuum

It make some sense as the vacuum is something devoid of matter, something we do not really know about its contents, As does the NULL in a database.

Here are a few points that need consideration as NULL plays the role of a double-edged sword in a database, providing a very efficient way to store unknown values , however not treated wisely could have a nastier effect especially, during data manipulation:

  • The Term null value is inaccurate as null indicates the lack of a value.
  • A null is as being a marker for a missing value that might never be known, might be determined later, or is not applicable. 
  • Nulls can be stored in columns of any data type that are not restricted by NOT NULL or PRIMARY KEY
  • Nulls aren't equal (or unequal) to each other. You can't determine whether a null matches any other value, including another null, so the expressions NULL=any_value, NULL<>any_value, NULL=NULL, and NULL<>NULL are neither true nor false but unknown;
  • Use IS [NOT] NULL to detect a null;
  • Oracle treats an empty string ('') as a null
Select DECODE('',NULL, 'TRUE', 'FALSE') IS_NULL
From dual

With test as (Select '' as col from dual)
Select 'TRUE' IS_NULL
From test t
Where t.col is NULL

  • DISTINCT Treats all the nulls in a particular column as duplicates
  • For Sorting you can specify Order by col [NULLS (FIRST|LAST)]. If you don not specify and Sort by a column that contains nulls, the nulls will be either NULLS LAST if ASC , also remember ASC is the Default if not specified, and NULL FIRST if desc
  • Any computations involving a Null can result in Null
Select 12+3+NULL Sum
from dual
so better to use always
Select 12+3+NVL(NULL,0) Sum
from dual
  • Aggregate functions, such as COUNT(COLUM_HAVING_NULLS),SUM, MAX, MIN, AVG, etc., all ignore NULLs while  COUNT(*) and GROUPING doesn't ignore NULLS 
  • If the grouping column in a GROUP BY clause contains nulls, all the nulls are put in a single group
  • Nulls affect the results of joins, the Inner Joins and The Left and Right Outer Joins. 
  • Nulls can cause problems in subqueries. I  suggest to do some googling on the topic as you can get some weird results just by placing keywords in different positions in your query. For example,
    WHERE NOT col IN (SELECT col FROM table1); can yeild different results from WHERE  col NOT IN (SELECT col FROM table1);  
  • In Set Operations, UNION,MINUS,INTERSECT will treat nulls as duplicates, Except for the UNION ALL which treats each null as different.
select 1, NULL from dual 
Union 
select 2, NULL from dual
Union
Select 1, NULL from dual;

select 1, NULL from dual 
Union
select 2, NULL from dual
Minus
Select 1, NULL from dual;

select 1, NULL from dual 
Union 
select 2, NULL from dual
Intersect
Select 1, NULL from dual;

  • Sometimes we call the same function twice in our query without any need thereof.Here is a tip when NVL is used in the where clause.
NVL(PRIMARY_QTY,0) > 0
Means 3 Things
  1. Nulls  converted to Zero(0) and
  2. PRIMARY_QTY is > 0
  3. so in the Select we can just use  PRIMARY_QTY no need to use NVL(PRIMARY_QTY,0) PRIMARY_QTY , since it's already filtered out

Some Resources