Tuesday, April 24, 2012

T-SQL Programming - Useful Tips - Text Search

SQL Server 2008 R2

TEXT SEARCH

Case Sensitive Search
1. CHARINDEX('ExpressionToFind', 'ExpressionToSearch' COLLATE Latin1_General_CI_AS)

    Here CHARINDEX function returns the position of the string found in  the search. Returns 0 if not found.

Case Insensitive Search
1. Using CHARINDEX

CHARINDEX('ExpressionToFind', 'ExpressionToSearch' COLLATE Latin1_General_CS_AS)

 - Here CHARINDEX function returns the position of the string found in  the search. Returns 0 if not found.

2. Using Like 
IF EXISTS(SELECT * FROM (SELECT 'This is a test' a) tab WHERE a like '%test')

- 'Like' does case insensitive search in T-SQL






Wednesday, April 18, 2012

How to retrieve last IDENTITY using JDBC

Background:
I was working on an existing module which was using JDBC for all the database operations. There was a scenario where front-end UI has to update a set of tables, meaning data will be inserted into parent table first, than in child tables using parent's ID to establish the referential integrity. Existing code was using database MAX function to fetch the ID which I felt not a good idea and might have concurrency issues when multiple users will be using the application. My investigation came up with few alternates which I have listed below.
Right now the application is pointing to the MS SQL Server database and all the solutions are for that database only. Plan is to do similar investigation in future to incorporate other databases like Oracle.

Solutions for MS SQL Server:

Solution-01: Using JDBC built-in functionality

Assumptions:
  1. Table where the insertion is happening has a auto-generated key
Steps: (see code-01)
  1. Use Statement.RETURN_GENERATED_KEYS during PreparedStatement creation
  2. Execute insert statement
  3. Get list of generated keys as ResultSet
  4. Get the first value from ResultSet
Solution-02Using MS SQL server OUTPUT clause feature in the query

Assumptions:
  1. The query should use OUTPUT clause feature of SQL Server (I have another post, which explains how to use OUTPUT clause)
Steps: (see code-02)
  1. Create a query with OUTPUT clause
  2. Execute the query using ' executeQuery' method of Statement
  3. Get list of generated keys as ResultSet
  4. Get the first value from ResultSet
Resources:

  1. How to use OUTPUT clause, this is my another post on this topic - http://peerprogrammer.blogspot.com/2012/04/how-to-retrieve-identity-of-table_18.html



Sample Codes:
--------------------
--- Code-01 - use JDBC feature to get IDENTITY
--------------------
public int getLastIdentity(String sql) {  
PreparedStatement pStmt;  
int lastIdentity = 0;  
try {  
pStmt = conn.prepareStatement( sql, Statement.RETURN_GENERATED_KEYS);  
   
pStmt.executeUpdate();  
ResultSet identities =  pStmt.getGeneratedKeys();  
  
identities.next();  
lastIdentity = identities.getInt(1);  
identities.close();  
pStmt.close();  
conn.close();  -- connection is created outside of this method
} catch (Exception e) { e.printStackTrace(); }  
return  lastIdentity;  
}

--------------------
--- Code-02 - use SQL Server 2005 feature OUTPUT clause
--------------------
Input SQL should look like below and assuming the table's auto increment column name is 'id':
INSERT INTO 'tablename' OUTPUT inserted.id VALUES('First Column', 'Second Column');

public int getLastIdentity(String sql) {  
PreparedStatement pStmt;  
int lastIdentity = 0;  
try {  
pStmt = conn.prepareStatement(sql);  

ResultSet identities =   pStmt.executeQuery();  
identities.next();  
lastIdentity = identities.getInt(1);  
identities.close();  
pStmt.close();  
conn.close();  -- connection is created outside of this method
} catch (Exception e) { e.printStackTrace(); }  
return  lastIdentity;  
}

How to retrieve IDENTITY of a table

Background:
I have a Stored Procedure which need to insert data into a set of tables by maintaining the referential integrity. Meaning parent table ID value (ID is a Identity column) need to be fetched and used in child tables during insertion. There is a trigger defined in the parent table which inserts a record in another table. Logic in the SP is like 1. inserts a record in parent table 2. immediate statement retrieves last identity using @@IDENTITY global variable. As the trigger inserts record in another table, the @@IDENTITY variable returns the identity from second table. See Query-03 which represents the same scenario.


This problem drove me to investigate in detail about the approaches and their pros and cons to retrieve IDENTITY value of a table. 


Factors need to consider: (mostly for @@IDENTITY, SCOPE_IDENTITY(), IDENT_CURRENT())
Different scopes:
  1. The session scope - the connection which produced the IDENTITY value
  2. The statement scope - where is the statement that produced the IDENTITY value
  3. The table scope -the able produced the IDENTITY value
What is scope?
A scope is a module: a stored procedure, trigger, function, or batch. Therefore, two statements are in the same scope if they are in the same stored procedure, function, or batch.


What is @@IDENTITY?
  1. @@IDENTITY is a system global variable
  2. Returns the last IDENTITY value produced on a connection
  3. Regardless of the table that produced the value
  4. Regardless of the scope of the statement that produced the value
  5. Returns IDENTITY of current session, ignores the scope
Note: From the attributes one thing is clear that developer has to be certain where the IDENTITY comes from and the scope of it unless same problem might occur which I explained above. (see Query-03 for sample code)


What are the alternatives?
  1. Use  SCOPE_IDENTITY()
    1. Attributes:
      1. Returns the last IDENTITY value produced on a connection and by a statement in the same scope
        1. Regardless of the table that produced the value
        2. Returns IDENTITY of current session and current scope
    2. Notes:
      1. IDENTITY created in Trigger or User Defined Function for other table(s) won't affect actual table's IDENTITY
      2. (see Query-03 for sample code)
  2. Use  IDENT_CURRENT('tablename')
    1. Attributes:
      1. Returns last IDENTITY produced in a table
      2. Regardless of the connection that created the value
      3. regardless of the scope of the statement that produced the value
    2. Notes:
      1. IDENTITY created in Trigger or User Defined Function for other table(s) won't affect actual table's IDENTITY
      2. If IDENTITY created in current scope is needed, this might not return correct value as some other connection might have already entered record in between
      3. (see Query-03 for sample code)
  3. Use OUTPUT clause:
    1. Attributes:
      1. Returns information from, or expressions based on, each row affected by an INSERT, UPDATE, DELETE (see Query-01)
      2. Information can be retrieved in a tabular form only, cannot be retrieved in a variable (see Query-02
      3. Columns returned from OUTPUT reflect the data as it is after the INSERT, UPDATE, or DELETE statement has completed but before triggers are executed
    2. Notes:
      1. The Output clause is a new to SQL Server 2005
      2. An UPDATE, INSERT, or DELETE statement that has an OUTPUT clause will return rows to the client even if the statement encounters errors and is rolled back. The result should not be used if any error occurs when you run the statement
  4. Use SEQUENCE feature in column (similar as Oracle Sequence)
    1. Notes:
      1. Available in SQL Server 2011 only
      2. I couldn't come up with sample code as not used SQL Server 2011 yet
  5. Use custom sequence (a table to store sequence value and a Stored Procedure to modify the value)
    1. Notes:
      1. Custom logic
      2. Concurrency need to be handled manually
Resources:
  1. How to use OUTPUT clause -  http://msdn.microsoft.com/en-us/library/ms177564.aspx
  2. Problem in SCOPE_IDENTITY -  (1) https://connect.microsoft.com/SQLServer/feedback/details/328811/scope-identity-sometimes-returns-incorrect-value (2)  http://support.microsoft.com/kb/2019779
Sample Queries:


-------------
-- Query-01 - OUTPUT clause 
-------------
-- Data table
CREATE TABLE test(id INT IDENTITY(1,1), value VARCHAR(100));
-- table for output
DECLARE @outputTable TABLE (id INT);


INSERT INTO test(value) 
 OUTPUT inserted.id 
 INTO @outputTable
 VALUES('One');

SELECT * FROM test;
SELECT * FROM @outputTable;


-- Test OUTPUT with UPDATE
INSERT INTO test(value) VALUES('Two');
UPDATE test
 SET value = 'One Changed'
 OUTPUT deleted.id
 INTO @outputTable
 WHERE id=1
;
SELECT * FROM test;
SELECT * FROM @outputTable;


-- Test OUTPUT with DELETE
DELETE test
 OUTPUT deleted.id
 INTO @outputTable
 WHERE id=2;
SELECT * FROM test;
SELECT * FROM @outputTable;
GO


-------------
-- Query-02 - Output of OUTPUT clause in a variable
-- Result: it fails because OUTPUT returns data in tabular format
-------------
-- Data table
CREATE TABLE test(id INT IDENTITY(1,1), value VARCHAR(100));
DECLARE @outputVariable INT;


INSERT INTO test(value) 
 OUTPUT inserted.id 
 INTO @outputVariable
 VALUES('One');

SELECT * FROM test;
SELECT 'Variable= '+CONVERT(VARCHAR(100), @outputVariable);
GO


Error:
Msg 1087, Level 16, State 1, Line 6
Must declare the table variable "@outputVariable".


-------------
-- Query-03 - Usage of @@IDENTITY, SCOPE_IDENTITY(), IDENT_CURRENT()
-------------
-- Data tables
CREATE TABLE one(id INT IDENTITY(1,1), value VARCHAR(100));
CREATE TABLE two(id INT IDENTITY(1,1), value VARCHAR(100));
GO
-- Create trigger in table 'one'
CREATE TRIGGER TROne
 ON one
 AFTER INSERT
AS
DECLARE @value VARCHAR(100), @id INT;
SELECT @value=value, @id=id FROM inserted;

INSERT INTO two(value) VALUES('ID='+CONVERT(VARCHAR(100), @id) + ' is inserted in table one');
INSERT INTO two(value) VALUES('Value='+CONVERT(VARCHAR(100), @value) + ' is inserted in table one');
GO


INSERT INTO one(value) VALUES('Record one');
SELECT '@@IDENTITY= '+CONVERT(VARCHAR(100), @@IDENTITY);
SELECT '@@SCOPE_IDENTITY= '+CONVERT(VARCHAR(100), SCOPE_IDENTITY());
SELECT '@@IDENT_CURRENT for table one= '+CONVERT(VARCHAR(100), IDENT_CURRENT('one'));
SELECT '@@IDENT_CURRENT for table two= '+CONVERT(VARCHAR(100), IDENT_CURRENT('two'));
GO

Tuesday, April 3, 2012

Database Coding Convention


  1. Each database object must have a short description

  2. If needed, each object should have ‘Implementation Flow’ (bullet point) which will describe the logic implemented within it. Comment should point to proper bullet point

  3. Each logic should have proper comment and should refer to appropriate  ‘Implementation Flow’ point

  4. Each database object must have ‘Version History’. Every time the object is modified an entry should be added. The format is as below-
    1. Format - [Serial #] - [Author] - [Date] - [Short Description]
    2. Example – 1 – Tom Brady – 03/20/2012 – Created
    3. Example – 2 – Tom Brady – 03/21/2012 – Unit tested

  1. Database object Naming Convention as follow -
    1. Stored Procedure – prefix SP
                                                              i.      Example - SPStoredProcedureName
    1. Function – prefix FN
                                                              i.      Example – FNFunctionName
    1. Trigger – prefix TR
                                                              i.      Example – TRTriggerName
    1. Variable – camel case (first letter of first word in small case, first letter of subsequent words in upper case)
                                                              i.      Example - @firstVariableName
    1. Temp table – camel case

  1. NULL checks need to be added for each mandatory input. If value is missing, error need to be raised
    1. Example -

-- NULL check for inputs

IF @inputOne IS NULL OR @inputTwo IS NULL
BEGIN
            RAISERROR('All inputs are required', 10, 1);
            RETURN;
END;

  1. TRY – CATCH block need to be added
[Not complete]

Monday, April 2, 2012

Reason a Stored Procedure is recompiled



Background:
I was investigating the fact that 'Table variables result in fewer re-compilations of a stored procedure as compared to temporary tables' which is mentioned in my another post 'Temp Table in SQL Server'. Than found two very good references (References-1 & 2) and felt like consolidating into fewer lines.

Query Plan related to Stored Procedure – Steps:
  1. The Stored Procedure is executed
  2. Check in cache to see it’s available
  3. If not available in cache,
    1. Load the Stored Procedure
    2. Compile the query plan
    3. Save compiled plan in cache
  4. Reuse by subsequent calls until the plan is invalidated and forced to recompile

Reason a Stored Procedure is recompiled:
  1. Use of a WITH RECOMPILE clause in the CREATE PROCEDURE or EXECUTE statement.
  2. Schema changes to any of the referenced objects, including adding or dropping constraints, defaults, or rules.
  3. Running sp_recompile for a table referenced by the procedure.
  4. Restoring the database containing the procedure or any of the objects the procedure references (if you are performing cross-database operations).
  5. Sufficient server activity causing the plan to be aged out of cache
  6. A sufficient percentage of data changes in a table that is referenced by the stored procedure
    1. Ways to counteract recompilation
                                                              i.      Execute the statement using sp_executesql - Statements executed using the sp_executesql stored procedure are not compiled as part of the stored procedure plan
                                                             ii.      Use sub-procedures to execute the statements that are causing the recompilations - the statement may still cause a recompilation, but instead of recompiling the large calling stored procedure, it will only recompile the small sub-procedure
                                                           iii.      Use the KEEP PLAN option
  1. The procedure interleaves Data Definition Language (DDL) and Data Manipulation Language (DML) operations.
  2. The procedure performs certain operations on temporary tables
    1. Ways to counteract recompilation
                                                              i.      All statements that contain the name of a temporary table refer to a temporary table created in the same stored procedure, and not in a calling or called stored procedure, or in a string executed using the EXECUTE statement or sp_executesql stored procedure
                                                             ii.      All statements that contain the name of a temporary table appear syntactically after the temporary table in the stored procedure or trigger.
                                                           iii.      There are no DECLARE CURSOR statements whose SELECT statements reference a temporary table
                                                          iv.      All statements that contain the name of any temporary table precede any DROP TABLE statement that references a temporary table



References:
  1. URL - http://support.microsoft.com/kb/243586
  2. URL -  http://support.microsoft.com/kb/305977

Sunday, April 1, 2012

Temp Table in SQL Server


Background:
Recently my team started using Microsoft SQL Server 2008 R2 as the RDBMS in one of the project and was lucky to be part of developing two critical functionalities which required extensive database development (writing SPs, functions, queries etc). Because of the nature of the problem, I had to use something like temp table a lot. While doing the actual implementation, I wanted to dig further to make sure the concepts, pros and cons are fully clear, also realized to document my learning for larger audience.
The things I have considered during writing this post are – (1) explanation has to be short, simple and easy to remember (2) structure of information has to be well segmented (3) as much example as possible. Hence created three major sections: concepts (facts, pros/cons, short explanation), best practices and examples (tested code)

CONCEPTS:
  1. Table variable:
    1. Syntax: DECLARE @tableName TABLE (id INT);
    2. Facts:
                                                              i.      Behave like local variables in their scoping rules and will be disposed as soon as it’s out of scope
                                                             ii.      Table variables are stored in ‘TempDB’
                                                           iii.      Available constraints that can be used with table variables
1.      CHECK
2.      PRIMARY KEY
3.      UNIQUE KEY
4.      NULL/NOT NULL
    1. Advantages
                                                              i.      Considered as ‘Local/Private’ to the process and has less locking to the resources
                                                             ii.      Transaction rollback doesn’t affect table variable as it exists for shorter span and not stored in persistent database
                                                           iii.      Less logging necessary as not stored in persistent database
                                                          iv.      Table variables result in fewer recompilations of a stored procedure as compared to temporary tables (reference-1)

    1. Disadvantages
                                                              i.      Local tables often disposed of before can be investigated
                                                             ii.      Cannot be used in dynamic SQL. In EXECUTE and sp_ExecuteSQL (example-1)
                                                           iii.      Cannot be used in ‘SELECT INTO’ to create them dynamically
                                                          iv.      User defined functions cannot be called from CHECK constraints, DEFAULT values
                                                            v.      Cannot be changed after declaration
                                                          vi.      Cannot be truncated
                                                         vii.      Cannot be dropped. Have to let go out of scope
                                                       viii.      Must be referenced by an alias if needed to join

  1. Table valued Parameter:
    1. Facts:
                                                              i.      A way to send multiple rows of data to a Stored Procedure or function without creating a temporary table or passing so many parameters
                                                             ii.      ‘Table variable’ or ‘table type’ can be passed as ‘Table Valued Parameter’
                                                           iii.      When table variables are passed as parameters below things happen behind the scene
1.      In the TempDB, table variable is materialized
2.      Reference is passed
    1. Advantages:
                                                              i.      Can be used to send client table to server
                                                             ii.      Can be used to pass as Table Valued Parameter to SP or function
    1. Disadvantages:
                                                              i.      Table Valued Parameter is passed as Read-Only. No INSERT, UPDATE or DELETE operations are allowed
                                                             ii.      Cannot be used in ‘SELECT INTO’ or ‘INSERT EXEC’ because of read only feature

  1. Local Temporary Table:
    1. Syntax: Table name starts with #
    2. Facts:
                                                              i.      Local Temporary tables are private to the process that created it
                                                             ii.      If the same stored procedure is executed simultaneously by several processes, the Database Engine distinguishes the same tables created by the different processes
                                                           iii.      Local temp tables are stored in ‘TempDB’
                                                          iv.      Local temporary tables are dropped automatically at the end of the current session or procedure (Best Practices-4)
                                                            v.      A session specific identifier is added to the name, to make sure that temp tables from other sessions don’t use the same name [same for Global Temp Table]
                                                          vi.      To allow for the suffix, the table name specified for a local temporary name must be less than 116 characters [same for Global Temp Table]
                                                         vii.      Within a stored procedure, a temporary table cannot be created, dropped, and then recreated a new temporary table with the same name (example-4 & 5) [same for Global Temp Table]
    1. Advantages:
                                                              i.      Can be referred within batches or in dynamic SQL (example-2 & 6)
                                                             ii.      Can be referred by any nested stored procedures executed by the stored procedure that created the table (example-3)
                                                           iii.      Reduction of required locks as only user has access to it
                                                          iv.      Less logging necessary
    1. Disadvantages:
                                                              i.      Cannot be used in views and cannot associate triggers with them
                                                             ii.      Table name limited by 116 characters

  1. Global Temporary table:
    1. Syntax: table name starts with ##
    2. Facts:
                                                              i.      Global temp tables work almost similar to local temp tables
                                                             ii.      global tables aren’t private to the process that created it
                                                           iii.      Global temp tables are visible to all sessions, until the session created it goes out of scope and referring sessions to the table go out of scope
    1. Advantages:
                                                              i.      Reduction of required locks as only user has access to it
                                                             ii.      Less logging necessary
    1. Disadvantages:
                                                              i.      Usage is very limited
                                                             ii.      Hard to manage data/flow consistency

Notes:
1.      Table variables are completely isolated to the batch that creates them so no 're-resolution' has to occur when a CREATE or ALTER statement takes place, which may occur with a temporary table. Temporary tables need this 're-resolution' so the table can be referenced from a nested stored procedure. Table variables avoid this completely so stored procedures can use plan that is already compiled, thus saving resources to process the stored procedure (References-2)

References:

  1. SP recompilation- I have found this link a superb one which explains in detail when a SP is recompiled (cover temp table related recompilation as well)
    URL - http://support.microsoft.com/kb/243586
  2. http://support.microsoft.com/kb/30597 
  3. http://www.sqlservercentral.com/articles/Temporary+Tables/66720/


BEST PRACTICES:
  1. Need to put extra caution when using Local Temp Table in the called SP which is defined in calling SP because called SP might throw exception when called from any other places
  2. Instead of Global Temp table, try to use permanent table if possible
  3. Always use alias for tables (permanent/temp)
  4. Don’t need to drop temp tables explicitly at the end of the Stored Procedure/block as SQL Server engine checks and drops any temp tables declared in the scope when the SP execution is ended

CODING:

Example-1: will not compile. Table variable is called in dynamic SQL

-- DROP PROCEDURE SPFirst
CREATE PROCEDURE SPFirst
AS
DECLARE @tableValue TABLE (id INT);
EXECUTE('SELECT * FROM @tableValue');
GO

Example-2: Will work where temp table is used in dynamic SQL

CREATE PROCEDURE SPFirst
AS
CREATE TABLE #tempTable (id INT);
EXECUTE('SELECT * FROM # tempTable’);
GO

Example-3: First SP calls second SP which uses the local temp table created in first. (To test, compile both SPs and execute last line)

-- DROP PROCEDURE SPFirst
CREATE PROCEDURE SPFirst
AS
          CREATE TABLE #tempTable (id INT);
          INSERT INTO #tempTable VALUES(1);
          INSERT INTO #tempTable VALUES(2);

          EXECUTE SPSecond;
GO

-- DROP PROCEDURE SPSecond
CREATE PROCEDURE SPSecond
AS
          SELECT * FROM #tempTable;
GO
EXECUTE SPFirst;


Example-4: Works fine. Global temp table created and dropped from same SP

-- DROP PROCEDURE SPFirst
CREATE PROCEDURE SPFirst
AS
          CREATE TABLE ##tempTable (id INT);
          INSERT INTO ##tempTable VALUES(1);
          INSERT INTO ##tempTable VALUES(2);
         
          DROP TABLE ##tempTable;
GO

Example-5: Will not compile. Temp table (global/local) is created>dropped>recreated with same name. The reason is Temp Table works as local variable in a scope where multiple variable declaration with same name will throw compile time error

CREATE PROCEDURE SPFirst
AS
          CREATE TABLE ##tempTable (id INT);
          INSERT INTO ##tempTable VALUES(1);
          INSERT INTO ##tempTable VALUES(2);
         
          DROP TABLE ##tempTable;
         
          CREATE TABLE ##tempTable (id INT);
GO


Example-6: Will compile but will fail during execution because temp table is defined in child session which is not available in parent session

-- DROP PROCEDURE SPFirst
CREATE PROCEDURE SPFirst
AS
EXECUTE('CREATE TABLE #tempTable (id INT);INSERT INTO #tempTable VALUES(1);INSERT INTO #tempTable VALUES(2);');

SELECT * FROM #tempTable;
GO

EXECUTE SPFirst;





Summary (Reference-03)

FeatureTable VariablesTemporary Tables
ScopeCurrent batchCurrent session, nested stored procedures. Global: all sessions.
UsageUDFs, Stored Procedures, Triggers, Batches.Stored Procedures, Triggers, Batches.
CreationDECLARE statement only.
CREATE TABLE statement.
SELECT INTO statement.
Table nameMaximum 128 characters.Maximum 116 characters.
Column data types
Can use user-defined data types.
Can use XML collections.
User-defined data types and XML collections must be in tempdb to use.
CollationString columns inherit collation from current database.String columns inherit collation from tempdb database.
IndexesCan only have indexes that are automatically created with PRIMARY KEY & UNIQUE constraints as part of the DECLARE statement.Indexes can be added after the table has been created.
ConstraintsPRIMARY KEY, UNIQUE, NULL, CHECK, but they must be incorporated with the creation of the table in the DECLARE statement. FOREIGN KEY not allowed.PRIMARY KEY, UNIQUE, NULL, CHECK. Can be part of the CREATE TABLE statement, or can be added after the table has been created. FOREIGN KEY not allowed.
Post-creation DDL (indexes, columns)Statements are not allowed.Statements are allowed.
Data insertionINSERT statement (SQL 2000: cannot use INSERT/EXEC).
INSERT statement, including INSERT/EXEC.
SELECT INTO statement.
Insert explicit values into identity columns (SET IDENTITY_INSERT).The SET IDENTITY_INSERT statement is not supported.The SET IDENTITY_INSERT statement is supported.
Truncate tableNot allowed.Allowed.
DestructionAutomatically at the end of the batch.Explicitly with DROP TABLE statement. Automatically when session ends. (Global: also when other sessions have no statements using table.)
TransactionsLast only for length of update against the table variable. Uses less than temporary tables.Last for the length of the transaction. Uses more than table variables.
Stored procedure recompilationsNot applicable.Creating temp table and data inserts cause procedure recompilations.
RollbacksNot affected (Data not rolled back).Affected (Data is rolled back).
StatisticsOptimizer cannot create any statistics on columns, so it treats table variable has having 1 record when creating execution plans.Optimizer can create statistics on columns. Uses actual row count for generation execution plan.
Pass to stored proceduresSQL 2008 only, with predefined user-defined table type.Not allowed to pass, but they are still in scope to nested procedures.
Explicitly named objects (indexes, constraints).Not allowed.Allowed, but be aware of multi-user issues.
Dynamic SQLMust declare table variable inside the dynamic SQL.Can use temporary tables created prior to calling the dynamic sql.