selecting data types in sql server 2012

Upload: abuihsan

Post on 04-Mar-2016

8 views

Category:

Documents


0 download

DESCRIPTION

Discuss the following tradeoffs in selecting the column data types that might reflect performance or security considerations:Integer versus GUID primary keys. It is often helpful to use a surrogate key as a primary key in your base tables. When working with surrogate keys, you commonly have two choices, either define it as an integer, such as smallint, int, or bigint, with an identity property or use a GUID data type (uniqueidentifier) with a NEWID default constraint.Fixed versus variable length columns. If all values in a column are of the same or similar length, you should use CHAR or NCHAR data type. However, if column data entries vary considerably in length, you should choose the VARCHAR or NVARCHAR data type. VARCHAR (MAX), NVARCHAR (MAX), and VARBINARY (MAX) data types. Using the extensions of VARCHAR, NVARCHAR, and VARBINARY data types, you can create columns that can contain more than 8,000 bytes of data. The new extension eliminates the need for text and image data types. Character column collations and Unicode and non-Unicode data types. Character column collations determine how data is sorted and compared. These are rules based on norms of languages, locales, case-sensitivity, accent marks, and kana or Japanese character rules. You can also use a more general collation such as the LATIN1_GENERAL collation.Transact-SQL user-defined data types. By using Transact-SQL user-defined data type (UDDT), you can obtain an extra level of abstraction by declaring the type in one place and using it in several columns.Question: What are the advantages and disadvantages of using GUID data types for key values?Answer: A GUID is unique across computers and databases. If a ‘key value’ needs to exists in multiple storage containers, a GUID is the better option. However, inappropriate use of a GUID may cause unnecessary page I/O, decreasing query performance.

TRANSCRIPT

Slide 1

Module 3Designing a Physical Database Model1Presentation: 75 minutesLab: 60 minutes

After completing this module, students will be able to: Design column data types Design database tables Design data integrity

Required materialsTo teach this module, you need the Microsoft Office PowerPoint file 50401B-ENU_Powerpnt_03.ppt.

Important: It is recommended that you use PowerPoint 2002 or a later version to display the slides for this course. If you use PowerPoint Viewer or an earlier version of PowerPoint, all the features of the slides might not be displayed correctly.

Preparation tasksTo prepare for this module:Read all of the materials for this module.Practice performing the demonstrations and the lab exercises.Work through the Module Review and Takeaways section, and determine how you will use this section to reinforce student learning and promote knowledge transfer to on-the-job performance.

Make sure that students are aware that there are additional information and resources for the module on the Course Companion CD.

Module 3: Designing a Physical Database ModelCourse 50401BModule Overview Selecting Data Types Designing Database Tables Designing Data Integrity2Module 3: Designing a Physical Database ModelCourse 50401BLesson 1: Selecting Data TypesConsiderations for Selecting Standard Column Data TypesConsiderations for Selecting New SQL Server 2008 Data TypesConsiderations for Using CLR User-Defined Data TypesConsiderations for Using Spatial Data TypesGuidelines for Using the XML Data Type Establishing Naming Standards for Database ObjectsDiscussion: Working with Data Tables3Module 3: Designing a Physical Database ModelCourse 50401BConsiderations for Selecting Standard Column Data TypesInteger versus GUID primary keysFixed versus variable length columnsVARCHAR (MAX), NVARCHAR (MAX), and VARBINARY (MAX) data typesCharacter column collations and Unicode and non-Unicode data typesTransact-SQL user-defined data types44Discuss the following tradeoffs in selecting the column data types that might reflect performance or security considerations:Integer versus GUID primary keys. It is often helpful to use a surrogate key as a primary key in your base tables. When working with surrogate keys, you commonly have two choices, either define it as an integer, such as smallint, int, or bigint, with an identity property or use a GUID data type (uniqueidentifier) with a NEWID default constraint.Fixed versus variable length columns. If all values in a column are of the same or similar length, you should use CHAR or NCHAR data type. However, if column data entries vary considerably in length, you should choose the VARCHAR or NVARCHAR data type. VARCHAR (MAX), NVARCHAR (MAX), and VARBINARY (MAX) data types. Using the extensions of VARCHAR, NVARCHAR, and VARBINARY data types, you can create columns that can contain more than 8,000 bytes of data. The new extension eliminates the need for text and image data types. Character column collations and Unicode and non-Unicode data types. Character column collations determine how data is sorted and compared. These are rules based on norms of languages, locales, case-sensitivity, accent marks, and kana or Japanese character rules. You can also use a more general collation such as the LATIN1_GENERAL collation.Transact-SQL user-defined data types. By using Transact-SQL user-defined data type (UDDT), you can obtain an extra level of abstraction by declaring the type in one place and using it in several columns.

Question: What are the advantages and disadvantages of using GUID data types for key values?Answer: A GUID is unique across computers and databases. If a key value needs to exists in multiple storage containers, a GUID is the better option. However, inappropriate use of a GUID may cause unnecessary page I/O, decreasing query performance.

Module 3: Designing a Physical Database ModelCourse 50401BConsiderations for Selecting New SQL Server 2008Data TypesSame date precision as dateSame time precision as time DateTimeOffset is time zone awareDatetime2 / DatetimeoffsetWhen a table contains a FILESTREAM columnWhen objects are larger than 1 MBWhere fast read access is importantWhere you use a middle tierFILESTREAMDate: From 01/01/0001 to 12/31/9999Date: No time componentTime: Based on 24 hour clock, NO date componentTime: Accurate to 100 nsDate / TimeTo create tables with a hierarchical structureTo query and perform work with hierarchical data by using T-SQLHierarchyid5Discuss in brief the following new SQL Server 2008 data types.

Transact-SQL Date and Time Data TypesDate Format: YYYY-MM-DDRange: 0001-01-01 through 9999-12-31Storage size (bytes): 3

TimeFormat: hh:mm:ss[.nnnnnnn]Range: 00:00:00.0000000 through 23:59:59.9999999Accuracy: 100 nanosecondsStorage size (bytes): 3 to 5 User-defined fractional second precision: YesTime zone aware: No

Datetime2 Format: YYYY-MM-DD hh:mm:ss[.nnnnnnn] Range: 0001-01-01 00:00:00.0000000 through 9999-12-31 23:59:59.9999999Accuracy: 100 nanosecondsStorage size (bytes): 6 to 8User-defined fractional second precision: YesTime zone aware: No

Datetimeoffset Format: YYYY-MM-DD hh:mm:ss[.nnnnnnn] [+|-]hh:mmRange: 0001-01-01 00:00:00.0000000 through 9999-12-31 23:59:59.9999999 (in UTC)Accuracy: 100 nanosecondsStorage size (bytes): 8 to 10User-defined fractional second precision: YesTime zone aware: YesModule 3: Designing a Physical Database ModelCourse 50401BNotes Page Over-flow Slide. Do Not Print Slide. See Notes pane.6HierarchyidUse hierarchyid as a data type to create tables with a hierarchical structure or to reference the hierarchical structure of data in another location. You can use hierarchyid functions to query and perform work with hierarchical data by using T-SQL.By using the new hierarchyid datatype, it is possible to capture parent-child relationships without relying on foreign keys to a parent table. Hierarchyid indicates node position in a tree either by width or depth. Hierarchyid encodes the path from the root of the tree to the node. At the cost of reduced performance with data maintenance (inserts, updates, deletes), using hierarchyid typically improves recursion with tree data. A value of the hierarchyid data type represents a position in a tree hierarchy but it requires the application to manage the value, there is nothing inherently automatic about representing position in the tree.

FILESTREAMFILESTREAM integrates the SQL Server Database Engine with an NTFS file system by storing varbinary(max) binary large object (BLOB) data as files on the file system. You can use Transact-SQL statements to insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.In FILESTREAM, the NT system cache is used for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used. Therefore, this memory is available for query processing.In SQL Server, BLOBs can be standard varbinary(max) data that stores the data in tables, or FILESTREAM varbinary(max) objects that store the data in the file system. The size and use of the data determines whether you should use database storage or file system storage.

Consider the following for using FILESTREAM storage: Objects that are being stored are, on average, larger than 1 MB. Fast read access is important. Applications that use a middle tier for application logic are being developed. Tables containing a FILESTREAM column must have a non-NULL unique row ID for each row. FILESTREAM data containers cannot be nested. FILESTREAM filegroups need to be stored on shared disk resources for using failover clustering. FILESTREAM filegroups can be on compressed volumes.

Question: What is the advantage of using datetimeoffset?Answer: The time zone offset from GMT is stored as metadata of the datetime value.

Module 3: Designing a Physical Database ModelCourse 50401BConsiderations for Using CLR User-Defined Data TypesUse common language runtime (CLR) user-defined data types for nonstandard or proprietary data typesAvoid excessively complex data typesConsider the risks of tightly coupling a CLR user-defined data types and the databaseConsider the overhead of row-by-row processing

7Discuss how common language runtime (CLR) user-defined data type as a new data type can help developers program their own data types in SQL Server 2008.

Discuss the following considerations for using the CLR user-defined data type data type:Use CLR user-defined data types for nonstandard or proprietary data types. CLR user-defined data types are useful for solving problems in which you need to use data types that are specific to your applications and different from built-in data types. Most data captured by business applications can be stored in built-in SQL Server data types. Avoid creating excessively complex data types. When building CLR user-defined data types, remember to avoid creating complex data types that can hinder server performance. For example, you might not want to use CLR user-defined data types to store a telephone array type because there are better standard solutions that address this requirement.Consider the overhead of row-by-row processing. When writing a class that supports a CLR user-defined data type, consider that the code defining the data type will be executed frequently, one row at time. Therefore, it is critical that the CLR user-defined data type code be fully optimized. However, even fully optimized code cannot perform as well as the built-in SQL Server data types, so it is important to use CLR user-defined data types only when the cost in performance is small or at least acceptable.Consider the risks of tightly coupling a CLR user-defined data type and the database. CLR user-defined data types are tightly coupled to the database, meaning that the code and the database are intrinsically joined and cannot be separated easily. Consider this factor when deciding to implement CLR user-defined data types because using them might negatively affect reusability and maintainability.Module 3: Designing a Physical Database ModelCourse 50401BConsiderations for Using Spatial Data Types Spatial Data:Represents information on the location and shape of geometric objectsCan be of two types including geometry and geography data types Implemented as .NET common language runtime data types in SQL ServerSupports eleven spatial data objects, or instance typesSpatial DataGeometry Data Type

Geography Data Type8Spatial data represents information about the physical location and shape of geometric objects. These objects can be point locations or more complex objects such as countries, roads, or lakes.

Types of Spatial DataThere are two types of spatial data: Geometry data type Geography data typeBoth data types are implemented as .NET common language runtime (CLR) data types in SQL Server. The geometry data type supports planar, or Euclidean (flat-earth), data. The geometry data type conforms to the Open Geospatial Consortium (OGC) Simple Features for SQL Specification version 1.1.0. In addition, SQL Server supports the geography data type, which stores ellipsoidal (round-earth) data, such as GPS latitude and longitude coordinates.The geometry and geography data types support eleven spatial data objects or instance types. However, only seven of these instance types are instantiable. You can create and work with these instances in a database. These instances derive certain properties from the data types that distinguish them as Points, LineStrings, Polygons, or multiple geometry or geography instances in a GeometryCollection. The figure depicts the geometry hierarchy upon which the geometry and geography data types are based. The instantiable types of geometry and geography are indicated in blue.

Data Storage in Spatial DataThe two types of spatial data often behave quite similarly, but there are some key differences in how the data is stored and manipulated. In the planar, or flat-earth, system, measurements of distances and areas are given in the same unit of measurement as coordinates. Using the geometry data type, the distance between (2, 2) and (5, 6) is 5 units, regardless of the units used. In the ellipsoidal, or round-earth system, coordinates are given in degrees of latitude and longitude. However, lengths and areas are usually measured in meters and square meters, though the measurement may depend on the spatial reference identifier (SRID) of the geography instance. The most common unit of measurement for the geography data type is meters.In the planar system, the ring orientation of a polygon is not important. For example, a polygon described by ((0, 0), (10, 0), (0, 20), (0, 0)) is the same as a polygon described by ((0, 0), (0, 20), (10, 0), (0, 0)). The OGC Simple Features for SQL Specification does not dictate a ring ordering, and SQL Server does not enforce ring ordering.In an ellipsoidal system, a polygon has no meaning, or is ambiguous, without an orientation. For example, does a ring around the equator describe the northern or southern hemisphere? If we use the geography data type to store the spatial instance, we must specify the orientation of the ring and accurately describe the location of the instance.SQL Server 2008 places the following restrictions on using the geography data type:Each geography instance must fit inside a single hemisphere. No spatial objects larger than a hemisphere can be stored.Any geography instance from an Open Geospatial Consortium (OGC) Well-Known Text (WKT) or Well-Known Binary (WKB) representation that produces an object larger than a hemisphere throws an ArgumentException. Any geography data type method that requires the input of two geography instances, such as STIntersection(), STUnion(), STDifference(), and STSymDifference(), will return NULL if the results from the methods do not fit inside a single hemisphere. STBuffer() will also return NULL if the output exceeds a single hemisphere.

Question: What is the orientation limitation of spatial data?Answer: Spatial data must exist in a single hemisphere.Module 3: Designing a Physical Database ModelCourse 50401BGuidelines for Using the XML Data TypeUse the XML data type for data that is not frequently updatedUse typed XML columnsUse the XML data type for data that is not relationally structuredUse the XML data type for configuration informationUse the XML data type for data with recursive structures

< xml >9Discuss how the built-in XML data type offers database designers the ability to store unstructured data that was not natively supported in the previous versions of SQL Server.

Discuss the following considerations for using the XML data type:Use the XML data type for data that is not frequently updated. The XML model does not support redundancy reduction, and it does not handle inserts, updates, and deletes as effectively as the relational model. Therefore, if data is updated frequently, you should work with a relational model that can provide support to many insert, update, and delete operations. Use the XML data type for data that is not relationally structured. When you have structured data, it is better to use the relational model and store the data in relational tables. When data is not relationally structured, consider using the XML data type.Use the XML data type for configuration information. Configuration information is semistructured data that is often stored in .ini files or in the Microsoft Windows registry. You can easily store configuration information in an XML data type and ensure that the data is well formed, validate the schema, and use XQuery and XPath to reference specific values stored inside the attributes.Use the XML data type for data with recursive structures. Sometimes, you may need to store data that requires a varying recursive structure. For this, you may use recursive queries, which are difficult to manage in the relational model. But, XML can easily handle these recursive structures because it is hierarchical.

Also, explain how to use typed XML columns for storing schema information. Explain that SQL Server 2008 allows designers to use XML Schema definition language (XSD) documents to create typed XML columns, parameters, and variables. A typed XML value is an XML element value that has been assigned a data type in an XML schema. To create typed XML columns, you use the XmlSchemaCollection object to include one or more XSD documents that will validate the column. Use typed XML columns when you have the schema information of the column. The server validates typed XML columns to support data integrity in the XML documents. Typed XML columns also allow the server to optimize storage and queries based on this information. Do not use typed XML columns when the schema is not available, or when the application validates the XML documents because the additional overhead of the typed XML column will be unnecessary.

Discuss the implications of the overhead involved in querying for elements while working with XML data type. When using the XML data type, take into consideration that you can incur significant overhead if the server has to extract values from elements embedded in an XML column. Do not expect the same level of performance obtained from querying attributes that are directly stored in a relational tables columns.Module 3: Designing a Physical Database ModelCourse 50401BEstablishing Naming Standards for Database ObjectsUse only standard abbreviations in namesUse names that comply with the rules for forming SQL Server 2008 identifiersUse descriptive termsUse models for naming standardsBe consistent across all objectsRecord and communicate naming standardsUse prefixes ONLY when it provides valueUse policy-based managementName intersection tables consistently

10Explain how developers can clearly communicate the contents of a table or the purpose of a stored procedure with the help of the naming convention for database objects. Accentuate that by adhering to a naming standard both the development team and the operations team can minimize errors caused by misleading object names.Discuss the following guidelines for establishing database object naming standards:Use names that comply with the rules for forming SQL Server 2008 identifiers. Microsoft SQL Server 2008 refers to database names as identifiers, such as delimited and regular.Use descriptive terms. Your database names should be brief and intuitive. With this simple and descriptive naming style, the identification of the contents of a table or other database objects becomes easier.Use only standard abbreviations in names. When naming objects, use well-known abbreviations consistently and avoid using nonstandard abbreviations. Use models for naming standards. ISO 11179-5 and Integration Definition for Information Modeling (IDEF1X) are some of the models that you can use for naming standards.Name intersection tables consistently. When establishing a naming standard, you must specify how to name intersection tables. Be consistent across all objects. It is critical that after defining the naming standard, you use it consistently across all objects. Record and communicate naming standards. The best way to enforce a selected set of naming standards is to document them carefully and then communicate them to all team members and stakeholders. Use prefixes ONLY when it provides value. You can use a prefix to clearly describe the object that you are naming, such as a table, a view, or a stored procedure.Use policy-based management. You can use policy-based management to enforce naming conventions.

Question: Why should you establish naming conventions?Answer: Naming conventions help to reduce code duplication.Module 3: Designing a Physical Database ModelCourse 50401BDiscussion: Working with Data Tables Scenario:Some developers argue to use a simple table structure containing only four columnsa unique row identifier, a field identifier (analogous to a column name), a datatype indicator, and the data itself stored as a sqlvariant.Some say that the application could reconstitute all data fields, and pivot them into a virtual table.The argument often goes further to whether developers should be involved in working with the database schema, since they can create new data properties (fields) as needed.Question:What is the fallacy in these arguments?

11Discuss the question displayed on the slide about identifying the correct method for working with data tables.

Suggested answer for the conversation:It would be difficult to use primary keys or unique indexes to maintain uniqueness. Besides, it would be extremely difficult to engage the database engine in managing primary key-foreign key relationships. Column and table constraints would be useless.The application will have to completely manage the data stored. The database engine could not be employed as a fail safe system to ensure that a field contains only the expected type of data. Any mistake in the application code could expose the data to significant errors. Attempts to employ the database engine to assist with data type constraints would be unnecessarily complex and operationally expensive.Constantly reconstituting the data into virtual tables will have increasingly large performance impacts. Searching the data will be unnecessarily burdensome.Using specific datatypes allow the database engine to ensure that the value stored is correct for the field. Datatypes offer inexpensive low level constraints that help to make sure that the data stored is the data expected. Since a sqlvariant can store any type of data, there would be no easy way to employ type checking. Most programmers understand the need for strong types in their programs, so you need to help them understand the need for strong types in the data.Module 3: Designing a Physical Database ModelCourse 50401BGuidelines for Determining Table WidthWhat Are Sparse Columns?Demonstration: How To Create a Table By Using Sparse ColumnsGuidelines for Using Computed ColumnsDiscussion: Using Computed ColumnsLesson 2: Designing Database Tables12Module 3: Designing a Physical Database ModelCourse 50401BGuidelines for Determining Table Width

Large Object Types

Data Overflow

Capacity Planning

13Discuss the following guidelines for determining table width:The impact of using Large Object (LOB) data type as row data. You can set a text in row option for tables that contain LOB data type columns. You can also specify a text in row option limit, from 24 through 7,000 bytes. The database engine needs 72 bytes of space in the row to store five pointers for an in-row string. If there is no sufficient space in the row to hold the pointers when the text in row option is ON or the large value types out of row option is OFF, the database engine may have to allocate an 8-K page to hold them. If the data length of the value exceeds 40,200 bytes, more than five in-row pointers are required. At this point, only 24 bytes are stored in the main row and an additional data page is allocated on the LOB storage space.Considerations for data overflow. Exceeding the 8,060-byte, row-size limit can affect performance because SQL Server maintains a limit of 8 KB per page. When you design a table with multiple varchar, nvarchar, varbinary, sql_variant, or CLR user-defined type columns, consider the percentage of rows that can flow over and the frequency with which the overflow data will be queried. The sum of other data type columns, including char and nchar data, must fall within the 8,060-byte row limit. Large object data is exempt from the 8,060-byte row limit. Considerations for capacity planning. You can include columns that contain row-overflow data as key or non-key columns of a nonclustered index. The record-size limit for tables that use sparse columns is 8,018 bytes. When the converted data and the existing record data exceeds 8,018 bytes, an MSSQLSERVER ERROR 576 occurs.Module 3: Designing a Physical Database ModelCourse 50401BProperties of Sparse ColumnsWhat Are Sparse Columns?The SQL Server Database Engine uses the SPARSE keyword in a column definition to optimize the storage of values in that column.Catalog views for a table that has sparse columns are the same as for a typical table.The COLUMNS_UPDATED function returns a varbinary value to indicate all the columns that were updated during a DML action.Sparse columns are ordinary columns that have an optimized storage for NULL values. They can be used with column sets and filtered index.

Data types that you cannot specify as sparse are geography, geometry, image, ntext, text, timestamp, and user-defined data types.14Using the slide, explain sparse columns.

Then, tell the students that sparse columns can be used with column sets and filtered indexes, as follows:Column sets. INSERT, UPDATE, and DELETE statements can reference sparse columns by name. However, you can also view and work with all the sparse columns of a table that are combined into a single XML column. This column is called a column set. Filtered indexes. As sparse columns have many NULL-valued rows, they are especially appropriate for filtered indexes. A filtered index on a sparse column can index only the rows that have populated values. This creates a smaller and more efficient index.

Then, using the slide, explain the properties of the sparse columns.

Finally, using the slide, mention the date types that cannot be specified as sparse.Module 3: Designing a Physical Database ModelCourse 50401BDemonstration: How To Create a Table By UsingSparse ColumnsIn this demonstration, you will see how to: Create a table by using sparse columns

Demonstrate the following steps to create a table by using sparse columns:Open the Microsoft SQL Server Management Studio window and connect to NYC-SQl1 server.Using the Computer window, open the Mod03_Demo1.sql file from the Allfiles (D:)\Demofiles location. Using the Query Editor pane, run a query to use sparse columns.

15Module 3: Designing a Physical Database ModelCourse 50401BAvoid the overhead of complex functions in computed columns Avoid persisted computed columns on active dataProtect against numeric overflow and divide by zero errorsUse persisted computed columns for performanceGuidelines for Using Computed ColumnsUse computed columns to derive results from other columnsUsage of Computed Columns

Exceptions in Column Data

Usage of Persisted Computed Columns

Usage of Persisted Computed Columns

Usage of Computed Columns

Exceptions in Column Data

Computed Columns

16Discuss the following guidelines for using computed columns:Use computed columns to derive results from other columns. You can use computed columns that derive their results from other columns to optimize the available space for data. Use persisted computed columns for performance. You can optimize the performance of a computed column by making it persistent if the column values are rarely updated. Consider a few exceptions in column data. Consider the following exceptions in computed data:Avoid the overhead of complex functions in computed columns. You should avoid using complex functions in computed columns that require heavy processor utilization.Avoid persisted computed columns on active data. You can minimize the overhead to updates on frequently updated tables by avoiding the use of persisted computed columns on active data. Protect against numeric overflow and divide by zero errors. You must ensure that all operations and functions are arithmetically correct when creating computed columns to prevent numeric overflows and divide by zero errors.

Question: When should you not use a persisted computed column?Answer: A persisted computed column should not be used with data that will require frequent recalculation due to changes in the underlying data.

Module 3: Designing a Physical Database ModelCourse 50401BDiscussion: Using Computed ColumnsWhat kind of problems do computed columns solve?When is a computed column actually computed?17Discuss the questions displayed on the slide about using computed columns.

Suggested answers for the conversation:Question: What kind of problems do computed columns solve?Answer: Computed columns:Are often used to calculate data from the same row. For example, you can use computed column to calculate the total amount by using the quantity and the unit price stored on a product table.Calculates extensions, e.g., quantity * price.Deconstructs data into a readily useable format, e.g., creates a month column from a date field.

Question: When is a computed column actually computed?Answer: A computed column is calculated upon need, such as when the data is returned as a result of a query. If the results are persisted by indexing, the computed column is calculated in order to be indexed.

Module 3: Designing a Physical Database ModelCourse 50401BLesson 3: Designing Data IntegrityGuidelines for Designing Column ConstraintsGuidelines for Designing Table ConstraintsGuidelines When Implementing DDL TriggersDiscussion: Identifying the Best Options for Column Data Types and Data Constraints18Module 3: Designing a Physical Database ModelCourse 50401BGuidelines for Designing Column ConstraintsUse column CHECK Constraints

CONSTRAINT chkQty CHECK (Amount > 0)...)

Declare columns as NOT NULL

(HireDate int NOT NULL...)Use ANSI default constraints

(CONSTRAINT Qty DEFAULT 0...)Use CHECK constraints instead of bound rules

Check Constraint

Bound Rules

19Discuss the following guidelines for designing column constraints:Declare columns as NOT NULL. NULL columns add complexity to a database. If the database design requires columns with unknown values, you can use a specific value to indicate an unknown value. Use ANSI default constraints. ANSI default constraints are the standard method for adding default values to a column. ANSI defaults are easier to declare and maintain than bound defaults. Use column CHECK constraints. CHECK constraints enforce domain integrity. Domain integrity enhances data quality by limiting the values that a column can contain. In SQL Server, CHECK constraints are the built-in method of enforcing domain integrity. Use CHECK constraints instead of Bound rules. Use CHECK constraints rather than bound rules. CHECK constraints are an ANSI standard method for placing explicit limits on the range of values stored in a column. Bound rules are a nonstandard alternative to CHECK constraints.

Module 3: Designing a Physical Database ModelCourse 50401BGuidelines for Designing Table Constraints Use DRI for data integrity in a databaseUse triggers to enforce referential integrity

Use ANSI-standard options Use table-level CHECK constraints

Check Constraint

ON DELETE

Specify cascading levels and options

20Discuss the following guidelines for designing table constraints: Use DRI for data integrity in a database. You can use Declarative Referential Integrity (DRI) to maintain data integrity within a database. You can enforce DRI by using foreign key constraints. Specify cascading levels and options. You can specify cascading levels and options to determine the actions to perform when a row referenced by another table is deleted or updated. Use ANSI-standard options. You can use ANSI-standard ON DELETE and ON UPDATE cascading options to enforce complex data types of DRI.Use triggers to enforce referential integrity across databases. You can use triggers to maintain data integrity between multiple databases. This is useful because you cannot establish DRI between databases in SQL Server 2008.Use table-level CHECK constraints. You can use table-level CHECK constraints to enforce domain integrity at the table level. When the valid values of a column depend on values from another column in the same table, you can use table-level CHECK constraint to enforce domain integrity.

Question: What is the difference between a column constraint and a table constraint?Answer: A column constraint is part of the definition of the column and works to constrain the values on a single column. A table constraint is part of the table definition and may affects one or more columns.

Module 3: Designing a Physical Database ModelCourse 50401BGuidelines for Designing Database Constraints byUsing DDL Triggers Use DDL triggers for auditing Use DDL triggers to prevent database changesUse DDL triggers to support security

21Discuss the following guidelines for designing database constraints by using DDL triggers:Use DDL triggers for auditing. You can use a data definition language (DDL) trigger to maintain an audit log. You can then use the audit log to track people who make changes in the database. Use DDL triggers to support security. You can use DDL triggers to prevent users from using the DROP TABLE statement. This helps prevent unauthorized changes to database schemas. Use DDL triggers to prevent database changes. Developers can review all DDL statements issued against the development database based on the auditing DDL trigger previously defined. Developers can also copy captured statements to create the script needed to deploy a new version of the database in the production environment.

Question: What is a DRI constraint?Answer: Database Referential Integrity (DRI) constraint helps to maintain the relationship between primary keys and foreign keys. When DRI is configured, the database engine ensures that all foreign keys have a primary key, and that any changes to the primary key are handled in a predetermined manner.

Question: Why would you use a trigger for a DRI constraint?Answer: You should use a trigger for a DRI constraint to manage DRI between separate databases as DRI works only in single database.

Module 3: Designing a Physical Database ModelCourse 50401BGuidelines When Implementing DDL TriggersUse DDL triggers with transactionsUse DDL triggers scope to control database operations or objects that activates the triggerAvoid creating DDL Trigger on both CREATE_SCHEMA and CREATE_TABLE eventUse value() instead of query() when querying data returned from EVENTDATA22Discuss the following guidelines for designing database constraints by using DDL triggers:Use DDL triggers with transactions. Like DML triggers, you can create more than one DDL trigger on the same Transact-SQL statement. Also, a DDL trigger and the statement that fires it are run within the same transaction. This transaction can be rolled back from within the trigger. Serious errors can cause a whole transaction to be automatically rolled back. DDL triggers that are run from a batch and explicitly include the ROLLBACK TRANSACTION statement will cancel the whole batch. Use DDL triggers scope to control database operations or objects that activates the trigger. DDL triggers can fire in response to a Transact-SQL event processed in the current database, or on the current server. The scope of the trigger depends on the event. For example, a DDL trigger created to fire in response to a CREATE_TABLE event can do so whenever a CREATE_TABLE event occurs in the database, or on the server instance. A DDL trigger created to fire in response to a CREATE_LOGIN event can do so only when a CREATE_LOGIN event occurs in the server.Avoid creating DDL Trigger on both CREATE_SCHEMA and CREATE_TABLE event. EVENTDATA captures the data of CREATE_SCHEMA events as well as the of the corresponding CREATE SCHEMA definition, if any exists. Additionally, EVENTDATA recognizes the definition as a separate event. Therefore, a DDL trigger created on both a CREATE_SCHEMA event, and an event represented by the of the CREATE SCHEMA definition, may return the same event data twice, such as the TSQLCommand data. Use value() instead of query() when querying data returned from EVENTDATA. To return event data, we recommend that you use the XQuery value() method instead of the query() method. The query() method returns XML and ampersand-escaped carriage return and line-feed (CRLF) instances in the output, while the value() method renders CRLF instances invisible in the output.

Question: What is a DRI constraint?Answer: Database Referential Integrity (DRI) constraint helps to maintain the relationship between primary keys and foreign keys. When DRI is configured, the database engine ensures that all foreign keys have a primary key, and that any changes to the primary key are handled in a predetermined manner.

Question: Why would you use a trigger for a DRI constraint?Answer: You should use a trigger for a DRI constraint to manage DRI between separate databases as DRI works only in single database.

Module 3: Designing a Physical Database ModelCourse 50401BDiscussion: Identifying the Best Options for Column Data Types and Data ConstraintsScenario:QuantamCorp is a local company conducting business only in USA and Canada. You need to create a single table to contain the following information.Time of the day for a 24 hour clockCompany nameE-mail addressPostal codeProduct information Product descriptive brochureTelephone number

Question:What are the best options for column data types and data constraints?

23Ask the students to read the scenarios given in the Student Handbook and then answer the questions given on the slide.

Question: What are the best options for column data types and data constraints?Answer: Suggested answers for the conversation:Time datatypeVarchar()Varchar(); Use a CLR RegEx function to constrain the email address to appropriate formChar(9) or Char(10) depending on whether you store the dash; Use a column constraint to ensure 5 or 9 numeric characters (plus dash if used.)XML datatype to store product specific characteristicsThe descriptive brochure should be stored as a FILESTREAM objectVarchar(10), or varchar(12) for ten digit phone numbers, depending on whether you store the dash. Use a column constraint to ensure 10 numeric characters (plus dash if used.)

Module 3: Designing a Physical Database ModelCourse 50401B Lab 3: Designing a Physical Database ModelExercise 1: Specifying Database Object Naming StandardsExercise 2: Converting a Logical Database Model into a Physical Database ModelEstimated time: 40 minutesLogon InformationUser namePasswordAdministratorPa$$w0rd24In this lab, students will the build a physical database model based on a logical model.

Exercise 1In this exercise, students will analyze the NamingStandardsTemplate document.

Exercise 2In this exercise, students will:Save the LogicalModel.vsd file as PhysicalModel.vsd.Update Visio to use SQL Server data type.Update the physical name for each entity.Modify column settings for entities.

Before the students begin the lab, read the scenario associated with each exercise to the class. This will reinforce the broad issue that the students are troubleshooting and will help to facilitate the lab discussion at the end of the module. Remind the students to complete the discussion questions after the last lab exercise.

Note: The lab exercise answer keys are provided on the Course Companion CD. To access the answer key, click the link located at the bottom of the relevant lab exercise page.

Module 3: Designing a Physical Database ModelCourse 50401BLab ScenarioThe main goals of the HR VASE project are as follows:Provide managers with current and historical information about employee vacation and sick-leave data. Provide permission to individual employees to view their vacation and sick-leave balances.Provide permission to selected employees in the HR department to view and update employee vacation and sick-leave data.Provide permission to the HR manager to view and update all the data.Standardize employee job titles.In this lab, you will build a physical database model based on the logical model created earlier. You are a lead database designer at QuantamCorp. You are working on the Human Resources Vacation and Sick Leave Enhancement (HR VASE) project that is designed to enhance the current HR system of your organization. This system is based on the QuantamCorp2008 sample database in SQL Server 2008.25Module 3: Designing a Physical Database ModelCourse 50401BLab ReviewCan you explain the purpose of creating naming standards and having a Naming Standards policy? What kind of issues may arise if a Naming Standards policy does not exist? 2626Use the questions on the slide to guide the debriefing after students have completed the lab exercises.

Review Questions:Question: Can you explain the purpose of creating naming standards and having a Naming Standards policy? Answer: Naming standards provide consistency and reduce code duplication. Having a policy will guide all parties in all aspects of the project when an object is created and has to be named.Question: What kind of issues may arise if a Naming Standards policy does not exist?Answer: Without a Naming Standards policy, different developers may inadvertently create code objects with virtually identical code. For example, they may create stored procedures with the following names:GetClientDataRetrieveClientDataSelectClientDataAll these stored procedures may contain a duplicate code. In a project with a large number of procedures, a developer thinking Get may not consider looking for a Select and therefore create a duplicate procedure.Module 3: Designing a Physical Database ModelCourse 50401BModule Review and TakeawaysReview QuestionsReal-world Issues and Scenarios

27Review Questions:

Question: Is it possible to have a Sales database containing a Sales table, a Sale user, as well as a Sales schema? Is it advisable? Why? Answer: Yes, it is possible to have a Sales database containing a Sales table, a Sale user, as well as a Sales schema. However, it is NOT advisable. It could be quite confusing to identify which object a statement is referring to.

Question: What are the considerations you would take into account when using GUID as a primary key of a table that is utilized by multiple other foreign key tables? Answer: You should consider the key length and index fragmentation when you use GUID as a primary key. A GUID requires 16 bytes and it can cause an increase in page I/O. It might impact performance negatively. By using GUIDs for key values you can increase index fragmentation.

Question: Can you store a spatial object that is bigger than a hemisphere?Answer: Yes, you can store spatial object of any size. However, using the tools at hand may be troublesome or inadequate for the task of interpreting data. SQL Server spatial functions are unable to decode an object greater than a hemisphere.

Question: What is the impact of using non-ANSI standard objects, such as bound rules and defaults? Answer: Non-ANSI standard objects, such as bound rules and defaults are on the list to be deprecated in a future version of SQL Server. So there could be a negative impact on any deployed utilization.

Question: When you change a column from sparse to non-sparse, how does SQL Server resolve it? Answer: SQL Server copies the row to the data page. Rows larger than 4009 bytes cause an error while the existing row in present. It is necessary to copy the data to a new table, and then remove the old table and rename the new table with the name of the old table.

Real-world Issues and Scenarios:

Question: Are there any standard conventions being followed in your company? What are some of the barriers to implementing standard conventions? Answer: Answers may vary. Some standard conventions could be as follows:Consider naming conventions Explore general application programming conventions Explore how the programming conventions may apply to SQL Server objectsSome barriers may include the following:Lack of standard policy addressing the conventionsLack of perceived valueLack of experienceModule 3: Designing a Physical Database ModelCourse 50401BNotes Page Over-flow Slide. Do Not Print Slide. See Notes pane.Question: Does your company use DRI or triggers to maintain Data Integrity? How can you check? Are there any tools available to help you do the same? Answer: Explore why and how triggers may be employed for DRI purposes perhaps multiple databases, difference vendors, lack of understanding how SQL Server handles native DRI or specialized needs, such as using INSTEAD OF triggers, multiple PK-FK relationships, cyclical relationships, and self JOINS.

Question: In what situation is a fixed-length column better than a variable-length column, or should all developers create columns with variable-length? Answer: For data that is subject to frequent changes with widely varying lengths, fixed length will eliminate any data movement solely related to the new values not fitting in the existing data location. For example, a field that contains a user name of the user last modifying the data should be fixed length.

Question: DRI is used to maintain integrity between the primary table and foreign table. There are four different settings for both the UPDATE as well as the DELETE. What is a good setting and why?Answer: NO ACTION | CASCADE | SET NULL | SET DEFAULTCASCADE for updates ensures that key anomalies do not occur. And using NO ACTION for deletes ensures that orphaned data does not occur.

Additional Reading Material:Data Compression: Strategy, Capacity Planning and Best Practices Filestream Storage in SQL Server 2008 Introduction to Spatial Coordinate Systems: Flat Maps for a Round Planet

28Module 3: Designing a Physical Database ModelCourse 50401B