interview topics on sql
TRANSCRIPT
-
8/9/2019 Interview Topics on SQL
1/22
2009
Vinay Kotha
CSC
11/5/2009
Interview Topics for SQL & MSBI
-
8/9/2019 Interview Topics on SQL
2/22
Author: Vinay Kotha Page 2
Table of Contents
Recovery Models: .................................................................................................................................... 4
SimpleRecovery Model: ...................................................................................................................... 4
Fullrecovery Model: ............................................................................................................................ 5
Bulk-Logged: ........................................................................................................................................ 5
Back-ups .................................................................................................................................................. 6
Back-up Scopes: ................................................................................................................................... 6
A) Database backups: ................................................................................................................... 6
B) Partial Back-ups:....................................................................................................................... 6
C) File Back-ups: ........................................................................................................................... 6
Back-Up Types ......................................................................................................................................... 6
A) Full Backups: ................................................................................................................................ 6
B) Differential backups: .................................................................................................................... 6
SQL SERVERREPLICATION........................................................................................................................ 7
A) Load Balancing: ............................................................................................................................ 7
B) Offline Processing: ....................................................................................................................... 7
C) Redundancy: ................................................................................................................................ 7
A) Publishers .................................................................................................................................... 7
B) Subscribers .................................................................................................................................. 7
A) Snapshot Replication:................................................................................................................... 7
B) TransactionalReplication: ............................................................................................................ 7
C) MergeReplication: ....................................................................................................................... 8
A) Expressedition .............................. ...................... ................................ ...................... ................... 8
B) Workgroup edition ....................................................................................................................... 8
C) Standard edition .......................................................................................................................... 8
D) Enterpriseedition ........................................................................................................................ 8
Difference between Temp tables and Table variables in SQL Server ............................ ...................... ....... 8
Suggestion forchoosing between these two: ....................................................................................... 9
Stored Procedures ................................................................................................................................... 9
Advantages of Stored Procedures: ....................................................................................................... 9
Differences between User Defined Functions and Stored Procedures .................................................... 10
SSAS ...................................................................................................................................................... 10
-
8/9/2019 Interview Topics on SQL
3/22
Author: Vinay Kotha Page 3
Different Dimension types by Microsoft available in Analysis Services.............................. .................. 10
Different Types of Dimensions ............................................................................................................... 11
Confirmed Dimension: ....................................................................................................................... 12
Junked Dimension:............................................................................................................................. 12
Degenerated Dimension: ................................................................................................................... 12
Slowly Changing Dimensions:............................................................................................................. 12
There are 10 types of dimension Tables ............................................................................................. 12
Differences between Analysis Services 2005 and 2008 .................................................. ......................... 12
Define temporary and extended stored procedure. ............................................................................... 13
Differences between SSRS 2005 and SSRS 2008 ..................................................................................... 14
Performance Tuning of SSRS: Handling a Large workload ............................... ....................... ................. 14
Steps to Improve Performance........................................................................................................... 14
Control the Size of yourReports..................................................................................................... 14
Use Cache Execution ...................................................................................................................... 14
Configure and Schedule YourReports ............................................................................................ 15
DeliverRendered Reports forNon-browser Formats ...................................................................... 15
Populate theReport Cache by Using Data-Driven Subscriptions for Parameterized Reports ........... 15
Back to Report Catalogs ................................................................................................................. 15
Tuning with Web Service................................................................................................................ 15
Memory Limits in SQL ServerReporting Services 2008 ............................................. .......................... 16
Memory Limit ................................................................................................................................ 16
Maximum Memory Limit ................................ ....................... ................................ ...................... ... 16
Performance Tuning of SQL Server......................................................................................................... 16
Section A: .......................................................................................................................................... 16
Microsoft Tips on Performance Tuning:.............................................................................................. 17
Not knowing the performance and scalability characteristics of yoursystem:......................... 17
Retrieving too much data: ...................................................................................................... 17
Misuse of Transactions: .......................................................................................................... 17
Misuse ofIndexes:.................................................................................................................. 17
Mixing OLTP, OLAP and reporting workloads: ......................................................................... 17
Inefficient Schemas: ............................................................................................................... 17
Using an inefficient disksub-system: ...................................................................................... 17
-
8/9/2019 Interview Topics on SQL
4/22
Author: Vinay Kotha Page 4
SSIS 10 Best Practices: ........................................................................................................................... 17
SSIS Performance tuning ........................................................................................................................ 18
Data Flow Optimization Modes .................................................................................................. 18
Buffers: ...................................................................................................................................... 18
Buffer Sizing: .............................................................................................................................. 18
Buffer Tuning: ............................................................................................................................ 19
Parallelism: ................................................................................................................................ 19
Extraction Tuning ....................................................................................................................... 19
Transformation Tuning ................................ ....................... ................................ ...................... .. 20
Merge-Join Transformation ............................ ....................... ................................ ..................... 20
Slowly Changing Dimensions ...................................................................................................... 21
Data Types ................................................................................................................................. 21
Miscellaneous ............................................................................................................................ 21
Load Tuning ............................................................................................................................... 21
Differences between SSIS 2005 and SSIS 2008 ....................................................................................... 22
Look-up ............................................................................................................................................. 22
Cache Transformation ............................... ....................... ................................ ...................... ............ 22
Data Profiling Task ............................................................................................................................. 22
Script Task and Transformation .............................. ....................... ................................ ..................... 22
Recovery Models:
There are 3 recovery Models in SQL Server.
1) Simple2) Full3) Bulk-Logged
SimpleRecovery Model: Simplerecovery model allows you to recover data only to the mostrecent full database or differential back-up. Transaction log back-ups are not available because the
contents of the transaction log are truncated each time a checkpoint is issued for the database.
Or
-
8/9/2019 Interview Topics on SQL
5/22
Author: Vinay Kotha Page 5
Simplerecovery model is just that simple, in this approach; SQL Server maintains only a minimal amount
of information in the transaction log. SQL Server truncates the transaction log each time the database
reaches a transaction checkpoint, leaving no log entries for disasterrecovery purposes.
In databases using simplerecovery model, you may restore full or differential back-up only. It is not
possible to restoresuch a database to a given point in time; you may only restore it to theexact timewhen a full or differential back-up occurred. Therefore, you will automatically lose any data
modifications made between the time of the most recent full/differential back-up and the time of
failure.
Fullrecovery Model: Fullrecovery model uses database back-ups and transaction log back-ups to
providecomplete protection against failure. Along with being able to restore a full or differential back-
up, you can recover the database to the point of failure or to a specific point in time. All operations,
including bulk operationssuch as SELECT INTO, CREATE INDEX and bulk-loading data, are fully logged
and recoverable.
Or
FullRecovery model also bears a self-descriptive name. In this model, SQL Server preserves the
transaction log until you back it up. This allows you to design a disaster back-up in conjunction with
transaction log back-ups.
In theevent of a database failure, you have the most flexibility restoring databases using the full
recovery model. In addition to preserving data modificationsstored in the transaction log, the full
recovery model allows you to restore a database to a specific point in time. Forexample, if an erroneous
modification corrupted your data at 2:36 Am on Monday, you could use SQL Servers point in time to
restore to roll your database back to 2:35 AM, wiping out theeffects of therecovery.
Bulk-Logged: Bulk-logged recovery model provides protection against failurecombined with the
best performance. In order to get better performance, the following operations are minimally logged
and not fully recoverable: SELECT INTO, bulkload operations.
Or
Bulkrecovery model is a special-purpose model that works in a similar manner to the fullrecovery
model. The only difference is in the way it handles bulk data modification operations. The bulk-logged
modelrecords these operations in the transaction log using a technicalknown as minimallogging. This
savessignificantly on processing time, but prevents you from using point-in-timerestore option.
Microsoft recommends that the bulk-logged recovery model only be used forshort periods of time. Best
practice dictates that you switch a database to the bulk-logged recovery model immediately before
conducting bulk operations and restore it to the fullrecovery model when those operationscomplete.
-
8/9/2019 Interview Topics on SQL
6/22
Author: Vinay Kotha Page 6
Back-ups
One of the major advantages that enterprise-class databases offer over their desktop counterparts is a
robust back-up and recovery featureset. Microsoft SQL Server provides database administrators with
the ability to customize a database backup and recovery plan to the business and technical
requirements of an organization.
In this article, weexplore the process of backing up data with Microsoft SQL Server. When you create a
backup plan, you will need to create an appropriate mix of backups with varying[em] backup
scopes[/em] and [em]backup types[/em] that meet therecovery objectives of your organization and are
suitable for your technicalenvironment.
Back-upScopes: Thescope of a back-up defines the portion of the databasecovered by the
backup. It defines the database, file and or file-group that SQL Server will backup. There are three
different types of back-up scope available in Microsoft SQL Server:
A)
Database backups: Thesecover theentire database including allstructuralschemainformation, theentire data contents of the database and any portion of the transaction log
necessary to restore the database from scratch to itsstate at the time of the backup. Database
backups are thesimplest way to restore your data in theevent of a disaster, but they consume a
large amount of diskspace and time to complete.
B) Partial Back-ups: These are good alternatives to database back-ups for very largedatabases that contain significant quantities ofread-only data. If you haveread-only file-groups
in your database, it probably doesnt makesense to back them up frequently, as they do not
change. Therefore, thescope of a partial back-up includes all files in the primary file-group; all
read/write file-groups, and any read-only file- groups that you explicitly specify.
C) File Back-ups: This allows you to individually back-up files and/or file-groups from yourdatabase. They may be used to complement partial back-ups by creating one-time-only backups
of yourread-only file-groups. They may also play a role in complex back-up models.
Back-UpTypes
Thesecond decision you need to make when planning a SQL Server database backup model is the type
each backup included in your plan. The backup type describes the temporalcoverage of the database
backup. SQL Serversupports two different back-up types:
A) Full Backups: This includes all data within the backup scope. Forexample, a full databasebackup will include all data in the database, regardless of when it waslast created for modified.
Similarly, a full partial backup will include theentirecontents ofevery file and file-group within
in thescope of the partial backup.
B) Differentialbackups: This includes only the portion of data that had changed since thelast full backup. Forexample, if you perform a full database backup on Monday morning and
then perform a differential backup on Monday evening, the differential backup will be a much
-
8/9/2019 Interview Topics on SQL
7/22
Author: Vinay Kotha Page 7
smaller file and takes much less time to create, this includes only the data changed during the
day on Monday.
You should keep in mind that thescope and type of a backup are two independent decisions made
when creating your backup plan. As described above, each type and scope allows you to customize
the amount of data included in the backup and, therefore, the amounts of timerequired to backupand restore the database in theevent of a disaster.
SQLSERVERREPLICATION
SQL Serverreplication allows database administrators to distribute data to variousservers
throughout an organization. You may wish to implement replication in your organization for a
number ofreasons, such as
A) Load Balancing:Replication allows you to disseminate your data to a number ofserversand then distribute thequery load among thoseservers.
B) OfflineProcessing: you may wish to manipulate data from your database on a machinethat is not alwaysconnected to the network.
C) Redundancy:Replication allows you to build a fail-over databaseserver thatsready to pickup the processing load at a moments notice.
In any replication scenario there are 2 main components:
A) Publishers have data to offer to the otherservers. Any given replication scheme may haveone or more publishers.
B) Subscribers are databaseservers that wish to receive updates from the publisher when thedata is modified
Theres nothing preventing a singlesystem from acting both of thesecapabilities. In fact, this is often
done in large-scale distributed databasesystems. Microsoft SQL Serversupports three types of database
replication. They are
A) SnapshotReplication:It acts in the manner its name implies. The publishersimply takes asnapshot of theentirereplicated database and shares it with thesubscribers. Ofcourse, this is a
very time and resource-intensive process. For thisreason, most administrators dont use
snapshot replication on a recurring basis for databases that change frequently. There are two
scenarios wheresnapshot replication iscommonly used. First, it is used for databases that rarelychange. Second, it is used to set a baseline to establish replication between systems while future
updates are propagated using transactional or mergereplication.
B) TransactionalReplication: This offers a more flexiblesolution for databases thatchange on a regular basis. With transactionalreplication, thereplication agent monitors the
publisher forchanges to the database and transmits thosechanges to thesubscribers. This
transmission can take place immediately or on a periodic basis.
-
8/9/2019 Interview Topics on SQL
8/22
Author: Vinay Kotha Page 8
C) MergeReplication:It allows the publisher and subscriber to independently makechangesto the database. Both entitiescan work without an active networkconnection. When they are
reconnected, the mergereplication agentschecks forchanges on both sets of data and modifies
each database accordingly. Ifchangesconflict with each other, it uses a predefined conflict
resolution algorithm to determine the appropriate data. Mergereplication iscommonly used by
laptop users and others who cannot beconstantly connected to the publisher.
Each one of thesereplication techniquesserves a useful purpose and is well-suited to particular
databasescenarios.
If you are working with SQL Server 2005, youll need to choose youredition based upon your
replication needs. Each edition has differing capabilities.
A) Express edition hasextremely limited replication capabilities. It is able to act as areplication client only.
B) W
orkgroupedition addslimited publishing capabilities. It is able to serve fiveclients usingtransactionalreplication and up to 25 clients using mergereplication. It can also act as a
replication client.
C) Standard edition has full, unlimited replication capabilities with other SQL Serverdatabases
D) Enterpriseedition adds a powerful tool for those operating in a mixed databaseenvironmentsitscapable ofreplication with oracle databases
As you have undoubtedly recognized by this point, SQL Serversreplication capabilities offer
database administrators a powerful tool for managing and scaling databases in an enterprise
environment.
Differencebetween Temp tables and Table variables in SQLServer
1) Transaction log are not recorded for table variablesso they are transactional neutral or you cansay that they are out ofscope of transaction mechanism. Whereas temp tables participate in
transactions just like normal tables
2) Table variablescannot be altered it means no DDL action is allowed on them. Whereas temptablescan be altered
3) Stored procedures with a temporary tablecannot be pre-compiled, while an execution plan ofprocedures with table variablescan bestatically compiled in advance. Pre-compiling a script
gives a major advantage to itsspeed ofexecution. This advantagecan be dramatic forlong
procedures, whererecompilation can be too pricy.
4) Unlike temp tables, table variables are memory resident but not always. Under memorypressure, the pages belonging to a table variablecan be pushed out to tempdb.
5) Therecan be big performance differences between using table variables and temporary tables.In most cases, temporary tables are faster than table variables. Although queries using table
variables didnt generate parallelquery plans on a large SMP box, similarqueries using
-
8/9/2019 Interview Topics on SQL
9/22
Author: Vinay Kotha Page 9
temporary tables (local or global) and running undersamecircumstances did generate parallel
plans.
6) Table variables use internal metadata in a way that prevents theengine from using a tablevariable with parallelquery. SQL Server maintainsstatistics forqueries that use temporary
tables but not forqueries that use table variables. Without statistics, SQL Server might choose a
poor processing plan for a query that contains a table variable.
No Statistics is maintained on the table variable which means that any changes in data
impacting table variable will not causerecompilation ofqueries accessing table variable. Queries
involving table variables dont generate parallel plans.
Suggestion forchoosing between these two:
1) Use table variable where you want to pass table to the SP as parameter because there is nootherchoice.
2) Its found that table variable areslow in SQL Server 2005 than in 2000 on similar data andcircumstances, so if you have used table variablesextensively in your database and planning to
migrate from 2000 to 2005, make yourchoicecarefully.
3) Table variable areOK if used in smallqueries and for processing small amount of data otherwisego for temp tables.
4) If you are using very complex businesslogic in your SP, its better using temp tables than tablevariables.
Stored ProceduresA stored Procedure is a group of SQLstatements that form a logical unit and perform a particular
task. Stored procedures are used to encapsulate a set of operations orqueries to execute on a
databaseserver. Forexample, operations on an employee database (hire, fire, promote, lookup)could becoded asstored procedureexecuted by application code. Stored procedurescan be
compiled and executed with different parameters and results, and they may have any combination
of input, output, and input/output parameters.
Advantages ofStored Procedures:
A) Precompiled execution: SQL Servercompileseach stored procedure once and then reutilizestheexecution plan. Thisresults in tremendous performance boosts when stored procedures are
called repeatedly.
B) Reduced client/server traffic:If network bandwidth is a concern in yourenvironment, youll behappy to learn that stored procedurescan reducelong SQLqueries to a singleline that ittransmitted over the wire.
C) Efficient re-use of code and programming abstraction: Stored Procedurescan be used bymultiple users and client programs. If you utilize them in a planned manner, youll find the
development cycle takesless time.
D) Enhanced security controls: you can grant users permissions to execute a stored procedureindependently of underlying table permissions.
-
8/9/2019 Interview Topics on SQL
10/22
Author: Vinay Kotha Page 10
Differences between UserDefined Functions and Stored Procedures
Stored procedures are very similar to user-defined functions, but there aresuitable differences.
Both allow you to create bundles of SQLstatements that arestored on theserver for future use. This
offers you a tremendousefficiency benefit, as you can save programming by
A) Reusing code from one program to another, cutting down on program development timeB) Hiding the SQL details, allowing database developers to worry about SQL and application
developers to deal only in higher-levellanguages
C) Centralize maintenance, allowing you to make businesslogicchanges in a single place thatautomatically affect all dependent applications
At first glance, functions and stored proceduresseem identical. However, there areseveralsubtle, yet
important differences between the two:
A) Stored procedures arecalled independently, using the EXEC command, while functions arecalled from within another SQLstatement
B) Stored procedures allow you to enhance application security by granting users and applicationspermission to usestored procedures, rather than permission to access the underlying tables.
Stored procedures provide the ability to restrict user actions at a much more granularlevel than
standard SQL Server permissions. Forexample if you have an inventory table that cashiers must
updateeach time an item issold (to decrement the inventory for that item by 1 unit), you can
grant cashiers permissions to use a decrement item stored procedure, rather than allowing
them to make arbitrary changes to the inventory table.
C) Functions always must return a value (either a scalar value or a table). Stored procedures mayreturn a scalar value, a table value or nothing at all.
Overall, stored procedures are one of the greatest treasures available to SQL Server developers. The
efficiency and security benefits are well worth the upfront investment in time.
SSAS
DifferentDimension types by Microsoftavailablein Analysis Services
1) Regular2) Time3) Organization4) Geography5) Bill of Materials6) Accounts7) Customers8) Products9) Scenario10)Quantitative
-
8/9/2019 Interview Topics on SQL
11/22
Author: Vinay Kotha Page 11
11)Utility12)Currency13)Rates14)Channel15)PromotionRegular: A dimension whose type has not been set to a special dimension type
Time: A dimension whose attributesrepresents time periods, such as years, semesters, quarters,
months and days
Organization: A dimension whose attributesrepresents organizational information such as
employers orsubsidiaries
Geography: A dimension whose attributerepresents geographic information, such ascities or postal
codes
Bill of Materials: A dimension whose attributesrepresent inventory r manufacturing information
such as partslists for products
Accounts: A dimension whose attributesrepresent a chart of accounts for financialreporting
purposes
Customers: A dimension whose attributerepresent customer orcontact information
Products: Dimensions whose attributerepresent product information
Scenario: Dimensions whose attributerepresent planning orstrategic analysis information
Quantitative: Dimensions whose attributesrepresent quantitative information
Utility: Dimensions whose attributerepresent miscellaneous information
Currency: A dimension whose attributesrepresentscurrency rate information
Rates: Dimensions whose attributerepresent currency rate information
Channel: A dimension whose attributerepresent channel information
Promotion: Dimensions whose attributerepresent marketing promotion information.
DifferentTypes ofDimensions
1) Confirmed Dimension2) Junk Dimension3) Degenerated Dimension4) Slowly changing dimensions
-
8/9/2019 Interview Topics on SQL
12/22
Author: Vinay Kotha Page 12
Confirmed Dimension: These dimensions aresomething that is built once in your model and
can dereused multiple times with different fact tables. Forexampleconsider a modelcontaining
multiple fact tables, representing different data-marts. Now look for a dimension that iscommon to
these fact tables. In thisexampleletsconsider that the product dimension iscommon and hence
can bereused by creating short cuts and joining the different fact tables. Some of theexamples are
time dimension arecustomer dimension, product dimension.
Junked Dimension: When you consolidatelots ofsmall dimensions and instead of having 100s
ofsmall dimensions, that will have few records in them, cluttering your database with these mini
identifier tables, allrecords from all thesesmall dimension tables areloaded into ONE dimension
table and wecall this dimension table as JUNK dimension table. (Since we arestoring all the Junk in
this one table) Forexample a company might have handful of manufacture plants, handful of order
types, and so on, so forth, and wecan consolidate them into one dimension tablecalled Junk
dimension table
Degenerated Dimension: An item that is in the fact table but isstripped off of itsdescription, because the description belongs in dimension table, isreferred to as Degenerated
Dimension. Since it lookslike dimension, but isreally in fact table and has been degenerated of its
description, hence iscalled as Degenerated Dimension.
Slowly Changing Dimensions: These dimensions are those wherekey value willremain
static but description might change over the period of time
Thereare 10 types of dimension Tables(This is not thecase in most of the instances)
1) Primary Dimensions2) Secondary Dimensions3) Degenerate Dimensions4) Confirmed Dimensions5) Slowly Changing Dimensions6) Rapidly Changing Dimensions7) Large Dimensions8) Rapidly Changing Monster Dimensions9) Junk Dimensions10)Role-Playing Dimensions
Differences between Analysis Services 2005 and 2008
A) Real time best practice design warnings. These warnings are implemented in AMO, exposed intheUI via bluesquiggly lines, and can be dismissed individually (a single occurrence) or turned
off all together. To disable/re-enable build project and then in the warning window select
warning message and right mouseclick to choose disable orenable.
B) New Dimension Design Wizard
-
8/9/2019 Interview Topics on SQL
13/22
Author: Vinay Kotha Page 13
C) New Cube Design WizardD) Attribute relationship tab in dimension designer. Allowseasier to define and understand
attributerelationship.
E) CREATE MEMBER syntax extensions to support defining caption, display folders and associatedmeasure group.
F) CREATE SET syntax extensions to support defining caption and display folders as well as theability to define dynamic named sets.
G) CREATE KPIcommand is addedH) Backup performance improvements.In SSAS 2005 backup time for big databases grew
exponentially. In SSAS 2008 backup time grow islinear. Redesigned backup storage willremove
backup sizelimits.
I) Write-back to MOLAP Analysis Services 2008 removes therequirement to query ROLAPpartitions when performing write-backs, which results in huge performance gains.
J) Scale-out Analysis Services. A singleread-only copy of Analysis Services databasecan besharedbetween many Analysis Services through a virtualIP address. Thiscreates a highly scalable
deployment option for an Analysis Services Solution.
K) UPDATE MEMBER new statement. TheUPDATE MEMBERstatement updates an existingcalculated member while preserving therelative precedence of this member with respect to
othercalculations. Therefore, you cannot use theUPDATE MEMBERstatement to change
SOLVEORDER. An UPDATE MEMBERstatement cannot bespecified in the MDX script for a Cube.
L) Block Computation. Thiseliminates unnecessary aggregation calculations (forexample, whenthe values to be aggregated areNULL) and provides a significant improvement in analysiscube
performance, which enables users to increase the depth of their hierarchies and complexity of
computations.
M) Aggregation Designer Changes. Algorithm that builds aggregations will be improved, there willbesupport for manualedit/create/delete of aggregations and weshould be able to see what
aggregates was designed. Also aggregation designer will have built-in validations for optimal
design assistance.
N) Data Management Views (DMV). These DMVs will allow writing SELECT typestatements againstSSAS instance to get performance and statistics information.
O) SSAS database attach/detachP) Analysis Services Personalization Extensions
Define temporary and extended stored procedure.
Answer - Temporary Stored Procedure isstored in Tempdb database. It is volatile and is deleted once
connection gets terminated orserver isrestarted......
-
8/9/2019 Interview Topics on SQL
14/22
Author: Vinay Kotha Page 14
Differences between SSRS 2005 and SSRS 2008
1) For SSRS 2005, it required Internet information services (IIS) to rum, where as in SSRS 2008, it nolongerrequiresIIS. 2008 useshttp.sys driver and listens forreport requests through http.sys.
Not only does thisreduce deployment headaches, it also reducesserver overhead
2) SSRS 2005 used more memory and it wasextremely resource intensive, so much so that manycompanies would install it on other machine apart from SQL Server, but 2008 utilizes memory
moreefficiently, especially when working with reports that contain largesets of data.
Additionally, SSRS 2008 will often load the first page of a report much faster than 2005.
PerformanceTuning ofSSRS: Handling aLarge workload
To get the highest performance when handling large workloads that include userrequests forlarge
reports, implement the following recommendations
Steps to ImprovePerformance
1) Control thesize of yourreports2) Use Cache Execution3) Configure and Schedule yourreports4) DeliverRendered Reports forNon-browser Formats5) Populate theReport Cache by Using Data-Driven Subscriptions for Parameterized Reports6) Back to theReport Catalogs7) Tuning the Web Service
Control the Size ofyour Reports
you will first want to determine the purpose of thesereports and whether a large multi-pagereport is
even necessary. If a largereport is necessary, how frequently will it be used? If you provide users with
smallersummary reports, can you reduce the frequency with which users attempt to access thislarge
multi-pagereport? Largereports have a significant processing load on thereport server, thereport
servercatalog, and report data, so it is necessary to evaluateeach report on a case-by-case basis
Somecommon problems with theselargereports are that they contain data fields that are not used in
thereport or they contain duplicate datasets. Often usersretrieve more data than they really need. To
significantly reduce theload placed on yourReporting Servicesenvironment, createsummary reports
that use aggregatescreated at the data source, and include only the necessary columns,. If you want to
provide data feeds, you can do this asynchronously using more appropriate toolssuch as SSIS, to provide
the data feed.
Use Cache Execution
If thereports do not need to haveliveexecution, enable thecacheexecution setting foreach of your
-
8/9/2019 Interview Topics on SQL
15/22
Author: Vinay Kotha Page 15
appropriatereports. Thissetting causes thereport server to cache a temporary copy of thosereports in
memory.
Configure and Schedule Your Reports
For yourlargereports, use theReport Execution Timeoutssetting to control how long a report can
execute before it times out. Somereportssimply need a long time to run, so timeouts will not help youthere, but ifreports are based on bad orrunaway queries, execution timeoutsensure that resources are
not being inappropriately utilized
If you havelargereports that create data processing bottle-necks, you can mitigateresourcecontention
issues by using Scheduled Snapshots. Instead of thereport data itself, a regularly scheduled report
execution snapshot is used to render thereport. Thescheduled snapshot can beexecuted during off-
peak hours, leaving moreresources available forlivereports for users during peak hours.
Deliver Rendered Reports for Non-browser Formats
rendering performance of non-browser formatssuch as PDF and XLS has improved SQL Server 2008Reporting Services, nevertheless, to reduce theload on your SQL ServerReporting Services
environment, you can place non-browser format reports onto a fileshare and/or Sharepoint, so users
can access the file directly instead ofcontinually regenerating thereport.
Populate the Report Cache by Using Data-Driven Subscriptions for
Parameterized Reports
For yourlarge parameterized reports; you can improve performance by pre-populating thereport cache
using data-driven subscriptions. Data-driven subscriptionsenableeasier population of thecache forset
combinations of parameter values that are frequently used when the parameterized report isexecuted.
Note that if you choose a set of parameters that are not used, you take on thecost ofrunning thecachewith little value in return. Therefore, to identify the more frequent parameter valuecombinations,
analyze the Execution-Log2 view. Ultimately, when a user opens thereport, thereport servercan now
use a cached copy of thereport instead ofcreating thereport on demand. You can schedule and
populate thereport cache by using data-driven subscriptions.
Back to Report Catalogs
You can also increase thesize of yourreport servercatalogs, which allows the database to store more of
thesnapshot data.
Tuning with Web ServiceIIS and Http.Sys tuning helps get thelast incremental performance out of thereport servercomputer.
Thelow-level options allow you to change thelength of the HTTP request queue, the duration that
connections arekept alive, and so on. Forlargeconcurrent reporting loads, it may be necessary to
change thesesettings to allow yourservercomputer to accept enough requests to fully utilize theserver
resources.
-
8/9/2019 Interview Topics on SQL
16/22
Author: Vinay Kotha Page 16
you should consider this only if yourservers are at maximum load and you do not see fullresource
utilization or if you experienceconnection failures to theReporting Services.
MemoryLimits in SQLServerReporting Services 2008
Memory Limit
Thisconfiguration issimilar to WorkingSetMinimum in SQL Server 2008. Its default is 60% of physical
memory. Increasing the value helpsReporting Services handle morerequests. After this threshold is
reached, no new requests are accepted.
MaximumMemory Limit
Thisconfiguration issimilar to WorkingSetMaximum in SQL Server 2008. Its default is 80% of physical
memory. But unlike SQL Server 2008 version, when its threshold isreached, it starts aborting process
instead ofrejecting new requests
PerformanceTuning ofSQLServer
Section A:
Increasing the min memory perquery option to improve the performance ofqueries that usehashing orsorting operations, if your SQL Server has a lot of memory available and there are
many queriesrunning concurrently on theserver. Default min memory perquery option is
equal to 1024 kb.
Increasing the max asyncIO option if the SQL Server works on a high performanceserver withhigh-speed intelligent disksubsystem (such as hardware-based RAID with more than 10 disks)
Changing the Network Packet Size option to the appropriate value. By default packet size is4096 kb, forqueries with high amounts of data packet sizecan be increased accordingly
You can increase the Recovery Interval value Increasing the Priority boost for SQL Server options to 1. By default it isset to 0. Set the Max Worker Threads options to maximum number of userconnections to your SQL
Server box.
The default setting for the max worker threads options is 255. If the number of user
connections will beless than the max worker threads value, a separate operating system
thread will becreated foreach client connection, but if the number of userconnections will
exceed this value the thread pooling will be used. Forexample, if the maximum number of the
userconnections to your SQL Server box isequal to 50, you can set the max worker threads
option to 50, this frees up resources for SQL Server to useelsewhere. If the maximum number ofthe userconnections to your SQL Server box isequal to 500, you can set the max worker
threads options to 500, thiscan improve SQL Server performance because thread pooling will
not be used.
Specify the Min Server Memory and Max Server Memory options Specify the Set Working Set Size SQL Server option to reserve the amount of physical memory
space for SQL Server.
-
8/9/2019 Interview Topics on SQL
17/22
Author: Vinay Kotha Page 17
MicrosoftTips on PerformanceTuning:
Not knowing the performance and scalability characteristics ofyoursystem:If performance and scalability of a system are important to you, the biggest
mistake that you can make is to not to know the actual performance and scalability
characteristics of important queries, and theeffect the different queries have on each otherin a multiusersystem. You achieve performance and scalability when you limit resource use
and handlecontention for thoseresources. Contention iscaused by locking and by physical
contention. Resource use includes CPU utilization, networkI/O, diskI/O and memory use.
Retrieving too much data: A common mistake is to retrieve more data than youactually require. Retrieving too much data leads to increased network traffic, and increased
server and client resources. Thiscan include both thecolumns and rows.
Misuse ofTransactions:Long-running transactions, transactions that depend on userinput to commit, transactions that nevercommit because of an error, and non-transactional
queries inside transactionscausescalability and performance problems because they lock
resourceslonger than needed.
Misuse ofIndexes: if you do not create indexes that support thequeries that are issuedagainst yourserver, the performance of your application suffers as a result. However, if you
have too many indexes, then insert and update performance of your application suffers. You
have to find a balance between the indexing needs of the writes and reads that is based on
how your application is used.
Mixing OLTP, OLAP and reporting workloads:OLTP workloads arecharacterized by many small transactions, with an expectation of very quickresponse time
from the user. OLAP and reporting workloads arecharacterized by a few-long running
operations that might consume moreresources and cause morecontention. Thelong-
running operations arecaused by locking and by the underlying physicalsub-system. You
must resolve thisconflict to achieve a scalablesystem.
Inefficient Schemas: Adding indexescan help improve performance, however theirimpact may belimited if yourqueries are inefficient because of poor table design that
results in too many join operations or in inefficient join operations. Schema design is a key
performance factor. It also provides information to theserver that may be used to optimize
query plans. Schema design islargely a tradeoff between good read performance and good
write performance. Normalization helps write performance. De-normalization helpsread
performance
Using an inefficient disk sub-system: the physical disksub-system must provide adatabaseserver with sufficient I/O processing power to permit the databaseserver to run
without diskqueuing orlong I/O waits.
SSIS 10 BestPractices:
1) SSIS is an in-memory pipeline, so ensure all transformations occur in memory
-
8/9/2019 Interview Topics on SQL
18/22
Author: Vinay Kotha Page 18
2) Plan forcapacity by understanding resource utilization3) Baselinesourcesystem extract speed4) Optimize SQL data source, lookup transformations and destination5) Tune your network6) Use data types yes, back to data types wisely7) Change the design8) Partition the problem9) Minimizelogged operations10)Schedule and distribute it correctly
SSISPerformance tuning
SSIS architecture has two engines, Run-Timeengine and Data Flow engine. Run-Timeengine is a highly
parallelcontrol flow engine that co-ordinates theexecution of tasks or units work within SSIS and
manages theengine threads that carry out those tasks. Data-Flow engine manages the data pipeline
within a data flow task.
DataFlow Optimization ModesData flow task has a property called RunInOptimizedMode. When this property isenabled,
any down-stream component that doesnt use any of thesourcecomponent columns is
automatically disabled, and unused column is also automatically disabled. The net result of
enabling the RunInOptimizedMode property is the performance of theentire data-flow task is
improved
SSIS projects also have a RunInOptimizedMode property. This indicates that the
RunInOptimizedMode property of all the data-flow tasks in the project is overridden at design
time, and that all of data-flow tasks in the project run is optimized mode during debugging.
Buffers:A buffer is an in-memory dataset object utilized by the data flow engine to transform data. The
data flow task has a configurable property called DefaultMaxBufferSize, which isset to 10,000
by default. Data-flow task also has a configurable property called DefaultBufferSize, which is
set to 10 MB by default. Additionally, data-flow task has a property called MaxBufferSize,
which isset to 100 MB and cannot bechanged.
BufferSizing:When performance tuning a data-flow task, the goalshould be to pass as many records as
possible through a single buffer whileefficiently utilizing memory. This begs thequestion: what
does efficiently utilizing memory mean? SSIS estimates thesize of a bufferrow by calculating
the data source meta-data at design time. Optimally, the bufferrow sizeshould be assmall as
possible, which can be accomplished by employing thesmallest possible data-type foreach
column. SSIS automatically multiplies theestimated bufferrow size by the
DefaultMaxBufferRows setting to determine how much memory to allocate to each buffer in
-
8/9/2019 Interview Topics on SQL
19/22
Author: Vinay Kotha Page 19
the data-flow engine. If this amount of memory exceeds Max Buffer Size100 MB, SSIS
automatically reduces the number of bufferrows to fit within the 100 MB boundary.
Data-flow task has another property called MinBufferSize, which is 64 KB and cannot be
changed. If the amount of memory estimated by SSIS to be allocated foreach buffer is below 64
KB, SSIS will automatically increase the number of bufferrows per buffer in order to exceed
MinBufferSize memory boundary.
BufferTuning:Data-flow task has a property called BufferSizeTuning. When the value of this property isset
to true, SSIS will add information to the SSIS log indicating where SSIS had adjusted the buffer
size. While buffer tuning, the goalshould be to fit as many rows into buffer as possible. Thus,
the value for DefaultMaxBufferRows should be aslarge as possible without exceeding a total
buffersize of 100 MB.
Parallelism:SSIS natively supports the parallelexecution of packages, tasks and transformations. Therefore,
parallelism can greatly improve the performance of a package when it isconfigures with-in the
constraints ofsystem resources. A package has a property called MaxConcurrentExecutables,
which can beconfigured to set the maximum number of threads that can execute in parallel per
package. By default this isset to -1, which translates to the number oflogical machine
processors plus 2. All orsome of the operations in a packagecan execute in parallel.
Additionally, data-flow task has a property called EngineThreads, which defines how many
threads the data-flow enginecan create and run in parallel. This property appliesequally to boththesource threads that the data flow enginecreates forsources and the worker threads that
theenginecreates for transformations and destinations. Forexample, setting the EngineThreads
property to 10 indicates that the data-flow enginecan create upto 10 source threads and 10
worker threads.
Extraction Tuninga) Increase the connectionmanagers packet size property:Useseparateconnection
managers for bulkloading and smaller packet size for ole-db command transformations
b) Affinitize network connections: thiscan be accomplished if a machine has multiple
cores and multipleNICs.
c) Tune Queries:
--Select only needed columns
--Use a hint to specify that no shared locks be used during theselect (query can potentially
read uncommitted data). Used only if thequery must have the best performance
d) Look-ups
-- Select only needed columns
-
8/9/2019 Interview Topics on SQL
20/22
Author: Vinay Kotha Page 20
--Use the Shared Look-up Cache (available in 2008)
e) Sorting
Merge and Merge-Join transformationsrequiresorted inputs. Source data for these
transformations that is already sorted obviates the need for an upstream Sort transformation
and improves data flow performance. The following properties must beconfigured on a source
component if thesource data is already sorted
a) IsSorted: The outputs of a sourcecomponent have a property called IsSorted. The value of
this property must be true.
b) Sort Key Position: Each output column of a sourcecomponent has this property, which
indicates whether a column issorted, thecolumnssort order and thesequence in which
multiplecolumns aresorted. This property must beset foreach column ofsorted data.
Transformation TuningPartially Blocking (Asynchronous): Merge, Merge-Join, union allcan possible be optimized in the
sourcequery
Use SSIS 2008:
--Improved data flow taskscheduler
--Union All transforms no longer necessary to split up and parallelizeexecution trees
Blocking Transformations (Asynchronous): Aggregate, Sort, Pivot, Un-Pivot should belimited
one per data flow on thesame data
Aggregate Transformations: This transformations includes the Keys, KeyScale, CountDistinctKeys
and CountDistinctScale properties, which improves performance by enabling the transformation
to pre-allocate the amount of memory that the transformation needs for the data that the
transformation caches. If theexact or approximate number of groups that areexpected to result
from a Group By operation isknown, then set the Keys and KeyScale propertiesrespectively. If
theexact or approximate number of distinct values that areexpected to result from a DistinctCount operation isknown, then set the CountDistinctKeys and CountDistinctScale properties
respectively.
If thecreation of multiple aggregations in a data flow is necessary, then consider thecreation of
multiple aggregations that use one Aggregate transformation instead ofcreating multiple
transformations. Performance is improved with this approach because when one aggregation is
a subset of another aggregation, the transformations internalstorage is optimized by scanning
the incoming data only once. Forexample, if an aggregation uses a Group By clause and an AVG
aggregation, then performancecan be improved by combining them into one transformation.
However, aggregation operations areserialized when multiple aggregations are performed
within one aggregation transformation. Therefore, performance might not be improved whenmultiple aggregations must becomputed independently.
Merge-Join TransformationMax Buffers PerInput: this property specifies the maximum number of buffers that can be
active foreach input at one time. This property can be used to tune the amount of memory that
buffersconsume, and consequently the performance of the transformation. As the number of
buffers increase, the more memory the transformation useswhich improves performance. The
-
8/9/2019 Interview Topics on SQL
21/22
Author: Vinay Kotha Page 21
default value of this property is 5. This is the number of buffers that works well in most
scenarios. Performancecan be tuned by using a slightly different number of bufferssuch as 4 or
6.using a very small number of buffersshould be avoided if possible. Forexample, there is a
significant impact on performance when MaxBuffersPerInput isset to 1 instead of 5.
Additionally, MaxBuffersPerInput shouldnt beset to 0 orless. Throttling doesnt occur with this
range of values. Also, depending on the data load the amount of memory available, the package
may not complete.
Slowly Changing Dimensionsthis wizard creates a set of data flow transformation components which work together with the
slowly changing dimension transformation component. This wizard createsOLE DB Command
transformation components that perform Updates against a singlerow at a time. Performance
can be improved by replacing these transformation components with destination components
that save allrows to be updated to a staging table. Then, an Execute SQL Taskcan be added that
performs a singleset-based T-SQLUpdatestatement against allrows at thesame time.
DataTypes1) Use thesmallest possible data-types in the data flow.
2) Use the CAST or CONVERT functions in thesourcequery if possible
Miscellaneous1) Sort in the Query if possible
2) if possible, use the T-SQL Mergestatement instead of the SCD transformation
3) If possible, use the T-SQLInsert Into statement instead of the data flow task
4) A data reload may perform better than a delta refresh
Load TuningUse the SQL Server Destination
1) Only helps if the data flow and the destination databases are on thesame machine
2) Weakererror handling then theOLE DB Destination
3) Set Commit Size = 0
Use OLE DB Destination
1) Set Commit Size = 0
Drop Indexes basedon the expected % load growth
1) Dont drop an index if its the only clustered index: Data in a table issorted by a clustered
index. Primary keys areclustered indexes. Loading will always be faster than dropping and
recreating a primary key, and usually be faster than dropping and recreating a clustered index
2) Drop a non-clustered index if theload willcause 100% increase: This is therule of thumb
3) Dont drop non-clustered index if theload increase is under 10%:Not a rule of thumb,
experiment to find out the optimal value.
Use Partitions ifNecessary
1) Use the SQL Server Profiler to trace the performance
2) see The Data Load Performance Guide
3) Use the Truncatestatement instead of the t-sql Deletestatement. Delete is a logged
-
8/9/2019 Interview Topics on SQL
22/22
Author: Vinay Kotha Page 22
operation which performsslower than Truncate
4) Affinitize the network
Differences between SSIS 2005 and SSIS 2008There is no difference between the architecture of both the SSIS 2005 and SSIS 2008. 2008 hassome
additional features which 2005 did not have, it can besaid that 2008 is theenhancement of features to
the 2005 version.
Look-up
In 2005 for ErrorOutput look-ups had only 3 options Fail Component, Ignore Failure and Re-direct row.
But in 2008 it has an additional feature No match Out-Put
In 2005 it did not had the Cache mode, while 2008 has 3 different Cache modes Full Cache, Partial Cache
and No Cache
2005 didnt have the Connection Manager types while 2008 hasOLE DB Connection Manager and CacheConnection Manager
CacheTransformation
2005 did not have this transformation; it is introduced in 2008 version. This is a Data-flow
transformation. Cache transformation writes data from a connected data source in the data-flow to a
Cache Connection Manager. TheLook-up transformation in a package performslookups on the data
In a single package, only one Cache Transformation can write data to thesame Connection Manager. If
the packagecontains multiple Cache transforms, then first Cache transform that arecalled when the
packageruns, writes the data to theconnection manager. The write operations ofsubsequent cache
transforms fail.Configuring ofthe Cache can be made in the following way
1) Specify theconnection manager
2) Map the input columns in thecache transform to destination columns in the Cache
connection manager
DataProfiling Task
2005 did not have this Task while it is introduced in 2008; this is a Control-flow task. It lets you analyze
data in a SQL Server database and from theresults of that analysis, generate XMLreports that can be
saved to a file or an SSIS variable. By configuring one or more of the tasks profile types, you can
generate a report that provides detailssuch as a columns minimum and maximum values, or thenumber and percentage of null values.
ScriptTaskand Transformation
2008 gives the option of writing thescriptseither in VB or C#, where as 2005 only enabled the users to
write thescripts in only VB