dba mysteries

36
Ten Database  Mysteries Chris Lawson Database Specialists, Inc. www.dbspecialists.com [email protected]

Upload: testksp4916

Post on 08-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 1/36

Ten Database  Mysteries

Chris LawsonDatabase Specialists, Inc.

www.dbspecialists.com

[email protected]

Page 2: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 2/36

Focus of Presentation

� Explore some ³strange´ database problems that have

 baffled some DBAs

� Most of the mysteries occurred on critical productionsystems, although some were on development systems

� ALL of the mysteries were eventually explained

� Depending on your personal experience, some of these

³mysteries´ will seem trivial or commonplace; otherswill indeed seem mysterious

� Most mysteries have a simple explanation

� Most mysteries have a simple fix

Page 3: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 3/36

Why Spend Time on These

Database Mysteries?� Each DBA has a unique set of experiences and biases.

What one DBA thinks is obvious, another will not.

� An Oracle ³detective´ is part scientist, part artist.Many solutions require creativity, not just logic.

� A superior DBA will look for ways to ³stretch´ and

learn ways to handle difficult problems.

� Without working out difficult problems, you will notadvance as a DBA.

� You will be the ³hero´ if you encounter a mystery and

solve it; remember the solution--you may see it again!

Page 4: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 4/36

AW

ord About Oracle Versions� This presentation was originally written in

1998

� Most of these mysteries involve Oracle 7

databases

� Although some of the mysteries might not

apply directly to Oracle 8i, they still offer insight into the problem-solving process

Page 5: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 5/36

The Case of the

Berserk ApplicationClue #1: ³Big Phone Company´ 1997

� Using HPUX, Oracle 7.3.2.3

� Help desk application (Vantive) that connects to Oracledatabase suddenly goes berserk, creating thousands of 

connections

� Program had worked normally for many months

� DBAs watch helplessly as CPU load driven from 1 to 50� As DBAs kill extra processes, more take their place

� Alert log and recent trace files show nothing unusual

� DBAs are united in accusing the application as the culprit

1

Page 6: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 6/36

Berserk Application

(continued )Clue #2: ³Big Publisher Ltd.´ 1998

� Running Sun Solaris, Oracle 7.3.2.3

� Users complain that performance has degraded in recentmonths

� Manager states that ³something must be wrong with thenetwork´

� Application is CORIS , a document management/printing

application� DBA investigates. Discovers that time to connect in

SQL*Plus is 30-45 seconds, even though server load is low

� Connect time is bad whether remote (PC) or directly on server 

� Server load (file I/O and CPU) is generally low

Page 7: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 7/36

Berserk Application: Solution

� OTRACE is the culprit. It is active by default on

many 7.3 Oracle versions

� Excerpt from Oracle Corporation Alert:

³Pr oblems descr ibed her e can occur when Or acle

Tr ace is not configur ed and is widely enabled.´

Page 8: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 8/36

Berserk Application: Solution

(continued )To Detect:

� Check directory ORACLE_HOME/rdbms/otrace: As

size of files process.dat and regid.dat approach 10mb, problems arise

-rw-r--r-- 1 oracle dba 3161424 Jun 05 09:43 process.dat

-rw-r--r-- 1 oracle dba 263808 Jun 05 09:43 regid.dat

� To correct: simply remove these two files, then issuecommand otrccref 

Page 9: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 9/36

Berserk Application: Solution

(continued )To Prevent:

� Add line to listener.ora for each database (after 

ORACLE_HOME):

( ENVS=¶EPC_ DISABLED=TRUE¶)

� Set and export environment variable

EPC_ DISABLED=TRUE for all users.Put standard profile in /etc directory

� Restart all databases and restart listener 

Page 10: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 10/36

The Case of the Reluctant PatchBackground:

� To correct several bugs, decision is made to

upgrade from 7.3.2.2 to 7.3.2.3 (HPUX)

� Patch is obtained from Oracle and applied to test

server. DBA notes that patch ran very quickly and

runs again ³just to be sure´� Bug is now gone on test server 

2

Page 11: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 11/36

The Case of the Reluctant Patch

(continued )Problem:

� Patch is similarly applied to production server--

same operating system and version.

� Production application is tested, but bug is still

there!

� Another DBA reviews patch file, location, etc.All seem correct.

Page 12: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 12/36

The Case of the Reluctant Patch:

Solution

� DBA happens to notice that upon SQL*Plus

startup, database is 7.3.2.2!

� The patch was really only applied on the second

run. This is apparently a quirk in the patch r eadme

file.

� The command what or acle (then grep for patch#)can be used to determine which patches are

applied

Page 13: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 13/36

The Case of the Sleazy SQL� ³Big Publisher Ltd.´ runs an MRP system called

³AVALON,´ similar to Oracle Manufacturing.

Database stores inventory, part information,vendors, etc.

� Server is ATT3555, running NCR UNIX.

Database is Oracle 7.1.6

� Issue: Users report that certain common operations

are very slow

3

Page 14: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 14/36

The Case of the Sleazy SQL

(continued )� DBA investigates and queries v$sqlarea using:

SELECT sql_text FROM v$sqlarea

 WHERE disk_reads/executions > 1000;

� Query yields troublesome SQL statement, with

these stats:

DISK  _ READS PER EXECUTION =5,000BUFFER  _GETS PER EXECUTION =5,100

Page 15: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 15/36

The Case of the Sleazy SQL

(continued )� Statement has been accidentally designed to ensure wor  st 

 possible per  for mance by making index usage impossible:SELECT * FROM ABC WHERE

NVL (COL _W) = NVL (:1) AND

NVL (COL _ X)  = NVL (:2) AND

NVL (COL _Y)  = NVL (:3) AND

NVL (COL _Z)  = NVL (:4);

� TABLE ABC SIZE = 3mb, about 25,000 rows

� DBA requests developers to alter statement to eliminate NVL (COL_N) functions

� DBA advised that no resources available to make change

� Problem: If code can¶t be changed, what can be done to

improve performance?

Page 16: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 16/36

The Case of the Sleazy SQL:

Solution� CACHE the table! For example:

alter table xyz cache;

� Normally, blocks from full-table scans are designated for 

rapid age-out; otherwise, they would ³wipe-out´ the db

cache. Cache of table causes blocks to be treated ³normally.´

� Caching table disables rapid age-out of this table

� Logical reads will not be reduced, but disk reads approachzero!

� Note: DB _ BLOCK  _ BUFFER S was slightly increased to

compensate for the cached table that now consumes a few

megabytes of database cache

Page 17: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 17/36

The Case of the

 Non-Optimal Optimizer 

� A large software company based in ³Cedar Shores´ has

designed a large financials application. Program has been

tuned for  Rule Based O ptimizer .

� The application runs very well, is a mature product, which

is used in thousands of companies around the world.

� Some users clamor for new features: more horns and

whistles

4

Page 18: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 18/36

The Case of the

 Non-Optimal Optimizer (continued )� The new development team, afraid to become obsolete,

wants to convert to Cost-Based Optimizer (CBO). They

also wisely consider that Oracle recommends using CBOon new projects.

� The older developers, now nearing peaceful retirement,

 predict disaster if the database is switched to CBO,

 because the execution plans will change.� Issue: How can Optimizer be selectively switched to CBO

without changing the code?

Page 19: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 19/36

The Case of the Non-Optimal

Optimizer: Solution� Simply substitute a view having a ³hint´ for the table

needing CBO

For example:

rename DEPT to DEPT_ORIG

create view DEPT as select / * + ALL_ROWS */

* from DEPT_ORIG;

� Now, application will use the VIEW when it looks for 

DEPT

� All queries using DEPT will use CBO

� Note: Upon renaming a table, the indexes and constraintswill ³move´ with the table; however, synonyms and grants

may need to be reset.

Page 20: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 20/36

The Case of the

Forgetful Memory� A new internet-transaction application, ECX  per t , and its

database have been installed on a Sun Ultra Enterprise Server 

� Sun Solaris 2.5.1, Oracle 7.2.3� Application appears to run smoothly for several months,

although it occasionally creates large dump files

� Trace files appear occasionally with ORA-4030 ³Out of Process

Memory´ and recommends ³increase process memory quota´

� Server seems to hang occasionally. Server reboot fixes

� SysAdmin checks kernel parameters related to memory. All

correct and match other servers. Not using any large stored

 procedures

� Problem: What is causing memory/hang problems?

5

Page 21: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 21/36

The Case of the

Forgetful Memory: Solution� DBA checks /tmp (swap area on server) and notes 99%

consumed 

� Investigation reveals that application occasionally goes berserk and consumes ENTIRE SWAP area with log

files

� Deletion of log files does not return disk space, since

application is still ³holding´ the files� Reboot of server cleaned up /tmp area, thereby

correcting problem

� Suggestion: If memory-related error messages exist,

check swap area first

Page 22: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 22/36

The Reluctant Index Affair Background:

� DBA asked to analyze and tune Australia manufacturing

database. Database is running CBO. One particularly bothersome SQL statement is identified

� The WHERE condition is perfect for a new index,

 because of its excellent selectivity

� Index is quickly created. Table is also analyzedProblem:

� Even though index is a ³perfect´ solution to the query, a

full table scan is used instead

6

Page 23: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 23/36

Page 24: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 24/36

Mystery of the

Hanging Database� At random intervals, a 7.3.2.3 database hangs. No tracefiles, and nothing unusual in the alert log.

� When problem occurs, no response to new connectionsrequests; over 1200 existing connections ³hang.´

� Oracle Support is alerted to priority 1 problem; expertsacross the world investigate for weeks

� DBA is using OEM Lock Manager tool and notices user who is blocking about 25 other users. The hang occurs

soon after.� Oracle Australia recommends checking indexes. This

suggestion led to the solution.

Problem:

� How did index problems hang database?

7

Page 25: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 25/36

Mystery of the

Hanging Database: Solution� Application design flaw

� There are hundreds of foreign keys in the database; 99%

had indexes. A few did not, violating good design practice.When batch program began updates, locking increased

rapidly.

� Without FK  index, updates on par ent table completely

 block updates on child (vice versa for 7.1.6)

� Reference: Server Application Developers Guide

� Although not admitted as database ³bug,´ database was

overwhelmed by the locks

� Once indexes on all FK ¶s created, problems disappeared

Page 26: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 26/36

The Case of the

Mysterious Package� Manufacturing application was installed on a Sun Ultra

3000 server. A small database was created for testing

 purposes. Oracle version 7.2.3. Shared pool size about 60mb.

� At first, all went well. Then, seemingly randomly, when

the users began to try new features, they would receive a

³funny´ error message and the application failed.

� A trace file recommended increasing shared pool

Problem:

� How can application fail with such a sizable shared pool?

� Aside from massive increase in shared pool, what can be

done?

8

Page 27: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 27/36

The Case of the

Mysterious Package: Solution� The application uses about 20 massive PL/SQL packages.

Some are 5x the SYS.STANDARD package. When a

 package load is attempted, it will not fit in the shared pool.

� Memory-intensive packages should be ³pinned´ or ³kept´

in shared pool after database startup

EXECUTE SYS.dbms_shared_pool.keep ('OBJ ECT_  NA  ME');

� But first, must find the ³big´ packages (will also listSYS.STANDARD):

SELECT owner, name, sharable_mem FROM v$db_object_cache

 WHERE sharable_mem > 100000;

Page 28: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 28/36

The Case of the

Mysterious Package: Solution(continued )� Example script to find ³big´ packages and generate

SQL script to ³pin´ them in memorySELECT 'EXECUTE SYS.dbms_shared_pool.keep('''||

owner||'.'||name||''');'

FROM v$db_object_cache

WHERE sharable_mem > 100000

AND type NOT IN ('VIEW', 'SYNONYM', 'TABLE')AND name NOT LIKE '%SHARED_POOL%'

AND owner IS NOT NULL;

Page 29: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 29/36

The Case of the

Uncooperative Rollback � In mid-afternoon, DBA (running ³OEM Top Sessions´)

notices many users ³ACTIVE´ but showing 0 file I/O. Lock 

Manager reveals one user performing big update blocking all.

� Culprit tracked down--agrees to be terminated. DBA

disconnects session.

� Locks are not released, but user is ³marked for kill.´

� Very little file I/O activity. Alert log shows very slow

switching of redo logs.

� DBA performs shutdown abort then startup. Database starts

up after 2 minutes. All is well.

Problem:

� Why did user not rollback and release locks?

9

Page 30: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 30/36

The Case of the Uncooperative

Rollback: Solution� If session is terminated, speed of rollback is proportional

to init.ora parameter 

CLEANUP _ ROLLBACK  _ ENTRIES� If default value (20) is used, rollback of killed session

can take 50x time of original update. Alternatively,

shutdown abort/startup cleans up database much faster.

� Rationale: Parameter prevents rollback of one user fromhogging all the resources on a busy system

� Solution: Increase parameter to reduce rollback time

(since shutdown abort is usually not an option)

Page 31: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 31/36

The Singular Case of the

Phantom Users� A manufacturing database in Sydney, Australia, needed

 performance tuning. SQL tuning on US databases had

yielded good results.

� The table v$sqlarea was queried to find resource-intensive

SQL statements. Several commonly-run statements were

isolated. Performance was improved through index additions.

� Statistics were re-examined over the next 4 hours, in order to

confirm improvements.

� However, repeated looks at execution statistics showed no

change.

� DBA puzzles over enigma for several hours, then realizes

that NOTHING is WRONG! What did he finally realize?

10

Page 32: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 32/36

The Singular Case of the

Phantom Users: Solution� Nothing is wrong because the users were

still asleep. It was only 5:00am in Sydney!

Page 33: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 33/36

The Case of the Slow Physician

(Bonus Mystery)� Health application is experiencing slow run times. Analysis shows

following SQL statement causing 3000 disk reads

� COSIGN_VIEW is a join of 2 tables (DOCS + COSIGN),

 joined on patient_id (indexed)

� Search criteria µDR. MCK ENZIE¶ is ver  y selective; thus,

nested loop IS expected choice for optimizer, with DOCSas Driving table.

11

SELECT * from COSIG N _VIEW WHERE doctor_id = 'DR. MCKE NZIE'

Page 34: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 34/36

The Case of the Slow Physician

(continued )� Even with index on DOCS(doctor _ id), optimizer (CBO)

insists on using hash-join, and r efuses to ever use index on

doctor_ id !

� Repeated analyze table commands do not correct

� Substituting query not using a view yields expected NL

result

Problem:

� Why does using the view cause optimizer to make the

³wrong´ choice?

Page 35: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 35/36

The Case of the Slow Physician:

Solution� Everything seemed to point to a problem with the view,

 because all worked normally as long as the view was

excluded

� Finally, DBA compared view definition (in OEM Schema

Manager) to definition seen using ³describe table´ syntax.

The columns did NOT match!

� Examining the object-create script revealed that the view

 switched column names, so that column DOC_ID in theview did NOT match DOC_ID in the table!

� Once the correct column was indexed, a Nested-Loop Join

was selected by the optimizer 

Page 36: DBA Mysteries

8/7/2019 DBA Mysteries

http://slidepdf.com/reader/full/dba-mysteries 36/36

Contact InformationChris Lawson

[email protected]

http://www.dbspecialists.com

Database Specialists, Inc.388 Market Street, Suite 400

San Francisco, CA 94111