advanced active directory design and troubleshooting ed whittington principal software engineer

Post on 22-Jan-2016

88 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Advanced Active Directory Design and Troubleshooting Ed Whittington Principal Software Engineer HP Business Critical Call Center Oct. 06, 2002. Topics. Troubleshooting Basics Troubleshooting Tools DNS Troubleshooting Troubleshooting Replication Troubleshooting DCPromo - PowerPoint PPT Presentation

TRANSCRIPT

Advanced Active Directory Design and Troubleshooting

Ed Whittington

Principal Software Engineer

HP Business Critical Call Center

Oct. 06, 2002

Topics

Troubleshooting Basics

Troubleshooting Tools

DNS Troubleshooting

Troubleshooting Replication

Troubleshooting DCPromo

Troubleshooting FRS Replication and DFS

Troubleshooting Group Policy

Troubleshooting in .NET

Troubleshooting Basics

Basic Troubleshooting Steps

Define the problem (make sure there is one)

• What’s failing?

• Client authentication and security

• Group policy application.

• Replication.

• Name resolution.

• Errors and warnings in event logs.

• FRS/DFS

• Application

• How is the problem replicated?

• One or multiple machines?

• Narrow the variables

Basic Troubleshooting Steps

MPSReports_DS (from HP or Microsoft)

Get the Log files

• Event logs

– http://www.eventid.net

• %windir%\debug\usermode\Userenv.log

• %windir%\debug\DCPromo*.log

Turn on Verbose Logging

Run NetDiag, DCDiag (verbose)

Get status report from Replication Monitor.

Basic Troubleshooting Steps

• Check DNS.

• Resolver on ALL computers.

• Name Server Properties (forwarding, etc.).

• Monitoring tab – test name resolution.

• Nslookup, ping to test name resolution.

• Ping SRV records.

• Check Replication.

• Force replication.

• Identify who isn’t replicating to whom.

• Outbound vs. inbound.

Basic Troubleshooting Steps

If all else fails, try demoting.

• Really cleans up a lot of problems… If problem is isolated to one DC.

• If replication isn’t working, demotion won’t work.

• Reinstall to remove the AD, then clean up AD

• Ntdsutil to remove server object.

• Delete server object from Sites & Services.

• Delete FRS server object from System container.

• Can manually demote a DC.

Manual Demotion of a DCHKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet

\Control\ProductOptions

Product Type=

– ServerNT (when the computer is a Member Server)

– LanManNT (when the computer is a Domain Controller)

• Change from LanManNT to ServerNT

It’s now a “dirty” member server

Clean server objects from the AD (Ntdsutil)

Clean up the disk and Registry

1. Create new Forward Lookup Zone – Bogus.com

2. Run DCpromo – create new forest for Bogus.com

3. Demote and eliminate Bogus.com

4. Wait for Replication

5. Promote back into domain – use same name if desired

Tool in Windows .NET

Troubleshooting Tools

Gathering Information

Netdiag.exe

NETDIAG.EXE

/v - verbose – always turn this on.

/l - log – writes netdiag.log to default directory.

/d:domain controller – finds DC in domain.

/test: - runs only specified tests.

/skip: - skips specified tests.

Can’t execute remotely.

C:>netdiag /v /l

Netdiag.exe

Domain Controller Discovery

Bindings, IP address, Default Gateway tests

DNS tests

NBTstat and WINS ping

Netstat

Route

Trust

Kerberos

Dcdiag.exe

DCdiag /v

Domain controller functions of netdiag

More domain-specific

FSMO roles

Connectivity

Replications

Domain controller locator

Intersite “health”

Topology integrity

Nltest.exe/server:servername Sets default server

/dsgetdc:domainname Dsgetdcname API

[ /gc /timeserv /ldap ]

/dclist:domainname Lists DCs in domain

/parentdomain Lists parent domain

/dsgetsite Lists site of server

/dsgetsitecov Lists DC “covering” site

/dcname:domainname Lists PDC for domain

/dcpromo Tests potential success of DCPromo

/whowill:domain user Returns name of DC that will authenticate user

Netdom.exe

/join

/add

/reset

/resetpwd

/query FSMO

/trust

NTDSUtil

• Built-in utility.

• Directly accesses Active Directory.

• Authoritative Restore.

– Can restore an older version of the AD and force it on all DCs to correct variety of problems.

– Entire AD or single tree.

– Can’t restore the schema.

• FSMO Roles.

– List, Transfer, Seize roles.

– Better than UI – can manipulate all roles in forest and all domains from one utility..

NTDSUtil

Metadata Cleanup

– Delete orphaned objects.

– Servers

– Domains

– The UI can and will lie to you! Don’t trust it.

Useful tool for listing contents of the AD

– Sites, domains, servers, FSMO role holders.

– Domains in site.

– Servers in domain, servers in site.

Q216364, Q216498, Q230306

Gpresult.exe

Run on client

Returns:

• Security group membership

• User and Computer policy info

• GPOs applied to each

• Registry settings set in the GPO

• Client-side extensions set– Scripts applied

Remember

• Policy is cached – reboot / login to clear

• Note who authenticating server is– Environmental Variable “logon server”

Much Improved in .NET!

GPOtool.exe

Run on domain controller.

Returns:

• Analysis of all GPOs in domain.

• GUID and friendly name of all GPOs.

• DS and Sysvol versions.

• Errors encountered.

Good group policy troubleshooting tool.

May take a long time to process (#GPOs)

ADSIedit.exe

GUI much like Users & Computers snap-in /Advanced features.

Graphical view of AD.

Like LDP.exe but:

• Easier to browse.

• Can modify attribute values

Don’t confuse with Users & Computers!

LDP.exe

Takes time to set up:

• Connect

• Bind

• View – Tree

• Enter DN to start (blank for default)

Exposes attributes quickly, easy to see.

Faster than ADSIedit – no GUI to traverse.

LDAP searches.

Can delete and modify, but not as easy as ADSIedit.

Can execute remotely.

DCPromo.log, DCPromoui.log

Located in %systemroot%\debug.

Logged every time dcpromo runs.

DCPromo.log

• Shorter.

• Appended (read bottom up).

DCPromoUI.log and DCPromoUI.xxxx.log

• Results of what is seen in the UI – longer.

• Find: Results of getdsdcname, DNS query, Time service sync, authentication, replication, Site info.

• Error (0x0) = success – no error .

Error reporting different – read both logs.

Userenv.log

Located: %systemroot%\debug\usermode

User environment info:

• Group policy (registry)

• Client side extensions– Scripts

– Security

Increase verbose logging (Q221833)

Take time – read and study and you may be surprised at what you can find!

Additional User Mode LogsClient-side extensions

• Registry see Q216357HKLM\software\Microsoft\WindowsNT\currentversion\winlogon\ GPExtension

• Errors created in %windir%\debug\user mode– Named after the .dll

– Scripts = Gptext.dll = gptext.log

– Folder Redirection = fdeploy.dll = fdeploy.log

– Security = scecli.dll = winlogon.log

– Q245422

– Produced automatically on error (except winlogon.log)

– Check User Mode directory for these files

• Invaluable in debugging. Use them!

Client Side Extensions (registry)

Windows .NET Troubleshooting Tools

Remote Desktop Resource Redirection

Client Resources Available when using Terminal Services Remote Desktop

• File System – Local drives and Network drives on Local Machine available on Remote machine

• Audio – Audio streams such as .wav and .mp3 files can be played through the client sound system.

• Port – Applications have access to the serial and parallel ports

• Printer – The default local or network printer on the client becomes the default-printing device for the Remote Desktop.

• Clipboard – The Remote Desktop and client computer share a clipboard

• Terminal Services Virtual Channel Application Programming Interfaces (APIs) are provided to extend client resource redirection for custom applications.

WMIComputer management

Active Directory

• Provider: MicrosoftActiveDirectory

• Classes:– Replication - See replprov.mof %windir%\system32

Trust health

• Provider: MicrosoftHealthMonitor

• Classes: see system32\wbem\trusthm.mof

DNS

• Provider: MicrosoftDNS

• Classes: system32\wbem\dnsprov.mof

Cluster

• MSCluster

Also look in CIM Studio in MSDN

WMIC Sample CommandsLook in %windir%\system32\wbem *.mof files for names of providers, classes, etc.

Active Directory

• Provider: MicrosoftActiveDirectory

• wmic:/namespace: \\root\microsoftactivedirectory PATH msad_replneighbor

(shows replication partners)

• wmic:/namespace:\\root\rsop\user path RSOP_GPO

(lists GPOs with User settings)

Admin Tool ImprovementsUsers and Computers snap-in

• Drag and drop.

• Multi-select and edit user objects.

• Heavily revised object picker.

Users and Computers, Sites and Services, DNS Snap-ins

• Saved queries.

• Viewing Saved DS, DNS, FRS eventlogs on non-DCs!

.NET Adminpak (only on XP)

Command Line Tools

GPresult

• Enhanced reporting

DCDiag

• dcdiag /test:DCPromo

Repadmin – enhanced reporting

Netdom – computername for DCrename

Others

Shipped on

• Service Pack 2 CD (install manually)

• .NET Server, AdvSvr CD

Windows .NET Improvement to NTDSUtil

Change Offline, DS Repair Mode Password While Online!

NTDSUtil

• Set DSRM Password (main menu)

Increases server up-time limited by password change interval in Win2K.

• (Had to reboot to DS Repair mode to change.)

• Q223301 (Win2K limit)

Cool error message!Setting password failed.

WIN32 Error Code: 0x6ba

Error Message: The RPC server is unavailable.

See Microsoft Knowledge Base article Q271641 at

http://support.microsoft.com for more information.

Errors in Windows .NET Kinder, Gentler and Report to Microsoft

Active Directory Load Balancing Tool

Does the job of branch office deployment.• KCC chooses BHS for connection objects – choose the same one.

• Tool allows you to spread the load to other DCs in the site (that have that NC).

• ADLB tool modifies the Hub DC’s replication schedules to spread it out over time.

• Generates a log – like replmon’s status log.

• For Deployments with hundreds of branch offices all replicating to a single hub..

• Tool=no benefit to sites with only one DC per domain.

Future: Graphical Replication Monitoring Tool

Very much like ‘Age of Directories’

Ability to make configuration changes

Not in .NET - maybe Longhorn or Blackcomb?

Troubleshooting DNS

DNS Resolver Configuration

Win2K clients, servers point to Win2K DNS Name Server that is SOA for their zone.

• Don’t point to ISP, other Internal NS.

(even as “additional”.)

• Keep it simple.

Win2K Name Servers forward to ISP or internal name server hosting registered domain.

DNS Name Server Configuration Basics

• Dynamic updates = Yes. • Active Directory Integrated Zone

• Select one “Primary”• All other ADI Primary NS point to it for DNS

• Win2k Name Servers can:• Forward to ISP or Internal NS.• Use root hints (or modify root hints).

• Reverse Lookup Zones NOT required• Needed only for tools - NSLookup

ADI Primary and Standard Secondary mixed zone• Only a DC can host an ADI primary zone• Member Servers can host Secondary zone

• Synch off of an ADI Primary

Secondary

Secondary

ADI Primary

ADI Primary ADI Primary

DNS Case Study

sa.corp.net eu.corp.netna.corp.netcorp.net

na.corp.net

sa.corp.net

eu.corp.net

Zone xfersZone xfers

ForwardingS

eco

nd

ary

zon

es

DNS Case Study

sa.corp.net eu.corp.netna.corp.netcorp.net

eu.corp.net

sa.corp.net

na.corp.net

find na.corp.net

With Conditional Forwarding FeatureIn Windows .NET Server…

sa.corp.net eu.corp.netna.corp.netcorp.net

find na.corp.net

Problem: SRV records only in Root domain

corp.comw2k.net

= Forwarder

NA.w2k.net EU.w2k.net

corp.com

= Zone Xfer

Location of SRV:

PDC

GC

Cname

Solution: Delegate _msdcs zone

corp.com

_msdcs

_tcp

_sites

_udp

w2k.net

= Forwarder

NA.w2k.net EU.w2k.net

_msdcs

= Delegation

Location of SRV:

PDC

GC

Cname

DNS Hotfix

Symptom: Replication breaks

Configuration: Using Secondary Zones for root _msdcs at child domains.

Problem: Serial Number of Secondary zone is higher than the primary – zone transfers stop.

Hotfix Q304653 • The Serial Number Is Decremented in DNS When You Reboot

• Solved in .Net

DNS Troubleshooting Basics• Check DNS event log (and others).• Check Location of DNS servers.

• Usually want Name Server in remote sites.• Check population of SRV records.

• _msdcs; _tcp; _udp; _sites• Need Kerberos, LDAP records for each DC.• Correct address, etc.• Can delete, repopulate by restarting netlogon.

• Check Delegations – correct names, IP.

DNS Troubleshooting Basics

• Use of Active Directory Integrated (ADI) zones.• Put standard secondary zones on mbr svrs.• Can clear problems by switching to Std Pri.

• Ping DC by SRV record:• ping <guid>.site._msdcs.compaq.com.• Clear the server cache.

• Negative Caching problems.• Test – Server Properties – Monitoring tab.• Test – Ping names, NSLookup.

Troubleshooting AD Replication

Replication Troubleshooting Tools

Event logs – Directory Services, System

Sites and Services snap-in

Age of Directories (AOD) – HP

Replication Monitor

Aelita Event Admin

NetPro Directory Analyzer

Command Line (Support Tools & Res Kit)

DCdiag, Netdiag

Repadmin.exe

Event Logs for Replication Troubleshooting

Directory Services Log

• 5778 - Subnets not mapped.

– Will break client’s “site awareness.”

• 1311 - serious - Not enough connectivity.

– Connectivity, traffic issue.

– Sites with DCs and no site links.

– Site topology incorrectly defined.

• DNS Lookup failure.

• 1772 – RPC Server is unavailable.

– Physical connectivity.

– DNS.

Event Logs for Replication Troubleshooting

System Log

• Netlogon errors

– Authentication

– Trusts

– Secure channel

• w32Time errors

– Kerberos authentication required for replication

– DCs must be no more than five minutes out of sync.

– Watch time zones!

Sites and Services Snap-in

Check for duplicate connection objects.

• KCC generating >1 connection between 2 DCs.

• Delete all connections and select “check replication topology” option to regenerate them.

• If they come back, find out why.

– Usually a DNS problem.

• Breaks FRS and AD replication.

Sites and Services Snap-in

Check for sites with no DC’s…

• OK to have a site with no servers if you plan it that way.

• If there should be a server in that site, find it and move it there.

Make sure all subnets are mapped to correct sites.

• Keep up on IP addressing changes.

Sites and Services Snap-in

Make sure site links are correct.

• Link correct sites per design (need a drawing).

• Cost, schedule, replication frequency.

Force replication between DCs.

• All connections are inbound.

• Use “check replication topology.”

• Create new site, user named for the DC.– Checks Configuration NC and Domain NC.

– Force Replication Between Replication Partners.

– On DC1 from DC2 and on DC2 from DC1.

Sites and Services Snap-in• Validate inbound, outbound replication on all DCs.

– Create new site, user named for the DC.

– Checks Configuration NC and Domain NC.

– Wait for replication (don’t force it).

– Check each DC for copy of these users, sites.

DC1 DC2 DC3

User Site

DC1 DC1

DC2 DC2

DC3

User Site

DC2 DC2

DC3 DC3

User Site

DC1 DC1

DC3 DC3

Check Cname DNS Records

• In root _msdcs zone (only), alias record mapping DC’s FQDN to its server GUID.

Only one record.

– Delete duplicates.

Match GUID in alias record to GUID reported by Repadmin /showreps.

If in doubt, delete DC’s Alias record(s) and re-start netlogon on broken DC to re-register .

Age Of Directories Tool - Demo

If interested, contact me ed.whittingtonn@HP.com

Replication Monitor

Status report (replication health report)

List of all GCs, BHS, Trusts

List of all replication errors on all DCs in domain

Changes not replicated

Replication partners

Force push/pull replication

Meta-data

Group Policy Object status

FSMO validation

Inbound connections (including reason)

Replication Monitor

Command-Line Utilities

RepAdmin

• In Support Tools.

• Perhaps the most useful tool for troubleshooting replication.

• /showreps - lists inbound, outbound connections.

– Only one to list outbound connections.

– Lists Server GUID (used for replication).

– Lists successful replication messages.

– Lists replication errors.

– Lists Replication partner used to replicate every naming context – inbound and outbound.

NTDS Diagnostic Logging

HKLM\system\CCS\Services\NTDS\diagnostics

• Set value = 0-5

– 0 = off 5=very verbose

– Start with 3 to begin with

– Reported in Event log

• Important Values

1 Knowledge Consistency Checker

13 Name Resolution

5 Replication Events

8 Directory Access

9 Internal Processing

18 Global Catalog

Things that break Replication(or indicate that it’s broken)

Duplicate connection objects

Orphaned objects

• Esp. DC objects, caused by a DC being removed from the domain without successful DCPromo.

• Garbage Collection initiated manually before all DCs and GCs are fully replicated.

• Reported in event logs.

Things that break Replication(or indicate that it’s broken)

DC unavailable

• Down

• Name Resolution

• Network problem

DNS misconfigured

• TCP/IP addresses change

– Delegation

– Client resolver configuration (including name servers)

– DHCP scope configuration for DNS registration

• Failure to Contact a DNS server (for SRV records)

Things that break Replication(or indicate that it’s broken)

KCC doesn’t do it’s job

• Routes around inaccessible DCs by creating duplicate connection objects.

• When DCs come back on line, KCC should clean up the duplicate connection objects.

– Usually doesn’t…

– Causes replication errors.

– Events in the DS Log.

– Need to clean them up manually.

Lingering Object Behavior

Basics

Scenerios

Object Deletions

Deleted objects turn into tombstones• Tombstones replicated to other DCs• This is how replication partners learn that an object was deletedTombstones purged from local database after tombstone lifetime has expired• AD: 60 days, adjustable (2 days minimum)• Sysvol: 60 daysIf tombstone does not replicate to a DC, object deletion is not replicated• Object not deleted on this DC• Object is now a Lingering Object• Can be on DC or GCRule: tombstone lifetime =• Max time DC can be disconnected• Max lifetime of Backup tape

Lingering Objects – Scenarios

Deleted object re-appears on all domain controllers in a domain and on all GCs

Deleted account does not disappear from Exchange GAL

Object was moved between domains and disconnected GC is brought online

Replication error on GC when new object is created

• Lingering object still holds attribute where uniqueness is enforced (samAccountName)

• Exchange cannot create mailbox because object already exists

Why does this Happen????

DCs disconnected for more than tombstone lifetime

• Left in storage room for long time

• Replication failures– I.e., bridgehead servers overloaded, no monitoring in place

• WAN connections down for a long time– Tombstone lifetime abuse

– “Somebody” changed time on a DC to garbage collect an object

– Tombstone lifetime was changed to garbage collect objects on single servers

Can this be avoided?

• YES, monitor KCC topology and replication

• Do not set tombstone lifetime to less than 60 days

• DCs offline > tombstone lifetime must be re-promoted

Lingering ObjectsStrict vs. Loose Replication Behavior

Replication Behavior

• Defines how DC reacts if an update for an object is replicated in, and the object does not exist on DC

Loose Behavior

• DC requests full copy from replication source

• Logs event ID: 1388 Strict Behavior

• DC stops replication from offending replication source

• Logs error code 8240 (ERROR_DS_NO_SUCH_OBJECT) embedded in event ID 1084

• Requires logging level 1

Behavior can be set via registry key• HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NTDS\Parameters\Strict Replication

Consistency

• Introduced in Q314282

Deleting Lingering Objects

If found on a DC• In loose behavior: Delete the object via users and computers• In strict behavior: Follow procedures outlined in Q314282On GC (in read-only NC)• Object cannot be changed or deleted on GC• Solution 1: Delete object on writeable replica (if possible)• Solution 2: Use ldp to delete the object on the GC

– Support to remove lingering objects from GC added in Q314282– Follow procedures outlined in Q314282

You might have to set loose behavior temporarily

Best Practice Recommendations

DC has not replicated for more than 60 days

• Tombstone lifetime default (60 days)

– Do not replicate, re-install OS

• Tombstone lifetime adjusted to > 60 days

– 60 days < time DC disconnected < tombstone lifetime

– Re-connect DC, restore sysvol

– Time DC disconnected > tombstone lifetime

– Do not replicate, re-install OS

If you have to disconnect a DC

• Make sure that it replicates successfully before you take it off-line

New deployments

• Add registry key to enforce strict replication behavior at DC OS installation time

More Best Practice Recommendations

Existing deployments

• Default setting: Loose replication (even on SP3)

• Goal: Get to strict mode asap

• Set registry key to strict mode on all DCs

• Watch event logs on DCs

– If you get many replication errors on single DCs, re-promote DC

– For small number of replication errors, clean-up the DC

– Delete lingering objects if necessary

– Follow procedures outlined in Q314282

• If you were monitoring…

– Then don’t worry, you won’t see any replication errors

Don’t lower tombstone lifetime to less than 60 days

Monitor!

Lingering Object Fix

Q317097 (good instructions)

HKLM\System\CurrentControlSet\Services\NTDS\Parameters…

• Add Value Name = Correct Missing Object

• Data Type =REG_DWORD

• Value = 1 (tight)

0 (loose)

Allows or Restricts AD replication when lingering objects are discovered.

• Tight when you want to know.

• Loose to inventory and remove the objects.

WNT: Object Replication

• change to attribute or value

W2K: Attribute level replication

• Better than NT (more efficient)

• Change to attribute replicates attribute

• Change to value replicates attribute

• Problem: Multi-Valued Attributes– Group = Attribute

– Member = Value

– Change Member = replicate attribute with all members

– Impacts network traffic

– Limit (per Microsoft) of 5,000 users/group

.NET: Value Level Replication

• Replicates values – not attributes

• Eliminates 5,000 user/group limit

Value Level Replication

Domain Limit

There is a limit of about 800 child domains to a single parent

Child domains are unlinked, multi-valued attribute – stored in the crossref attribute of the domain object

Jet database limits the data that can be stored. No way to patch – must change Jet

“Might” be improved in Longhorn (not Whistler)

Domain Limit

One customer got to 900 domains

• Replication failed

• Authentication failed

• Mission critical application failed

Temporary Repair

• Demote all domains in reverse order of creation to return to 800

• Fixed Replication

Solution

• Redesign and redeployed to a single domain

DCPromo Troubleshooting

DCPromo Basics

First Test of:

• DNS registration and resolution .

• LDAP query and response.

• Kerberos authentication.

• Active Directory replication.

• FRS replication.

• Application of group policy.

Validation and Flow …

• Chapter 2, Active Directory Data Storage in the Windows 2000 Resource Kit

DCPromo Logs

%windir%\debug• Dcpromo.log

• Dcpromoui.log

• Dcpromoui.xxx.log

Set verbosity on dcpromoui.log• HKLM\Software\Microsoft\Windows\CurrentVersion\AdminDebug

• Values: DCpromo and DCPromoui

• Data– 380001 = Default

– 0xFF003 – full file and debugger logging output

– 0xFF001 – maximum detail to DCPromoui.log

DCPromo Phases

Initialization

• UI Input - DNS Name resolution

• LDAP Query/resp - Kerberos Authentication

AD Replication

FRS Replication

Wrap Up

• Apply policy - Upgrade Trusts

• Publish new DC in the DS

Initialization Phase

Authorization error

• Enterprise Admin required to create new domain (or to remove the last one).

• Domain Admin required to add replica DC (or demote a replica).

Can’t find DNS with Dynamic Updates.

• Prompt to let DCPromo configure DNS.– Creating domain.

– Answer NO!

Replicas, Child – must find DNS server to locate a “sourcing DC.”

Errors Creating the Computer Account

Need privileges to create the account.

First creates the account, puts it in domain/computers container.

Then puts it in domain controller’s OU.

Source DC identified in DCPromo logs.

DCPromo Initialization Checklist

Privileges required

• Enterprise Admin if creating new domain.

• Domain Admin if creating a replica.

System time configured properly

• Kerberos requires sync within five minutes.

• All parent, child domain DCs.

Sufficient free disk space.

• ~850 MB

Domain Naming Master FSMO required if creating new domain.

DCPromo Initialization Checklist

Everyone or Enterprise DC group has “Access this computer from network”

Enterprise DC group rights:

• Manage Replication Topology.

• Replicating Directory Changes.

• Replication Synchronization.

Sourcing DC

• Security policy applied.

• Enable Computer and user account to be trusted for delegation.

DCPromo Initialization Checklist

Target DC has valid Kerberos tickets.

• Kerbtray.exe utility from Resource Kit.

GC must be contacted.

• Nltest /dsgetdc:compaq.com/GC

Able to contact a functional existing DC.

• Uses UDP (watch for firewall issues).

– Can use TCP but it’s a Microsoft Secret!

• Use Ping, NLTest, Nslookup to find a DC.

If Source DC not Reachable...

See if one responds.

• Ping FQDN of domain (Ping compaq.com).

• NLTest /dsgetdc:compaq.com /ds

– Other: /gc /pdc /timeserv

• Check Site mapping for this computer.

– Nltest /server:<name> /dsgetsite

Check Dcpromoui.log to see source.

Force DCPromo to use a specific source

• Q224390

• Turn off Netlogon on other DCs.

Join the Server to the domain then DCPromo.

Info to Collect for Debug

Netdiag /v

• Problem DC

• Source DC (see dcpromo.log)

DCDiag /v

• Source DC

Replication working? (other DC in site)

AD & FRS Replication PhasesInitially inbound connection created to replicate from source DC.

• Machine acct (DC1$) moved to DC OU.– UserAccountControl Attribute set

– 4096 (1000 hex) = Workstation/Server

– 532480 (82000 hex) = DC

– Account is moved.

• Error: DC1$ not found, access denied, etc.– Credentials of account running Dcpromo

– Source must have computer object.

– Source must have security policy applied to itself.

– Q250874

AD & FRS Replication PhasesAfter first reboot…

• Outbound connection created.

• AD changes for new DC replicated to source.

– Including UserAccountControl attribute.

– Server (Replication) object.

– Replicated to other DCs.

• Sysvol is populated (policies copied to new DC).

• Sysvol and Netlogon Shares created.

Troubleshooting Missing Sysvol, Netlogon Shares

Outbound connection failed

• Look in Sites and Services or Repadmin

• UserAccountControl still 4096 on source

[Q257338] – Good but …• Build manual “outbound” connection• Force KCC to “Check Replication Topology”• Check UDP traffic if in a remote site.

Missing Sysvol and Netlogon Shares

Create replication “links” manually then force replication:

• Repadmin /add (adds outbound link)

• Repadmin /sync (forces replication)

Can’t create them manually. When Replication is fixed, they’ll get created.

Tracking Down a GUID

Problem: GUID referenced in event log. What is it?

Solution: (Q216359)

• LDP – search for the GUID

• Search.vbs in Support tools

Orphaned Object (will kill replication)

• Turn up NTDS diagnostic logging

– Internal processing

– Replication

• Find object (GUID) in event logs

• Delete it via LDP

DCPromo Improvements in Windows .NET

Install From Media (IFM)Source Replica AD from Media in DCPromo

• GCs or DCs (Replica only).

• No initial replication from a DC.

– Faster (no searching for a DC).

– Less network impact (No full sync on the WAN).

– Easy branch office installation.

• After initial load, replicates changes.

• Network connectivity still required.

• Unattended Answer File Support:

– ReplicateFromMedia

– ReplicationSourcePath

Install From Media (IFM)Unattended Answer File Support

• ReplicateFromMedia

• ReplicationSourcePath

Media must be local drive.

Media useful life < 60 days.

How?Use Backup Files/Media

• Create first DC in domain.

• Back up DC.

• Restore to Media (local disk, CD, …).

• C:>dcpromo /adv.

• Wizard produces an additional screen…

DCPromo Answer FileSee Q223757[Unattended]

Unattendmode=fullunattended

[DCINSTALL]

UserName=administrator

Password=Password3

UserDomain=corp.net

DatabasePath=c:\windows\ntds

LogPath=c:\windows\ntds

SYSVOLPath=c:\windows\sysvol

SafeModeAdminPassword=Password2

CriticalReplicationOnly

SiteName=Seattle

ReplicaOrNewDomain=Replica

ReplicaDomainDNSName=corp.net

ReplicationSourceDC= ! Leave this blank for IFM

ReplicateFromMedia=yes

ReplicationSourcePath=e:\DSrestore

RebootOnSuccess=yes

File Replication Service (FRS) Basics

FRS Background

File Replication Service

• Replicates file system portion of policy

• Optional replication engine for DFS

Concepts

Challenges

• Journal wraps

• Staging File backlog

• Reconciliation / Morphed Directories

Concepts

Objects in DS

• Members, Subscribers, Conn. objects, filters

• Depends on AD replication

• Determines partners and schedule

NTFS USN Journal

• Used by FRS to track changes to NTFS volumes

Staging File and Directory

• Rename safe

• Compression support

Database

• Record of incoming, outgoing & existing files

File Replica Service (FRS)

Replaces NT 3.X\4.0 LMREPL service

Replicates SYSTEM Policy, Group Policy, DFS

• Group policy templates

• Ntconfig.pol & logon scripts for down-level clients

– NETLOGON Share

• DFS share contents

Multi-threaded replication engine

• Replicate different files to different computers simultaneously.

Terminology

• Computer A and B replicate DFS+SYSVOL

• B is computer A’s outbound partner

• A is B’s inbound partner.

• A is B’s “upstream” partner

• Changes flow “downstream to B

Computer A

Computer B

Upstream Downstream

A’s Outbound partnerB’s Inbound partner

Replication

Basic Operation

3Notify Replication partners (replicas)

of changes

1 DC1

GPO

Change created on DC1

GPO

2Temp File moved

to staging directory

Pull

Partners pull changes from DC1

DC2

4

File and Folder Filters

Excluded from FRS Replication:

• Computer specific EFS files/folders

• File names beginning with ~

• Files with .bak or .tmp extensions

• NTFS Mount Points

• Reparse points

Configurable for DFS shares

The Replication Process

GPO

\winnt\sysvol\sysvol\compaq.com\policies \winnt\sysvol\

staging\domain

\winnt\sysvol\staging areas\compaq.com

DC1

Notify Partners

AD Object version updated

The Replication Process

GPT.ini

/\winnt\sysvol\sysvol\compaq.com\

policies

DC2

/\winnt\sysvol\sysvol\

DO_NOT_REMOVE_ntfrs_PreInstall_Doma

in

Pull Sysvol version of

GPO updatedDC1

FRS Replication

Observe File Replication Process

• Edit a group policy – modify and save it.

• Copy of changed file goes to staging and staging areas directories.

• Copied to staging/staging areas directories on other DCs..

• Moved to sysvol\sysvol directory on the DC.

• Group policy file is updated.

Distributed File System (DFS)

DFS BasicsDomain-based (Win2K) vs Standalone (NT)

Root

• Must be on a DC.

• Contains PKT.

• DFS service.

Replica

• PKT from DC, stored locally.

• DC or Member Server.

FRS Replicates Data between DCs

• Member servers DFS replicate data to share via DFS service.

Site Aware (clients locate “closest” DFS Replica)

The DFS Replication Process

DC2

Replica

DC1 - Root

SVR2

Replica

SVR1

Replica

DataData

DFS service

FRS

Data

DFS Troubleshooting

Symptom: Shared folders not in sync.

Make Sure DFS service is started on all servers and DCs.

Make sure AD Replication is working.

Make sure FRS is working.

DFSUtil.exe.

Watch for applications that keep files open.

• Anti-virus.

• Defragmenters.

FRS TroubleshootingTechniques

Basics

Remember…

• You MUST install latest service pack and hot fix.– Post SP2 (SP3) Hot fix Q307319

– Don’t go any further until this is installed.

• “Multi Master” characteristics replicates changes (and problems) quickly. Turn off the FRS Service to get control.

• FRS depends on AD Replication, which depends on DNS.

Diagnostic ToolsEvent Viewer: FRS log, DS Log

NTFRSutl.exe

• /outlog – outbound logs

• /inlog – inbound logs

• /ds – directory service

NTFRSxxx.log in \winnt\debug

NTFRS Health Check utility

• HP, Microsoft

Netdiag, DCDiag

AD replication tools

FRS Replication

What happens if it breaks?

• Changes not replicated to all DCs, resulting in inconsistent AD

• Group policy gets out of sync and may not get applied.– GPOTool: Version mismatch

• Logon scripts don’t get applied.

• DFS shares out of sync.

FRS ReplicationHow to tell if it’s broken

• Events in FRS log

– Event 1000, 1001 in app log every five minutes.

• Files backed up in staging areas

– Get size of staging directories (MB).

– Get date of oldest file (how long it has been broken).

• Group Policy not applied (new changes)

Ensure DNS is working.

• DNS Lookup Failures in events (description).

• Ping, Nslookup to resolve names.

– Domain name

– DC, Server names

Ensure AD Replication is working.

• Create New Objects and see if they replicate.

• Repadmin/showreps and /showconn

• DS Event Log

• DCDiag

Replication Problems

Staging Areas should have no files

• Common FRS problem.

• Check size of dir, date of files.

Ensure FRS is working.

• Create text file on each DC, named for the DC.

• Put it in \winnt\sysvol\sysvol\<domain name>.

• All DCs should have copy of all DCs’ text files .

Replication Problems

FRS Event Log

• 13508 – Normal…but watch them

• 13509 – success after having 13508s

• 13514 – When Sysvol share not created “FRS preventing computer from becoming a DC”

• 13553,13554 – FRS successfully added computer to replica set (DCPromo successful)

• 13557 – Duplicate Connection Objects

• 13522 – Staging area full Q264822

• Lots of KB Articles: Search for “FRS and Event”

Replication Problems

\WINNT\DEBUG

Identify errors, warning messages and milestone events in the log files

Very difficult to interpret

Interpreting the Logs NTFRS_000x.log

NTFRSutl.exe

Ntfrsutl inlog = Lists inbound log

Ntfrsutl outlog = Lists outbound log

Ntfrsutl sets = Lists replica sets

Ntfrsutl DS = FRS’s view of the DS

Can execute remotely:

Ntfrsutl sets DC1

Group Policy Troubleshooting

Group Policy Troubleshooting BasicsPolicy isn’t getting applied

• Set something easy – Admin Templates

– User Settings: Log off/on

– Computer Settings: Reboot

• Client-side extensions act as separate policies – debug separately from Admin Templates

– Folder Redirection

– Scripts

– Disk Quotas

– Security

– IE Branding

– EFS Recovery

– IPSec

– Application Management

Group Policy Troubleshooting Basics

Policy applied, but settings not effective.

• Userenv.log (verbose) Q221833

• Set Diagnostic logging Q186454HKLM\software\Microsoft\WindowsNT\CurrentVersion\Diagnostics

Value: RunDiagnosticLoggingGroupPolicy

Value Type: REG_DWORD

Value Data: 3 (value 0-5 0=off)

– Change One setting in GPO

– Logoff/on or reboot

– Verbose info in Application log

– Lists all registry settings applied to user

– Turn it off afterward – fills the event log fast!

Gpresult.exe

Resource Kit command-line utility.

Reports applied policy for user, computer.

• DN

• Security groups

Verbose mode – gpresult /v

• Registry settings

• Computer: Client-side extensions.

WATCH:

• Logon server.

• Cached policy on client may mask solution.

• Refresh Policy – make sure it’s applied .

GPOtool

Resource Kit command-line utility.

Run on DC only.

• Version Comparison: AD vs. Sysvol.– AD version set immediately on change.

– Sysvol version set after FRS Replication.

• Friendly name /GUID associationPolicy {08FAB736-9628-41D5-B5A8-37A0F98D7E43}

Policy OK

Details:

------------------------------------------------------------

DC: Qtest-DC2.qtest.cpqcorp.net

Friendly name: Folder Redirection Policy

Solving Version Mismatch

Small mismatch is normal.

• After change until FRS Replication completes.

• Be patient – see if it resolves.

Big mismatch is bad.

• Prevents application of policy.

• Unreplicated changes.

• Manually set FRS version = AD version.– %windir%\sysvol\sysvol\<domain>\policies\{guid}\gpt.ini

– Will lose changes.

Resetting Default Domain Policy or Default DC Policy

These policies are always same (GUID).• Default Domain: {31B2F340-016D-11D2-945F-00C04FB984F9}

• Default DC: {6AC1786C-016F-11D2-945F-00C04FB984F9}

Changes are a mess – need to restore default.

To restore security defaults only, import the BasicDC.inf template (Q258595).

If settings are hosed, copy an original copy of the policy to winnt\sysvol\sysvol\ <domain>\policies.

• Copying policies only supported for these two cases.

• Other will have different GUIDs.

• Can’t copy other policies from one forest to another for debug.

How to copy the Default Domain and Default DC policy

1. Get a copy of a clean, default policy folder.

– Restore the policy folder (GUID) from backup.

– Create new domain and copy the GUID folder from that machine .

– Don’t zip it .

2. Delete existing policy.

3. Wait for replication.

4. Copy new policy folder to winnt\sysvol\sysvol\<domain>\policies.

5. Wait for replication.

6. Run GPOtool to make sure it shows up on all DCs.

Unable to Edit Group Policy

Group policy changed on PDC by default.

If PDC is not available.

• Dialog: Change on any DC, current DC or not.

• Error: Unable to contact Domain (no DC).

Solution: Transfer or seize the PDC role to another DC.

Can set policy to NOT use PDC …. Don’t!

Using Userenv.log to solve Group Policy problems

Turn on Verbose Logging Q221833

interpreting group policy information in userenv.log

Debugging Logon Scripts (script doesn’t apply)

Configure it via group policy snap-in.

Make sure policy is applied.

• Set a desktop setting.

• Use Gpresult /v.

• Enable verbose logging for Userenv.log.

Turn on “Run logon scripts visible.”

Create simple logon script as a .bat file to make sure it’s not the script failing.

Example: Using Userenv.log to find script errors.

Can’t find FSMO Role Holder

Problem: Operation trying to contact a FSMO role holder – PDC Emulator or…?

• Can ping by name – seems to be ok

• Operation can’t find it

Solution:

• Find out who has that role:

netdom query fsmo

(returns a quick list)

• Transfer the role to a local DC

Group Policy Refresh Anomaly

Users complain of a 5-25 second “hang” intermittently in any application – Outlook, Word, 3rd party apps. Keystrokes are buffered and they can continue to work

Noticed direct correlation between the 1704 events (GP Refresh) and the “hang”.

Change refresh interval via group policy and the frequency of the “hang” changed.

Group Policy Refresh Anomaly

Cause: SceCli applies group policy every 16 hrs (default) if no gpo changes have occurred. (DCs are every 5 minutes)

• Broadcasts WM_settingschanged to all top level windows

• Wakes up sleeping processes causing massive paging in/out of memory – causing hangs

• More pronounced on “slower” computers

Solution: Configure Policy Refresh Interval in Group Policy so refresh occurs every 12 hrs at midnight/noon so users don’t notice it.

Account Lockout

Background

Finding locked out user accounts

Client Bugs and Fixes

Server Bugs and Fixes

Resolution and Futures

Lockout Reasons & Options

Prevent spoofing or hijacking account

Optional event logging in Audit Policy

Account Lockout Options

• Timed lockout

– Account enabled after admin defined time

• Hard lockout

– Account disabled until reset by admin

• Lockout policy defined in group policy

– Single lockout and password policy per domain

– Location: default domain policy

Account Lockout on DC’s

Each DC records # of bad password attempts

BDC check PDC for latest password

All Bad password attempts seen by PDC

• PDC always 1st to lock out account

• PDC urgently replicates lockout when threshold reached

• Bad password attempts not replicated by DC

BadPasswordCount reset to 0 on 1st good password

PDC chaining operations

If BDC fails authentication with:• STATUS_WRONG_PASSWORD• STATUS_PASSWORD_EXPIRED• STATUS_PASSWORD_MUST_CHANGE• STATUS_ACCOUNT_LOCKED_OUT • Referred to as “BadPasswordStatus”

BDC chains authentication to PDC• Return status from PDC if status = success or listed above• Otherwise, ignore PDC status and use local status

Exception to PDC chaining• AvoidPDCOnWan enabled and PDC in remote site (Q225511) • 10 “BadPasswordStatus”events logged in 10 minutes

– NegativeCache enhancement Q263821– Cache reset after good password entered

Troubleshooting account lockouts

Your goal: Answer the 4 W’s

• Who, Where, When and Why

Environment setup

• Enable Auditing in domain policy– Account Logon Events – Failure

– Account Management – Success

– Logon Events – Failure

– Security Event log on DC’s: 10K events + over-write

• Enable netlogon logging (ntlm clients)– NLTEST /DBFLAG:2080FFFF (no reboot)

• Enable Kerberos Logging– Q262177: Kerberos logging (kerb clients)

Account Lockout – Where

DC Resources • NTLM Clients

– Search DC & CLIENT NETLOGON.LOG for lockouts– 0xC000006A = bad passwords – 0xC0000234 = account lockout

• NTLM + Kerberos Clients– Search DS Event Logs– Q230254, Q299475, Q273499 and Q301677 for description– 644: NTLM + Kerberos Lockout Event– 675: Kerberos badd password – 681: NTLM bad password– 529: Failed logon– 531: Account disabled

Tools

• EVENTCOMB• AL.EXE• NETMON.EXE

EVENTCOMB

AL.EXE

Account Lockout: Why

Attack, “Pilot Error” or Bug• Wrong Password entered, mis-configured Service Account Scenario• Account type: user, computer or service account• Lockout trigger?• logon, drive access, following p/w change)

Drill Down: Look at TOD, pattern & frequency• Process related lockouts

– Structured pattern– Logged when users not present– Look for:

– common services, applications, client configuration

• User related lockouts– Random pattern, – Fewer events logged– Look at:

– shortcuts, mapped drives, logon scripts, applications

Account Lockout – Client

Win9X• Q278558: Access denied to a mapped drive after disconnect

• Q272594: Client can't log on after log off w/o reboot

• Q293793: VREDIR looses file tracking structures

• Q271496: One unsuccessful logon attempt triggers lockout (1:3)– Net use + dsgetdc + logon attempt.

• Q266772: Logon fails if Unicode string password to NTLM SSPI

DS Client on Win95, Windows 98, 98 Second Ed• DSCLIENT *MUST be installed before any hotfixes!

– Q301344, Q283261– DS Client lets WIN98 account lockout fixes work on Win95

Win2K• Q275508: User locked when accessing home dir after changing p/w

• Hotfix or SP2

Windows XP

• None

Account Lockout: Server Fixes

Read server side KB articles

• Q287639: Win9x Clients Locked Out after unlock

– MSV1 package does password check against BDC with old password during 2nd phase of logon

• Q278299: Bad p/w count not reset to 0 (ntlm)

– Original hotfix had regression. Confirm latest version deployed.

• Q263821: Bad p/w count not reset to 0 (kerb)

• Q292573: DSA.MSC and ADSI may not use same DC to WinSERaid:16662 (post SP2 hotfix)

Resolution

• Windows 2000 DC’s: Install SP2 + Q314282

– Same QFE as lingering object and other good DC fixes

• Service Pack 3

PDC FSMO Load Reduction

Windows 2000 domains are much larger than their NT 4 predecessors

• i.e. > 50,000 clients

NT 4 and WIN9X clients still deployed and target PDC only for updates

Windows 2000 / XP clients use Windows 2000 DCs in mixed mode domains (Q284937)

Older applications select PDC only rather than any DC

Applications may enumerate whole domain ( NT 4 usrmgr, srvmgr )

Result: PDC gets more load

Symptoms of Overload

High CPU utilization for long period

• Greater than 70%

• High average disk queue

– Disk queue > number spindles

• Timeout of requests

– Password changes

Steps to Optimize PDC

Optimize hardware and software

Hide PDC from DNS clients

Implement WINS optimizations

Block down-level enumeration

PDC in dummy site

Optimize Hardware & Software

Run Windows 2000 Advance Server with /3gb switch

• Enables ESE cache of 1.5 gb

4 Processor Server is optimal

2 Gb RAM

Disk

• RAID 1 set for OS and Page File

• RAID 1 set for Log Files

• RAID 0+1 for NTDS.DIT and sysvol

Run only core DC services

Disk

• RAID 1 set for OS and Page File

• RAID 1 set for Log Files

• RAID 0+1 for NTDS.DIT and sysvol

Run only core DC services

Hiding Techniques (DNS)

Lower PDC SRV Priority

• Reduce chance of DS aware clients selecting PDC before other DCs

• HKLM\System\CurrentControlSet\Services\Netlogon\Parameters\LdapSrvPriority=1000

• Data type: Reg_DWORD

PDC only Site

• Clients will use it only as last resort

• Create a site-link to real site

Disable AutoSite Coverge on PDC

• HKLM\System\CurrentControlSet\Services\Netlogon\Parameters\AutoSiteCoverage=0

Hiding Techniques (WINS)

Down-level clients locate DCs through 1C queries

WINS always adds PDC first in 1C list

Remove PDC from top of list (SP2) Q269424

– HKLM\System\CCS\Services\WINS\Parameters

– Value name: Add1Bto1CQueries

– Data type: Reg_DWORD

– Value data: 0 = disabled, 1 = Enabled (default)

Randomize 1C list for general load balancing

– HKLM\System\CCS\Services\WINS\Parameters

– Value name: Randomize1cList

– Data type: Reg_DWORD

– Value data: 0 = disabled, 1 = Enabled

– Q231305 (NT4 SP4 and later)

Block Enumeration

Old (non DS enabled) applications often call SAM APIs to enumerate entire domain

Hard to control

Block unauthorized users from seeing more than 100 objects per call

• New access control right determines access• HKLM\System\CCS\Control\Lsa\SamDoExtendedEnumerationAccessCheck=1

• Q268339 

Misc. – Server Applications

Server based applications can create frequent changes in the directory

• Agent based systems

– Create and delete accounts

– Grant accounts rights in the domain

Changes create replication

• AD replication for frequent group changes

• FRS changes for policy changes

Apply SMS hot fixes

• Q311127, Q278345

• Read articles, configuration necessary

Distributed Link Tracking

Purpose

• Used to track moves of linked files across volumes and servers (shell shortcuts)

• Uses AD objects to track files and volumes

Objects stored in DS

• linkTrackVolentry object for each NTFS volume in the domain

• linkTrackOMTEntry created for each linked item that is moved

• Clients query service when a shell shortcut or OLE link can’t be resolved

Clients refresh links every 30 days

DCs scavenge objects older than 90 days

Distributed Link Tracking

DLT is an optional service

• Enabled by default

Typically not included in DS capacity planning

Best Practices

• Disable on all DCs

– Reduces AD replication traffic

– Reduces AD database size

• Use Group Policy to disable DLT server service on DCs

• Remove objects from DS

– Use staggered approach

• Q312403

DC/GC Promotion Consideration

DC Promotion / Demotion

Process to cleanup after failed promotion

GC Promotion

GC Demotion

DC Promotion / Demotion

Create proper sites before hand

Failed promotion or removing server

• Manually clean out metadata from any failed attempt

– When replacing a failed DC

– When a DCPROMO has failed

– To clean meta data

– Use NTDSUTIL

– FRS member / subscriber objects

– Machine account in domain

• Allow replication to all DCs before promoting again

GC Promotion

First GC in site may go online before all partitions are replicated

• Default: GC will advertise after all partitions in site replicate

• Exchange may use GC before ready

• Mail may bounce

Best Practice

• Stop Netlogon

• Mark DC as GC

• Use repadmin to monitor success

• Start Netlogon all NCs replicated

SP3 will wait for all partitions to replicate before advertising

GC Demotion

GC removal requires time for object removal

The KCC removes 500 objects per default 15 min cycle

Best Practice

• Monitor for event 1069 to record progress

• Forced GC removal when needed (Q297935)

– Remove each partition with repadmin

– repadmin /delete DC=globalit,DC=unity,DC=com %destgc% /nosource

Container Inheritable ACE’s

ACE that applies to either all objects or objects of a specific class in a container

• Example: Delegate right to reset user passwords in one OU

Security Descriptor propagation copies ACE to all objects

• Makes access check very fast

– All information is on directory object

• Also class specific ACEs are copied to all objects

– Example: ACE used to delegate right to reset user passwords also copied to computer and container objects

Increases object size – database size

• Increase proportional to size of subtree

– If set on domain root: Highest impact

– If set on OU: Lower impact (depends on number of objects in OU)

• Low impact if set on schema or configuration container

SD propagation is asynchronous

• Takes time to propagate (i.e., 3 hours in 50,000 user domain)

Container Inheritable ACEsBest Practices

Don’t add container inheritable ACEs to domain root

Add on OUs as appropriate

• Best Practice Documentation recommends OUs for– Users

– Groups

– Computers

• Container inheritable ACEs on these OUs have small impact only

Watch SD propagator events

• SD propagation running: 1257 (Level 2)

• SD propagation report (objects touched): 1258 (Level 2)

• SD propagation terminated abnormally: 1262 (Level 0)

Always leave sufficient disk space on database partition

• 20% of database size, at least 500 MB

• Monitor!

Test ACL changes in lab or pilot domain to bracket size increase

Container Inheritable ACEsThe Future

Windows .NET will have single-instance store for Security Descriptors

• Objects have links to security descriptors

• If container inheritable ACE changes, only one SD changes– No impact on disk size

Does not require .NET only forest

• SD propagation happens on local DC

• Transparent to other DCs

• Feature available immediately

Monitor SD prop events after upgrading a DC

• SD propagator will build single instance store after the domain controller boots .NET for the first time

Database will shrink after OS upgrade

• Need to off-line defrag database to see changes

Forest Recovery

Imagine the unthinkable

• All domain controllers crash and won’t reboot

• Data corruption replicates through the forest

• Schema becomes unavailable

• Somebody made changes to the schema that prevent standard applications from installing

• Malicious administrator performs irreparable damage to the schema that replicates through the forest

• You lose your root domain

• You win the lottery

So far, this has never happened

• But you want to be prepared

Forest RecoveryRolling back in time

TimeTime

ChangesChanges

CatastrophicCatastrophicEventEvent

BackupBackup

BackupBackup

BackupBackup

BackupBackup

BackupBackup

BackupBackup

BackupBackup

BackupBackup

BackupBackup

BackupBackup

Restore –Restore –Changes lostChanges lost

IdentifiedIdentifiedRoot CauseRoot Cause

Forest Business RecoveryHigh Level Steps

Shutdown all domain controllers in forest

In each domain

• Restore one DC from good backup tape

• Re-install OS on all other domain controllers

• Re-promote all other domain controllers

Start with root domain first

Forest Recovery

Shutdown all DCs

Restore one DC per domain (off-network)

Break replication Seize FSMO roles

Disable GC service

Increase RID by 100,000Bring restored DCs back on the network

Enable GC on at least one root DC

Forest Recovery

Re-install OS on all other DCsRe-install OS on all other DCs

Promote all other DCsPromote all other DCs

Enable GC service as neededEnable GC service as needed

Move FSMOs as neededMove FSMOs as needed

Forest Recovery

Detailed steps available very soon in white paper on microsoft.com

• Best Practice for Recovering your Active Directory Forest

FRS Concepts revisited

Objects in DS

• Members, Subscribers, Conn. objects, filters

• Depends on AD replication

• Determines partners and schedule

NTFS USN Journal

• Used by FRS to track changes to NTFS volumes

Staging File and Directory

• Rename safe

• Compression support

Database

• Record of incoming, outgoing & existing files

FRS Replication Operation

Create / Modify file

NTFS Drive

Write OB LogWrite entry in FRS ID Table

Request change Write to Inbound and ID log

Filter out unwanted filesAge Cache waits 3s

Build staging file Replica copies file to staging dir Write to OB log for other replicas

Copy file into Pre-install area

Rename + move file to final location

Send change order to partner

NTFS Drive FRS learns of file changes from

the NTFS “USN Change journal”

Journal Wraps / Staging backlog

NTFS USN Journal is a fixed-size log of file changes

• FRS Service must run to keep up with these changes

• Last ∆ in FRS DB must exist in NTFS journal

– If not, FRS cannot know all changes. Called ‘journal wrap’

• Resolution

– Keep Service running (especially during bulk modifications)

– Increase size of USN journal (automatic in SP3 rollup)

Staging File backlog

• Before SP3, staging files stored until all direct partners receive the staged files

– Associated with connections

• Common causes of backlogs:

– Offline downstream partners

– Full SYNCS by Administrators or applications

– Antivirus , Disk Optimizers, File system policy

• Sharing violations / Move-In problems

Reconcilation & Morphed Directories

Files: Last-writer wins

• All change orders have event times (UTC)

• Event time of CO compared to ID Table

– Event time > 30 minutes, last writer wins

– Event time < 30 minutes, highest version wins

Folders: Last-writer wins

• Conflicting change gets morphed name

– Preserves files associated with directory

– First-writer wins for name conflicts of folders

• Causes

– BURFLAGS abuse

– Conflicting creates on replication failure

FRS Enhancements (Q319473) QFE roll-up of coming Service Pack 3 changes

Increases NTFS USN journal: 128 MB

Dynamic staging file relocation

LRU staging files deleted: 60 / 90 rule

Staging files for offline partners deleted

SYSTEM = Full Control / NTFS bug

Duplicate changes not sent on wire + event

Office XP (Excel) data deletion fix

Topology Enhancements

DFSGUI from .NET Server• Runs on XP clients in Windows 2000 domains• Available on microsoft.com now: Q304718

New topology options• Full Mesh, Ring, Simple Hub & Spoke• Custom Topologies• Connection Tuning

– Enable / disable individual connections – Change orders are associated with connections

– Disabling connections deletes associated backlog

Connection Priority (may pull this)• Bit on options attribute of connection object• Defines partners used during initial / recovery sync

– High: “Must” source all connections in class– Medium: Source from at least 1 connection in class– Low: “best effort” sync

FRS best practices

Run Q307319 + new NTFS.SYSKeep service running• Avoids journal wrapsJoin empty replica setsDon’t place DFS targets on OS partitionDFS: enable replication on child links• Targets can be taken offline• Incremental sourcing & advertisement of data• Replica set specific burflags Properly size staging dir• 128 largest files + 50% or 650 MB minimumDon’t delete files from staging directory• Change orders, # of VV joins, file size

FRS best practices

Topology management

• No full mesh

• SYSVOL: requires 1 in / outbound CO

Forceful deletion of FRS members

• Delete member and subscriber objects

Tools

NTFRSUTL

• NTFRSUTL DS

– Repadmin /showconn for FRS

– DS Object inventory + topology review

• NTFRSUTL SETS

– Repadmin showreps for FRS

– Status of downstream partner sync status

• NTFRSUTL INLOG | OUTLOG: IDTABLE

– Inbound + outbound changes + tree inventory

Debug Logs: systemroot%\debug\ntfrs_*.log

• Two way conversation between partners

Summary

All deployments should run SP2

Deploy SP3 when available

Q314282 provides roll-up fix for many issues

• Lingering objects

• Account lockouts

• PDC overload situations

Monitor Active Directory

New Documentation

Available on microsoft.com

• Best Practices for Active Directory Delegation– http://www.microsoft.com/windows2000/techinfo/planning/activedirectory/addeladmin.asp

Coming soon

• Active Directory Monitoring Guidelines and Key Indicators

• Active Directory Forest Recovery

Eventcomb

– http://download.microsoft.com/download/win2000adserv/secops/RTM/NT5/EN-US/SecOps.exe

top related