plain english - university of notre damefinance/020601/news/loughran...the plain english rule only...

Plain English

Tim Loughran Mendoza College of Business

University of Notre Dame Notre Dame, IN 46556-5646

574.631.8432 voice [email protected]

Bill McDonald

Mendoza College of Business University of Notre Dame

Notre Dame, IN 46556-5646 574.631.5137 voice [email protected]

March 26, 2008

Abstract: In October of 1998 the SEC implemented a rule requiring firms to use “plain English” in their prospectus filings, with explicit encouragement to incorporate this presentation style in all disclosure documents. We perform a textual analysis on 56,079 10-K electronic filings from 1994 to 2006 and provide empirical evidence that the rule has produced its intended effect. We document significant increases in participation by small investors for firms showing greater improvement in plain English compliance, even after controlling for the market-wide increase in 100-lot trades. In addition, we find that firms issuing seasoned equity exhibit a significant positive change in their 10-K writing style in the year prior to issuance. Firms with high plain English attributes also have corporate governance policies that are significantly more shareholder friendly. Our results suggest that regulators can impact market outcomes simply by using their “bully pulpit.”

We thank Robert Battalio, Margaret Foster, and Jennifer Marietta-Westberg for helpful comments.

Plain English

The SEC sampled expense information for 1,500 registration statements filed in 1995 and found

that about 48% of the documented preparatory hours were attributable to legal and technical

writing, with the remaining 52% of hours billed to accounting. Interestingly, in spite of this

approximately 50/50 mix in preparing financial disclosures the vast majority of financial and

accounting research has focused on the accounting numbers while spending significantly less

time looking at the context and framing of these numbers. Only recently has the finance literature

focused on identifying relations between the text and numbers, as technology has improved our

ability to parse disclosure documents and any other related text (see, for example, Hanley and

Hoberg (2007), Li (2007), Tetlock (2007), and Tetlock, Saar-Tsechansky, and Macskassy

(2007)).

In our research, we examine the SEC’s plain English rule of October 1998. While the

central theme of both the 1933 Securities Act and the 1934 Securities Exchange Act was

disclosure, the refined point of the plain English rule is making the disclosures truly accessible to

the “average” investor. Average investors will be less able to assess and less likely to invest in

companies whose financial disclosures are buried in legal jargon and obtuse language.

The plain English rule is precise in mandating that firms’ prospectuses “must use plain

English principles in the organization, language, and design of the front and back cover pages,

the summary, and the risk factors section.” The rule becomes somewhat less precise, however,

when it requires that the writing in these sections of the prospectus “substantially complies with”

a list of plain English principles.

1

To measure the notion of disclosure style, we create a standardized statistic that

aggregates word length, word commonality, and a series of writing components specifically

identified by the SEC. In the communications surrounding the development and implementation

of the plain English rule, the SEC clearly encourages firms to adopt these principles in all their

filings and communications with shareholders.1 We consider a broad sample of 56,079 10-K

filings during the period 1994-2006.

We find that our measure of plain English does “improve” after the regulation is enacted,

but not in a singular leap around the rule date. Throughout this paper we will use the term

“improve” to denote an increase in the plain English measure. We use yearly dummy variables to

control for the gradual increase in plain English usage. Interestingly, both the mean and median

words contained in the 10-Ks sharply increase during the time period. The median number of

words per document increases from about 25,000 in 1996 to over 40,000 by 2006.

We link the text-based measures to trade lot sizes and use the proportion of 100-lot trades

as a measure of “average” investors. Additionally, we control for the impact of industry (as

defined by Fama and French (1997)) and auditor. We find substantial differences in compliance

across industries and only limited differences across the big 5 auditors.

Using period-to-period differences to control for market structure changes occurring over

this time interval, we find that the plain English rule has its intended effect. That is, there is a

clear positive relation between the improvement in a firm’s plain English measure and changes

in the proportion of 100-lot trades.

We then apply our plain English measure to a logit model predicting seasoned equity

offerings (SEOs). We find that firms showing a higher year-to-year change in plain English

usage are more likely to issue seasoned equity in the following year. Within the subsample of 1 See page 4 of the Plain English Handbook and page 68 of SEC Release #34-38164.

2

firms issuing seasoned equity, companies are more likely to do so in a year when they have also

improved the plain English compliance of their 10-K. Thus, in addition to our evidence

supporting the impact of legislation based on the 100-lot trade results, the SEO results indicate

that managers value transparency in the form of more readable documents when issuing

additional shares. Finally, we link the 10-K data to the Gompers, Ishii, and Metrick (2003)

corporate governance index and find that firms with shareholder friendly governance structures

are more likely to have 10-K filings that score high on our plain English measure.

Although often ignored by the academic literature, the technical writing of a 10-K has

real importance. Our paper’s contribution is in documenting how subsequent trading by small

investors and issuance of seasoned equity is facilitated by the writing style of the 10-K. Firm

managers have a choice in their writing style. In public documents, managers can elect to make

their 10-K’s incoherent to all but highly trained lawyers by the overuse of legalese, uncommon

words, and superfluous language. Or, firm managers can improve the transparency of their firms

by creating documents that typical retail investors can more easily comprehend. Following

prodding by the SEC, firms have measurably improved the writing style of their 10-Ks.

More generally, we show that regulators can impact market outcomes simply using the

“bully pulpit” of their office. The plain English rule only requires changes in prospectus filings,

not 10-Ks. The impact of plain English we observe in the 10-K sample is simply an artifact of

encouragement by the SEC to adopt the plain English guidelines across all filings.

Section I of the paper reviews the history of disclosure requirements and the plain

English rule. Section II describes the process of parsing 10-K documents available on the SEC’s

Edgar web site. Our measure of plain English is defined in Section III. Section IV of the paper

3

reports the summary statistics. Section V presents the empirical results. Concluding comments

are provided in Section VI.

I. Security Regulation and the Plain English Rule

A. Legislative History and Disclosure Requirements

As the United States underwent rapid industrialization in the late 19th and early 20th

centuries there was a corresponding increase in the demand for capital, and with it an increase in

dealers selling securities representing fraudulent or, at best, ephemeral firms. The Securities Act

of 1933 and the Securities Exchange Act of 1934 evolved from “Blue Sky” laws which began to

multiply across the states beginning in 1910 with Rhode Island and the more well-known

example of Kansas in 1911.

These laws became popular across the states as a means of mitigating the problem of

fraudulent offerings. The Supreme Court upheld the constitutionality of these laws in 1917 and

by 1933 every state except Nevada had some form of a “Blue Sky” law. Reed (1920) notes,

however, that the laws were considered by some to be “hopelessly crude and unworkable,” and

many of the “Blue Sky” laws were subsequently repealed or ignored. The need for security

regulation at the national level was generally acknowledged, but the move to federal legislation

was slowed by a divergence of opinion on how the security markets should be controlled.

“Blue Sky” laws created a norm where state officials passed judgment on a firm’s

financial viability and decided whether investors would derive a “fair” return.2 Many argued that

the government should not be responsible for assessing the financial viability of securities and

should focus only on assuring full disclosure (see, for example, Ellenberger and Mahar (1973)).

2 See for example Ohio Rev Code Ann. § 1707.09, page 1953, Section 9.

4

The 1929 market crash focused attention to national regulation, and after many failed attempts

by Congress to pass legislation, the 1933 Act was finally enacted.

One of the concerns throughout the history of implementing these regulations was the

readability of the materials that were disclosed. While the SEC through interpretive advice and

other means tried to improve the readability of the mandated filings, there did not appear to be

notable change (see SEC Release 33-7497, p. 66). In the late 1990s, Arthur Levitt as the

Chairman of the SEC, championed the cause of improving disclosure documents:

“Investors need to read and understand disclosure documents to benefit fully from the protections offered by our federal securities laws. Because many investors are neither lawyers, accountants, nor investment bankers, we need to start writing disclosure documents in a language investors can understand: plain English.” (A Plain English Handbook, p. 3.)

B. Is Federal Regulation of Securities Markets Effective?

In the academic literature, there has been enormous debate over whether federal

regulation is actually effective. Stigler (1964) was among the first to test the measurable impact

of disclosure requirements. In a pointed response to the 1963 Cohen report— a report which

deemed security regulation and the SEC a resounding success—Stigler criticizes both its

qualitative and quantitative conclusions.3 Stigler finds no evidence of significant differences in

new issue performance across the regulatory regimes, and provides similar conclusions for a

separate sample of preferred stocks.

Jarrell (1981), like Stigler (1964), focuses on the 1933 Act and finds similar evidence of

lower post-regulation risk. As an alternative explanation for this finding, he suggests that higher

regulatory costs have simply pushed higher risk ventures out of the public markets. Smith (1981)

3 The Cohen report refers to the Report of the Special Study of the Securities Markets of the Securities and Exchange Commission (88th Congress, 1st session, House Document 95, 1963, Washington, D.C.: Government Printing Office). Milton Cohen chaired the committee that produced the report.

5

summarizes Jarrell’s assessment as “According to Jarrell, SEC regulation of new security issues

has been an abysmal failure.”

Simon (1989) also documents a decrease in abnormal returns following the 1933 Act,

with the effect being larger for unseasoned non-NYSE issues. Since the regulation should have

the greatest effect on securities where private information costs were highest, she argues that her

evidence supports the notion of reductions in investor forecast errors attributable to the better

information environment produced by the 1933 Act. She acknowledges that her evidence does

not preclude the argument that higher risk issuers simply moved to unregulated markets. Further,

many “confounding factors” such as the 1929 crash and Great Depression make it difficult to

convincingly attribute any changes in return characteristics specifically to the advent of federal

regulations in 1933 and 1934.

Benston (1973) finds little evidence of changes in the occurrence of fraud before and

after the 1934 Act. Looking at pre-regulation delistings and comparing stocks that did or did not

disclose financial data, interestingly Benston found that investors were actually better off owning

securities of firms that did not disclose.

More recently, Bushee and Leuz (2005) and Greenstone, Oyer, and Vissing-Jorgensen

(2006) examine the economic impact of mandatory disclosure. Bushee and Leuz (2005) find that

over 76% of sample firms were removed from the Over-The-Counter (OTC) Bulletin Board

rather than comply with SEC mandatory disclosure requirements relating to the 1999 “eligibility

rule.” Their evidence strongly suggests that mandated disclosure has a high cost for smaller

firms.

Greenstone, Oyer, and Vissing-Jorgensen (2006) provide evidence that mandatory

disclosure can dramatically increase stock market values. Focusing on the OTC firms most

6

affected by the 1964 Securities Acts Amendments, the three authors find a 11.5% to 22.1%

increase in market values over an 23-month period from when the law was proposed to when it

went into force.

In summary, major events, like the Great Depression, bracketing the Securities Acts

makes definitive conclusions from prior research hard to make. By considering incremental

regulation associated with the plain English rule, we can more precisely examine the debate

concerning the effectiveness of government regulation versus the self-regulation of market

discipline.

C. The Plain English Rule

The plain English rule became effective October 1, 1998. The SEC Staff Legal Bulletin

No. 7 provides a summary of the rule and corresponding amendments:

“… companies filing registration statements under the Securities Act of 1933 must: • write the forepart of these registration statements in plain English; • write the remaining portions of these registration statements in a clear,

understandable manner; and • design these registration statements to be visually inviting and easy to read.”

Rule 421(d) specifically requires that issuers must:

“… substantially comply with these plain English principles: • short sentences • definite, concrete everyday language; • active voice; • tabular presentation of complex information; • no legal jargon; and • no multiple negatives.”

Additionally, Rule 421(b) was amended, prescribing stylistic approaches that should be avoided

such as “legal and highly technical business terminology” or “legalistic or overly complex

presentations that make the substance of the disclosure difficult to understand.”

7

Although the plain English rule is mandated only for prospectuses, in documentation

surrounding the rule’s release the SEC clearly encourages firms’ conformance with the rule in all

filings. Arthur Levitt, as then Chairman of the SEC, in his forward to A Plain English Handbook

concludes with: “I urge you—in long and short documents, in prospectuses and shareholder

reports—to speak to investors in words they can understand.” (p. 4) The SEC in their proposed

rules document states: “Our ultimate goal is to have all disclosure documents written in plain

English …” (release #34-38164, p. 24) and later in the document “We also encourage you to use

these techniques for drafting your other disclosure documents.” Thus, we focus on the sample of

annual 10-K reports, which provides us a large sample of firms over an extended time interval

and allows us to test a broader range of hypotheses.

In the subsequent tests where we focus on the impact of regulation, notice that our focus

on 10-K filings tests an even more subtle relation between markets and regulation. In

discussions of the president’s role in the United States’ monetary policy, economists frequently

refer to the process of coercion through public comments from the platform of governmental

office as the “bully pulpit” (see Havrilesky, 1988). Similarly, the chairman of the Federal

Reserve is assumed to impact policy expectations from his “bully pulpit.” (See “Bush and Fed

Step Toward a Mortgage Rescue, March 5, 2008, The New York Times or Dudley (2006).) The

plain English mandate for 10-Ks is not based on an SEC regulation or specific legislation, but is

simply an artifact of the SEC’s use of their “bully pulpit” to encourage broader adoption of a rule

mandated for prospectuses.

8

II. Data

A. The 10-K Sample

Although electronic filing was not required by the SEC until May 1996, a significant

number of forms are available on EDGAR beginning in 1994.4 Until 2003, a box on the front

page of the 10-K form was to be check marked if a “disclosure of delinquent filers pursuant to

Item 405” was not contained in the current filing, nor anticipated to be disclosed in statements

incorporated by reference or amendments. If this box was checked, the form was filed as a 10-

K405. In 2001, almost one-third of the 10-K filings were 10-K405 forms.

According to the SEC, because there was confusion and inconsistency in making this

choice, the 405 provision was eliminated after 2002. Because this choice has no impact on the

focus of our study, we include both 10-K and 10-K405 forms in our sample and make no

distinction in subsequent analysis. We do not include amended documents, 10-K/A or 10-

K405/A, in the sample.

The initial 10-K sample covering 1994-2006 contains 104,621 documents. For our tests

we link the 10-K sample to both the Center for Research in Security Prices (CRSP) and NYSE

Trade and Quote (TAQ) databases. We use the WRDS CIK file to link the SEC’s CIK identifier

to a CRSP PERMNO. We then use CRSP ticker symbols to link to the TAQ database.

4 The earlier work of Asthana, Balsam, and Sankaraguruswamy (2004) reports that small trades (i.e.,

average investors) are more likely to reflect the information disclosed in a 10-K than large trades after the filings became freely available on EDGAR.

9

B. Parsing the 10-K documents

The EDGAR web site contains quarterly master files listing a filename for each

document filed during that quarter. We use this master index file to identify the relevant filings,

which are programmatically downloaded and parsed.

Many of the variables we use to examine the plain English initiative are based on parsing

the 10-K documents into a list of words. Most of the parsing is done using regular expression

search patterns. We first remove from the document all ASCII-encoded graphics, carriage-

returns/line feeds, and punctuation. We remove all HTML coding. (The quantity of HTML code

embedded in the documents increased exponentially over the sample interval.) All remaining

tokens bounded by spaces are then compared to a word list to determine if the token is a word.

To identify a word, we use release 4.0 of the 2of12inf word list, available at

http://wordlist.sourceforge.net/12dicts-readme.html, which contains a word list originally based

on twelve source dictionaries, subsequently expanded to include other sources. The list contains

81,520 words but does not include abbreviations, acronyms, or names. The “inf” version

includes word inflections. Once the document is parsed into a vector of words, we then tabulate

the specific words and phrases identified as good or bad examples based on the SEC

documentation relating to the plain English initiative.

10

C. Control Variables

In addition to controlling for yearly variation in the data, we control for the impact of

both industry and auditor. For industry classifications we use the 48 industry grouping of Fama

and French (1997). SIC codes were parsed from the 10-K filings and are self-reported by the

firms.

Auditor variables are based on a text search of the 10-K. The documents are searched for

the big-5 auditing firms: Arthur Andersen, Deloitte & Touche, Ernst & Young, KPMG, and

PricewaterhouseCoopers. From 1998-2002, observations of Price Waterhouse and Coopers &

Lybrand are both classified as PricewaterhouseCoopers. Arthur Andersen drops out of the

sample due to its bankruptcy in 2002. If none of these auditor names are found in the 10-K filing

then the auditor is classified as Auditor Other. If multiple names are found in the document, then

the auditor is classified as Switch. In reviewing the sampling results we found that in most cases

where there were multiple auditor names, the firm had changed auditors in the recent past.

Although this is not always the case, we wanted to distinguish this case from Auditor Other.

As an interesting artifact of our auditor classification procedure, our textual search

identified 245 unique times in which Arthur Andersen was misspelled (i.e., having ending of –

son instead of –sen) within a 10-K document where Arthur Andersen was the auditor. As an

example, in the 1994, 1996, 1997, and 2002 letters to the shareholders of the International Paper

Company, the failed accounting firm listed its name as “Arthur Anderson LLP.” Similarly, the

1994, 1997, and 1998 10-Ks filed for ALLTEL made the same mistake. Notably, in both of these

cases where Arthur Andersen was misspelled, the error occurred in the signature line to the

“Report of Independent Public Accountants.”

11

D. Sample summary

Table I documents the sample formation process. Requiring a CRSP match with data to

calculate market capitalization and only including ordinary common equity firms (CRSP share

type code of 10 or 11), substantially reduces the original sample of 10-Ks. For example, Asset-

Backed Securities had over 10,000 observations in the original 10-K sample, primarily

attributable to filings for security offerings such as Exchange Traded Funds. These funds were

removed from the sample by applying the ordinary common equity filter.

A small number of firms, particularly in the early years, filed 10-Ks that were unusually

short and might, for example, simply incorporate documents by reference. Thus we eliminate 88

firms with 10-Ks containing less than 5,000 words. We also include only the first filing in a

given year for a firm and require at least 180 days between filings. After applying these filters

the final sample is 56,079.

Figure 1 presents the distribution of sample size and firm market capitalization by the 10-

K filing month. Approximately 57% of the 10-Ks are filed in the month of March. Most firms

have December 31st fiscal year-ends and will wait to file until the latest possible date. The

substantially larger median market capitalization in February is partly an artifact of a recent SEC

rule requiring large public float firms to file within 60 days of their fiscal year end, with smaller

firms allowed 70 days. (See SEC release #33-8644.) On average, 63%, 80%, and 90% of the 10-

Ks are filed by the end of the first, second and third quarter, respectively. Because the sample

size and composition is so heterogeneous across months, in subsequent analysis our unit of

analysis for time series will be years.

Figure 2 compares the annual number of firms in our final sample with the annual

number of firms having a share type code of 10 or 11 in the CRSP database. In all of our

12

analysis, we define year as the calendar year in which the 10-K was filed. So, Google’s

December 31, 2004 10-K which was filed on March 30, 2005, would be classified as being a

2005 observation. Additionally Figure 2 shows the median market capitalization of firms in the

sample by year. The implementation phase of electronic filing is apparent in the first three years

of the sample. In 1997, the first full year when electronic filing was required, the median market

capitalization of the sample reaches its lowest point of $145 million dollars. Larger firms

dominated the sample in years prior to the requirement of electronic filing.

Figure 2 also shows that both the number of firms on CRSP and firms in our sample

steadily fell after peaking in 1997 as the number of IPOs failed to keep pace with the volume of

mergers and distressed delistings. The difference between the potential universe of firms and

firms included in our sample is mostly due to a failure to match the CIK identifier with the CRSP

PERMNO or failure to match with the TAQ data. Every year since 1997, the number of firms

that appear within CRSP that is not in our sample shrinks.

III. A Measure of Plain English

Just as mandating writing style is difficult, so is measuring the degree of compliance.

Without deep parsing, which is itself subject to substantial error, at what point does a document

meet the threshold of being written in active voice? What is “clear and understandable”?

We use specific examples provided in the SEC documentation to create seven

components we include in our aggregate measure of plain English. These provide concrete

examples which we tabulate for each document.

• Legalese: A count of the 14 words and phrases identified in Staff Legal Bulletin No. 7 (http://www.sec.gov/interps/legal/cfslb7a.htm) as inappropriate legal jargon (e.g., “by such forward looking” or “hereinafter so surrendered”).

13

• Weak Verb: Weak verbs can take many forms. To avoid the ambiguities of deeply parsing the document into word types and then attempting to identify context for weak verbs, we tabulate only the two examples cited on page 19 of the Plain English Handbook, “to have” and “to be.”

• Negative Phrase: A count of 11 negative compound phrases identified on page 27 of the Plain English Handbook (e.g., “does not have” or “not certain”).

• Personal Pronoun-We: A count of the personal pronouns, which the handbook on page 22 indicates will “dramatically” improve the clarity of writing. “We” counts occurrences of “we,” “us,” “our,” and “ours.”

• Personal Pronoun-You: “You” counts occurrences of “you,” “your” and “yours.”

• Respectively: A count of the word “respectively,” which according to page 34 of the handbook is to be avoided.

• Superfluous: A count of the eight phrases identified as superfluous on page 25 of the handbook (e.g., “because of the fact that” or “in order to”).

In addition, we use measures of word length and word commonality to capture the notion

of “definite, concrete everyday language.”

• Average Word Length: The average number of characters per word in a given document.

• Word Commonality: Using the entire 10-K sample, we tabulate for each word the number of documents where a given word appears. Word Commonality is the average of this number across all words in a given document divided by the total number of documents. Thus, if Word Commonality=80%, the words in the current document appeared, on average, at least once in 80% of all documents in the total sample.

We choose not to include sentence length, one of the items specifically mentioned in the

rule. Using a simple heuristic of punctuation can parse sentences with about 90% accuracy (see

Riley 1989), with more sophisticated approaches achieving even higher levels (see Mikheev

(1998)). These rates are achieved, however, with traditional text, where the incorporation of

tables, lists and numbers is less frequent. Given the content structure of financial filings, we felt

that sentence parsing could create frequent and substantial errors. The variables Average Word

Length and Word Commonality should provide comparable constructs for “everyday language.”

14

We then need to combine the nine measures described above into an aggregate measure

of plain English. Two characteristics of word measures dictate the approach that we choose.

First, the first seven variables listed above are highly correlated with the total number of

words in a document. Obviously the likely magnitude of the word count variables increases with

the number of words. Average word length and word commonality also are likely to be impacted

by document length.

Second, the distribution of words in a document corresponds to what is labeled as a Large

Number of Rare Events distribution. Hapax legomena is the term used to describe words that

occur only once in a document. These singular occurrences produce what is by far the most

common frequency in word counts, one, which creates a highly skewed distribution. The use of a

log transformation on word counts is common in natural language processing and substantively

mitigates the skewness problem (see Baayen 2001). Thus, for the components of our plain

English measure based on word counts we use log transformations of one plus the word count.

We also use a log transform of Average Word Size and Word Commonality.

To combine these measures into a single metric, we separately regress the log transform

of each of the nine variables on the log of the number of words occurring in the document. The

regressions for each component of the plain English measure are reported in Table II. Average

Word Size declines as the number of words increases, indicating that larger documents are not

necessarily more complex. The Common Word regression has a negative coefficient; however

the r-square of 3.8% is by far the lowest among the regressions. The remaining variables are

significant and positively related to the number of words with r-squares ranging from 12.5% to

87.4%.

15

For each firm, we then sum the standardized residuals from the nine regressions, where

the standardized residuals based on the Word Commonality and both Personal Pronoun

regressions are positively signed, (i.e., common words and personal pronouns are positive

attributes), and standardized residuals for the remaining six variables are subtracted from the

total. This combination is then standardized, providing our variable labeled Plain English, where

more positive values represent documents that better conform to the writing standards

promulgated by the SEC.

A. Descriptive Results for the Plain English Measure

The mean and median for the plain English measure are reported by year in Figure 3.

With more than 90 percent of the 10-Ks filed in 1998 occurring before the date the rule became

effective in October of that year, the rule’s impact should potentially become apparent in the

1999 averages.

The measure decreases from -0.11 to -0.47 from 1994 to 1998. Recall from Figure 2 that

the market capitalization of the reporting firms also drops substantially over these first five years.

In a regression of the plain English measure on the natural log of market capitalization, the

coefficient is significant and negative.5 Thus as the average market capitalization of the sample

declined in the first four years, we would expect the plain English measure to actually increase

slightly. Instead we see the strong downward trend in plain English in the first five years with a

sharp reversal in the first full year under the new rule. There is a continuing positive trend in the

plain English measure from the time of implementation. This result indicates that even in the 10-

5 The t-statistic for log(size) in a regression on Plain English with year and industry dummies is -14.27. The coefficient on log(size) is significant and negative if the year and industry dummies are not included in the regression. The simple correlation between Plain English and both size and log(size) also is negative.

16

K sample, whose style mandate was only a “bully pulpit” artifact of a rule restricted to

prospectuses, the plain English rule had a substantial impact on the textual presentation.

B. A Berkshire Hathaway Anecdote

Warren Buffet is considered the poster boy for plain English, authoring the preface to the

SEC’s Plain English Handbook. Buffet’s famous letter preceding his annual reports is the

epitome of folksy and nontechnical writing. Because we can easily benchmark the filings of

Buffet’s Berkshire Hathaway, we briefly consider this anecdote in Figure 4.

Interestingly, although Buffet’s shareholder letters might be targeted toward “Doris and

Bertie,” his two sisters with non-business backgrounds, the time-series of his firm’s performance

on the plain English measure would suggest that until he was approached by the SEC to

champion the plain English cause, his record was at best mixed, with the 1995 and 1996 filings

substantially below the average score for the universe of all firms or for all firms in the same

industry as Berkshire Hathaway (SIC of 6331).

Berkshire’s 10-Ks show a dramatic change in writing style immediately after the plain

English initiative, but have reverted to average in the past few years. By 2006, the plain English

measure for Berkshire is the same compared to all firms or for firms within its industry.

Although some of Berkshire’s below average performance in the early years might be

rationalized as an artifact of the insurance industry’s legal complexity, the insurance industry

average also plotted in Figure 4 does not support this contention.

17

IV. Summary Statistics and Control Variable Results

A. Summary Statistics

Summary statistics for the sample variables are reported in Table III. The sample is

divided into two periods: prior to the October 1, 1998 plain English rule (column 1) and after

(column 2). The last column of the table lists the summary statistics for the entire period. The

number of observations, the average plain English measure, the average market values (as of the

10-K fling date), and the average number of words contained in the 10-K have larger values

during the second time period.

Figure 5 presents the mean and median number of words per document over the 1994-

2006 period. The dip that occurs in the first few years reflects the tendency for early adopters of

electronic filing to be bigger firms filing larger documents. Clearly, 10-K filings have become

more verbose, with the median number of words rising from 26,000 in 1997, the first full year of

mandatory electronic filing, to well over 40,000 in the final sample year of 2006. The passing of

Sarbanes-Oxley in 2002 could account for the substantial shift in word count apparent in Figure

5 from the years 2001 to 2004, with stabilization in the subsequent years.

As the number of words in a 10-K has increased, Table III reports that the average word

length and word commonality are quite similar between the two periods. As an example, the

average word length was 5.44 letters prior to October 1998 compared to 5.47 letters after the

plain English rule.

Table III also reports that the two time periods differ substantially in terms of the

proportion of trades within a trade size category. From the TAQ data we tabulate the proportion

of trades within a given trade size category. We consider the following five categories:

Variable Shares traded (s) Proportion Trades 1-100 s <= 100

18

Proportion Trades 101-500 100 < s <= 500 Proportion Trades 501-1,000 500 < s <= 1,000 Proportion Trades 1,001-10,000 1,000 < s <= 10,000 Proportion Trades >10,000 s > 10,000

We tabulate this proportion for the period beginning on the document filing date and for

the subsequent 20 days, creating a 21-day sample window. Firms must have at least one day of

trading in the 21-day window to be included in the sample.

During 1994 to September 1998, 15.1% of all trades were for between 1 and 100 shares.

In the second period (October 1998 to 2006), that proportion jumped to 39.2%. In the earlier

period, 23.4% of all trades were in the 1,001-10,000 share category. In the later period, less than

half the number of trades from the earlier period (only 11.8%) was in the 1,001-10,000 trade

category.

After October 1998, across the three major trading venues, almost 75% of all trades were

for 500 shares or less. Figure 6 reports the proportions of trades within each of the five lot

categories by each calendar year of our sample. The proportion of trades for 100 shares or less

actually reaches 60% in 2006 after being only 15% in 1997.

As the NYSE, Amex, and Nasdaq moved toward quoting stock prices in decimals, the

quoted depth reduced in size. Investors received better prices (i.e., closer to the mid-point) while

simultaneously being able to trade fewer shares at the improved price. Starting on January 29,

2001, all NYSE-listed stocks could be priced in decimals. For Nasdaq, all listed firms could be

priced in decimals by April 9, 2001.

Following decimalization and the advent of electronic communication networks (ECNs),

large investors increasingly split up their order for trade execution. So instead of submitting an

order to buy 10,000 shares of Microsoft, investors might break the order into 20 different

19

segments of 500 shares. Additionally, when retail investors submitted market orders, the

brokerage house might execute trades at prices that differ by one penny. These factors are the

major drivers in the increase in 100-lot trades observed over the sample interval.

There was slightly more seasoned equity issuance in the later time period. After October

1998, on average, 5.5% of firms had an SEO compared to 4.3% in the earlier time period. A

slightly larger percentage of the sample universe lists on Nasdaq versus the Amex or the NYSE

in the later period.

Lastly, Table III reports the proportion of big 5 auditors and the Gompers, Ishii, and

Metrick (2003) Governance Index. Over the entire time period, less than 10% of the firms used a

non-big 5 auditor. PricewaterhouseCoopers (PWC) audited the highest percentage of firms

(19.1%) prior to the plain English rule while Ernst & Young had the highest share (18.8%) in the

later period. The largest drop in the proportion of firms audited was for Arthur Andersen. As

noted earlier, Andersen went bankrupt in 2002. The Gompers, Ishii, and Metrick (2003)

Governance Index is a measure of shareholder rights for 9,615 firms during our sample period.

The index, as defined, can range from 1 to 24—democratic to dictatorship, respectively, using

the terminology of the authors—and averages approximately 9 in each period.

B. Industry and Auditor Results

Does the plain English measure differ across industry and auditor? Figure 7 documents

the variability of our plain English measure across the Fama and French (1997) 48 industries.

The worst industry in terms of the measure is Smoke. This is most likely due to the litigation

discussion in the tobacco industry during our time period. As an example, Reynolds American

(formally R. J. Reynolds Tobacco) had, in 2006, one of the most extreme percentages of legal

20

words in a 10-K (over 1.8% of all words were legal). The best three industries for the plain

English measure are Financials, Banks, and Soda.

To examine the statistical significance of the differences in plain English usage across

auditors we know that we should control for industry effects, as suggested by our prior industry

results and the tendency for auditors to specialize in certain industries (see, for example, Hogan

and Jeter (1999)). We test auditors’ use of plain English and the change in their style from pre- to

post- regulation by estimating a regression of plain English on the following independent

variables: auditor dummy variables (with PWC the excluded auditor), a dummy variable

indicating when the plain English regulation was in effect, the cross-products of auditor dummies

and regulatory period dummy, the log of market capitalization, a Nasdaq dummy, calendar year

dummies, and dummy variables for the 48 Fama-French industries.

The results of the regression are reported in Table IV. Column (1) of the table reports the

regressions without the industry and year dummies, while column (2) includes both. The post

October 1998 dummy and its inclusion as an interaction with the auditor dummies, in many cases

goes from significant in the first column to insignificant when the yearly dummies are included.

This simply reflects the ability of the annual dummy variables to better capture the trend in Plain

English shown in Figure 3.

Thus we will focus on the coefficients in column (2) where the industry and year

dummies are included. Pre-regulation, only Deloitte and “Other” show significantly greater plain

English measures relative to PWC. Anderson, Deloitte and KPMG are not significantly different

from PWC in the pre-regulatory period. Only the groups labeled “Other” and “Switch” have

changes in plain English usage in the post regulatory period that are significantly different from

the PWC control. The positive shift in plain English usage documented for the Switch group is

21

consistent with auditing firms improving compliance in cases where they are not simply updating

prior years’ reports.

To control for the year-to-year changes in Plain English documented in Figure 3, the

large differences in plain English across industries, and the differences across auditor, our

subsequent regressions will include year, Fama-French industry, and auditor dummies.

V. Empirical Results

A. Plain English and the Average Investor

Because of decimalization and an increasing role of ECNs, we expect the proportion of

100-lot trades to increase for all firms over the sample period. Note we use “100-lot” to refer to

trades of 100 shares or less. Thus we focus on the change in plain English relative to the change

in the proportion of 100-lot trades, pre and post regulation.

We first provide descriptive results for firms partitioned into deciles based on the

magnitude of the difference between their average pre and average post plain English value. The

corresponding average change in 100-lot trades for each decile is plotted in Figure 8. The

relation shows a clear trend with firms in the lowest change in plain English decile having a

corresponding change in 100-lot trades of less than 15%. Firms in the highest decile of plain

English change averaged approximately a 22% increase in 100-lot trades.

We test this relation at the level of individual firms in the regressions reported in Table

V. For each firm we regress the difference in the average value of plain English between the pre

and post regulatory period on the same difference for the 100-lot trades. The firm must have one

observation in each period to be included in the sample.

22

Since we have now collapsed the sample on firms, there are only 5,030 observations. For

control variables we also include size, which is the average market capitalization in the post

period, and industry, which is the median industry classification in the post period. The Nasdaq

and auditor dummy variables are now proportions indicating the number of times in the pre or

post period that the corresponding dummy was equal to one.

From this, the coefficient on “Pre and Post 1998 Change in Plain English” reflects the

impact of the change in the average level of plain English on the corresponding change in the

average level of 100-lot trades across the pre and post regulatory period, after accounting for the

control variables. We first consider the change variable by itself in column (1), then also include

Log(average size) and the Nasdaq dummy in column (2) and finally in column (3) we append the

auditor and industry dummies. The signs and significance of the variables remain stable across

the three regressions so we will focus on the results of the full specification in column (3).

The results indicate that larger firms experienced greater increases in the change in 100-

lot trades. As the exchanges moved to decimalization in 2001, large firms, some with spreads

hovering around one penny, became more likely to have their quote depth dispersed over a

broader range of incremental prices. Thus large firms are more likely to experience trades that

are sweeping the books and taking out any 100-lot quotes. Because ECNs historically have

played a much bigger role on Nasdaq than on the NYSE, 100-lot trades are more predominant

for Nasdaq-listed firms.

In all cases the results show a positive and significant relation between the change in

plain English and the corresponding change in 100-lot trades. Thus, although firms on average

experienced a substantial increase in 100-lot trades, those with greater improvement in writing

style experienced even greater growth in small trades. The coefficient on the change in Plain

23

English variable is 0.018 with a t-statistic of 8.98. Since there is little reason to expect large

institutional traders to be breaking up trades based on a firm’s writing style, the results indicate

that small investor participation increases with positive changes in writing style. Increased

participation by “average” investors was the explicit intent of the plain English regulation.

B. Plain English and Seasoned Equity Offerings

If managers view the 10-K as a vehicle to increase the transparency of their firms, one

should see improvements in writing style prior to equity issuance. That is, firms might be

expected to use more common words and better style in an attempt to lower information

asymmetries between managers and outsider investors. On the other hand, if managers could

care less about clearly communicating with their shareholders, one would not expect to see any

improvement in the plain English measure.

About 5% of our sample had a seasoned equity offering (SEO) in the year after the 10-K

filing date. We use the Thomson Financial Securities Data (also known as Securities Data Co.) to

identify all firms having an SEO during our sample period. To examine the relation between our

plain English measure and equity issuance, Table VI reports logit regressions. The dependent

variable, Equity Issuance Dummy, takes the value of one if the firm issued seasoned equity in the

year following the 10-K filing; otherwise the variable takes a value of zero.

The key control variable will be prior stock performance. Korajczyk, Lucas, and

MacDonald (1990) show, that the stock performance in the prior year is a highly significant

determinant of the likelihood of equity issuance. Loughran and Ritter (1995) report that their

SEO sample had average raw returns of over 72% in the year prior to offering. In CFO survey

results, Graham and Harvey (2001) find that recent stock price performance is the third most

24

important factor in determining firms’ equity issuance decisions. Since Nasdaq is the trading

venue of choice for younger, more growth orientated stocks, it will also be added as a control

variable.

The independent variables are the year-to-year change in plain English, the raw buy-and-

hold returns in the year before the filing, the log of market value, and Nasdaq, auditor, Fama-

French industry, and calendar year dummies. Because we require the change in the plain English

variable, the sample size drops to 46,109 observations, e.g., a firm must have both a 1994 and a

1995 plain English variable to be included for year 1995.

In all four logit regressions, heteroskedasticity-adjusted z-statistics are in parentheses

while the odds ratios are in brackets. The first two columns include all firms while columns (3)

and (4) report results when the sample is restricted to only firms who issued equity at least once

in the sample period.

Table VI reports that the coefficient on the year-to-year change in plain English is

positive and statistically significant at conventional levels. In column (2), the coefficient is 0.121

with a z-statistic of 4.81. The odds ratio is 1.129. This odds ratio implies that when the change in

plain English variable increases by one standard deviation the odds of issuing equity in the next

year increase by 12.9%. As expected, the coefficient on the prior year return variable is positive

and highly significant. The higher the prior year’s return, the more likely the firm would issue

equity. Being listed on Nasdaq also substantially increases the likelihood of having an SEO.

The last two columns of Table VI restrict the sample to firms issuing seasoned equity at

least once during the sample period. This introduces a look-ahead bias. That is, in 1996, one

could not know which firms would subsequently issue equity over the next decade. Yet, even in

this restricted sample, the year-to-year change in plain English has a positive and economically

25

significant relationship with equity issuance. In column (3), the odds ratio implies a one standard

deviation increase in the change in plain English raises the odds of subsequently having an SEO

by 13%.

The evidence in this table is consistent with managers attempting to reduce information

asymmetries with outside investors. As the overall writing quality of the 10-K increases, so do

the odds of issuing equity even after controlling for various factors.

C. Plain English and Corporate Governance

Is there a relationship between our plain English measure and corporate governance? Do

firms with strong shareholder rights produce more readable 10-Ks? In Table VII, we report

regression results with our plain English measure as the dependent variable. The independent

variables are the Gompers, Ishii, and Metrick (2003) Corporate Governance Index, log of market

value on the filing date, and dummies for Nasdaq, auditor, Fama-French industry, and calendar

year.

We obtain the Gompers, Ishii, and Metrick (2003) Corporate Governance Index from

http://finance.wharton.upenn.edu/~metrick/data. The three authors use 24 different governance

rules to assign scores ranging from 1 to 24. Data is only available for the years 1995, 1998, 2000,

2002, 2004, and 2006. The higher the governance index is, the more dictatorial is the firm’s

polices (that is, weaker shareholder rights). The lower the index score, the more democratic the

company’s policies are. In the Table VII regressions, the sample is reduced to 9,615 observations

due to data availability of the Governance Index.

The coefficient on the Governance Index variable is negative and statistically significant

in each of the three regressions. That implies the higher is the index (i.e., more dictatorial firms),

26

the lower the plain English measure. Firms with more shareholder rights have significantly better

measures of 10-K readability. In the first regression, the Governance Index is the only

explanatory variable. The coefficient on the variable is -0.020 with a t-statistic of -5.17.

When the control variables are added in the second and third regressions, the coefficient

on the Governance Index remains significant. The last column reports that firms with strong

shareholder rights, small firms and those listed on Nasdaq have better plain English values after

controlling for auditor, industry, and calendar year.

VI. Conclusion

After performing a textual analysis on a sample of 56,079 10-Ks during 1994-2006, we

present evidence that a trend toward less readable 10-K filings was reversed with the SEC’s plain

English rule of October 1998. We find different pieces of evidence that the plain English rule has

been beneficial. We create a plain English variable that is an aggregate statistic which

standardizes word length, word commonality, and a series of writing components specifically

identified by the SEC.

The first finding is that our plain English variable reverses a downward trend and

gradually improves after the enactment of the October 1998 rule. Second, small investors have

much higher participation levels in trading following the 10-K filing for firms with improved

writing quality as measured by our plain English measure. This is consistent with the SEC’s goal

to make disclosure truly accessible to the “average” investor.

Third, greater improvement in plain English relates to increased odds of issuing seasoned

equity to outside investors. After controlling for factors including prior return, listed exchange,

and industry, we find a one standard deviation increase in the change of the plain English

27

variable increases the odds of issuing equity in the next year by 12.9%. Managers appear to be

lowering the information differences between themselves and outside investors through the

writing of their 10-K documents.

Lastly, we find that companies with more democratic corporate governance policies have

much higher plain English measures than companies with poor governance policies. Firms

whose management is shareholder friendly also create 10-Ks that are more readable.

In sum, our results indicate that the plain English rule produced a measurable impact on

participation of small investors to the extent management followed the SEC’s style guidelines for

writing. In addition to the regulation, managers consider writing style of sufficient importance to

improve their prose in anticipation of seeking additional equity funding. And, as might be

expected, shareholder friendly managers produce 10-Ks that are more user friendly. Importantly,

all of these changes were observed in 10-K filings where the change in style was not directly

mandated by an SEC rule. The changes appear to be a simple artifact of the SEC encouraging

firms to use plain English even where it was not required.

28

REFERENCES

A Plain English Handbook: How to create clear SEC disclosure documents, 1998, Office of Investor Education and Assistance, U.S. Securities and Exchange Commission, http://www.sec.gov/pdf/handbook.pdf.

Asthana, Sharad, Balsam, Steven, and Sankaraguruswamy, Srinivasan, 2004, Differential

response of small versus large investors to 10-K filings on EDGAR, Accounting Review 79, 571-589.

Baayen, R. Harald, 2001, Word frequency distributions, Kluwer Academic Publishers, The

Netherlands. Benston, George, 1973, Required disclosure and the stock market: An evaluation of the

Securities Act of 1934, The American Economic Review 63, 132-155. Bushee, B. and C. Leuz, 2005, Economic consequences of SEC disclosure regulation: Evidence

from the OTC bulletin board, Journal of Accounting and Economics 39, 233-264. Dudley, William, How should central banks respond to asset bubbles, NBER Conference on

Asset Prices and Monetary Policy, May, 2006. Ellenberger, J. S. and Ellen P. Mahar, 1973, Legislative history of the securities exchange act of

1933 and Securities Exchange Act of 1934, F. B. Rothman, New Jersey. Fama, E. and French, Kenneth, 1997, Industry costs of equity, Journal of Financial Economics

43, 153-193. Gompers, Paul, Joy Ishii and Andrew Metrick, 2003, Corporate governance and equity prices,

Quarterly Journal of Economics 118, 107-155. Graham, J., Harvey, C., 2001, The theory and practice of corporate finance: Evidence from the field, Journal of Financial Economics 60, 187-243. Greenstone, M., Oyer, P., and Vissing-Jorgensen, A., 2006, Mandated disclosure, stock returns

and the 1964 Securities Acts amendments, Quarterly Journal of Economics 121, 399-460. Hanley, Kathleen Weiss and Hoberg, Gerard, 2008, Strategic disclosure and the pricing of initial

public offerings, Working paper, University of Maryland. Havrilesky, Thomas, 1988, Monetary policy signaling from the administration to the Federal

Reserve, Journal of Money, Credit and Banking 20, 83-101. Hogan, C.E. and D.C. Jeter, 1999, Industry specialization by auditors, Auditing: A Journal of

Practice and Theory 18, 1-17.

29

Jarrell, Gregg A., 1981, The economic effects of federal regulation of the market for new security issues, Journal of Law and Economics 24, 613-675.

Korajczyk, R., Lucas, D., McDonald, R., 1990, Understanding stock price behavior around the

time of equity issues, in R. Glenn Hubbard, Ed.: Asymmetric Information, Corporate Finance, and Investment (University of Chicago Press, Chicago).

Li, Feng, 2007, Annual report readability, current earnings, and earnings persistence, Working

paper, University of Michigan. Loughran, T., Ritter, J., 1995, The new issues puzzle, Journal of Finance 50, 23-51. Mikheev, Andrei, 1998, Feature lattices for maximum entropy modeling, Proceedings for the

36thAnnual Meeting of the Association of Computational Linguistics, 848-854. Reed, Robert R., 1920, “Blue Sky” laws, Annals of the American Academy of Political and

Social Science 88, 177-187. Riley, Michael D., 1989, Some applications of tree-based modeling to speech and language

indexing, Proceedings of the DARPA Speech and Natural Language Workshop, 339-352. SEC Release #33-7497, http://www.sec.gov/rules/final/33-7497.txt. SEC Release #34-38164, http://www.sec.gov/rules/proposed/34-38164.txt. Simon, Carol J., 1989, The effect of the 1933 Securities Act on investor information and the

performance of new issues, American Economic Review 79, 295-318. Smith, Rodney T., 1981, Comments on Jarrell, Journal of Law and Economics 24, 677-686. Stigler, George J., 1964, Public regulation of the securities markets, Journal of Business 37, 117-

142. Tetlock, Paul C., 2007, Giving content to investor sentiment: The role of media in the stock

market, Journal of Finance 62, 1139-1168. Tetlock, Paul C., Maytal Saar-Tsechansky, and Sofus Macskassy, 2007, More than words:

Quantifying language to measure firms’ fundamentals, Journal of Finance, forthcoming. White, Halbert, 1980, A heteroskedasticity-consistent covariance matrix estimator and a direct

test for heteroskedasticity, Econometrica 48, 817-838.

30

Figure 1. Number of 10-Ks in the sample and median market capitalization in millions of dollars by month.

31

Figure 2. Annual number of firms with 10-K filings included in the sample, annual number of firms with CRSP (share type code of 10 or 11) data and median market capitalization in dollars for the sample, 1994-2006. Electronic filing was required for all firms by the SEC beginning in May, 1996.

32

Figure 3. Mean and median of Plain English Measure, 1994-2006. The plain English rule took effect in October, 1998.

33

Figure 4. Berkshire Hathaway and Plain English. Plain English values for Berkshire Hathaway, average values for all firms in the sample excluding Berkshire Hathaway (BRK), and average values for all firms with SIC=6331 excluding BRK, for the 1994-2006 time interval.

34

Figure 5. 10-K mean and median number of words per document, 1994-2006.

35

Figure 6. Proportion of trades within lot categories, 1994-2006

36

Figure 7. Plain English measure across Fama and French (1997) 48 industries.

37

Figure 8. Change in the proportion of 100-lot trades relative to the change in Plain English decile. Changes are based on the mean value of the variables for each firm before and after the plain English initiative. Decile ten contains firms with the largest positive change in the plain English measure from the pre and post period.

38

39

Table I Sample Creation

This table reports the impact of various data filters on the sample size. Requiring availability of certain information on the Center for Research in Security Prices (CRSP) and the NYSE Trade and Quote (TAQ) databases largely reduced the sample to 56,079 firms with 10-Ks.

Source/Filter

Sample Size

Observations Removed

Edgar 10-K 1994-2006 Complete Sample 104,621 CRSP Permno Match 66,103 38,518CRSP Market value available 60,731 5,372Reported on CRSP as an Ordinary Common Equity Firm 56,690 4,041TAQ Match 56,414 276Number of words in 10-K > 5,000 56,326 88Include only first filing in a given year 56,116 210At least 180 days between filings 56,079 37Final Sample 56,079

40

Table II

Calculation of Plain English Measure The table reports the estimated coefficients for each of nine dependent variables on the independent variable Log(# of words), which is the natural logarithm of the number of words in the 10-K document. Note that all of the dependent variables are also log transforms. The standardized residuals from these regressions are aggregated to create the plain English measure.

Ave

rage

Wor

d Si

ze

Com

mon

Wor

d

Lega

lese

Wea

k V

erb

Neg

ativ

e Ph

rase

Pers

onal

Pr

onou

n –

We

Pers

onal

Pr

onou

n –

You

Res

pect

ivel

y

Supe

rflu

ous

(1) (2) (3) (4) (5) (6) (7) (8) (9)

Log(# of words) -0.022 -0.006 1.050 1.197 0.665 0.918 0.987 0.481 1.307

(-160.34) (-46.73) (269.86) (623.30) (169.61) (89.42) (109.64) (121.20) (465.05)

Constant 1.929 -0.100 -7.410 -8.422 -5.901 -5.210 -9.011 -1.841 -10.28

(1,333.74) (-68.87) (-182.34) (-420.09) (-144.10) (-48.62) (-95.84) (-44.45) (-350.46)

Observations 56,079 56,079 56,079 56,079 56,079 56,079 56,079 56,079 56,079 R-squared 31.4% 3.8% 56.5% 87.4% 33.9% 12.5% 17.7% 20.8% 79.4%

Table III Sample Summary Statistics

The 10-K sample is 56,079 firm year observations over the 1994-2006 time period. The sample is also divided into sub-periods around the SEC’s plain English rule of October 1998. The Plain English variable is an aggregate statistic that standardizes word length, word commonality, and a series of writing components specifically identified by the SEC. The market values from CRSP are as of the 10-K filing date. Average word length is the average character length across all words used in a given 10-K. Word commonality measures the average frequency of use for words appearing in a given document. Using the NYSE Trade and Quote (TAQ) data over a 21-day period starting on the filing date, we tabulate the proportion of trades within a given trade size category. The SEO dummy variable is set to one if the firm issued equity in the subsequent year to the 10K filing, else zero. The Nasdaq dummy is set to one if the firm is listed on Nasdaq at the time of the filing, else zero. Five auditor dummies are created. Thus, the Andersen dummy takes the value of one if Arthur Andersen was the auditor (zero otherwise). Other auditor dummy is set to one if a non-big 5 auditor is used, else zero. Auditor Switch Dummy is set equal to one if more than one of the five major auditors appears in the 10-K. The Gompers, Ishii, and Metrick (2003) Governance Index is only available for 9,615 firms. (1) (2) (3) Time Period

1994-Sept. 1998

Oct. 1998-2006

1994- 2006

10-K Observations 17,620 38,459 56,079 Plain English Measure -0.33 0.15 0.00 Average Market Value (in millions) $1,687.2 $2,642.6 $2,342.4 Average Word Length 5.44 5.47 5.46 Number of Words in 10-K 36,414.1 44,669.4 42,075.6 Word Commonality 0.85 0.85 0.85 Proportion Trades 1-100 15.1% 39.2% 31.7% Proportion Trades 101-500 33.7% 34.0% 33.9% Proportion Trades 501-1,000 26.1% 14.3% 18.0% Proportion Trades 1,001-10,000 23.4% 11.8% 15.4% Proportion Trades > 10,000 1.7% 0.6% 1.0% SEO Dummy 4.3% 5.5% 5.1% Nasdaq Dummy 57.0% 61.5% 60.1% Andersen Dummy 15.5% 8.0% 10.4% Deloitte Dummy 11.5% 12.9% 12.5% Ernst Dummy 16.1% 18.8% 18.0% KPMG Dummy 14.0% 14.9% 14.6% PWC Dummy 19.1% 17.8% 18.2% Other Auditor Dummy 7.0% 10.2% 9.2% Auditor Switch Dummy 16.9% 17.5% 17.3% GIM (2003) Governance Index 9.06 9.05 9.05

41

Table IV Relation between Plain English and Auditor

The table reports the estimated coefficients of a regression with plain English as the dependent variable. The tabulated independent variables are dummy variables for each auditor, with PWC the excluded auditor dummy, the cross-product of each auditor dummy and Post Oct. 1998 dummy, a Post Oct. 1998 dummy that is one after October 1, 1998, otherwise zero, the natural log of market capitalization (Log(size)), and a Nasdaq dummy. Included in the regression but not tabulated are an intercept, industry dummies based on the Fama and French 48 SIC categories, and year dummies. Standard errors are clustered for individual firms. The t-statistics (in parentheses) are calculated using White’s (1980) heteroskedasticity consistent methodology.

Independent Variables (1) (2) Andersen Dummy -0.008 0.007 (-0.22) (0.19) Andersen*Post Oct. 1998 Dummy -0.162 0.021 (-4.06) (0.53) Deloitte Dummy 0.124 0.109 (3.20) (2.91) Deloitte*Post Oct. 1998 Dummy -0.023 -0.048 (-0.54) (-1.15) Ernst Dummy -0.040 -0.024 (-1.18) (-0.73) Ernst*Post Oct. 1998 Dummy 0.105 0.063 (2.62) (1.61) KPMG Dummy 0.069 0.046 (2.03) (1.39) KPMG*Post Oct. 1998 Dummy 0.025 -0.001 (0.61) (-0.02) Other Auditor Dummy 0.325 0.280 (7.47) (6.57) Other Auditor*Post Oct. 1998 Dummy -0.058 -0.154 (-1.23) (-3.37) Switch Dummy -0.038 -0.033 (-1.25) (-1.10) Switch*Post Oct. 1998 Dummy 0.076 0.085 (2.07) (2.36) Post Oct. 1998 Dummy 0.443 0.030 (15.90) (0.61) Log(size) 0.005 -0.017 (1.04) (-3.75) Nasdaq Dummy 0.114 0.085 (6.63) (4.73) Intercept Yes Yes Fama-French Industry Dummies No Yes Year Dummies No Yes Observations 56,079 56,079 R2

adjusted 6.2% 12.7%

42

Table V

Regressions with the Change in the Proportion of 100-lot Trades as the Dependent Variable

The table reports the estimated coefficients of a regression with the change in the proportion 100-lot trades (100 shares or less) as the dependent variable. All change variables are based on the difference between the mean value of the variable for a given firm before and after the plain English initiative on October 1, 1998. Log(size) is the natural logarithm of the average market capitalization in the post period. Nasdaq in this table represents the proportion of periods the firm was listed on the Nasdaq in the post-Plain English time period. The Fama-French Industry Dummies are based on the most frequent classification occurring in the post-period. The Auditor Proportions variables are the portion of post-periods that the firm was associated with each of the auditor classifications. The t-statistics (in parentheses) are calculated using White’s (1980) heteroskedasticity consistent methodology.

Independent variables (1) (2) (3) Pre and Post 1998 Change in Plain English 0.028 0.017 0.018 (12.25) (8.48) (8.98)

Log(average size) 0.019 0.020 (19.58) (19.35)

Nasdaq 0.156 0.159 (41.10) (39.43)

Intercept Yes Yes Yes Auditor Proportions No No Yes Fama-French Industry Dummies No No Yes

Observations 5,030 5,030 5,030 R2

adjusted 2.8% 26.4% 29.6%

43

Table VI Logit Regression of the Probability of Issuing Seasoned Equity in the Subsequent Year

The dependent variable, Equity Issuance Dummy, has a value of one if the firm issued equity in the year after the 10-K filing, zero otherwise. Change in Plain English is the difference in the Plain English measure from the prior year’s filing. Prior return is the raw buy-and-hold return for the firm in the year prior to the 10-K filing. Nasdaq Dummy is equal to one if the firm is listed on Nasdaq, zero if the firm is listed on NYSE or Amex. Log(size) is the natural log of the market value at the time of 10-K filing. Included in the regression but not tabulated are an intercept, auditor dummies, industry dummies based on the Fama and French 48 categories, and year dummies. Standard errors are clustered for individual firms. White’s (1980) heteroskedasticity-adjusted z-statistics are in parentheses. The odds ratios (in brackets) are given for a one standard deviation increase in the independent variable. Columns (3) and (4) restrict the sample to include only firms that issued an SEO at least once during our time period.

Independent Variables (1) (2) (3)

(4)

Year-to-Year Change in Plain English

0.137 (5.41) [1.147]

0.121 (4.81) [1.129]

0.122 (4.37) [1.130]

0.096 (3.40) [1.100]

Prior Year Return

0.233

(10.80) [1.262]

0.296

(10.31) [1.344]

Log(size)

0.202 (14.71) [1.224]

0.049 (3.01) [1.050]

Nasdaq Dummy

0.442 (6.40) [1.556]

0.166 (2.82) [1.181]

Intercept Auditor Dummies FF Industry Dummies Year Dummies Only firms with SEO

Yes Yes Yes Yes No

Yes Yes Yes Yes No

Yes Yes Yes Yes Yes

Yes Yes Yes Yes Yes

Observations 46,109 46,109 12,883 12,833

44

Table VII Regressions of the Plain English Variable on the Gompers, Ishii, and Metrick (2003)

Corporate Governance Index and Other Variables The dependent variable, Plain English, is an aggregate statistic that standardizes word length, word commonality, and a series of writing components specifically identified by the SEC. Governance Index is from Gompers, Ishii, and Metrick (2003). The Nasdaq dummy variable is equal to one if the firm is listed on Nasdaq, zero if the firm is listed on NYSE or Amex. Log(size) is the natural log of the market value at the time of 10-K filing. Included in the regression but not tabulated are an intercept, auditor dummies, industry dummies based on the Fama and French 48 categories, and year dummies. White’s (1980) heteroskedasticity-adjusted t-statistics are in parentheses. The Gompers, Ishii, and Metrick Governance Index is available only for years 1995, 1998, 2000, 2002, 2004, and 2006.

Independent Variables (1) (2) (3)

Governance Index -0.020 -0.012 -0.012 (-5.17) (-3.23) (-3.03)

Log(size) 0.006 -0.019 (0.88) (-2.73)

Nasdaq Dummy 0.210 0.102 (9.20) (4.19)

Intercept Auditor Dummies FF Industry Dummies Year Dummies

Yes No No No

Yes No No No

Yes Yes Yes Yes

Observations R2

adjusted

9,615 0.3%

9,615 1.1%

9,615 11.2%

45

plain english - university of notre damefinance/020601/news/loughran...the plain english rule only...

Documents