plain english - university of notre damefinance/020601/news/loughran...the plain english rule only...
TRANSCRIPT
Plain English
Tim Loughran Mendoza College of Business
University of Notre Dame Notre Dame, IN 46556-5646
574.631.8432 voice [email protected]
Bill McDonald
Mendoza College of Business University of Notre Dame
Notre Dame, IN 46556-5646 574.631.5137 voice [email protected]
March 26, 2008
Abstract: In October of 1998 the SEC implemented a rule requiring firms to use “plain English” in their prospectus filings, with explicit encouragement to incorporate this presentation style in all disclosure documents. We perform a textual analysis on 56,079 10-K electronic filings from 1994 to 2006 and provide empirical evidence that the rule has produced its intended effect. We document significant increases in participation by small investors for firms showing greater improvement in plain English compliance, even after controlling for the market-wide increase in 100-lot trades. In addition, we find that firms issuing seasoned equity exhibit a significant positive change in their 10-K writing style in the year prior to issuance. Firms with high plain English attributes also have corporate governance policies that are significantly more shareholder friendly. Our results suggest that regulators can impact market outcomes simply by using their “bully pulpit.”
We thank Robert Battalio, Margaret Foster, and Jennifer Marietta-Westberg for helpful comments.
Plain English
The SEC sampled expense information for 1,500 registration statements filed in 1995 and found
that about 48% of the documented preparatory hours were attributable to legal and technical
writing, with the remaining 52% of hours billed to accounting. Interestingly, in spite of this
approximately 50/50 mix in preparing financial disclosures the vast majority of financial and
accounting research has focused on the accounting numbers while spending significantly less
time looking at the context and framing of these numbers. Only recently has the finance literature
focused on identifying relations between the text and numbers, as technology has improved our
ability to parse disclosure documents and any other related text (see, for example, Hanley and
Hoberg (2007), Li (2007), Tetlock (2007), and Tetlock, Saar-Tsechansky, and Macskassy
(2007)).
In our research, we examine the SEC’s plain English rule of October 1998. While the
central theme of both the 1933 Securities Act and the 1934 Securities Exchange Act was
disclosure, the refined point of the plain English rule is making the disclosures truly accessible to
the “average” investor. Average investors will be less able to assess and less likely to invest in
companies whose financial disclosures are buried in legal jargon and obtuse language.
The plain English rule is precise in mandating that firms’ prospectuses “must use plain
English principles in the organization, language, and design of the front and back cover pages,
the summary, and the risk factors section.” The rule becomes somewhat less precise, however,
when it requires that the writing in these sections of the prospectus “substantially complies with”
a list of plain English principles.
1
To measure the notion of disclosure style, we create a standardized statistic that
aggregates word length, word commonality, and a series of writing components specifically
identified by the SEC. In the communications surrounding the development and implementation
of the plain English rule, the SEC clearly encourages firms to adopt these principles in all their
filings and communications with shareholders.1 We consider a broad sample of 56,079 10-K
filings during the period 1994-2006.
We find that our measure of plain English does “improve” after the regulation is enacted,
but not in a singular leap around the rule date. Throughout this paper we will use the term
“improve” to denote an increase in the plain English measure. We use yearly dummy variables to
control for the gradual increase in plain English usage. Interestingly, both the mean and median
words contained in the 10-Ks sharply increase during the time period. The median number of
words per document increases from about 25,000 in 1996 to over 40,000 by 2006.
We link the text-based measures to trade lot sizes and use the proportion of 100-lot trades
as a measure of “average” investors. Additionally, we control for the impact of industry (as
defined by Fama and French (1997)) and auditor. We find substantial differences in compliance
across industries and only limited differences across the big 5 auditors.
Using period-to-period differences to control for market structure changes occurring over
this time interval, we find that the plain English rule has its intended effect. That is, there is a
clear positive relation between the improvement in a firm’s plain English measure and changes
in the proportion of 100-lot trades.
We then apply our plain English measure to a logit model predicting seasoned equity
offerings (SEOs). We find that firms showing a higher year-to-year change in plain English
usage are more likely to issue seasoned equity in the following year. Within the subsample of 1 See page 4 of the Plain English Handbook and page 68 of SEC Release #34-38164.
2
firms issuing seasoned equity, companies are more likely to do so in a year when they have also
improved the plain English compliance of their 10-K. Thus, in addition to our evidence
supporting the impact of legislation based on the 100-lot trade results, the SEO results indicate
that managers value transparency in the form of more readable documents when issuing
additional shares. Finally, we link the 10-K data to the Gompers, Ishii, and Metrick (2003)
corporate governance index and find that firms with shareholder friendly governance structures
are more likely to have 10-K filings that score high on our plain English measure.
Although often ignored by the academic literature, the technical writing of a 10-K has
real importance. Our paper’s contribution is in documenting how subsequent trading by small
investors and issuance of seasoned equity is facilitated by the writing style of the 10-K. Firm
managers have a choice in their writing style. In public documents, managers can elect to make
their 10-K’s incoherent to all but highly trained lawyers by the overuse of legalese, uncommon
words, and superfluous language. Or, firm managers can improve the transparency of their firms
by creating documents that typical retail investors can more easily comprehend. Following
prodding by the SEC, firms have measurably improved the writing style of their 10-Ks.
More generally, we show that regulators can impact market outcomes simply using the
“bully pulpit” of their office. The plain English rule only requires changes in prospectus filings,
not 10-Ks. The impact of plain English we observe in the 10-K sample is simply an artifact of
encouragement by the SEC to adopt the plain English guidelines across all filings.
Section I of the paper reviews the history of disclosure requirements and the plain
English rule. Section II describes the process of parsing 10-K documents available on the SEC’s
Edgar web site. Our measure of plain English is defined in Section III. Section IV of the paper
3
reports the summary statistics. Section V presents the empirical results. Concluding comments
are provided in Section VI.
I. Security Regulation and the Plain English Rule
A. Legislative History and Disclosure Requirements
As the United States underwent rapid industrialization in the late 19th and early 20th
centuries there was a corresponding increase in the demand for capital, and with it an increase in
dealers selling securities representing fraudulent or, at best, ephemeral firms. The Securities Act
of 1933 and the Securities Exchange Act of 1934 evolved from “Blue Sky” laws which began to
multiply across the states beginning in 1910 with Rhode Island and the more well-known
example of Kansas in 1911.
These laws became popular across the states as a means of mitigating the problem of
fraudulent offerings. The Supreme Court upheld the constitutionality of these laws in 1917 and
by 1933 every state except Nevada had some form of a “Blue Sky” law. Reed (1920) notes,
however, that the laws were considered by some to be “hopelessly crude and unworkable,” and
many of the “Blue Sky” laws were subsequently repealed or ignored. The need for security
regulation at the national level was generally acknowledged, but the move to federal legislation
was slowed by a divergence of opinion on how the security markets should be controlled.
“Blue Sky” laws created a norm where state officials passed judgment on a firm’s
financial viability and decided whether investors would derive a “fair” return.2 Many argued that
the government should not be responsible for assessing the financial viability of securities and
should focus only on assuring full disclosure (see, for example, Ellenberger and Mahar (1973)).
2 See for example Ohio Rev Code Ann. § 1707.09, page 1953, Section 9.
4
The 1929 market crash focused attention to national regulation, and after many failed attempts
by Congress to pass legislation, the 1933 Act was finally enacted.
One of the concerns throughout the history of implementing these regulations was the
readability of the materials that were disclosed. While the SEC through interpretive advice and
other means tried to improve the readability of the mandated filings, there did not appear to be
notable change (see SEC Release 33-7497, p. 66). In the late 1990s, Arthur Levitt as the
Chairman of the SEC, championed the cause of improving disclosure documents:
“Investors need to read and understand disclosure documents to benefit fully from the protections offered by our federal securities laws. Because many investors are neither lawyers, accountants, nor investment bankers, we need to start writing disclosure documents in a language investors can understand: plain English.” (A Plain English Handbook, p. 3.)
B. Is Federal Regulation of Securities Markets Effective?
In the academic literature, there has been enormous debate over whether federal
regulation is actually effective. Stigler (1964) was among the first to test the measurable impact
of disclosure requirements. In a pointed response to the 1963 Cohen report— a report which
deemed security regulation and the SEC a resounding success—Stigler criticizes both its
qualitative and quantitative conclusions.3 Stigler finds no evidence of significant differences in
new issue performance across the regulatory regimes, and provides similar conclusions for a
separate sample of preferred stocks.
Jarrell (1981), like Stigler (1964), focuses on the 1933 Act and finds similar evidence of
lower post-regulation risk. As an alternative explanation for this finding, he suggests that higher
regulatory costs have simply pushed higher risk ventures out of the public markets. Smith (1981)
3 The Cohen report refers to the Report of the Special Study of the Securities Markets of the Securities and Exchange Commission (88th Congress, 1st session, House Document 95, 1963, Washington, D.C.: Government Printing Office). Milton Cohen chaired the committee that produced the report.
5
summarizes Jarrell’s assessment as “According to Jarrell, SEC regulation of new security issues
has been an abysmal failure.”
Simon (1989) also documents a decrease in abnormal returns following the 1933 Act,
with the effect being larger for unseasoned non-NYSE issues. Since the regulation should have
the greatest effect on securities where private information costs were highest, she argues that her
evidence supports the notion of reductions in investor forecast errors attributable to the better
information environment produced by the 1933 Act. She acknowledges that her evidence does
not preclude the argument that higher risk issuers simply moved to unregulated markets. Further,
many “confounding factors” such as the 1929 crash and Great Depression make it difficult to
convincingly attribute any changes in return characteristics specifically to the advent of federal
regulations in 1933 and 1934.
Benston (1973) finds little evidence of changes in the occurrence of fraud before and
after the 1934 Act. Looking at pre-regulation delistings and comparing stocks that did or did not
disclose financial data, interestingly Benston found that investors were actually better off owning
securities of firms that did not disclose.
More recently, Bushee and Leuz (2005) and Greenstone, Oyer, and Vissing-Jorgensen
(2006) examine the economic impact of mandatory disclosure. Bushee and Leuz (2005) find that
over 76% of sample firms were removed from the Over-The-Counter (OTC) Bulletin Board
rather than comply with SEC mandatory disclosure requirements relating to the 1999 “eligibility
rule.” Their evidence strongly suggests that mandated disclosure has a high cost for smaller
firms.
Greenstone, Oyer, and Vissing-Jorgensen (2006) provide evidence that mandatory
disclosure can dramatically increase stock market values. Focusing on the OTC firms most
6
affected by the 1964 Securities Acts Amendments, the three authors find a 11.5% to 22.1%
increase in market values over an 23-month period from when the law was proposed to when it
went into force.
In summary, major events, like the Great Depression, bracketing the Securities Acts
makes definitive conclusions from prior research hard to make. By considering incremental
regulation associated with the plain English rule, we can more precisely examine the debate
concerning the effectiveness of government regulation versus the self-regulation of market
discipline.
C. The Plain English Rule
The plain English rule became effective October 1, 1998. The SEC Staff Legal Bulletin
No. 7 provides a summary of the rule and corresponding amendments:
“… companies filing registration statements under the Securities Act of 1933 must: • write the forepart of these registration statements in plain English; • write the remaining portions of these registration statements in a clear,
understandable manner; and • design these registration statements to be visually inviting and easy to read.”
Rule 421(d) specifically requires that issuers must:
“… substantially comply with these plain English principles: • short sentences • definite, concrete everyday language; • active voice; • tabular presentation of complex information; • no legal jargon; and • no multiple negatives.”
Additionally, Rule 421(b) was amended, prescribing stylistic approaches that should be avoided
such as “legal and highly technical business terminology” or “legalistic or overly complex
presentations that make the substance of the disclosure difficult to understand.”
7
Although the plain English rule is mandated only for prospectuses, in documentation
surrounding the rule’s release the SEC clearly encourages firms’ conformance with the rule in all
filings. Arthur Levitt, as then Chairman of the SEC, in his forward to A Plain English Handbook
concludes with: “I urge you—in long and short documents, in prospectuses and shareholder
reports—to speak to investors in words they can understand.” (p. 4) The SEC in their proposed
rules document states: “Our ultimate goal is to have all disclosure documents written in plain
English …” (release #34-38164, p. 24) and later in the document “We also encourage you to use
these techniques for drafting your other disclosure documents.” Thus, we focus on the sample of
annual 10-K reports, which provides us a large sample of firms over an extended time interval
and allows us to test a broader range of hypotheses.
In the subsequent tests where we focus on the impact of regulation, notice that our focus
on 10-K filings tests an even more subtle relation between markets and regulation. In
discussions of the president’s role in the United States’ monetary policy, economists frequently
refer to the process of coercion through public comments from the platform of governmental
office as the “bully pulpit” (see Havrilesky, 1988). Similarly, the chairman of the Federal
Reserve is assumed to impact policy expectations from his “bully pulpit.” (See “Bush and Fed
Step Toward a Mortgage Rescue, March 5, 2008, The New York Times or Dudley (2006).) The
plain English mandate for 10-Ks is not based on an SEC regulation or specific legislation, but is
simply an artifact of the SEC’s use of their “bully pulpit” to encourage broader adoption of a rule
mandated for prospectuses.
8
II. Data
A. The 10-K Sample
Although electronic filing was not required by the SEC until May 1996, a significant
number of forms are available on EDGAR beginning in 1994.4 Until 2003, a box on the front
page of the 10-K form was to be check marked if a “disclosure of delinquent filers pursuant to
Item 405” was not contained in the current filing, nor anticipated to be disclosed in statements
incorporated by reference or amendments. If this box was checked, the form was filed as a 10-
K405. In 2001, almost one-third of the 10-K filings were 10-K405 forms.
According to the SEC, because there was confusion and inconsistency in making this
choice, the 405 provision was eliminated after 2002. Because this choice has no impact on the
focus of our study, we include both 10-K and 10-K405 forms in our sample and make no
distinction in subsequent analysis. We do not include amended documents, 10-K/A or 10-
K405/A, in the sample.
The initial 10-K sample covering 1994-2006 contains 104,621 documents. For our tests
we link the 10-K sample to both the Center for Research in Security Prices (CRSP) and NYSE
Trade and Quote (TAQ) databases. We use the WRDS CIK file to link the SEC’s CIK identifier
to a CRSP PERMNO. We then use CRSP ticker symbols to link to the TAQ database.
4 The earlier work of Asthana, Balsam, and Sankaraguruswamy (2004) reports that small trades (i.e.,
average investors) are more likely to reflect the information disclosed in a 10-K than large trades after the filings became freely available on EDGAR.
9
B. Parsing the 10-K documents
The EDGAR web site contains quarterly master files listing a filename for each
document filed during that quarter. We use this master index file to identify the relevant filings,
which are programmatically downloaded and parsed.
Many of the variables we use to examine the plain English initiative are based on parsing
the 10-K documents into a list of words. Most of the parsing is done using regular expression
search patterns. We first remove from the document all ASCII-encoded graphics, carriage-
returns/line feeds, and punctuation. We remove all HTML coding. (The quantity of HTML code
embedded in the documents increased exponentially over the sample interval.) All remaining
tokens bounded by spaces are then compared to a word list to determine if the token is a word.
To identify a word, we use release 4.0 of the 2of12inf word list, available at
http://wordlist.sourceforge.net/12dicts-readme.html, which contains a word list originally based
on twelve source dictionaries, subsequently expanded to include other sources. The list contains
81,520 words but does not include abbreviations, acronyms, or names. The “inf” version
includes word inflections. Once the document is parsed into a vector of words, we then tabulate
the specific words and phrases identified as good or bad examples based on the SEC
documentation relating to the plain English initiative.
10
C. Control Variables
In addition to controlling for yearly variation in the data, we control for the impact of
both industry and auditor. For industry classifications we use the 48 industry grouping of Fama
and French (1997). SIC codes were parsed from the 10-K filings and are self-reported by the
firms.
Auditor variables are based on a text search of the 10-K. The documents are searched for
the big-5 auditing firms: Arthur Andersen, Deloitte & Touche, Ernst & Young, KPMG, and
PricewaterhouseCoopers. From 1998-2002, observations of Price Waterhouse and Coopers &
Lybrand are both classified as PricewaterhouseCoopers. Arthur Andersen drops out of the
sample due to its bankruptcy in 2002. If none of these auditor names are found in the 10-K filing
then the auditor is classified as Auditor Other. If multiple names are found in the document, then
the auditor is classified as Switch. In reviewing the sampling results we found that in most cases
where there were multiple auditor names, the firm had changed auditors in the recent past.
Although this is not always the case, we wanted to distinguish this case from Auditor Other.
As an interesting artifact of our auditor classification procedure, our textual search
identified 245 unique times in which Arthur Andersen was misspelled (i.e., having ending of –
son instead of –sen) within a 10-K document where Arthur Andersen was the auditor. As an
example, in the 1994, 1996, 1997, and 2002 letters to the shareholders of the International Paper
Company, the failed accounting firm listed its name as “Arthur Anderson LLP.” Similarly, the
1994, 1997, and 1998 10-Ks filed for ALLTEL made the same mistake. Notably, in both of these
cases where Arthur Andersen was misspelled, the error occurred in the signature line to the
“Report of Independent Public Accountants.”
11
D. Sample summary
Table I documents the sample formation process. Requiring a CRSP match with data to
calculate market capitalization and only including ordinary common equity firms (CRSP share
type code of 10 or 11), substantially reduces the original sample of 10-Ks. For example, Asset-
Backed Securities had over 10,000 observations in the original 10-K sample, primarily
attributable to filings for security offerings such as Exchange Traded Funds. These funds were
removed from the sample by applying the ordinary common equity filter.
A small number of firms, particularly in the early years, filed 10-Ks that were unusually
short and might, for example, simply incorporate documents by reference. Thus we eliminate 88
firms with 10-Ks containing less than 5,000 words. We also include only the first filing in a
given year for a firm and require at least 180 days between filings. After applying these filters
the final sample is 56,079.
Figure 1 presents the distribution of sample size and firm market capitalization by the 10-
K filing month. Approximately 57% of the 10-Ks are filed in the month of March. Most firms
have December 31st fiscal year-ends and will wait to file until the latest possible date. The
substantially larger median market capitalization in February is partly an artifact of a recent SEC
rule requiring large public float firms to file within 60 days of their fiscal year end, with smaller
firms allowed 70 days. (See SEC release #33-8644.) On average, 63%, 80%, and 90% of the 10-
Ks are filed by the end of the first, second and third quarter, respectively. Because the sample
size and composition is so heterogeneous across months, in subsequent analysis our unit of
analysis for time series will be years.
Figure 2 compares the annual number of firms in our final sample with the annual
number of firms having a share type code of 10 or 11 in the CRSP database. In all of our
12
analysis, we define year as the calendar year in which the 10-K was filed. So, Google’s
December 31, 2004 10-K which was filed on March 30, 2005, would be classified as being a
2005 observation. Additionally Figure 2 shows the median market capitalization of firms in the
sample by year. The implementation phase of electronic filing is apparent in the first three years
of the sample. In 1997, the first full year when electronic filing was required, the median market
capitalization of the sample reaches its lowest point of $145 million dollars. Larger firms
dominated the sample in years prior to the requirement of electronic filing.
Figure 2 also shows that both the number of firms on CRSP and firms in our sample
steadily fell after peaking in 1997 as the number of IPOs failed to keep pace with the volume of
mergers and distressed delistings. The difference between the potential universe of firms and
firms included in our sample is mostly due to a failure to match the CIK identifier with the CRSP
PERMNO or failure to match with the TAQ data. Every year since 1997, the number of firms
that appear within CRSP that is not in our sample shrinks.
III. A Measure of Plain English
Just as mandating writing style is difficult, so is measuring the degree of compliance.
Without deep parsing, which is itself subject to substantial error, at what point does a document
meet the threshold of being written in active voice? What is “clear and understandable”?
We use specific examples provided in the SEC documentation to create seven
components we include in our aggregate measure of plain English. These provide concrete
examples which we tabulate for each document.
• Legalese: A count of the 14 words and phrases identified in Staff Legal Bulletin No. 7 (http://www.sec.gov/interps/legal/cfslb7a.htm) as inappropriate legal jargon (e.g., “by such forward looking” or “hereinafter so surrendered”).
13
• Weak Verb: Weak verbs can take many forms. To avoid the ambiguities of deeply parsing the document into word types and then attempting to identify context for weak verbs, we tabulate only the two examples cited on page 19 of the Plain English Handbook, “to have” and “to be.”
• Negative Phrase: A count of 11 negative compound phrases identified on page 27 of the Plain English Handbook (e.g., “does not have” or “not certain”).
• Personal Pronoun-We: A count of the personal pronouns, which the handbook on page 22 indicates will “dramatically” improve the clarity of writing. “We” counts occurrences of “we,” “us,” “our,” and “ours.”
• Personal Pronoun-You: “You” counts occurrences of “you,” “your” and “yours.”
• Respectively: A count of the word “respectively,” which according to page 34 of the handbook is to be avoided.
• Superfluous: A count of the eight phrases identified as superfluous on page 25 of the handbook (e.g., “because of the fact that” or “in order to”).
In addition, we use measures of word length and word commonality to capture the notion
of “definite, concrete everyday language.”
• Average Word Length: The average number of characters per word in a given document.
• Word Commonality: Using the entire 10-K sample, we tabulate for each word the number of documents where a given word appears. Word Commonality is the average of this number across all words in a given document divided by the total number of documents. Thus, if Word Commonality=80%, the words in the current document appeared, on average, at least once in 80% of all documents in the total sample.
We choose not to include sentence length, one of the items specifically mentioned in the
rule. Using a simple heuristic of punctuation can parse sentences with about 90% accuracy (see
Riley 1989), with more sophisticated approaches achieving even higher levels (see Mikheev
(1998)). These rates are achieved, however, with traditional text, where the incorporation of
tables, lists and numbers is less frequent. Given the content structure of financial filings, we felt
that sentence parsing could create frequent and substantial errors. The variables Average Word
Length and Word Commonality should provide comparable constructs for “everyday language.”
14
We then need to combine the nine measures described above into an aggregate measure
of plain English. Two characteristics of word measures dictate the approach that we choose.
First, the first seven variables listed above are highly correlated with the total number of
words in a document. Obviously the likely magnitude of the word count variables increases with
the number of words. Average word length and word commonality also are likely to be impacted
by document length.
Second, the distribution of words in a document corresponds to what is labeled as a Large
Number of Rare Events distribution. Hapax legomena is the term used to describe words that
occur only once in a document. These singular occurrences produce what is by far the most
common frequency in word counts, one, which creates a highly skewed distribution. The use of a
log transformation on word counts is common in natural language processing and substantively
mitigates the skewness problem (see Baayen 2001). Thus, for the components of our plain
English measure based on word counts we use log transformations of one plus the word count.
We also use a log transform of Average Word Size and Word Commonality.
To combine these measures into a single metric, we separately regress the log transform
of each of the nine variables on the log of the number of words occurring in the document. The
regressions for each component of the plain English measure are reported in Table II. Average
Word Size declines as the number of words increases, indicating that larger documents are not
necessarily more complex. The Common Word regression has a negative coefficient; however
the r-square of 3.8% is by far the lowest among the regressions. The remaining variables are
significant and positively related to the number of words with r-squares ranging from 12.5% to
87.4%.
15
For each firm, we then sum the standardized residuals from the nine regressions, where
the standardized residuals based on the Word Commonality and both Personal Pronoun
regressions are positively signed, (i.e., common words and personal pronouns are positive
attributes), and standardized residuals for the remaining six variables are subtracted from the
total. This combination is then standardized, providing our variable labeled Plain English, where
more positive values represent documents that better conform to the writing standards
promulgated by the SEC.
A. Descriptive Results for the Plain English Measure
The mean and median for the plain English measure are reported by year in Figure 3.
With more than 90 percent of the 10-Ks filed in 1998 occurring before the date the rule became
effective in October of that year, the rule’s impact should potentially become apparent in the
1999 averages.
The measure decreases from -0.11 to -0.47 from 1994 to 1998. Recall from Figure 2 that
the market capitalization of the reporting firms also drops substantially over these first five years.
In a regression of the plain English measure on the natural log of market capitalization, the
coefficient is significant and negative.5 Thus as the average market capitalization of the sample
declined in the first four years, we would expect the plain English measure to actually increase
slightly. Instead we see the strong downward trend in plain English in the first five years with a
sharp reversal in the first full year under the new rule. There is a continuing positive trend in the
plain English measure from the time of implementation. This result indicates that even in the 10-
5 The t-statistic for log(size) in a regression on Plain English with year and industry dummies is -14.27. The coefficient on log(size) is significant and negative if the year and industry dummies are not included in the regression. The simple correlation between Plain English and both size and log(size) also is negative.
16
K sample, whose style mandate was only a “bully pulpit” artifact of a rule restricted to
prospectuses, the plain English rule had a substantial impact on the textual presentation.
B. A Berkshire Hathaway Anecdote
Warren Buffet is considered the poster boy for plain English, authoring the preface to the
SEC’s Plain English Handbook. Buffet’s famous letter preceding his annual reports is the
epitome of folksy and nontechnical writing. Because we can easily benchmark the filings of
Buffet’s Berkshire Hathaway, we briefly consider this anecdote in Figure 4.
Interestingly, although Buffet’s shareholder letters might be targeted toward “Doris and
Bertie,” his two sisters with non-business backgrounds, the time-series of his firm’s performance
on the plain English measure would suggest that until he was approached by the SEC to
champion the plain English cause, his record was at best mixed, with the 1995 and 1996 filings
substantially below the average score for the universe of all firms or for all firms in the same
industry as Berkshire Hathaway (SIC of 6331).
Berkshire’s 10-Ks show a dramatic change in writing style immediately after the plain
English initiative, but have reverted to average in the past few years. By 2006, the plain English
measure for Berkshire is the same compared to all firms or for firms within its industry.
Although some of Berkshire’s below average performance in the early years might be
rationalized as an artifact of the insurance industry’s legal complexity, the insurance industry
average also plotted in Figure 4 does not support this contention.
17
IV. Summary Statistics and Control Variable Results
A. Summary Statistics
Summary statistics for the sample variables are reported in Table III. The sample is
divided into two periods: prior to the October 1, 1998 plain English rule (column 1) and after
(column 2). The last column of the table lists the summary statistics for the entire period. The
number of observations, the average plain English measure, the average market values (as of the
10-K fling date), and the average number of words contained in the 10-K have larger values
during the second time period.
Figure 5 presents the mean and median number of words per document over the 1994-
2006 period. The dip that occurs in the first few years reflects the tendency for early adopters of
electronic filing to be bigger firms filing larger documents. Clearly, 10-K filings have become
more verbose, with the median number of words rising from 26,000 in 1997, the first full year of
mandatory electronic filing, to well over 40,000 in the final sample year of 2006. The passing of
Sarbanes-Oxley in 2002 could account for the substantial shift in word count apparent in Figure
5 from the years 2001 to 2004, with stabilization in the subsequent years.
As the number of words in a 10-K has increased, Table III reports that the average word
length and word commonality are quite similar between the two periods. As an example, the
average word length was 5.44 letters prior to October 1998 compared to 5.47 letters after the
plain English rule.
Table III also reports that the two time periods differ substantially in terms of the
proportion of trades within a trade size category. From the TAQ data we tabulate the proportion
of trades within a given trade size category. We consider the following five categories:
Variable Shares traded (s) Proportion Trades 1-100 s <= 100
18
Proportion Trades 101-500 100 < s <= 500 Proportion Trades 501-1,000 500 < s <= 1,000 Proportion Trades 1,001-10,000 1,000 < s <= 10,000 Proportion Trades >10,000 s > 10,000
We tabulate this proportion for the period beginning on the document filing date and for
the subsequent 20 days, creating a 21-day sample window. Firms must have at least one day of
trading in the 21-day window to be included in the sample.
During 1994 to September 1998, 15.1% of all trades were for between 1 and 100 shares.
In the second period (October 1998 to 2006), that proportion jumped to 39.2%. In the earlier
period, 23.4% of all trades were in the 1,001-10,000 share category. In the later period, less than
half the number of trades from the earlier period (only 11.8%) was in the 1,001-10,000 trade
category.
After October 1998, across the three major trading venues, almost 75% of all trades were
for 500 shares or less. Figure 6 reports the proportions of trades within each of the five lot
categories by each calendar year of our sample. The proportion of trades for 100 shares or less
actually reaches 60% in 2006 after being only 15% in 1997.
As the NYSE, Amex, and Nasdaq moved toward quoting stock prices in decimals, the
quoted depth reduced in size. Investors received better prices (i.e., closer to the mid-point) while
simultaneously being able to trade fewer shares at the improved price. Starting on January 29,
2001, all NYSE-listed stocks could be priced in decimals. For Nasdaq, all listed firms could be
priced in decimals by April 9, 2001.
Following decimalization and the advent of electronic communication networks (ECNs),
large investors increasingly split up their order for trade execution. So instead of submitting an
order to buy 10,000 shares of Microsoft, investors might break the order into 20 different
19
segments of 500 shares. Additionally, when retail investors submitted market orders, the
brokerage house might execute trades at prices that differ by one penny. These factors are the
major drivers in the increase in 100-lot trades observed over the sample interval.
There was slightly more seasoned equity issuance in the later time period. After October
1998, on average, 5.5% of firms had an SEO compared to 4.3% in the earlier time period. A
slightly larger percentage of the sample universe lists on Nasdaq versus the Amex or the NYSE
in the later period.
Lastly, Table III reports the proportion of big 5 auditors and the Gompers, Ishii, and
Metrick (2003) Governance Index. Over the entire time period, less than 10% of the firms used a
non-big 5 auditor. PricewaterhouseCoopers (PWC) audited the highest percentage of firms
(19.1%) prior to the plain English rule while Ernst & Young had the highest share (18.8%) in the
later period. The largest drop in the proportion of firms audited was for Arthur Andersen. As
noted earlier, Andersen went bankrupt in 2002. The Gompers, Ishii, and Metrick (2003)
Governance Index is a measure of shareholder rights for 9,615 firms during our sample period.
The index, as defined, can range from 1 to 24—democratic to dictatorship, respectively, using
the terminology of the authors—and averages approximately 9 in each period.
B. Industry and Auditor Results
Does the plain English measure differ across industry and auditor? Figure 7 documents
the variability of our plain English measure across the Fama and French (1997) 48 industries.
The worst industry in terms of the measure is Smoke. This is most likely due to the litigation
discussion in the tobacco industry during our time period. As an example, Reynolds American
(formally R. J. Reynolds Tobacco) had, in 2006, one of the most extreme percentages of legal
20
words in a 10-K (over 1.8% of all words were legal). The best three industries for the plain
English measure are Financials, Banks, and Soda.
To examine the statistical significance of the differences in plain English usage across
auditors we know that we should control for industry effects, as suggested by our prior industry
results and the tendency for auditors to specialize in certain industries (see, for example, Hogan
and Jeter (1999)). We test auditors’ use of plain English and the change in their style from pre- to
post- regulation by estimating a regression of plain English on the following independent
variables: auditor dummy variables (with PWC the excluded auditor), a dummy variable
indicating when the plain English regulation was in effect, the cross-products of auditor dummies
and regulatory period dummy, the log of market capitalization, a Nasdaq dummy, calendar year
dummies, and dummy variables for the 48 Fama-French industries.
The results of the regression are reported in Table IV. Column (1) of the table reports the
regressions without the industry and year dummies, while column (2) includes both. The post
October 1998 dummy and its inclusion as an interaction with the auditor dummies, in many cases
goes from significant in the first column to insignificant when the yearly dummies are included.
This simply reflects the ability of the annual dummy variables to better capture the trend in Plain
English shown in Figure 3.
Thus we will focus on the coefficients in column (2) where the industry and year
dummies are included. Pre-regulation, only Deloitte and “Other” show significantly greater plain
English measures relative to PWC. Anderson, Deloitte and KPMG are not significantly different
from PWC in the pre-regulatory period. Only the groups labeled “Other” and “Switch” have
changes in plain English usage in the post regulatory period that are significantly different from
the PWC control. The positive shift in plain English usage documented for the Switch group is
21
consistent with auditing firms improving compliance in cases where they are not simply updating
prior years’ reports.
To control for the year-to-year changes in Plain English documented in Figure 3, the
large differences in plain English across industries, and the differences across auditor, our
subsequent regressions will include year, Fama-French industry, and auditor dummies.
V. Empirical Results
A. Plain English and the Average Investor
Because of decimalization and an increasing role of ECNs, we expect the proportion of
100-lot trades to increase for all firms over the sample period. Note we use “100-lot” to refer to
trades of 100 shares or less. Thus we focus on the change in plain English relative to the change
in the proportion of 100-lot trades, pre and post regulation.
We first provide descriptive results for firms partitioned into deciles based on the
magnitude of the difference between their average pre and average post plain English value. The
corresponding average change in 100-lot trades for each decile is plotted in Figure 8. The
relation shows a clear trend with firms in the lowest change in plain English decile having a
corresponding change in 100-lot trades of less than 15%. Firms in the highest decile of plain
English change averaged approximately a 22% increase in 100-lot trades.
We test this relation at the level of individual firms in the regressions reported in Table
V. For each firm we regress the difference in the average value of plain English between the pre
and post regulatory period on the same difference for the 100-lot trades. The firm must have one
observation in each period to be included in the sample.
22
Since we have now collapsed the sample on firms, there are only 5,030 observations. For
control variables we also include size, which is the average market capitalization in the post
period, and industry, which is the median industry classification in the post period. The Nasdaq
and auditor dummy variables are now proportions indicating the number of times in the pre or
post period that the corresponding dummy was equal to one.
From this, the coefficient on “Pre and Post 1998 Change in Plain English” reflects the
impact of the change in the average level of plain English on the corresponding change in the
average level of 100-lot trades across the pre and post regulatory period, after accounting for the
control variables. We first consider the change variable by itself in column (1), then also include
Log(average size) and the Nasdaq dummy in column (2) and finally in column (3) we append the
auditor and industry dummies. The signs and significance of the variables remain stable across
the three regressions so we will focus on the results of the full specification in column (3).
The results indicate that larger firms experienced greater increases in the change in 100-
lot trades. As the exchanges moved to decimalization in 2001, large firms, some with spreads
hovering around one penny, became more likely to have their quote depth dispersed over a
broader range of incremental prices. Thus large firms are more likely to experience trades that
are sweeping the books and taking out any 100-lot quotes. Because ECNs historically have
played a much bigger role on Nasdaq than on the NYSE, 100-lot trades are more predominant
for Nasdaq-listed firms.
In all cases the results show a positive and significant relation between the change in
plain English and the corresponding change in 100-lot trades. Thus, although firms on average
experienced a substantial increase in 100-lot trades, those with greater improvement in writing
style experienced even greater growth in small trades. The coefficient on the change in Plain
23
English variable is 0.018 with a t-statistic of 8.98. Since there is little reason to expect large
institutional traders to be breaking up trades based on a firm’s writing style, the results indicate
that small investor participation increases with positive changes in writing style. Increased
participation by “average” investors was the explicit intent of the plain English regulation.
B. Plain English and Seasoned Equity Offerings
If managers view the 10-K as a vehicle to increase the transparency of their firms, one
should see improvements in writing style prior to equity issuance. That is, firms might be
expected to use more common words and better style in an attempt to lower information
asymmetries between managers and outsider investors. On the other hand, if managers could
care less about clearly communicating with their shareholders, one would not expect to see any
improvement in the plain English measure.
About 5% of our sample had a seasoned equity offering (SEO) in the year after the 10-K
filing date. We use the Thomson Financial Securities Data (also known as Securities Data Co.) to
identify all firms having an SEO during our sample period. To examine the relation between our
plain English measure and equity issuance, Table VI reports logit regressions. The dependent
variable, Equity Issuance Dummy, takes the value of one if the firm issued seasoned equity in the
year following the 10-K filing; otherwise the variable takes a value of zero.
The key control variable will be prior stock performance. Korajczyk, Lucas, and
MacDonald (1990) show, that the stock performance in the prior year is a highly significant
determinant of the likelihood of equity issuance. Loughran and Ritter (1995) report that their
SEO sample had average raw returns of over 72% in the year prior to offering. In CFO survey
results, Graham and Harvey (2001) find that recent stock price performance is the third most
24
important factor in determining firms’ equity issuance decisions. Since Nasdaq is the trading
venue of choice for younger, more growth orientated stocks, it will also be added as a control
variable.
The independent variables are the year-to-year change in plain English, the raw buy-and-
hold returns in the year before the filing, the log of market value, and Nasdaq, auditor, Fama-
French industry, and calendar year dummies. Because we require the change in the plain English
variable, the sample size drops to 46,109 observations, e.g., a firm must have both a 1994 and a
1995 plain English variable to be included for year 1995.
In all four logit regressions, heteroskedasticity-adjusted z-statistics are in parentheses
while the odds ratios are in brackets. The first two columns include all firms while columns (3)
and (4) report results when the sample is restricted to only firms who issued equity at least once
in the sample period.
Table VI reports that the coefficient on the year-to-year change in plain English is
positive and statistically significant at conventional levels. In column (2), the coefficient is 0.121
with a z-statistic of 4.81. The odds ratio is 1.129. This odds ratio implies that when the change in
plain English variable increases by one standard deviation the odds of issuing equity in the next
year increase by 12.9%. As expected, the coefficient on the prior year return variable is positive
and highly significant. The higher the prior year’s return, the more likely the firm would issue
equity. Being listed on Nasdaq also substantially increases the likelihood of having an SEO.
The last two columns of Table VI restrict the sample to firms issuing seasoned equity at
least once during the sample period. This introduces a look-ahead bias. That is, in 1996, one
could not know which firms would subsequently issue equity over the next decade. Yet, even in
this restricted sample, the year-to-year change in plain English has a positive and economically
25
significant relationship with equity issuance. In column (3), the odds ratio implies a one standard
deviation increase in the change in plain English raises the odds of subsequently having an SEO
by 13%.
The evidence in this table is consistent with managers attempting to reduce information
asymmetries with outside investors. As the overall writing quality of the 10-K increases, so do
the odds of issuing equity even after controlling for various factors.
C. Plain English and Corporate Governance
Is there a relationship between our plain English measure and corporate governance? Do
firms with strong shareholder rights produce more readable 10-Ks? In Table VII, we report
regression results with our plain English measure as the dependent variable. The independent
variables are the Gompers, Ishii, and Metrick (2003) Corporate Governance Index, log of market
value on the filing date, and dummies for Nasdaq, auditor, Fama-French industry, and calendar
year.
We obtain the Gompers, Ishii, and Metrick (2003) Corporate Governance Index from
http://finance.wharton.upenn.edu/~metrick/data. The three authors use 24 different governance
rules to assign scores ranging from 1 to 24. Data is only available for the years 1995, 1998, 2000,
2002, 2004, and 2006. The higher the governance index is, the more dictatorial is the firm’s
polices (that is, weaker shareholder rights). The lower the index score, the more democratic the
company’s policies are. In the Table VII regressions, the sample is reduced to 9,615 observations
due to data availability of the Governance Index.
The coefficient on the Governance Index variable is negative and statistically significant
in each of the three regressions. That implies the higher is the index (i.e., more dictatorial firms),
26
the lower the plain English measure. Firms with more shareholder rights have significantly better
measures of 10-K readability. In the first regression, the Governance Index is the only
explanatory variable. The coefficient on the variable is -0.020 with a t-statistic of -5.17.
When the control variables are added in the second and third regressions, the coefficient
on the Governance Index remains significant. The last column reports that firms with strong
shareholder rights, small firms and those listed on Nasdaq have better plain English values after
controlling for auditor, industry, and calendar year.
VI. Conclusion
After performing a textual analysis on a sample of 56,079 10-Ks during 1994-2006, we
present evidence that a trend toward less readable 10-K filings was reversed with the SEC’s plain
English rule of October 1998. We find different pieces of evidence that the plain English rule has
been beneficial. We create a plain English variable that is an aggregate statistic which
standardizes word length, word commonality, and a series of writing components specifically
identified by the SEC.
The first finding is that our plain English variable reverses a downward trend and
gradually improves after the enactment of the October 1998 rule. Second, small investors have
much higher participation levels in trading following the 10-K filing for firms with improved
writing quality as measured by our plain English measure. This is consistent with the SEC’s goal
to make disclosure truly accessible to the “average” investor.
Third, greater improvement in plain English relates to increased odds of issuing seasoned
equity to outside investors. After controlling for factors including prior return, listed exchange,
and industry, we find a one standard deviation increase in the change of the plain English
27
variable increases the odds of issuing equity in the next year by 12.9%. Managers appear to be
lowering the information differences between themselves and outside investors through the
writing of their 10-K documents.
Lastly, we find that companies with more democratic corporate governance policies have
much higher plain English measures than companies with poor governance policies. Firms
whose management is shareholder friendly also create 10-Ks that are more readable.
In sum, our results indicate that the plain English rule produced a measurable impact on
participation of small investors to the extent management followed the SEC’s style guidelines for
writing. In addition to the regulation, managers consider writing style of sufficient importance to
improve their prose in anticipation of seeking additional equity funding. And, as might be
expected, shareholder friendly managers produce 10-Ks that are more user friendly. Importantly,
all of these changes were observed in 10-K filings where the change in style was not directly
mandated by an SEC rule. The changes appear to be a simple artifact of the SEC encouraging
firms to use plain English even where it was not required.
28
REFERENCES
A Plain English Handbook: How to create clear SEC disclosure documents, 1998, Office of Investor Education and Assistance, U.S. Securities and Exchange Commission, http://www.sec.gov/pdf/handbook.pdf.
Asthana, Sharad, Balsam, Steven, and Sankaraguruswamy, Srinivasan, 2004, Differential
response of small versus large investors to 10-K filings on EDGAR, Accounting Review 79, 571-589.
Baayen, R. Harald, 2001, Word frequency distributions, Kluwer Academic Publishers, The
Netherlands. Benston, George, 1973, Required disclosure and the stock market: An evaluation of the
Securities Act of 1934, The American Economic Review 63, 132-155. Bushee, B. and C. Leuz, 2005, Economic consequences of SEC disclosure regulation: Evidence
from the OTC bulletin board, Journal of Accounting and Economics 39, 233-264. Dudley, William, How should central banks respond to asset bubbles, NBER Conference on
Asset Prices and Monetary Policy, May, 2006. Ellenberger, J. S. and Ellen P. Mahar, 1973, Legislative history of the securities exchange act of
1933 and Securities Exchange Act of 1934, F. B. Rothman, New Jersey. Fama, E. and French, Kenneth, 1997, Industry costs of equity, Journal of Financial Economics
43, 153-193. Gompers, Paul, Joy Ishii and Andrew Metrick, 2003, Corporate governance and equity prices,
Quarterly Journal of Economics 118, 107-155. Graham, J., Harvey, C., 2001, The theory and practice of corporate finance: Evidence from the field, Journal of Financial Economics 60, 187-243. Greenstone, M., Oyer, P., and Vissing-Jorgensen, A., 2006, Mandated disclosure, stock returns
and the 1964 Securities Acts amendments, Quarterly Journal of Economics 121, 399-460. Hanley, Kathleen Weiss and Hoberg, Gerard, 2008, Strategic disclosure and the pricing of initial
public offerings, Working paper, University of Maryland. Havrilesky, Thomas, 1988, Monetary policy signaling from the administration to the Federal
Reserve, Journal of Money, Credit and Banking 20, 83-101. Hogan, C.E. and D.C. Jeter, 1999, Industry specialization by auditors, Auditing: A Journal of
Practice and Theory 18, 1-17.
29
Jarrell, Gregg A., 1981, The economic effects of federal regulation of the market for new security issues, Journal of Law and Economics 24, 613-675.
Korajczyk, R., Lucas, D., McDonald, R., 1990, Understanding stock price behavior around the
time of equity issues, in R. Glenn Hubbard, Ed.: Asymmetric Information, Corporate Finance, and Investment (University of Chicago Press, Chicago).
Li, Feng, 2007, Annual report readability, current earnings, and earnings persistence, Working
paper, University of Michigan. Loughran, T., Ritter, J., 1995, The new issues puzzle, Journal of Finance 50, 23-51. Mikheev, Andrei, 1998, Feature lattices for maximum entropy modeling, Proceedings for the
36thAnnual Meeting of the Association of Computational Linguistics, 848-854. Reed, Robert R., 1920, “Blue Sky” laws, Annals of the American Academy of Political and
Social Science 88, 177-187. Riley, Michael D., 1989, Some applications of tree-based modeling to speech and language
indexing, Proceedings of the DARPA Speech and Natural Language Workshop, 339-352. SEC Release #33-7497, http://www.sec.gov/rules/final/33-7497.txt. SEC Release #34-38164, http://www.sec.gov/rules/proposed/34-38164.txt. Simon, Carol J., 1989, The effect of the 1933 Securities Act on investor information and the
performance of new issues, American Economic Review 79, 295-318. Smith, Rodney T., 1981, Comments on Jarrell, Journal of Law and Economics 24, 677-686. Stigler, George J., 1964, Public regulation of the securities markets, Journal of Business 37, 117-
142. Tetlock, Paul C., 2007, Giving content to investor sentiment: The role of media in the stock
market, Journal of Finance 62, 1139-1168. Tetlock, Paul C., Maytal Saar-Tsechansky, and Sofus Macskassy, 2007, More than words:
Quantifying language to measure firms’ fundamentals, Journal of Finance, forthcoming. White, Halbert, 1980, A heteroskedasticity-consistent covariance matrix estimator and a direct
test for heteroskedasticity, Econometrica 48, 817-838.
30
Figure 1. Number of 10-Ks in the sample and median market capitalization in millions of dollars by month.
31
Figure 2. Annual number of firms with 10-K filings included in the sample, annual number of firms with CRSP (share type code of 10 or 11) data and median market capitalization in dollars for the sample, 1994-2006. Electronic filing was required for all firms by the SEC beginning in May, 1996.
32
Figure 3. Mean and median of Plain English Measure, 1994-2006. The plain English rule took effect in October, 1998.
33
Figure 4. Berkshire Hathaway and Plain English. Plain English values for Berkshire Hathaway, average values for all firms in the sample excluding Berkshire Hathaway (BRK), and average values for all firms with SIC=6331 excluding BRK, for the 1994-2006 time interval.
34
Figure 8. Change in the proportion of 100-lot trades relative to the change in Plain English decile. Changes are based on the mean value of the variables for each firm before and after the plain English initiative. Decile ten contains firms with the largest positive change in the plain English measure from the pre and post period.
38
39
Table I Sample Creation
This table reports the impact of various data filters on the sample size. Requiring availability of certain information on the Center for Research in Security Prices (CRSP) and the NYSE Trade and Quote (TAQ) databases largely reduced the sample to 56,079 firms with 10-Ks.
Source/Filter
Sample Size
Observations Removed
Edgar 10-K 1994-2006 Complete Sample 104,621 CRSP Permno Match 66,103 38,518CRSP Market value available 60,731 5,372Reported on CRSP as an Ordinary Common Equity Firm 56,690 4,041TAQ Match 56,414 276Number of words in 10-K > 5,000 56,326 88Include only first filing in a given year 56,116 210At least 180 days between filings 56,079 37Final Sample 56,079
40
Table II
Calculation of Plain English Measure The table reports the estimated coefficients for each of nine dependent variables on the independent variable Log(# of words), which is the natural logarithm of the number of words in the 10-K document. Note that all of the dependent variables are also log transforms. The standardized residuals from these regressions are aggregated to create the plain English measure.
Ave
rage
Wor
d Si
ze
Com
mon
Wor
d
Lega
lese
Wea
k V
erb
Neg
ativ
e Ph
rase
Pers
onal
Pr
onou
n –
We
Pers
onal
Pr
onou
n –
You
Res
pect
ivel
y
Supe
rflu
ous
(1) (2) (3) (4) (5) (6) (7) (8) (9)
Log(# of words) -0.022 -0.006 1.050 1.197 0.665 0.918 0.987 0.481 1.307
(-160.34) (-46.73) (269.86) (623.30) (169.61) (89.42) (109.64) (121.20) (465.05)
Constant 1.929 -0.100 -7.410 -8.422 -5.901 -5.210 -9.011 -1.841 -10.28
(1,333.74) (-68.87) (-182.34) (-420.09) (-144.10) (-48.62) (-95.84) (-44.45) (-350.46)
Observations 56,079 56,079 56,079 56,079 56,079 56,079 56,079 56,079 56,079 R-squared 31.4% 3.8% 56.5% 87.4% 33.9% 12.5% 17.7% 20.8% 79.4%
Table III Sample Summary Statistics
The 10-K sample is 56,079 firm year observations over the 1994-2006 time period. The sample is also divided into sub-periods around the SEC’s plain English rule of October 1998. The Plain English variable is an aggregate statistic that standardizes word length, word commonality, and a series of writing components specifically identified by the SEC. The market values from CRSP are as of the 10-K filing date. Average word length is the average character length across all words used in a given 10-K. Word commonality measures the average frequency of use for words appearing in a given document. Using the NYSE Trade and Quote (TAQ) data over a 21-day period starting on the filing date, we tabulate the proportion of trades within a given trade size category. The SEO dummy variable is set to one if the firm issued equity in the subsequent year to the 10K filing, else zero. The Nasdaq dummy is set to one if the firm is listed on Nasdaq at the time of the filing, else zero. Five auditor dummies are created. Thus, the Andersen dummy takes the value of one if Arthur Andersen was the auditor (zero otherwise). Other auditor dummy is set to one if a non-big 5 auditor is used, else zero. Auditor Switch Dummy is set equal to one if more than one of the five major auditors appears in the 10-K. The Gompers, Ishii, and Metrick (2003) Governance Index is only available for 9,615 firms. (1) (2) (3) Time Period
1994-Sept. 1998
Oct. 1998-2006
1994- 2006
10-K Observations 17,620 38,459 56,079 Plain English Measure -0.33 0.15 0.00 Average Market Value (in millions) $1,687.2 $2,642.6 $2,342.4 Average Word Length 5.44 5.47 5.46 Number of Words in 10-K 36,414.1 44,669.4 42,075.6 Word Commonality 0.85 0.85 0.85 Proportion Trades 1-100 15.1% 39.2% 31.7% Proportion Trades 101-500 33.7% 34.0% 33.9% Proportion Trades 501-1,000 26.1% 14.3% 18.0% Proportion Trades 1,001-10,000 23.4% 11.8% 15.4% Proportion Trades > 10,000 1.7% 0.6% 1.0% SEO Dummy 4.3% 5.5% 5.1% Nasdaq Dummy 57.0% 61.5% 60.1% Andersen Dummy 15.5% 8.0% 10.4% Deloitte Dummy 11.5% 12.9% 12.5% Ernst Dummy 16.1% 18.8% 18.0% KPMG Dummy 14.0% 14.9% 14.6% PWC Dummy 19.1% 17.8% 18.2% Other Auditor Dummy 7.0% 10.2% 9.2% Auditor Switch Dummy 16.9% 17.5% 17.3% GIM (2003) Governance Index 9.06 9.05 9.05
41
Table IV Relation between Plain English and Auditor
The table reports the estimated coefficients of a regression with plain English as the dependent variable. The tabulated independent variables are dummy variables for each auditor, with PWC the excluded auditor dummy, the cross-product of each auditor dummy and Post Oct. 1998 dummy, a Post Oct. 1998 dummy that is one after October 1, 1998, otherwise zero, the natural log of market capitalization (Log(size)), and a Nasdaq dummy. Included in the regression but not tabulated are an intercept, industry dummies based on the Fama and French 48 SIC categories, and year dummies. Standard errors are clustered for individual firms. The t-statistics (in parentheses) are calculated using White’s (1980) heteroskedasticity consistent methodology.
Independent Variables (1) (2) Andersen Dummy -0.008 0.007 (-0.22) (0.19) Andersen*Post Oct. 1998 Dummy -0.162 0.021 (-4.06) (0.53) Deloitte Dummy 0.124 0.109 (3.20) (2.91) Deloitte*Post Oct. 1998 Dummy -0.023 -0.048 (-0.54) (-1.15) Ernst Dummy -0.040 -0.024 (-1.18) (-0.73) Ernst*Post Oct. 1998 Dummy 0.105 0.063 (2.62) (1.61) KPMG Dummy 0.069 0.046 (2.03) (1.39) KPMG*Post Oct. 1998 Dummy 0.025 -0.001 (0.61) (-0.02) Other Auditor Dummy 0.325 0.280 (7.47) (6.57) Other Auditor*Post Oct. 1998 Dummy -0.058 -0.154 (-1.23) (-3.37) Switch Dummy -0.038 -0.033 (-1.25) (-1.10) Switch*Post Oct. 1998 Dummy 0.076 0.085 (2.07) (2.36) Post Oct. 1998 Dummy 0.443 0.030 (15.90) (0.61) Log(size) 0.005 -0.017 (1.04) (-3.75) Nasdaq Dummy 0.114 0.085 (6.63) (4.73) Intercept Yes Yes Fama-French Industry Dummies No Yes Year Dummies No Yes Observations 56,079 56,079 R2
adjusted 6.2% 12.7%
42
Table V
Regressions with the Change in the Proportion of 100-lot Trades as the Dependent Variable
The table reports the estimated coefficients of a regression with the change in the proportion 100-lot trades (100 shares or less) as the dependent variable. All change variables are based on the difference between the mean value of the variable for a given firm before and after the plain English initiative on October 1, 1998. Log(size) is the natural logarithm of the average market capitalization in the post period. Nasdaq in this table represents the proportion of periods the firm was listed on the Nasdaq in the post-Plain English time period. The Fama-French Industry Dummies are based on the most frequent classification occurring in the post-period. The Auditor Proportions variables are the portion of post-periods that the firm was associated with each of the auditor classifications. The t-statistics (in parentheses) are calculated using White’s (1980) heteroskedasticity consistent methodology.
Independent variables (1) (2) (3) Pre and Post 1998 Change in Plain English 0.028 0.017 0.018 (12.25) (8.48) (8.98)
Log(average size) 0.019 0.020 (19.58) (19.35)
Nasdaq 0.156 0.159 (41.10) (39.43)
Intercept Yes Yes Yes Auditor Proportions No No Yes Fama-French Industry Dummies No No Yes
Observations 5,030 5,030 5,030 R2
adjusted 2.8% 26.4% 29.6%
43
Table VI Logit Regression of the Probability of Issuing Seasoned Equity in the Subsequent Year
The dependent variable, Equity Issuance Dummy, has a value of one if the firm issued equity in the year after the 10-K filing, zero otherwise. Change in Plain English is the difference in the Plain English measure from the prior year’s filing. Prior return is the raw buy-and-hold return for the firm in the year prior to the 10-K filing. Nasdaq Dummy is equal to one if the firm is listed on Nasdaq, zero if the firm is listed on NYSE or Amex. Log(size) is the natural log of the market value at the time of 10-K filing. Included in the regression but not tabulated are an intercept, auditor dummies, industry dummies based on the Fama and French 48 categories, and year dummies. Standard errors are clustered for individual firms. White’s (1980) heteroskedasticity-adjusted z-statistics are in parentheses. The odds ratios (in brackets) are given for a one standard deviation increase in the independent variable. Columns (3) and (4) restrict the sample to include only firms that issued an SEO at least once during our time period.
Independent Variables (1) (2) (3)
(4)
Year-to-Year Change in Plain English
0.137 (5.41) [1.147]
0.121 (4.81) [1.129]
0.122 (4.37) [1.130]
0.096 (3.40) [1.100]
Prior Year Return
0.233
(10.80) [1.262]
0.296
(10.31) [1.344]
Log(size)
0.202 (14.71) [1.224]
0.049 (3.01) [1.050]
Nasdaq Dummy
0.442 (6.40) [1.556]
0.166 (2.82) [1.181]
Intercept Auditor Dummies FF Industry Dummies Year Dummies Only firms with SEO
Yes Yes Yes Yes No
Yes Yes Yes Yes No
Yes Yes Yes Yes Yes
Yes Yes Yes Yes Yes
Observations 46,109 46,109 12,883 12,833
44
Table VII Regressions of the Plain English Variable on the Gompers, Ishii, and Metrick (2003)
Corporate Governance Index and Other Variables The dependent variable, Plain English, is an aggregate statistic that standardizes word length, word commonality, and a series of writing components specifically identified by the SEC. Governance Index is from Gompers, Ishii, and Metrick (2003). The Nasdaq dummy variable is equal to one if the firm is listed on Nasdaq, zero if the firm is listed on NYSE or Amex. Log(size) is the natural log of the market value at the time of 10-K filing. Included in the regression but not tabulated are an intercept, auditor dummies, industry dummies based on the Fama and French 48 categories, and year dummies. White’s (1980) heteroskedasticity-adjusted t-statistics are in parentheses. The Gompers, Ishii, and Metrick Governance Index is available only for years 1995, 1998, 2000, 2002, 2004, and 2006.
Independent Variables (1) (2) (3)
Governance Index -0.020 -0.012 -0.012 (-5.17) (-3.23) (-3.03)
Log(size) 0.006 -0.019 (0.88) (-2.73)
Nasdaq Dummy 0.210 0.102 (9.20) (4.19)
Intercept Auditor Dummies FF Industry Dummies Year Dummies
Yes No No No
Yes No No No
Yes Yes Yes Yes
Observations R2
adjusted
9,615 0.3%
9,615 1.1%
9,615 11.2%
45