the professionals guide to pagerank optimization
TRANSCRIPT
-
8/9/2019 The Professionals Guide to Pagerank Optimization
1/35
The Professionals Guide To PageRank
Optimization
ContentsContents ............................................................. 1
Introduction ........................................................ 3
PART I - Understanding the Theory behindPageRank Optimization ........................................ 3
What is PageRank? ........................................................................................3
Who Invented PageRank? .......................................................................................................................................3
What is the purpose of PageRank? .........................................................................................................................3
How does PageRank affect rankings? ...........................................................4
How is PageRank measured? ........................................................................5
Toolbar PageRank vs. Real PageRank ....................................................................................................................5
Assumptions about PageRank .......................................................................8
PageRank is Still a Measurement of Importance ....................................................................................................8
PageRank Still Doesn't Measure Relevance ...........................................................................................................8
PageRank is Still a Relative Measurement ................................................................................................ ...... ...... .8
Pages Don't Vote for Themselves ...........................................................................................................................9
Each Page Can Only Vote for another Page Once ..................................................................................................9
The Damping Factor is Constant ........................................................................................................ ...... ...... ......10
Calculating PageRank .................................................................................11
Introducing the PageRank Function ......................................................................................................................11
Accumulating PageRank vs. Distributing PageRank ........................................................................................... ..13
Iterative Calculations and Convergence of PageRank ..........................................................................................14
PageRank Behavior .....................................................................................16
Maximum PageRank per System .................................................................................................................... ......16
Site PageRank vs. Page PageRank .......................................................................................................................17
Add Links to Important Pages ...............................................................................................................................18
Subtract Links from Unimportant Pages ...............................................................................................................18
Add Content ..........................................................................................................................................................19
Subtract Content ................................................................................................................................... ...... ...... ...20
Ideal PageRank Distribution ........................................................................21
Natural Distribution from a Hierarchical Structure ...............................................................................................21
Site Architecture Definitions .................................................................................................................................22
Depth of Content on a Website ............................................................................................................................23
-
8/9/2019 The Professionals Guide to Pagerank Optimization
2/35
PART II - Tools and Techniques for SculptingPageRank .......................................................... 25
Introduction to PageRank Sculpting ............................................................25
Natural vs. Unnatural Linking ................................................................................................................... ...... ......25
Link Level PageRank Controls .....................................................................25rel="nofollow" .......................................................................................................................................................26
JavaScript ..............................................................................................................................................................26
..............................................................................................................................................................29
Flash .................................................................................................................................................... ...... ...... .....32
............................................................................................................................................................... ..32
Summary Chart of Link Level PageRank Controls .......................................32
Page Level PageRank Controls ....................................................................33
Robots Meta Tag ...................................................................................................................................................33
robots.txt ..............................................................................................................................................................34
301 Permanent Redirects .................................................................................................................................. ...34
302 Temporary Redirects .....................................................................................................................................34
Summary Chart of Page Level PageRank Controls ......................................35
Conclusion ...................................................................................................35
Resources ...................................................................................................35
-
8/9/2019 The Professionals Guide to Pagerank Optimization
3/35
IntroductionThe purpose of this guide is to provide experienced SEO consultants and web
developers with a high-level understanding of PageRank optimization. The guide is
broken down into two parts. Part 1 covers the theory behind PageRank and providessimple illustrations that will help you understand how PageRank "flows" throughout
a website. Part 2 provides practical strategies and solutions for optimizing the
PageRank on your site through linking and page controls, and it also offers tips on
how to get the most out of your inbound links.
PART I - Understanding the Theory behind
PageRank Optimization
What is PageRank?Most Web developers and SEOs think they know everything they need to about
PageRank, but in reality, very few people have a solid understanding of how it is
calculated or how it affects rankings. This section will discuss all of those important
details.
Who Invented PageRank?PageRank was invented by (and derives its name from) Larry Page. While at
Stanford University in the late 1990s, Page and his fellow Google cofounder, Sergey
Brin, wanted to create a search engine that could outperform the existing search
engines at that time. The other search engines relied heavily on text analysis to
calculate relevance, but Page and Brin were confident that Google could return
higher-quality search results by calculating relevance andimportance. PageRank
made it possible for Google to calculate the relative importance of webpages.
What is the purpose of PageRank?The purpose of PageRank is to help Google return search results that human visitors
consider important. It does this by assigning a numerical value to every
webpage/URL it finds. This value is often referred to as "real PageRank" or "internal
PageRank," but this guide will refer to it simply as "PageRank." Every webpage
starts with a small amount of PageRank, which increases as other pages link to it. Itis assumed that each page on the internet is controlled by a human, and that
humans link to important pages. Therefore, pages with the highest PageRank
should represent what humans consider to be the most important.
-
8/9/2019 The Professionals Guide to Pagerank Optimization
4/35
How does PageRank affect rankings?One of the most common misconceptions about PageRank is how it influences
rankings. Many SEOs will have various opinions about the importance of PageRank,
but the best answer would be one from Google itself:
The heart of our software is PageRank, a system for ranking web pagesdeveloped by our founders Larry Page and Sergey Brin at Stanford University.
And while we have dozens of engineers working to improve every aspect of
Google on a daily basis, PageRank continues to play a central role in many of
our web search tools. (Corporate Information: Technology Overview, Google)
Here, Google officially states that PageRank is important, but neglects to explain
how PageRank affects rankings. According to Google:
Traditional search engines rely heavily on how often a word appears on a
web page. We use more than 200 signals, including our patented PageRank
algorithm, to examine the entire link structure of the web and determinewhich pages are most important. We then conduct hypertext-matching
analysis to determine which pages are relevant to the specific search being
conducted. By combining overall importance and query-specific relevance,
we're able to put the most relevant and reliable results first. (Corporate
Information: Technology Overview, Google)
Webpages are ranked using 200+ signals, but the majority of these signals fall
under two main categories:
1. Relevance - when a user types in a query, Google finds all the documents in
its index that contain the words from that query.
2. Importance - after Google has fetched the relevant documents, it sorts them
by importance.
In other words, we can increase the importance of a page by increasing its
PageRank, but if the page is NOT relevant to a user's query, it still won't rank for
that query. This is a key point to remember when sculpting PageRank. The
techniques described in this guide can increase importance--but not relevance.
http://www.google.com/corporate/tech.htmlhttp://www.google.com/corporate/tech.htmlhttp://www.google.com/corporate/tech.htmlhttp://www.google.com/corporate/tech.htmlhttp://www.google.com/corporate/tech.htmlhttp://www.google.com/corporate/tech.html -
8/9/2019 The Professionals Guide to Pagerank Optimization
5/35
How is PageRank measured?
Toolbar PageRank vs. Real PageRankThroughout this guide, we will use the term "PageRank" to mean real PageRank
the PageRank value that starts at 0.15 and can go into the millions. However, the
most commonly used PageRank scale is the "toolbar PageRank." The toolbar
PageRank value is shown on the Google Toolbar, which is an add-on feature of most
modern Web browsers. Here is an example of the Google toolbar in Firefox 2.0.
The toolbar PageRank scale goes from 0 to 10. Google takes all the real PageRank
values of every page in its index and separates them into these eleven differentranges. It is important to note that the toolbar scale is notlinear it is logarithmic.
So a page that has a toolbar PageRank of 6 does NOT actually have twice as much
real PageRank as a page with a toolbar PageRank of 3. In reality, it would have
many more times as much real PageRank. Here is a graph to help you visualize this
concept.
-
8/9/2019 The Professionals Guide to Pagerank Optimization
6/35
Only Google knows what the actual base value would be (or if it is indeed a
logarithmic scale), but to keep things simple, we are using a base value of 2 in this
example. If the base value really were 2, then this would be the ranges of real
PageRank that each toolbar PageRank value represents:
ToolbarPageRank
RealPageRank
0 1 - 2
1 2 - 4
2 4 - 8
3 8 - 164 16 - 32
5 32 - 64
6 64 - 128
7 128 - 256
8 256 - 512
9 512 - 1,024
-
8/9/2019 The Professionals Guide to Pagerank Optimization
7/35
10 1,024 +
Some "PageRank enthusiasts" estimate that the actual base would be around 5 or
6. Here is another example that uses 5.5 as a base:
ToolbarPageRank
Real PageRank
0 1 - 6
1 6 - 30
2 30 - 166
3 166 - 915
4 915 - 5,033
5 5,033 - 27,6816 27,681 - 152,244
7 152,244 - 837,339
8 837,339 - 4,605,367
9 4,605,367 - 25,329,516
10 25,329,516 +
Regardless of what the base is, here are a few things you should know about the
toolbar PageRank and real PageRank values:
Each toolbar PageRank value corresponds to a wide range of realPageRank values. This means that even if two pages have the same toolbar
PageRank value, their real PageRank values can vary greatly. Conversely, if
one page has a toolbar PageRank of 5 and another page has a 6, then these
two pages might have real PageRank values that are actually very close.
Every toolbar PageRank value is exponentially greater than the one
before it, and is thus more difficult to achieve. After a page is created,
it can easily increase its toolbar PageRank from a 0 to a 1. However, an
increase from a 6 to a 7 would be much more difficult and require
significantly more inbound links.
The toolbar PageRank value is not frequently updated. We speculate,
with a high degree of certainty, that the real PageRank values are what
Google actually uses in their algorithm, and these values are constantly
changing. However, the toolbar PageRank values are only updated every 3 -
4 months, and their value is based on the current real PageRank value at that
time. In other words, if a page is created and it immediately acquires a large
-
8/9/2019 The Professionals Guide to Pagerank Optimization
8/35
number of inbound links, the toolbar PageRank value isn't going to reflect
that until the next time Google updates their toolbar PageRank numbers, but
the page will still receive the ranking benefits from those links, whether the
toolbar shows it yet or not.
Assumptions about PageRankOne of the challenges of writing a detailed PageRank sculpting guide is overcoming
the limited availability of recent or reliable information on PageRank. The original
paper about PageRank was written before Google was a highly profitable company,
so we can assume the information it contains was at least true at that time, but
Google has undergone many changes and modifications over the last decade.
Therefore, in order to make this guide as complete and as accurate as possible, we
need to establish a set of assumptions from which to build.
PageRank is Still a Measurement of ImportanceEven though Google has publicly admitted to making changes to the PageRank
algorithm over the years, one thing we will assume is that it still serves the same
basic purpose: In other words, Google may have tweaked certain details about how
PageRank is calculated, but we can almost certainly be sure that PageRank still uses
the linking structure of the web to measure the relative importance of every page.
PageRank Still Doesn't Measure RelevanceAs we explained in the previous section, PageRank does not measure relevance it
measures importance. Therefore, the on-page content of a given webpage does not
affect its PageRank value. In other words, you can increase the PageRank ofyour homepage by getting more people to link to it, but not by adding
more keywords to the page.
PageRank is Still a Relative MeasurementThe PageRank function doesn't just measure importance it measures relative
importance. The function literally "ranks" every page on the Web, by placing them
in a specific order of most-important to least-important. Therefore, unless every
page on the internet links to every other page on the internet*, they can't all be
equally important. Some pages are inevitably going to be more important than
others. PageRank may have changed since it was originally invented, but it can't
stop performing its primary purpose of ranking each page, relative to the rest. It is
for this reason that PageRank sculpting is possible. By telling Google which pages
are NOT important, we can make the rest appear more important by comparison.
*SEOmoz does not recommend placing several billion links on each of your pages.
-
8/9/2019 The Professionals Guide to Pagerank Optimization
9/35
Pages Don't Vote for ThemselvesOne way that Google describes PageRank is by comparing it to a system of votes.
One of the original characteristics of PageRank that made it an appealing option for
ranking webpages was that it relies on the democratic nature of the web. In other
words, it is resistant to manipulation because it requires other pages to "vote" for a
page by linking to it. In this analogy, it makes sense that Google would ignoreinstances where a page votes for itself by linking to its own URL. Even if Google
does count links from a page to itself (i.e. even if our assumption is false), this
would not have a significant effect on the outcome of PageRank optimization
techniques described in this guide. For example, consider a page that links to 50
other pages. Each link would receive 1/50th of the distributed PageRank. If the
linking page links to itself (and we count that link as part of the PageRank
calculation), this would only reduce the other pages' share of PageRank from 1/50
to 1/51 - or a difference of 1/2550 of the linking page's PageRank. In any case, we
will assume Google does not count them.
Each Page Can Only Vote for another Page OnceIn this guide, we are going to assume that Google's graph of linked pages can be
represented by a square graph, in which all pages are assigned to one column and
one row. We are also going to assume that each square of the graph can only
contain one of two values: Y or N. In other words, each square is an intersection of a
row (linking page) and a column (linked page), and the square answers the
question: "Does this page link to this page?" See the following illustration.
-
8/9/2019 The Professionals Guide to Pagerank Optimization
10/35
The Damping Factor is ConstantA common way to describe PageRank is the "random surfer" analogy. Imagine
someone surfing the internet by clicking a link at random on every page they visit.PageRank represents the probability that the surfer will be on a certain page at any
given point in time. In other words:
More inbound links Higher PageRank Higher chance that random
surfer lands on page
Part of the PageRank function is the damping factor, d. In our random surfer
analogy, the damping factor represents the probability that the surfer will keep
clicking random links on the pages they visit, while (1 - d) represents the possibility
that the surfer will "get bored" with the current page and type in a new URL (that
they somehow know exists), instead of clicking a link. By setting the damping factor
as a constant, we are also setting (1 - d) as a constant, which means we're saying
that when a surfer gets bored, the URL they type in is totally random and unbiased.
Even though there have been papers written over the last several years that
propose methods for weighting certain URLs, depending on the surfer's history and
preferences (e.g. personalized search), we are still going to proceed with this guide
under the assumption that URLs are not biased in this way. The illustrations and
-
8/9/2019 The Professionals Guide to Pagerank Optimization
11/35
examples in this guide are going to use nonspecific webpages that are assumed to
be equal in quality, trust, content, age, etc.
Calculating PageRankThe calculation of PageRank can quickly become a very complex topic. In the paper,
The PageRank Citation Ranking: Bringing Order to the Web, Lawrence Page, SergeyBrin, Rajeev Motwani, and Terry Winograd wrote 17 pages about it, but hopefully we
can cover the basics in fewer words.
Introducing the PageRank FunctionAccording toThe Anatomy of a Large-Scale Hypertextual Web Search Engine, the
original paper written by Sergey Brin and Larry Page, the formula for determining
PageRank is given as follows:
We assume page A has pages T1...Tn which point to it (i.e., are citations). The
parameter d is a damping factor which can be set between 0 and 1. We
usually set d to 0.85. There are more details about d in the next section. Also
C(A) is defined as the number of links going out of page A. The PageRank of a
page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
For most of us, that description probably isn't too helpful, and the function itself
looks intimidating. However, it's actually much simpler than it looks. Let's break it
down into smaller pieces, and analyze each piece, one at a time. You can also use
the illustrations to help you visualize how each piece of the function fits together.
http://dbpubs.stanford.edu:8090/pub/1999-66http://infolab.stanford.edu/~backrub/google.htmlhttp://dbpubs.stanford.edu:8090/pub/1999-66http://infolab.stanford.edu/~backrub/google.html -
8/9/2019 The Professionals Guide to Pagerank Optimization
12/35
PR(A) This basically means "the PageRank of page A."
(1-d) This is the small amount of PageRank that every webpage starts with.
The variable d represents a value determined by Google and is often referred
to as the damping factor. The paper suggests a value of .85, so that's what
we'll work with in this guide. (1-d) would then be equal to .15. In other words,every new webpage starts with an initial PageRank value of .15.
d( ) This is the second appearance of d. First we subtracted it from 1 to get
the initial PageRank value for page A. Now we are multiplying it by
everything between the ( ). This multiplication is what causes the "damping."
Assuming that d is .85, this means that if we add up all the PageRank coming
to page A through inbound links, page A will only get 85 percent of it. Without
this damping effect, PageRank calculations would create an infinite loop of
increasing values.
PR(T1)/C(T1) This whole thing can be understood to mean "the PageRankcoming from webpage T1." The numerator, PR(T1), represents T1's PageRank,
and the denominator, C(T1), represents the number of crawlable links on T1.
So there are two distinct ways to increase the amount of PageRank
that page A receives from page T1: increase the PageRank of T1, or
decrease the links on T1. This is the foundation of PageRank sculpting! For
a more in-depth look at how this piece of the PageRank function works, be
sure to check out the illustration that follows.
+ + PR(Tn)/C(Tn) This fancy-looking string of characters basically means
"plus the PageRank coming from all the other pages that link to you."
-
8/9/2019 The Professionals Guide to Pagerank Optimization
13/35
Accumulating PageRank vs. Distributing PageRankOne of the key features of PageRank is that a page distributes it over its outbound
links and accumulates it from inbound links. However, when a page "votes" for
other pages through its outbound links, it isn't giving away its own PageRank. Inother words, if a lot of pages link to page A, then page A is considered to be
important and the pages page A links to are assumed to be important by
association. But when page A votes for another page, page A isn't giving away its
importance--its simply passing it on. A page does not directly lose PageRank by
linking out.
-
8/9/2019 The Professionals Guide to Pagerank Optimization
14/35
Iterative Calculations and Convergence of PageRankNow that we understand the basic elements of the PageRank function, the next
question is: how do we use this function to calculate actual PageRank numbers?
After all, if the PageRank of every page depends on the PageRank of pages that link
to it, wouldn't this create an infinite loop of calculations? In a way, yes, the
calculation of PageRank is never-ending, but remember that the function has a
-
8/9/2019 The Professionals Guide to Pagerank Optimization
15/35
damping factor built in. Again, the damping factor makes sure that every time a
page distributes PageRank to another page, the other page only receives 85% of it.
In a simple example of two pages linking to each other, they will pass PageRank
back and forth, indefinitely. Each time Google makes a calculation of the new
PageRank totals, it is called an iteration. As you can see in the following illustration,each iteration brings the PageRank totals closer and closer to a specific value or
limit. When the totals for each page stabilize (i.e. they do not change significantly
after each additional iteration), they are said to have converged. There are certain
strategies for choosing starting PageRank values that converge quickly (i.e. after
the fewest number of iterations), but for practical purposes, we can assume that
Google's stored values of PageRank have all converged and are stable.
-
8/9/2019 The Professionals Guide to Pagerank Optimization
16/35
-
8/9/2019 The Professionals Guide to Pagerank Optimization
17/35
In the example above, the first three systems are well-linked, meaning every page
has at least one inbound link and one outbound link. This allows the PageRank to
"flow" through all the pages and achieve the maximum system PageRank. (If you
are puzzled to see a PageRank 11, remember that we are dealing with real
PageRank values here--not Toolbar PageRank values.)
The fourth system contains a dangling link, which means it links to a page that
doesn't link out to anything (kind of like a dead-end). A dangling link prevents the
system from achieving its maximum total PageRank because it stops the flow of
PageRank. The gray page in the above example takes PageRank from the green
page, but it doesn't distribute anything back into the system.
Site PageRank vs. Page PageRankThe examples above focus on total system PageRank. We assume that a system of
pages is a single collection of pages that start with an initial PageRank of 0.15 on
each page. We also assume that the system includes all existing pages, and
therefore, all possible PageRank is accounted for. In other words, a system of pages
can be thought of as a website that doesn't link out and doesn't have inbound links.
-
8/9/2019 The Professionals Guide to Pagerank Optimization
18/35
In reality, you are not likely to encounter a site such as that on the Internet (and it
probably wouldn't be indexed), but its a good way to illustrate the fluctuation of
page-level PageRank within a single system whose total PageRank remains
constant. The following sections will briefly describe how certain changes to a
website can affect page-level PageRank values, as well as the total site PageRank.
Add Links to Important PagesMany SEOs and developers focus too heavily on inbound links from external sites
and we neglect the links that we have complete control over--our internal links! It is
no coincidence that a website's home page usually has the highest PageRank value.
The high PR is not just from external inbound links. It's also because every page on
a given website usually links to the home page. However, most websites have
several important pages that deserve high rankings for relevant terms--not just the
home page. The following example shows how the PageRank value of a single page
can be increased by adding more links to it.
Subtract Links from Unimportant PagesIf you ask any website owner to show you the unimportant pages on his or her site,
they will probably try to tell you that all their pages are important. Naturally, they
are using a definition of "importance" that caters to their on-site users, but probably
not to people who begin at search engines. For the purposes of sculpting PageRank,
we must use a definition of usefulness that revolves around a search engine,
specifically Google. Google wants to rank pages that are independentlyvaluable to
Google's users. In other words, when a user types in a query, they are searching for
something specific and Google wants to satisfy that user after one click. For
example, if a user searches for [iphone], Google wants to deliver Apple's page about
iphones--not the Apple home page or their sitemap or any other page that would
-
8/9/2019 The Professionals Guide to Pagerank Optimization
19/35
require the user to click unnecessarily. So our definition of an important page is one
that contains unique content that is the most-relevant to one of our keywords. For
our purposes here, unimportant pages are those which provide little use for search
engine users, such as Contact pages, About pages, etc.
The following example shows how removing links from unimportant pages canincrease the PageRank going to important pages.
Add ContentSEOs often say, "content is king," but what exactly qualifies as "content"? In terms
of PageRank sculpting, content refers to pages. The more pages you have on your
site, the higher the maximum total PageRank. Remember, creating new pages also
creates new PageRank, and if your site is well-linked, then every new page adds a
PageRank value of 1 to the site's total. (As a side note, don't forget that Google only
gives PageRank to pages that it knows about and that meet certain criteria of
quality. In other words, creating a million blank pages isn't going to increase your
site's total PageRank, because Google isn't going to index a million blank pages.
When we talk about "adding a page," it is assumed that the new page will be
valuable and indexable, according to Google.) The example below shows how
adding supporting content can increase the PageRank of a landing page.
-
8/9/2019 The Professionals Guide to Pagerank Optimization
20/35
Subtract ContentSubtracting content is by far the most challenging PageRank sculpting concept to
understand, but the following example should help. We have already learned that
more pages mean more total PageRank, so what would be the benefit of subtracting
content? To answer this question, we have to have a clear understanding of total
site PageRank vs. page-level PageRank. Once you realize that Google ranks pages--
not sites--then you can understand why we would be willing to sacrifice some site-
level PageRank for the sake of increasing a single page's PageRank.
When we remove a page from a system, as long as that system remains well-linked,then the total system PageRank will only decrease by 1. This is the same amount of
PageRank that would be added to a system when we add a new page. In certain
cases, the page we remove might have a PageRank value that's higher than 1, but
it makes no difference: The rest of its PageRank will redistribute to the remaining
pages. As long as there is an increase in PageRank to the remaining important
pages, the total system decrease is justified.
-
8/9/2019 The Professionals Guide to Pagerank Optimization
21/35
Ideal PageRank Distribution
Natural Distribution from a Hierarchical StructureThe Google Webmaster Guidelines include the following recommendation when
creating a hierarchy for a website:
Make a site with a clear hierarchy and text links. Every page should be
reachable from at least one static text link.
A hierarchical data structure is formed when the pages of a website are organized
by topical categories. The home page typically targets broad, single-word keyword
phrases, and the landing pages typically target more specific keyword phrases that
are a subset of the main topic. Here is a simple example:
-
8/9/2019 The Professionals Guide to Pagerank Optimization
22/35
In the example above, the PageRank distribution and the choice of categorization
are both aligned with the search activity of users. In other words, we are assuming
that "cats" would be more difficult to rank for than "Persian cats," and therefore,this structure naturally channels more PageRank to the "cats" page. Choosing how
to categorize the content of your site is not an exact science, but sorting the pages
by keyword-relevant topics is usually a good place to start.
Site Architecture DefinitionsIn order to effectively communicate the techniques of PageRank sculpting, we must
first establish some definitions for related terms. The sections that follow will use
these definitions:
Landing Page - when we are talking about PageRank sculpting, a landing page is a
page that we are trying to get ranked for certain keywords. This is the page that wewant to show up in the results of search engines, and therefore, it is a page that we
want to focus PageRank on.
Supporting Page - a supporting page is a page that is somewhat optimized for
certain keywords, but it isn't the landing page for those keywords. An example of
this is the "Persian cats" page from the previous illustration. This page would
-
8/9/2019 The Professionals Guide to Pagerank Optimization
23/35
inevitably discuss "cats" in its content, but we wouldn't expect it to rank for "cats."
It would be the landing page for "Persian cats," but the supporting page for "cats."
Global Navigation - this refers to the links that appear on every page of a website.
These are usually located at the top and bottom of a page, but they can also be a
left navigation or right navigation. Most sites manage the global navigation codethrough some kind of content management system or server includes, so the
webmaster can change one file and that change will appear on every page of the
site. Global links (those that appear on every page on a site) are extremely
important for sculpting PageRank, as they are the quickest way to make significant
changes to a site's linking structure.
Secondary Navigation - for the purposes of PageRank sculpting, we will refer to a
site's secondary navigation as the set of links that can be found on every page in a
specific section of a website. This is similar to a global navigation, because the code
is typically managed through a single file that populates every page in a certain
section of the site. A secondary navigation may not affect every page on a website,but it should still have a considerable impact on PageRank sculpting.
Depth of Content on a WebsiteWe already discussed the relationship between accumulated PageRank and
distributed PageRank, in a previous section, but let's review. A page accumulates
PageRank from its inbound links and distributes PageRank evenly across its
outbound links. Due to the damping factor, every page only receives 85% of the
PageRank that was sent to it. As you navigate from your home page to your
important pages, imagine each new page's PageRank decreasing by 15% after each
click. Because of this damping effect, it is in your best interest to make sure that
your important pages are accessible after the fewest number of clicks possible.
Here is an illustration to help you visualize this concept:
-
8/9/2019 The Professionals Guide to Pagerank Optimization
24/35
Unless the yellow pages in this example contain important content, they should not
stand between the home page and the important page in this site's linking
structure.
-
8/9/2019 The Professionals Guide to Pagerank Optimization
25/35
PART II - Tools and Techniques for Sculpting
PageRank
Introduction to PageRank SculptingBuilding a website from scratch makes it pretty easy to incorporate best practices
like topical categories and a hierarchical structure, but most SEOs deal with sites
that are already up and running. For these sites, we can use various techniques to
redistribute PageRank to important pages, a process called PageRank sculpting. The
goal of PageRank sculpting is to increase our current search engine rankings by
simply altering the crawlable linking structure of our site.
Natural vs. Unnatural LinkingBefore making any significant changes to a site's navigation structure, an SEO
should consider how search engines might interpret those changes. Google
continues to advance their algorithms, but the fact remains that your content is
being interpreted by a machine and it needs to be machine-readable. However,
there are certain ways to write code that a web browser understands, but a search
engine doesn't. An example of this is JavaScript. Web browsers have JavaScript
interpreters built into them, so they can read the code and create links from it, but
search engines have a very limited understanding of JavaScript, so chances are they
won't recognize JavaScript links. This gives us the opportunity to distinguish
between user navigation and search engine navigation. For instance, we can rewrite
the code for a link, using JavaScript instead of HTML. This would prevent that page
from distributing PageRank through that link, but the link itself would still functionfor users.
We will soon discuss several ways to code links that search engines can't crawl, but
keep in mind that the only foolproof way to be sure a link isn't getting
PageRank is to not have it there at all. The more natural your link code is, the
less you have to worry about search engines crawling links you didn't want them to.
So yes, we will be making assumptions about what types of links count towards
PageRank distribution, but we don't know what the future holds. Google is
constantly making improvements to their ability to crawl JavaScript URLs, forms,
and even links in Flash files, but we may never fully understand how or if those links
affect PageRank. The bottom line is: the only way to know for sure that a link isn't
passing PageRank is to not put it on your page at all.
Link Level PageRank ControlsIn order to control the flow of PageRank on a website, we must understand how to
control the following:
-
8/9/2019 The Professionals Guide to Pagerank Optimization
26/35
Which pages accumulate PageRank
Which pages distribute PageRank
Which links pass PageRank
In the next section, we will list the most common options (i.e. tools) for sculpting
PageRank, and we will discuss how they affect these important details.
rel="nofollow"This is by far the most popular choice for sculpting PageRank, and for good reasons.
First, it is officially supported and endorsed by Google as a way to prevent the flow
of PageRank through links. Second, it is easy to implement: all you need to do is
add the attribute to any anchor tag to stop the flow of PageRank through that link.
Third, it allows precise control over PageRank flow, since the nofollow attribute can
be applied at the single-link level (as opposed to affecting the entire page).
Google has publicly informed the SEO community that all paid links should include
the nofollow attribute, so we have good reason to believe that the attribute really
does block the flow of PageRank. Additionally, Google has said that they do not use
nofollowed links for discovery, and nofollowed links do not pass anchor text either.
In other words, we will assume that nofollowed links are essentially invisible to
Googlebot.
Here is a simple example of a link, before and after adding the nofollow attribute to
the HTML code. Both of these links would appear and function the same to a user,
regardless of whether or not they contain the rel="nofollow" attribute.
Distributes PageRank:
SEOmoz
Does NOT distribute PageRank:
SEOmoz
JavaScriptFor a long time, Google completely ignored JavaScript, because it was too difficult or
too costly to interpret all the scripts on every webpage Google indexed. However,
rumors from the webmaster community suggest that this may no longer be the
case. There is a growing body of evidence that Google has developed at least a
fundamental understanding of JavaScript code, and that it can interpret simple
functions and parse (i.e. find and extract) URLs and file names, in an attempt to
discover new pages and new content.
This is great news for sites that are unknowingly preventing Googlebot from
indexing their pages, but it is not-so-great for sites that intentionally use JavaScript
-
8/9/2019 The Professionals Guide to Pagerank Optimization
27/35
links to prevent Googlebot from finding or indexing certain pages on the site. At the
time of writing this document, Google is still telling webmasters that its spiders
ignore JavaScript entirely. Since there is no way to know for sure, we recommend
that you avoid using JavaScript as your sole mechanism for controlling Googlebot or
PageRank. If you do use JavaScript to show your users links that you don't want
Google to crawl or distribute PageRank to, then here are a couple of tips to keep inmind:
1. Don't include complete URLs (or the HTML code for links) in your JavaScript
code. If you do, Google will have a much easier time finding the URLs you
don't want found.
2. Externalize as much as possible. Putting your JavaScript code into an external
.js file is a web design/SEO best practice and it also has the added benefit of
removing "user-only" links and URLs from your on-page HTML code. This
means that Google would have to fetch and read your external .js file, if it
wanted to figure out which of your JavaScript functions insert links onto thepage. As an additional safeguard, you can also disallow Googlebot from
accessing that external .js file, using your robots.txt file.
Here are 3 examples of links that rely on JavaScript. Each of the following examples
represents a complete webpage that contains nothing more than a single link to
SEOmoz.org. All 3 pages appear exactly the same to users (assuming they have
JavaScript enabled), but the actual page code itself is different from one example to
the next. The important sections of code have been highlighted, and each example
includes a brief explanation.
Example 1:
This example is a single page of XHTML code. It doesn't rely on any external files,
because the JavaScript is coded directly onto the page (between the tags).
The document.write() method is a built-in function of JavaScript that inserts
additional code when the page is rendered by a web browser. The additional code in
this example is highlighted. It is the plain HTML code that creates the link that users
see. This example shows how easy it would be for Google to recognize that link,
despite the fact that it's technically part of a JavaScript script. Therefore, we would
assume that this example would not be an effective way to prevent PageRank from
flowing through this link.
Link 1
-
8/9/2019 The Professionals Guide to Pagerank Optimization
28/35
Example 2:
This example is another single page of XHTML code. Again, it doesn't rely on any
external files, because the JavaScript is coded directly onto the page. However, this
example doesn't use tags like Example 1 did. Instead, it assigns JavaScript
code to the href attribute of a regular HTML link. When a user clicks this link, their
web browser executes the JavaScript code, instead of taking them to a new URL. Inthis example, the JavaScript code tells the browser to change the current window
location to SEOmoz.org, so it essentially performs the same basic function that a
pure-HTML link would. The only difference is this example requires users to have
JavaScript enabled. So in theory, Google shouldn't recognize this link because it
requires a JavaScript interpreter, but in reality, chances are that Google would have
no trouble seeing the URL and understanding the intent of this code. Therefore, we
would assume that this is not an effective technique for preventing the flow of
PageRank through this link.
Link 2
SEOmoz
Example 3:
This example is by far our best option for preventing the flow of PageRank through
this link. This example is similar to example 2, except that it removes the JavaScript
code that contains the link URL and places it in an external file, named
javascript.js. The external file is then disallowed in the robots.txt file. This means
that the only code Google ever sees is the reference to a function called
-
8/9/2019 The Professionals Guide to Pagerank Optimization
29/35
homePage(), but Google has no way of knowing what that function does unless it can
access the external JavaScript. Even though this example uses two more files than
the other examples, it is the only way we can be confident that PageRank is not
flowing through this link.
Link 3
SEOmoz
External JavaScript (javascript.js):
function homePage() {location.href = 'http://www.seomoz.org';}
robots.txt:
User-agent: *Disallow: javascript.js
Iframes can be a very convenient way to externalize large portions of HTML code
into a separate file. In most practical applications of PageRank sculpting, iframes
are used to display global content and navigations. An iframe element is usually just
one or two lines of HTML code that allows you to view another webpage's content
by referencing its URL in the iframe's src attribute. An iframe is basically a window
that we can embed in our webpage and view another webpage through. We control
the iframe's dimensions, border, and its ability to use a scrollbar or not, but the
content that appears (to users) in the iframe "window" can only be changed by
editing the content of the external page that we referenced in our src attribute.Since the iframe content (and HTML code) is not actually on our webpage, we don't
have to worry about distributing PageRank through those links. When Google crawls
our page, all it sees is our iframe element with a src attribute pointing to another
webpageit doesn't see the links that users see in their browser. Note the following
example.
Example - Before:
-
8/9/2019 The Professionals Guide to Pagerank Optimization
30/35
As an example of using an iframe for PageRank sculpting, imagine that your website
uses a global header navigation that links to a bunch of unimportant pages that are
wasting PageRank. In this case, you could cut-and-paste the header's HTML code
into its own separate webpage, and add an iframe element to your original
webpage where the header code used to be.
The highlighted div element here shows the links to unimportant pages, as they
would normally appear in the HTML code:
Before Iframe
Contact UsAbout UsPrivacy PolicyView CartCalculate ShippingReturns Policy
HomeLanding Page 1Landing Page 2Landing Page 3
Page Heading
Page content.
Example - After:
This is the HTML code after the "badlinks" div has been moved into its own external
file. Now this page is only distributing PageRank through "goodlinks," but the
appearance and functionality would not change for users. (The HTML code for the
newly-created "header.html" page follows.)
-
8/9/2019 The Professionals Guide to Pagerank Optimization
31/35
After Iframe
HomeLanding Page 1Landing Page 2Landing Page 3
Page Heading
Page content.
Header Page:
The following HTML code represents the external file that we reference in our iframe
tag. In other words, this is the content that we view through our iframe "window."
The HTML code for the "badlinks" div element has basically been cut from the
original example and pasted into a new page, but with some necessary tweaks
added to it:
The meta robots tag has been added to prevent Google from indexing this
page.
The body tag has been styled to remove the default spacing that would
otherwise change the appearance for users.
Each link now includes the target attribute to maintain the original
functionality for users.
Header
Contact UsAbout UsPrivacy Policy
-
8/9/2019 The Professionals Guide to Pagerank Optimization
32/35
View CartCalculate ShippingReturns Policy
Now, instead of Google seeing our unimportant links on every page, it only sees the
iframe src URL. One thing to keep in mind is that Google will still parse the src URL
from iframe tags, and it will add that URL to its list of pages to crawl. It is uncertain
whether or not Google would treat the iframe URL reference as a link to the header
page, but we do know that Google will crawl and index the header page, just as it
would any other webpage. For this reason, you may want to consider taking some of
the following precautions, depending on your specific needs:
Add the rel="nofollow" attribute to the unimportant links in the header
navigation.
Add the Meta robots tag to your header page, and set it to "noindex" or
"none".
Disallow the header page URL in your robots.txt file.
FlashSince Google has announced that they have improved their ability to read Flash
content, using Flash for PageRank sculpting is not the reliable tool it once was.
Chances are that Google's primary goal concerning Flash files is to find new content
contained in them, effectively negating Flashs usefulness as a means by which tohide content. Since it is possible to build an entire site using Flash, and then embed
the Flash file on a single URL, this raises questions about how Google would index
such a site. Despite the fact that Google continues to improve its understanding of
Flash content, we will still assume that Flash links do not distribute PageRank.
Just like JavaScript and Flash, forms are not the foolproof "spider blockers" that they
once were. Google has announced that they are trying out new methods of crawling
through forms. Presumably, Googlebot would test various combinations of input
parameters and analyze the resulting pages for unique content. However, using the
test URLs for discovery doesn't mean Google distributes PageRank to them. In thisguide, we assume that PageRank does not flow through forms.
Summary Chart of Link Level PageRank ControlsThe following chart summarizes the functionality of the PageRank controls
discussed in the previous section. These controls can be used at the link level,
-
8/9/2019 The Professionals Guide to Pagerank Optimization
33/35
meaning you can use them to block the flow of PageRank through specific links on a
page, while still allowing the remaining links to function as usual.
Does Googlesee these
links?
Does Google usethese links fordiscovering new
pages?
Does Google
distributePageRank
through theselinks?
rel="nofollow" yes no noJavaScript maybe maybe no
Flash maybe maybe no yes yes no
* no no no*Assuming Google is blocked from the src file
Page Level PageRank ControlsControlling PageRank distribution at the link level is fairly straightforward: either thelink distributes PageRank or it doesn't. Page-level controls are a bit trickier, because
we have to consider whether or not a given page accumulates PageRank, in
addition to whether or not it distributes PageRank.
Robots Meta TagMost SEOs and webmasters should already be familiar with the robots meta tag. It is
placed in the section of a webpage, and it tells search engines whether or
not they can crawl, index, or cache the content of that page. Before we explain how
this tag can be used to sculpt PageRank, let's define these three search engine
processes.
Indexing - this is the process where Google "reads" the content of your
webpage and transforms it into a representation of content--one that is easily
sorted in Google's index and processed for search results. The process of
indexing a webpage includes things like removing HTML code and stop
words, reducing words to their root (stemming), determining term
frequencies, and assigning weighted values to certain terms (depending on
how the terms appeared in the content). In other words, when Google
indexes a document, it determines the document's keyword relevancy. If you
include the noindex attribute in your page's robots Meta tag, you are basically
telling Google not to consider that page relevant to any query, and therefore,
not to list it in any search results.
Crawling - this is the process of a search engine identifying links in your
content and recording them. Google uses this information to discover new
pages and to calculate PageRank. Adding the nofollow attribute to the robots
-
8/9/2019 The Professionals Guide to Pagerank Optimization
34/35
Meta tag will tell Google to ignore all the links on the page. This has the same
effect as adding the rel="nofollow" attribute to every link on the page.
Caching - this is when Google stores a local copy of your page on its own
computers. This copy is essentially a snapshot of your page's code, from the
time that Google last saw it. Webmasters who don't want Google to cachetheir page can add the noarchive attribute to the robots Meta tag. Because
caching is for archival purposes only, we assume that this attribute does not
have any applications in PageRank sculpting.
Adding the nofollow attribute to a robots Meta tag would have obvious implications
for PageRank sculpting. It completely prevents that page from distributing
PageRank. However, the effect of adding the noindex attribute is not as intuitive.
Most webmasters would assume that a page must be indexed in order to
accumulate PageRank, but that isn't entirely true. Preventing Google from indexing
your page doesn't prevent other pages from linking to it. We have to assume that
when Google finds links pointing to your page, it records them for calculatingPageRank. Therefore, as far as PageRank sculpting is concerned, the noindex
attribute does not prevent a page from accumulating PageRank--it only prevents it
from showing up in the search results.
robots.txtIn contrast to the noindex tag, exclusion via robots.txt does not prevent a page
from showing up in Googles search results. The purpose of a websites robots.txt
file is to block certain user-agents from accessing certain files or directories of your
site. However, many SEOs and webmasters make the false assumption that
disallowing a page in the robots.txt file will prevent it from accumulating PageRank.
The truth is, a page that has been disallowed can still be linked to by other pages,
and Google is still going to consider those links when calculating PageRank. This
would create the same result that the robots Meta tag noindex attribute has, aside
from the fact that a page excluded in a robots.txt file can still appear in rankings.
301 Permanent RedirectsWe should all be familiar by now with the 301 redirect and its role in preserving
"link juice." This isn't so much a page-level PageRank control as it is a best practice.
If you need to remove a page of content from your site for whatever reason, you
should configure your server to return a 301 response that redirects to the new URL
location or another page with similar content.
302 Temporary RedirectsThese types of redirects will not forward the PageRank of the old URL to the new
URL (like the 301 does). Therefore, we avoid using these for sculpting PageRank.
302 redirects are useful in a number of situations, but not for the purpose of
distributing PageRank.
http://www.google.com/support/webmasters/bin/answer.py?answer=93633http://www.google.com/support/webmasters/bin/answer.py?answer=93633 -
8/9/2019 The Professionals Guide to Pagerank Optimization
35/35
Summary Chart of Page Level PageRank ControlsCan Google
indexcontent from
this page?
Does thispage show up
in searchresults?
Does thispage
accumulatePageRank?
Does thispage
distributePageRank?
Meta robots"noindex"
no no yes yes
Meta robots"nofollow"
yes yes yes no
Disallowed inrobots.txt
no yes yes no
301 redirects no no no no302 redirects no yes yes no
Conclusion
We have found that there is a widespread misunderstanding of PageRank: how it isobtained, how it is distributed and how best to structure a website for optimal
PageRank optimization. This guide has focused quite heavily on the PageRank
model because there is no good way to effectively optimize something without
having a good understanding of how it operates.
Beware of the tendency to think about PageRank in terms of the green bar at the
bottom of a web browser, as difficult as it can be to forget years of conditioning. By
developing a good understanding of real PageRank, we are certain that you can
improve your linking structure for maximum PageRank benefit.
ResourcesGoogles Webmaster Guidelines
Robotstxt.org
The PageRank Citation Ranking: Bringing Order to the Web
The Anatomy of a Large-Scale Hypertextual Web Search Engine
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769http://www.robotstxt.org/http://dbpubs.stanford.edu:8090/pub/1999-66http://infolab.stanford.edu/~backrub/google.htmlhttp://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769http://www.robotstxt.org/http://dbpubs.stanford.edu:8090/pub/1999-66http://infolab.stanford.edu/~backrub/google.html