designingriasforsearchengineaccessibility
TRANSCRIPT
-
8/15/2019 DesigningRIAsForSearchEngineAccessibility
1/8
Designing Rich Internet Applications
For Search Engine Accessibility
-
8/15/2019 DesigningRIAsForSearchEngineAccessibility
2/8
Designing Rich Internet Applications For Search Engine AccessibilityPage o 8
Rich Internet Applications create new opportunities. The
most undamental o these is the ability to create Single Page
Interaces (SPIs). A SPI is an interace that consists o a singleHTML page. Additional inormation that is required, when
the user clicks on a link or when some other event occurs,
is not supplied by means o a traditional ull page reload,
but is instead retrieved via an XML message. The original
page remains intact, its contents or state is simply updated
by the contents o the XML message. JavaScript is used to
acilitate this whole process. Although it is not mandatory to
create a SPI, when using Backbases sotware, a SPI provides a
more intuitive user interace and smoother user experience.
There are a ew questions that need to be answered however
when you make use o this new paradigm. One o the main
questions is that o search engine accessibility and deeplinking.
The web sites that have been created up until now, consist
almost entirely o Multi Page Interaces (MPIs). These web
sites and applications consist o multiple unique pages, which
may or may not have been dynamically generated. Since each
page, and or dynamic pages every page state, has a unique
URI; it is very easy to link to any page or state within this site.
Navigation between pages is done by the user clicking on
links or submitting orms, both o which contain the location
and state inormation or the new page. It is these unique
URIs that make deep linking possible. Deep linking does not
just link to a particular web site, but links directly to a specic
page within the site. It is this MPI paradigm which inorms the
robots which are used by search engines such as Google orYahoo to index the inormation in web sites. Search bots are
sotware agents that crawl through web sites; they start at
the index page and, ater categorizing all o the inormation
on the page; they ollow the links on this page to other pages
on the site. In this way they crawl through the entire web site,
visiting any page that has been linked to using a link tag o
the type:
Next Page
However in an SPI, the linked page structure that the search
bot is expecting, has been extended with BXML commands,which indicate the use o include les, load commands and
orm submissions, which only partially update the page,
instead o causing a ull reload as is the case with normal
orms. Since search bots arent proper web browsers, they
dont understand or execute any JavaScript. This means that
a Backbase SPI needs to be specically designed to work
with these search bots.
This article puts orward a set o guidelines, which you can
use to design your SPI or maximal search engine accessibility
and shows you techniques to allow or deep linking into your
SPI.
Introduction
-
8/15/2019 DesigningRIAsForSearchEngineAccessibility
3/8
Designing Rich Internet Applications For Search Engine Accessibility Page o 8
Several approaches are available or making your web site
accessible to search engines; these approaches dier in
the level o indexing, which is obtainable and how this isachieved. For certain sites, it is not necessarily a requirement
that every part o the site can be indexed by search engines.
For example, a site, which provides a web-based e-mail
service, does not require every single piece o inormation on
the site to be indexed by a search bot. Other sites, however,
do require that every piece o inormation can easily be ound
and indexed by search engines. For example, a web site with
inormation about the courses provided by a university is
such a case. Backbase has identied the ollowing strategies
or getting a SPI indexed by search engines:
Lightweight Indexing: no structurally changes are madeto your site; existing tags such as meta, title and h1 are
leveraged.
Extra Link Strategy: extra links are placed on the site,
which search bots can ollow and thereby index the whole
site.
Secondary Site Strategy: a secondary site is created,
which is ully accessible to the search engine.
For each o these strategies the ollowing questions will be
answered:
To what extent is the content o the page indexed?
Can links be ollowed on the page (e.g. link elements () or s:include elements)?
When a link is ollowed by the search bot, what is the
status o the URL that is being indexed. Can this URL be
displayed by browsers or will some type o redirection be
required?
Lightweight Indexing
This strategy should be used i only certain key inormation
needs to be indexed by search engines. In this case it is
recommended that you take the ollowing steps when
designing your SPI:
Use a title element in the document head, preerably
containing one or more keywords that specically relate
to the contents o the site. For example:
BXML WebMail Sign In
Use a keywords meta element with a contentattribute
containing some appropriate keywords. For example:
Use a descriptionmeta element with a contentattribute,
which contains a relevant description o the web page.
The value o this element is oten printed as part o a
search result by Google. For example:
Place key content within the main HTML structure and
not in an include le, or some other dynamically loaded
content. I possible, place this important content within
a h1, h2 or h3 element, since search bots deem these to
contain more important inormation. Remember that
these tags can be styled in anyway you want using CSS.
It should be noted that these points can also be put to good
use, in the design o your SPI, in conjunction with the extra
link strategy or the secondary site strategy.
In summary by using this lightweight-indexing strategy
only the content supplied by the title and meta elements and
those elements that are directly located on the index page
is indexed. No links o type s:include are ollowed; thereore
there is no requirement to deal with redirection. This is not a
very ull indexing scheme, but it is extremely simple to apply
to your site.
The Extra Link Strategy
There are two main approaches to making a site ully
indexable by search engines: the extra link strategy and the
secondary site strategy. The extra link strategy is the easiest
o these two to implement and it can make the site entirely
indexable by search engines, but does not create a secondary
site in normal HTML and is thereore not accessible to older
browser, which are incompatible with BXML. The essence o
this strategy is to create an extra link on the main SPI index
page or each include le, whose contents you wish to be
indexed. Some experimentation has revealed that the extra
links must be o the type:
include 1
Making SPIs Search Engine Accessible
-
8/15/2019 DesigningRIAsForSearchEngineAccessibility
4/8
Designing Rich Internet Applications For Search Engine AccessibilityPage o 8
The ollowing points must be ollowed, i you want Google to
index these pages:
The link must be made by an a element and the include
le must be indicated by the hrefattribute.
The include le must have the .htmlor .htm le extension.
This is a bit o workaround, since in reality include les
arent proper HTML les but are instead XML les. However
i you use a divelement or a similar HTML element as the
root tag, then all modern browsers will be able to read the
le as i they were HTML and Google will index it. As ar
as the BPC (Backbase Presentation Client) is concerned, it
merely stipulates that a include le should be well-ormed
XML and isnt interested in which le-type extension ituses.
NB: The include les should not have a XML declaration or
a document type denition, otherwise Internet Explorer
will be unable to accept .htmlor .htm les as include les.
The link tag must have some text content. Without this
Google will simply ignore it.
No attempt should be made at using HTML to hide these
links, since Google rowns on this and may not index such
pages. You can however use BXML to remove or hide
these links, by way o a constructevent handler, as shown
in the example below:
Left Panel
Right Panel
It is not necessary to detect the user agent o the search bots
(see the appendix at the end o this article or ull details o
this process), since they will simply ollow the extra links that
are provided or them. However it is necessary to do some
detection when these include les are being served up. This
is tricky since these include les can be requested by the user
in two dierent ways. When a user is directed to one o these
pages through a search engine, they need to be redirected
to the main index page. On the other hand when the BPC
requests theses pages as include les, no redirection should
occur. Due to the act that both search bots and the BPC
ignore meta reresh tags it is possible to solve this problem.
Such a meta reresh tag must be included directly inside the
body o the include le. Even though these tags are normally
placed inside the headelement, they will still be executedanywhere in the body by all BXML-compatible browsers.
Below is an example o such a meta reresh tag:
Once the browser has been redirected to the SPI index page,
this page must parse out the reerrer and trigger an event
handler, which will update the state o the SPI accordingly.
This process o detecting deep linking and updating the
page state is explained in much more detail in the appendix
at the end o this document.
In summary the extra link strategy makes the whole site ully
indexable. By adding extra link elements search bots are able
to index all pages o the site. However since the URLs o the
pages that get indexed, point to include les, which arentully BXML-capable pages, it is necessary to redirect normal
browser back to the SPI version o the site and then update
the state o this SPI accordingly.
The Secondary Site Strategy
The secondary site strategy is the most complete o all o the
indexing strategies. It is also the most labor intensive. The
site should be made out o plain HTML and contain a linked
multi-paged structure. Though this may seem laborious;
having a secondary site to all back upon makes your siteavailable to people that are using older browsers, which
arent supported by Backbase, as well as browsers on mobile
devices and to disabled people. This gives you a chance to
make your site accessible to all users, not just search engines.
This strategy has three important components:
1. Generating the secondary sites pages.
. User-agent detection o both the search bots and
BXML-compatible browser.
. Redirection o browsers and the detection o this
redirection, which allows the status o the SPI to be
updated to refect this deep linking.
Generating the Search Engine Accessible Pages
The search engine accessible pages can be generated
in several ways. It is possible to manually generate the
secondary all-back site. It is also possible to automate this
process using XSLT.
Manual Site Generation. This is a simple, lo-tech solution,
but it is also labor-intensive, since you have to build two
versions o your web site. There is also a danger that when
you update your site with new inormation, you will orget to
update the secondary pages. This will cause the two versionso the site to be out o sync with each other and or the
inormation ound on search engines to not be up to date.
-
8/15/2019 DesigningRIAsForSearchEngineAccessibility
5/8
Designing Rich Internet Applications For Search Engine Accessibility Page o 8
The ull solution to this problem consists o two parts. Firstly,
BXML compatible browsers need to be redirected to the
SPI version o the site. And secondly, the SPI version needsto detect that it has been redirected rom one o these
deep linked pages and then update the state o the page
accordingly, so that the inormation relevant to this link is
shown.
Browser Redirection. When one o the MPI pages intended
or the search engine, is requested, the user agent must
be detected again. However, in this case when a BXML-
compatible browser is detected, it is redirected and not the
search bot. The browser is sent to the index page o the SPI
version o the site.
Detecting Deep Linking. The BXML version o index.html
needs to ascertain rom which page it was reerred. This must
be done as soon as the page is loaded, so that the transition
appears to be seamless to the user. Full details o how to
detect deep linking and how to update the page state can
be ound in the appendix at the end o this article.
In summary the secondary site strategy makes the whole
site ully indexable. Since the search bot is redirected to a
normal HTML site, all links are ollowable by the search bot.
However since the URLs o the pages, which get indexed
when the links are ollowed, point to non-BXML pages, it is
necessary to redirect normal browser back to the SPI version
o the site and then update the state o this SPI accordingly.
XSLT-Driven Generation. An alternative strategy, which is
especially eective i you use a content management system
(CMS), is to store all o the inormation, or at least the copyor your site as plain XML. This can be in a ormat dened
by yoursel or your CMS. This XML must then be transormed
into BXML using an XSLT. A second, much simpler, XSLT
is used to transorm the XML into the secondary, search-
engine accessible site. Although this approach requires a
little more eort when you initially develop the site, once
both XSLTs are ready, new content can easily be added to the
XML data source and then both versions will be generated
automatically.
User-Agent Detection
A vital component o this two-site strategy is browser
detection. Techniques that can be used or user-agent
detection are discussed in the appendix at the end o the
article. Once the user agent has been detected it is necessary
to make sure that the BXML-compatible browsers get sent
to the BXML site and that the search bots and non BXML-
compatible browsers get sent to the accessible site.
Deep Linking and Browser Redirection
This section looks at an issue that arises rom having a
secondary multi-paged version o your site that is indexed
by search engines. The solution to this problem also
immediately oers a solution to the issue o deep linking
in an SPI. The issue boils down to the act that a site with
multiple pages is being used to represent a site that consists
o only a single page. Lets take an example to illustrate this
problem: a simple SPI, which consists o a main index page
that itsel consists o a tabbed interace, which contains
three dierent tabs. The contents o each o these tabs will
be stored in a separate include le and be loaded into the SPI
as and when they are required. Thereore to make this siteindexable by a search engine, a MPI version o this site would
presumably have been made with one index page (e.g.
index.html) and three separate HTML pages representing
the include iles or each o the tabs (e.g. tab1.html, tab2.
html and tab3.html). Now i a users search term closely
matched something indexed on the third tab, then the
search engine would point the user to tab3.html. However,
in reality, you do not want your user to be redirected to tab3.
html. Instead, you want him to be sent to the index.html
page o the SPI version o your site and when this page is
opened, the third tab, which correlates to tab3.htmlshould
be selected.
-
8/15/2019 DesigningRIAsForSearchEngineAccessibility
6/8
Designing Rich Internet Applications For Search Engine AccessibilityPage o 8
Google especially and presumably other search engines
deeply rown upon any attempts to try and unairly
manipulate search results. Any site that is caught willullytrying to manipulate Google will be banned rom Googles
index. Redirection to another site, with dierent content,
based on the user agent is technically called cloaking and
is rowned upon. Thereore, you should make sure that the
inormation conveyed by any secondary web sites, which
have been set up, with the intention o making your site
indexable by Google and other search engines, is exactly the
same as the inormation contained by your BXML site.
Ethics
-
8/15/2019 DesigningRIAsForSearchEngineAccessibility
7/8
Designing Rich Internet Applications For Search Engine Accessibility Page o 8
User-Agent Detection
A vital component o both the secondary site strategy and
the extra link strategy is browser detection. The technical
term or a web browser or a search robot or any other piece
o sotware that approaches a web site is a user agent. When
a user agent requests a particular page, it supplies details o
itsel by way o one o the HTTP headers that are sent along
with the request. The Fireox browser or instance sends the
ollowing request header:
User-Agent: Mozilla/5.0 (Windows; U; Windows
NT 5.1; en-US; rv:1.7.8) Gecko/20050511
Firefox/1.0.4
It is thereore relatively straightorward to write a script,
which determines what the user agent is, and then redi-
rects the user agent to the appropriate version o the site.
The most straightorward technique is not to try to nd the
search bots or other incompatible browsers, since this group
is relatively large and hard to qualiy. It is easier to determine
whether the user agent is a BXML compatible browser and
then assume that i the user agent isnt one o these then it
is either a search bot or an incompatible browser. The ollow-
ing browsers are BXML compatible:
Internet Explorer .0 and newer
Mozilla 1. and newer
Fireox 1.0 and newer
Netscape . and newer
User-agent detection can be done on the server using a PHP,
ASP or JSP script. There are standard libraries, which help take
care o this. Alternatively i you cannot or do not wish to use
server-side scripts to determine the user agent, it is possible
to do this in JavaScript. I you take this approach, you should
be aware o the act that search bots cannot be expected to
execute any JavaScript. Thereore i you are using the sec-
ondary site strategy in conjunction with JavaScript based
detection, the deault page provided by the initial page re-quest must be the non-BXML site, which is intended or the
search engine bot. When you ascertain that the user agent is
a BXML-compatible browser, then JavaScript should redirect
the browser to the BXML version o your site. The ollowing
code ragment shows a simple JavaScript unction, which
tests whether a BXML-compatible Mozilla-based browser is
in use and then redirects the browser based on this.
function testUA(){
var bCompatible = false;
var sUA = window.navigator.userAgent;
var iIOGecko = sUA.indexOf(Gecko);
//Test if the User-Agent string contains
//the string Gecko
Appendix
if (iIOGecko >= 0){
//extract the string directly after rv:
//and check valuevar iIOrv = sUA.indexOf(rv:);
var sRv = sUA.substr(iIOrv + 3, 3);
if (sRv >= 1.5)
bCompatible = true;
}
//now if compatible redirect
if (bCompatible)
window.location.href = bxmlIndex.html;
}
This unction is relatively straightorward but certain parts
may need explaining. Firstly, both Netscape and Fireox
browsers use the same Gecko core as Mozilla does. They
also have similar User-Agent strings. Thereore, the unction
above rstly searches or a Gecko sub-string, which all o
their User-Agent string will contain. Once this sub-string has
been ound, the unction searches or the rv: sub-string. This
is short or revision and it is ollowed by the version number
o the Gecko engine. I this number is 1. or higher, then the
Gecko engine is BXML compatible. Thereore, this relatively
simple unction is able to test or all compatible Netscape,
Fireox and Mozilla browsers.
Obviously, it is also necessary to test or compatible versions
o Internet Explorer too. This can be done in a similar way,but there is one added complication. All compatible versions
o Internet Explorer have a User-Agent string that contains
the sub-string: MSIE, which is directly ollowed by the ver-
sion number. Below is an example o such a header rom an
Internet Explorer browser.
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; .NET CLR 1.1.4322)
However unortunately Opera browsers have a very similar
User-Agent string:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; en) Opera 8.00
Thereore you must rstly test that the User-Agent string
doesnt contain the Opera sub-string and once this has been
ascertained, then simply parse out the version number which
ollows the MSIE sub-string.
-
8/15/2019 DesigningRIAsForSearchEngineAccessibility
8/8
Designing Rich Internet Applications For Search Engine AccessibilityPage 8 o 8
Detecting Deep Linking and Updating the
Pages StateThis section looks at how redirection based on deep linking
can be detected and then at how the state o a page can
then be updated using this inormation. Deep linking can be
detected on the server by reading the ReferrerHTTP request
header using a server-side script. Once the reerrer has been
read then an appropriate construct event handler must be
created, which updates the initial state. Alternatively, i you
do not have access to server-side scripting, you can use
a JavaScript unction to do this. The js action is a special
BXML action, which is used to call JavaScript unctions. The
ollowing behavior takes care o calling this unction when
the page is loaded:
... Other event handlers go here ...
The updateState unction, which this action calls, then needs
to parse out the reerrer. Once this value has been ound
the JavaScript unction triggers an appropriate BXML event,
hereby passing control back to the BPC. This is done by calling
the execute method o the bpcobject, with a BXML string. A
simple version o such a unction looks like this:
function updateState(){
//rst parse out the value of the referrer
//var sReferrer = document.referrer;
//do quick test to make sure that referrer
//is from the same host
if(sReferrer.indexOf(
window.location.hostname) >= 0){
var iLastSlash =sReferrer.lastIndexOf(/);
var sValue =
sReferrer.substr(iLastSlash + 1);
//trigger an event with the same name as
//the referrer
var sExecute = ;
bpc.execute(sExecute);
}
}
You should note that this is a very simplistic implementation osuch a reerrer parsing unction. For a more complicated web
site structure, it is important that it is totally unambiguous
which page the reerrer was, otherwise mistakes can be
made. For such cases, more complicated JavaScript will be
required to veriy this.
Now nally lets look at an example o the type o event
handler that could be triggered by such an updateState
unction:
... Other event handlers go here ...
This behavior contains an event handler or the custom event
tab3.html, which is triggered by the JavaScript unction when
redirection has occurred rom the tab3.html page. All it does
is perorm a selectaction on a target with an id o tab. I this
corresponds to the appropriate tab, then simply by selecting
this tab, the tab should be loaded and become visible.