designingriasforsearchengineaccessibility

Upload: an-sheng-jhang

Post on 31-May-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/15/2019 DesigningRIAsForSearchEngineAccessibility

    1/8

    Designing Rich Internet Applications

    For Search Engine Accessibility

  • 8/15/2019 DesigningRIAsForSearchEngineAccessibility

    2/8

    Designing Rich Internet Applications For Search Engine AccessibilityPage o 8

    Rich Internet Applications create new opportunities. The

    most undamental o these is the ability to create Single Page

    Interaces (SPIs). A SPI is an interace that consists o a singleHTML page. Additional inormation that is required, when

    the user clicks on a link or when some other event occurs,

    is not supplied by means o a traditional ull page reload,

    but is instead retrieved via an XML message. The original

    page remains intact, its contents or state is simply updated

    by the contents o the XML message. JavaScript is used to

    acilitate this whole process. Although it is not mandatory to

    create a SPI, when using Backbases sotware, a SPI provides a

    more intuitive user interace and smoother user experience.

    There are a ew questions that need to be answered however

    when you make use o this new paradigm. One o the main

    questions is that o search engine accessibility and deeplinking.

    The web sites that have been created up until now, consist

    almost entirely o Multi Page Interaces (MPIs). These web

    sites and applications consist o multiple unique pages, which

    may or may not have been dynamically generated. Since each

    page, and or dynamic pages every page state, has a unique

    URI; it is very easy to link to any page or state within this site.

    Navigation between pages is done by the user clicking on

    links or submitting orms, both o which contain the location

    and state inormation or the new page. It is these unique

    URIs that make deep linking possible. Deep linking does not

    just link to a particular web site, but links directly to a specic

    page within the site. It is this MPI paradigm which inorms the

    robots which are used by search engines such as Google orYahoo to index the inormation in web sites. Search bots are

    sotware agents that crawl through web sites; they start at

    the index page and, ater categorizing all o the inormation

    on the page; they ollow the links on this page to other pages

    on the site. In this way they crawl through the entire web site,

    visiting any page that has been linked to using a link tag o

    the type:

    Next Page

    However in an SPI, the linked page structure that the search

    bot is expecting, has been extended with BXML commands,which indicate the use o include les, load commands and

    orm submissions, which only partially update the page,

    instead o causing a ull reload as is the case with normal

    orms. Since search bots arent proper web browsers, they

    dont understand or execute any JavaScript. This means that

    a Backbase SPI needs to be specically designed to work

    with these search bots.

    This article puts orward a set o guidelines, which you can

    use to design your SPI or maximal search engine accessibility

    and shows you techniques to allow or deep linking into your

    SPI.

    Introduction

  • 8/15/2019 DesigningRIAsForSearchEngineAccessibility

    3/8

    Designing Rich Internet Applications For Search Engine Accessibility Page o 8

    Several approaches are available or making your web site

    accessible to search engines; these approaches dier in

    the level o indexing, which is obtainable and how this isachieved. For certain sites, it is not necessarily a requirement

    that every part o the site can be indexed by search engines.

    For example, a site, which provides a web-based e-mail

    service, does not require every single piece o inormation on

    the site to be indexed by a search bot. Other sites, however,

    do require that every piece o inormation can easily be ound

    and indexed by search engines. For example, a web site with

    inormation about the courses provided by a university is

    such a case. Backbase has identied the ollowing strategies

    or getting a SPI indexed by search engines:

    Lightweight Indexing: no structurally changes are madeto your site; existing tags such as meta, title and h1 are

    leveraged.

    Extra Link Strategy: extra links are placed on the site,

    which search bots can ollow and thereby index the whole

    site.

    Secondary Site Strategy: a secondary site is created,

    which is ully accessible to the search engine.

    For each o these strategies the ollowing questions will be

    answered:

    To what extent is the content o the page indexed?

    Can links be ollowed on the page (e.g. link elements () or s:include elements)?

    When a link is ollowed by the search bot, what is the

    status o the URL that is being indexed. Can this URL be

    displayed by browsers or will some type o redirection be

    required?

    Lightweight Indexing

    This strategy should be used i only certain key inormation

    needs to be indexed by search engines. In this case it is

    recommended that you take the ollowing steps when

    designing your SPI:

    Use a title element in the document head, preerably

    containing one or more keywords that specically relate

    to the contents o the site. For example:

    BXML WebMail Sign In

    Use a keywords meta element with a contentattribute

    containing some appropriate keywords. For example:

    Use a descriptionmeta element with a contentattribute,

    which contains a relevant description o the web page.

    The value o this element is oten printed as part o a

    search result by Google. For example:

    Place key content within the main HTML structure and

    not in an include le, or some other dynamically loaded

    content. I possible, place this important content within

    a h1, h2 or h3 element, since search bots deem these to

    contain more important inormation. Remember that

    these tags can be styled in anyway you want using CSS.

    It should be noted that these points can also be put to good

    use, in the design o your SPI, in conjunction with the extra

    link strategy or the secondary site strategy.

    In summary by using this lightweight-indexing strategy

    only the content supplied by the title and meta elements and

    those elements that are directly located on the index page

    is indexed. No links o type s:include are ollowed; thereore

    there is no requirement to deal with redirection. This is not a

    very ull indexing scheme, but it is extremely simple to apply

    to your site.

    The Extra Link Strategy

    There are two main approaches to making a site ully

    indexable by search engines: the extra link strategy and the

    secondary site strategy. The extra link strategy is the easiest

    o these two to implement and it can make the site entirely

    indexable by search engines, but does not create a secondary

    site in normal HTML and is thereore not accessible to older

    browser, which are incompatible with BXML. The essence o

    this strategy is to create an extra link on the main SPI index

    page or each include le, whose contents you wish to be

    indexed. Some experimentation has revealed that the extra

    links must be o the type:

    include 1

    Making SPIs Search Engine Accessible

  • 8/15/2019 DesigningRIAsForSearchEngineAccessibility

    4/8

    Designing Rich Internet Applications For Search Engine AccessibilityPage o 8

    The ollowing points must be ollowed, i you want Google to

    index these pages:

    The link must be made by an a element and the include

    le must be indicated by the hrefattribute.

    The include le must have the .htmlor .htm le extension.

    This is a bit o workaround, since in reality include les

    arent proper HTML les but are instead XML les. However

    i you use a divelement or a similar HTML element as the

    root tag, then all modern browsers will be able to read the

    le as i they were HTML and Google will index it. As ar

    as the BPC (Backbase Presentation Client) is concerned, it

    merely stipulates that a include le should be well-ormed

    XML and isnt interested in which le-type extension ituses.

    NB: The include les should not have a XML declaration or

    a document type denition, otherwise Internet Explorer

    will be unable to accept .htmlor .htm les as include les.

    The link tag must have some text content. Without this

    Google will simply ignore it.

    No attempt should be made at using HTML to hide these

    links, since Google rowns on this and may not index such

    pages. You can however use BXML to remove or hide

    these links, by way o a constructevent handler, as shown

    in the example below:

    Left Panel

    Right Panel

    It is not necessary to detect the user agent o the search bots

    (see the appendix at the end o this article or ull details o

    this process), since they will simply ollow the extra links that

    are provided or them. However it is necessary to do some

    detection when these include les are being served up. This

    is tricky since these include les can be requested by the user

    in two dierent ways. When a user is directed to one o these

    pages through a search engine, they need to be redirected

    to the main index page. On the other hand when the BPC

    requests theses pages as include les, no redirection should

    occur. Due to the act that both search bots and the BPC

    ignore meta reresh tags it is possible to solve this problem.

    Such a meta reresh tag must be included directly inside the

    body o the include le. Even though these tags are normally

    placed inside the headelement, they will still be executedanywhere in the body by all BXML-compatible browsers.

    Below is an example o such a meta reresh tag:

    Once the browser has been redirected to the SPI index page,

    this page must parse out the reerrer and trigger an event

    handler, which will update the state o the SPI accordingly.

    This process o detecting deep linking and updating the

    page state is explained in much more detail in the appendix

    at the end o this document.

    In summary the extra link strategy makes the whole site ully

    indexable. By adding extra link elements search bots are able

    to index all pages o the site. However since the URLs o the

    pages that get indexed, point to include les, which arentully BXML-capable pages, it is necessary to redirect normal

    browser back to the SPI version o the site and then update

    the state o this SPI accordingly.

    The Secondary Site Strategy

    The secondary site strategy is the most complete o all o the

    indexing strategies. It is also the most labor intensive. The

    site should be made out o plain HTML and contain a linked

    multi-paged structure. Though this may seem laborious;

    having a secondary site to all back upon makes your siteavailable to people that are using older browsers, which

    arent supported by Backbase, as well as browsers on mobile

    devices and to disabled people. This gives you a chance to

    make your site accessible to all users, not just search engines.

    This strategy has three important components:

    1. Generating the secondary sites pages.

    . User-agent detection o both the search bots and

    BXML-compatible browser.

    . Redirection o browsers and the detection o this

    redirection, which allows the status o the SPI to be

    updated to refect this deep linking.

    Generating the Search Engine Accessible Pages

    The search engine accessible pages can be generated

    in several ways. It is possible to manually generate the

    secondary all-back site. It is also possible to automate this

    process using XSLT.

    Manual Site Generation. This is a simple, lo-tech solution,

    but it is also labor-intensive, since you have to build two

    versions o your web site. There is also a danger that when

    you update your site with new inormation, you will orget to

    update the secondary pages. This will cause the two versionso the site to be out o sync with each other and or the

    inormation ound on search engines to not be up to date.

  • 8/15/2019 DesigningRIAsForSearchEngineAccessibility

    5/8

    Designing Rich Internet Applications For Search Engine Accessibility Page o 8

    The ull solution to this problem consists o two parts. Firstly,

    BXML compatible browsers need to be redirected to the

    SPI version o the site. And secondly, the SPI version needsto detect that it has been redirected rom one o these

    deep linked pages and then update the state o the page

    accordingly, so that the inormation relevant to this link is

    shown.

    Browser Redirection. When one o the MPI pages intended

    or the search engine, is requested, the user agent must

    be detected again. However, in this case when a BXML-

    compatible browser is detected, it is redirected and not the

    search bot. The browser is sent to the index page o the SPI

    version o the site.

    Detecting Deep Linking. The BXML version o index.html

    needs to ascertain rom which page it was reerred. This must

    be done as soon as the page is loaded, so that the transition

    appears to be seamless to the user. Full details o how to

    detect deep linking and how to update the page state can

    be ound in the appendix at the end o this article.

    In summary the secondary site strategy makes the whole

    site ully indexable. Since the search bot is redirected to a

    normal HTML site, all links are ollowable by the search bot.

    However since the URLs o the pages, which get indexed

    when the links are ollowed, point to non-BXML pages, it is

    necessary to redirect normal browser back to the SPI version

    o the site and then update the state o this SPI accordingly.

    XSLT-Driven Generation. An alternative strategy, which is

    especially eective i you use a content management system

    (CMS), is to store all o the inormation, or at least the copyor your site as plain XML. This can be in a ormat dened

    by yoursel or your CMS. This XML must then be transormed

    into BXML using an XSLT. A second, much simpler, XSLT

    is used to transorm the XML into the secondary, search-

    engine accessible site. Although this approach requires a

    little more eort when you initially develop the site, once

    both XSLTs are ready, new content can easily be added to the

    XML data source and then both versions will be generated

    automatically.

    User-Agent Detection

    A vital component o this two-site strategy is browser

    detection. Techniques that can be used or user-agent

    detection are discussed in the appendix at the end o the

    article. Once the user agent has been detected it is necessary

    to make sure that the BXML-compatible browsers get sent

    to the BXML site and that the search bots and non BXML-

    compatible browsers get sent to the accessible site.

    Deep Linking and Browser Redirection

    This section looks at an issue that arises rom having a

    secondary multi-paged version o your site that is indexed

    by search engines. The solution to this problem also

    immediately oers a solution to the issue o deep linking

    in an SPI. The issue boils down to the act that a site with

    multiple pages is being used to represent a site that consists

    o only a single page. Lets take an example to illustrate this

    problem: a simple SPI, which consists o a main index page

    that itsel consists o a tabbed interace, which contains

    three dierent tabs. The contents o each o these tabs will

    be stored in a separate include le and be loaded into the SPI

    as and when they are required. Thereore to make this siteindexable by a search engine, a MPI version o this site would

    presumably have been made with one index page (e.g.

    index.html) and three separate HTML pages representing

    the include iles or each o the tabs (e.g. tab1.html, tab2.

    html and tab3.html). Now i a users search term closely

    matched something indexed on the third tab, then the

    search engine would point the user to tab3.html. However,

    in reality, you do not want your user to be redirected to tab3.

    html. Instead, you want him to be sent to the index.html

    page o the SPI version o your site and when this page is

    opened, the third tab, which correlates to tab3.htmlshould

    be selected.

  • 8/15/2019 DesigningRIAsForSearchEngineAccessibility

    6/8

    Designing Rich Internet Applications For Search Engine AccessibilityPage o 8

    Google especially and presumably other search engines

    deeply rown upon any attempts to try and unairly

    manipulate search results. Any site that is caught willullytrying to manipulate Google will be banned rom Googles

    index. Redirection to another site, with dierent content,

    based on the user agent is technically called cloaking and

    is rowned upon. Thereore, you should make sure that the

    inormation conveyed by any secondary web sites, which

    have been set up, with the intention o making your site

    indexable by Google and other search engines, is exactly the

    same as the inormation contained by your BXML site.

    Ethics

  • 8/15/2019 DesigningRIAsForSearchEngineAccessibility

    7/8

    Designing Rich Internet Applications For Search Engine Accessibility Page o 8

    User-Agent Detection

    A vital component o both the secondary site strategy and

    the extra link strategy is browser detection. The technical

    term or a web browser or a search robot or any other piece

    o sotware that approaches a web site is a user agent. When

    a user agent requests a particular page, it supplies details o

    itsel by way o one o the HTTP headers that are sent along

    with the request. The Fireox browser or instance sends the

    ollowing request header:

    User-Agent: Mozilla/5.0 (Windows; U; Windows

    NT 5.1; en-US; rv:1.7.8) Gecko/20050511

    Firefox/1.0.4

    It is thereore relatively straightorward to write a script,

    which determines what the user agent is, and then redi-

    rects the user agent to the appropriate version o the site.

    The most straightorward technique is not to try to nd the

    search bots or other incompatible browsers, since this group

    is relatively large and hard to qualiy. It is easier to determine

    whether the user agent is a BXML compatible browser and

    then assume that i the user agent isnt one o these then it

    is either a search bot or an incompatible browser. The ollow-

    ing browsers are BXML compatible:

    Internet Explorer .0 and newer

    Mozilla 1. and newer

    Fireox 1.0 and newer

    Netscape . and newer

    User-agent detection can be done on the server using a PHP,

    ASP or JSP script. There are standard libraries, which help take

    care o this. Alternatively i you cannot or do not wish to use

    server-side scripts to determine the user agent, it is possible

    to do this in JavaScript. I you take this approach, you should

    be aware o the act that search bots cannot be expected to

    execute any JavaScript. Thereore i you are using the sec-

    ondary site strategy in conjunction with JavaScript based

    detection, the deault page provided by the initial page re-quest must be the non-BXML site, which is intended or the

    search engine bot. When you ascertain that the user agent is

    a BXML-compatible browser, then JavaScript should redirect

    the browser to the BXML version o your site. The ollowing

    code ragment shows a simple JavaScript unction, which

    tests whether a BXML-compatible Mozilla-based browser is

    in use and then redirects the browser based on this.

    function testUA(){

    var bCompatible = false;

    var sUA = window.navigator.userAgent;

    var iIOGecko = sUA.indexOf(Gecko);

    //Test if the User-Agent string contains

    //the string Gecko

    Appendix

    if (iIOGecko >= 0){

    //extract the string directly after rv:

    //and check valuevar iIOrv = sUA.indexOf(rv:);

    var sRv = sUA.substr(iIOrv + 3, 3);

    if (sRv >= 1.5)

    bCompatible = true;

    }

    //now if compatible redirect

    if (bCompatible)

    window.location.href = bxmlIndex.html;

    }

    This unction is relatively straightorward but certain parts

    may need explaining. Firstly, both Netscape and Fireox

    browsers use the same Gecko core as Mozilla does. They

    also have similar User-Agent strings. Thereore, the unction

    above rstly searches or a Gecko sub-string, which all o

    their User-Agent string will contain. Once this sub-string has

    been ound, the unction searches or the rv: sub-string. This

    is short or revision and it is ollowed by the version number

    o the Gecko engine. I this number is 1. or higher, then the

    Gecko engine is BXML compatible. Thereore, this relatively

    simple unction is able to test or all compatible Netscape,

    Fireox and Mozilla browsers.

    Obviously, it is also necessary to test or compatible versions

    o Internet Explorer too. This can be done in a similar way,but there is one added complication. All compatible versions

    o Internet Explorer have a User-Agent string that contains

    the sub-string: MSIE, which is directly ollowed by the ver-

    sion number. Below is an example o such a header rom an

    Internet Explorer browser.

    User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;

    Windows NT 5.1; SV1; .NET CLR 1.1.4322)

    However unortunately Opera browsers have a very similar

    User-Agent string:

    User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;

    Windows NT 5.1; en) Opera 8.00

    Thereore you must rstly test that the User-Agent string

    doesnt contain the Opera sub-string and once this has been

    ascertained, then simply parse out the version number which

    ollows the MSIE sub-string.

  • 8/15/2019 DesigningRIAsForSearchEngineAccessibility

    8/8

    Designing Rich Internet Applications For Search Engine AccessibilityPage 8 o 8

    Detecting Deep Linking and Updating the

    Pages StateThis section looks at how redirection based on deep linking

    can be detected and then at how the state o a page can

    then be updated using this inormation. Deep linking can be

    detected on the server by reading the ReferrerHTTP request

    header using a server-side script. Once the reerrer has been

    read then an appropriate construct event handler must be

    created, which updates the initial state. Alternatively, i you

    do not have access to server-side scripting, you can use

    a JavaScript unction to do this. The js action is a special

    BXML action, which is used to call JavaScript unctions. The

    ollowing behavior takes care o calling this unction when

    the page is loaded:

    ... Other event handlers go here ...

    The updateState unction, which this action calls, then needs

    to parse out the reerrer. Once this value has been ound

    the JavaScript unction triggers an appropriate BXML event,

    hereby passing control back to the BPC. This is done by calling

    the execute method o the bpcobject, with a BXML string. A

    simple version o such a unction looks like this:

    function updateState(){

    //rst parse out the value of the referrer

    //var sReferrer = document.referrer;

    //do quick test to make sure that referrer

    //is from the same host

    if(sReferrer.indexOf(

    window.location.hostname) >= 0){

    var iLastSlash =sReferrer.lastIndexOf(/);

    var sValue =

    sReferrer.substr(iLastSlash + 1);

    //trigger an event with the same name as

    //the referrer

    var sExecute = ;

    bpc.execute(sExecute);

    }

    }

    You should note that this is a very simplistic implementation osuch a reerrer parsing unction. For a more complicated web

    site structure, it is important that it is totally unambiguous

    which page the reerrer was, otherwise mistakes can be

    made. For such cases, more complicated JavaScript will be

    required to veriy this.

    Now nally lets look at an example o the type o event

    handler that could be triggered by such an updateState

    unction:

    ... Other event handlers go here ...

    This behavior contains an event handler or the custom event

    tab3.html, which is triggered by the JavaScript unction when

    redirection has occurred rom the tab3.html page. All it does

    is perorm a selectaction on a target with an id o tab. I this

    corresponds to the appropriate tab, then simply by selecting

    this tab, the tab should be loaded and become visible.