bloorreport philiphoward sybaseiq 15.4 ar

Upload: manjuhnju

Post on 14-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    1/17

    Sybase IQ 15.4

    An InDetail Paper by Bloor ResearchAuthor : Philip HowardPublish date : February 2012

    InDe

    tail

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    2/17

    Sybase IQ is a columnar databasethat was designed to support datawarehousing from its inception.This approach has now beencopied many times and hasbeen vindicated in organisationsworldwide

    Philip Howard

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    3/171 2012 Bloor ResearchA Bloor InDetail Paper

    Sybase IQ 15.4

    Executive summary

    Fast acts

    Sybase IQ is a column-based relational data-base that has been designed specifcally oranalytics and business intelligence applica-tions. With this release the product has movedsquarely into the big data market. However,it is clear that this is no recent decision andit is worth highlighting the developments thatSybase IQ has gone through in the last severalreleases that has enabled it to reach this point.

    In March 2009 Sybase IQ 15.0 was released.

    The main architectural eature o this versionwas that, or the frst time, the product couldsupport multiple write nodes as well as mul-tiple read nodes. This release has ormed thebedrock upon which subsequent releases havebeen built.

    In July 2009 version 15.1 appeared. It includedsupport or in-database analytics or the frsttime.

    In June 2010 release 15.2 introduced text ana-lytics capabilities.

    In June 2011 version 15.3 brought the PlexQarchitecture. This is a shared everything MPP(massively parallel processing) architecture.Previously, you could parallelise a query with-in a node and you could have queries runningin parallel on dierent nodes but now you canalso have a single query distributed acrossmultiple nodes. This architecture also sup-ports the concept o logical servers (virtualdata marts).

    The current release, introduced in November2011, sees in-database support or MapRe-

    duce and R as well as ederated and othersupport or environments where Hadoop is tobe run alongside Sybase IQ.

    The key point to note is that the current release,version 15.4, would not have been easiblewithout the various eatures introduced in pre-vious releases. This suggests that Sybase wasthinking about big data, and planning how itwas going to support it, long beore it becamethe ashion that it is today.

    The benefts associated with Sybase IQ arepredicated upon its column-based approach,

    its scalable grid architecture, and the perorm-ance benefts that it can oer, while requiringewer hardware resources. This is especial-ly true where queries are complex or require

    large table scans and, in the latter case, thishas the knock-on advantage that you do nothave to pre-aggregate data, which representsboth a perormance and a management sav-ing when compared to traditional approaches todata warehousing. The reduced size o SybaseIQ data warehouses (along with other eatures othe product) also means that Sybase IQ has thepotential to oer signifcant perormance advan-tages when scaling or large numbers o users.

    In other words, Sybase aims to provide betterperormance with a lower total cost o owner-

    ship. Moreover, apart rom the act that data isstored by column, in all other respects SybaseIQ acts exactly like a conventional relation-al database. For instance, you use standardSQL, hardware and operating systems: data-base schemas are (or may be) the same, asare applications; and training requirementsare similar.

    Key fndings

    In the opinion o Bloor Research, the ollowingrepresent the key acts o which prospectiveusers should be aware:

    In addition to its column-based storage, Sy-base IQ delivers a number o specialisedindexes in order to urther accelerate adhoc query perormance. These include in-dexes or low cardinality data, grouped data,range data, joined columns, real-time com-parisons or Web applications, date, andtime analysis. In addition, there are textualanalysis indexes (providing analytics on un-structured data that may be combined withstructured analysis).

    Sybase IQ provides a highly available solu-tion. In particular, separate read and writenodes allow or procedures to be executedin parallel, without aecting one another.Read nodes are particularly useul or dataaggregators oering multi-client analyticsservices because a node can be assigned toan individual account or later chargeback.

    The PlexQ architecture is augmented by theSybase IQ optimiser, which will automatical-ly recognise when a query will beneft rombeing distributed and to which nodes thequery should be distributed.

    While it supports both normalised and starschema architectures, Sybase IQ also sup-ports Rcube at schemas that can provide

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    4/172 2012 Bloor Research A Bloor InDetail Paper

    Sybase IQ 15.4

    Executive summary

    major benefts when compared to conven-tional star schemas. In particular, Rcubescan signifcantly speed up implementation,as well as improve run-time perormanceand provide increased exibility. In addi-tion, Sybase IQ allows on-the-y changesto schema attributes (columns); that is, youcan add/delete columns in a table while theSybase IQ server is up and running.

    Sybase provides column-based encryptioncapabilities as well as database-level en-cryption. This is particularly important or

    data aggregators with multi-client serv-ices where you want to be able to encryptdierent customers data using dierent al-gorithms. Encryption is supported or bothdata-at-rest and data-in-ight.

    With this release Sybase IQ supports in-da-tabase MapReduce, in-database R (an opensource statistical programming language)processing and PMML (predictive modellingmark-up language), which is the industrystandard or portable data mining models,so that you can score in-database.

    In-database analytics, which is supportedby Sybase IQ, provides much better per-ormance than traditional approaches toanalytics. It is supported by means o user-defned unctions that take ull advantage oSybase IQs optimiser and parallel capabili-ties. The company has partnered with FuzzyLogix, which provides a library with hundredso analytic unctions (especially fnancialunctions) that exploit these capabilities.This library has now been implemented us-ing Sybase IQs native MapReduce API.

    Sybase IQ provides standard ODBC/JDBC/OLE-DB connections to its query engine,thereby enabling access rom any stand-ards-based ront-end BI tool. Sybase IQ iscertifed to work with most industry lead-ing tools such as SAP BusinessObjects, IBMCognos, MicroStrategy, QlikView, iDash-boards, SAS, SPSS, KXEN and others. Aspart o SAP, Sybase IQ has been optimisedor use with SAP BusinessObjects.

    Sybase IQ oers optimised ETL (and ELT)capabilities as a separate add-on productenabling developers to quickly build and de-

    ploy their data sets or analysis on Sybase IQ.Alternatively, Sybase IQ is also certifed to

    work with leading third party ETL tools and,again, there are specifc optimisations builtor ETL unctionality inside SAP BusinessOb-jects Data Services. Sybase IQ also supportsloading data directly rom a client, bypassingthe DBA and removing the need or him orher to intervene in the loading process. Thisis important or environments where, orsecurity or confdentiality reasons, the DBAshould not be able to see the data.

    For situations where operational data thatyou want to query cannot (or compliance or

    other reasons) be loaded into the data ware-housing environment, Sybase IQ supportsquery ederation with data held in SybaseASE, Oracle, MySQL, SQL Anywhere, IBMDB2, and SQL Server environments. Feder-ation (both data and query ederation) mayalso be useul when using Sybase IQ in con-junction with Hadoop and our dierent wayso integrating the two are supported.

    Sybase IQ does not (yet) support columnarstorage or spatial data. The ederation withSQL Anywhere is thus particularly impor-tant because it enables geo-spatial analytics

    within Sybase IQ, leveraging spatial indexeswithin SQL Anywhere.

    Sybase IQ can be loaded on a continuous(near) real-time basis using an inrastruc-ture comprised o Sybase Replication Server(which has real-time loading capabilities),and a set o scripts generated by SybasePowerDesigner. Sybase IQ can also be load-ed on a real-time basis with event data viaSybase ESP (Event Streaming Platorm).Simultaneous loading and querying is pro-vided via Sybase IQs versioning capability: a

    new version is created or the load processwhile the queries run on the older versionuntil the new load is committed.

    Inormation liecycle management, whichsupports the archival o data rom ront-line to near-line to historic storage, wasintroduced with Sybase 1Q 15. The abil-ity to ormally build data retention rules issupported by means o PowerDesigner, aleading modelling and metadata manage-ment tool, and WorkSpace Data Analytics,which is an Eclipse-based development en-vironment that supports database, reporting

    and analytic development in conjunctionwith Sybase IQ

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    5/173 2012 Bloor ResearchA Bloor InDetail Paper

    Sybase IQ 15.4

    Executive summary

    The bottom line

    It is difcult to spot weaknesses in Sybase IQ, either rom a technicalperspective or rom the point o view o marketing. Certainly, we wouldlike to see spatial capabilities built into the database but the compa-ny is already addressing this issue. Other than that we are impressedwith Sybases support or big data, either in conjunction with Hadoop ordirectly, and we are pleased to see the implementation o both MapRe-duce and R within the database, as well as support or PMML. We expectthe company to continue to grow its user base at a signifcant rate.

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    6/174 2012 Bloor Research A Bloor InDetail Paper

    Sybase IQ 15.4

    The product

    The current version number o Sybase IQ isversion 15.4. It is now ANSI SQL 2008 com-pliant. With this release the company hasintroduced an Express Edition, which is reeor developers. Supporting products, includ-ing Sybase IQ InoPrimer (ormerly SybaseETL), Sybase PowerDesigner, and SybaseReplication Server all leverage the eaturesavailable within Sybase IQ. In the latest re-lease o PowerDesigner there is a new eaturethat allows you to eed it with a warehousingworkload and the sotware will automaticallygenerate a bill o materials and appropriate

    confgurations or Sybase IQ. This will be use-ul, not just in live environments but also orproo o concepts.

    In the case o Sybase IQ InoPrimer, this is spe-cifcally optimised to be used in conjunctionwith Sybase IQ and only supports Sybase IQ asa target. For companies not using InoPrim-er, the company has introduced a specialisedODBC/JDBC bulk insert capability or large da-tasets. While still not as ast as using a nativeAPI this oers signifcantly improved perorm-ance compared to previous releases.

    The product runs under Windows, Linux (RedHat and SuSE) and the leading UNIX operatingsystems rom HP, IBM and Oracle. Languagesupport or analytic developers includes Perl,Python, PHP, ADO.Net, OLE-DB, and Ruby onRails, amongst others.

    Logical architecture

    Sybase IQ has been designed specifcallyor data warehousing. That is to say, it is notoptimised or transaction processing and,thereore, it does not include the sort o acili-

    ties you would need or transaction processing,as opposed to data warehousing. This is impor-tant because the leading merchant databasesoer both sets o capabilities, so Sybase IQ hasa smaller ootprint and is less complex thanthese oerings.

    Sybase IQ also diers rom merchant data-bases in that it is a column-based relationaldatabase rather than a row-based relationaldatabase. The latter is required or trans-action processing where individual records(rows) are constantly being inserted into thedatabase and updated. Conversely, column-

    based databases have signifcant advantageswhen it comes to query processing becauseeach column is, eectively, an index but with-out any o the overheads associated with

    defning and storing those indexes. Moreover,every column is indexed in this sense, some-thing that would never normally be possiblewhen using a row-based approach. That said,Sybase also supports a number o index typesthat you can optionally implement: these arediscussed later.

    Another major advantage o a column-basedapproach is simply the amount o data thatneeds to be read or each query. Wheneveryou access data or a query rom a convention-al database, you read each row in its entirety,

    regardless o the actual felds that you are in-terested in or that specifc query. In practice,this might mean reading a 3000 byte record toretrieve just 20 characters o data but by read-ing data on a columnar basis, you only have toread what is specifcally needed or the queryat hand. O course the dierence in perorm-ance when you are reading a single record willbe negligible, but many queries require ull ta-ble scans. Multiply that single read by a ewmillion rows per table and the perormancedierence is very signifcant.

    A urther consequence o using a column-

    based approach is that you typically do notemploy conventional horizontal partition-ing, which is predicated upon a row-basedapproach. Instead, Sybase IQ implementsvertical partitioning: partitioning by columnrather than by row. One o the advantages othis approach is that partitions can never be-come unbalanced, since there will always bethe same number o felds in each column oa table. This signifcantly reduces the mainte-nance requirement o managing partitions andshould eliminate the database re-organisationthat may become necessary when convention-

    al partitions become unbalanced and start toimpair perormance.

    In addition, columns are easy to compress be-cause you can have dierent algorithms ordierent datatypes. As a result, Sybase hashad an historic advantage over the merchantdatabases in this area. This has been whittledaway now, though Sybase IQ still has advan-tages. In particular, Sybase has enhanced itscompression capabilities in this release by in-troducing the ability to remove spaces withinfelds (say, a 50 character string in a 200 byteaddress feld) rather than just operating at the

    page level. Sybase claims that a Sybase IQ datawarehouse will never exceed the size o theraw data: this is by no means the case with itsmerchant rivals.

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    7/175 2012 Bloor ResearchA Bloor InDetail Paper

    Sybase IQ 15.4

    The product

    Physical architecture

    The architecture o Sybase IQ is illustrated inFigure 1. It is a shared everything massivelyparallel architecture, with every node connect-ed to every other node in a ull mesh providedby the interconnect. This reduces I/O issues andimproves resilience. The only exception to theshared everything approach is that each nodecan have its own, local, temporary storage. Thebig advantage o oering shared everything,and shared disks in particular, is that you donot have to distribute your data across the vari-

    ous disks in use, thereby removing what can bea signifcant administrative headache.

    Each node in the Sybase IQ environment isdesignated as either a read/write node or aread only node. In the case o the ormer, eachnode can be exibly designated as a read or awrite node, as required. Thus, i you are run-ning a large overnight batch update you mightwant all o your read/write nodes to operateas write nodes, but have them running as readnodes during the day. In addition, you can addnew nodes as needed, dynamically, so that youcan scale up incrementally.

    Nodes (servers) can be linked into a logicalserver, as shown. In addition, one logical serv-er can loan nodes to other logical serverson a scheduled basis, or example to supportovernight batch loading.

    This approach to logical servers supportsmixed query workloads because you can as-sign particular queries to a logical server andthat query can then only use the resourcesavailable to that logical server. How many log-ical servers, and the number o nodes withineach group, is entirely up to you. A graphical

    administration tool (see later) is provided tosupport the creation o these logical group-ings, add or remove nodes, designate readonly or read/write nodes and so on.

    Each logical server can have its own logins sothat specifc users, departments or queriescan always be directed to a designated logi-cal server.

    When a query is received, the receiving node isdesignated as the leader node and, i the op-timiser (see below) decides that this query canbeneft rom distribution, then the other nodes

    Figure 1: Sybase IQ PlexQ Architecture

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    8/176 2012 Bloor Research A Bloor InDetail Paper

    Sybase IQ 15.4

    The product

    to be involved in processing this query are des-ignated as worker nodes. Any node can be aleader or worker but only one node can be aleader or any particular query.

    There are a number o unctions specifcallyto support high speed loading. The frst is thatthe product supports pipeline parallelism sothat where indexes have separate data struc-tures (which apply to Word and High Groupindexes: see later) these can be updated at thesame time as data is being loaded.

    It was historically the case that, in many envi-ronments, the data was loaded to the serverand then the database administrator pushedthe data to the warehouse. However, thismeans that the DBA can see the data, whichis not acceptable in many environments (orexample, i the warehouse is outsourced), andmany clients want to load the data directly. Inorder to support this, Sybase supports a loadrom client option that supports the loading oboth data and LOBs (large Objects) via DBLib.

    Another major eature is support or inorma-tion liecycle management (ILM). There are

    specifc eatures to support dierent orms ostorage or archival purposes: adding near-line and historical storage capabilities to activedata. These can be designated as read-onlystores or compliance purposes, i required,and, similarly, you can apply dierent securitypolicies to each store. There are specifc acili-ties provided to support time-based retentionperiods with data frst being marked as read-only and then dropped.

    Another point to note about this architectureis the advantage that it oers to data aggre-

    gators and resellers (which represent a targetmarket or Sybase), because it means thateach subscriber can have its own read andread/write nodes, separate rom anyone else,which obviously has benefcial security as wellas chargeback implications; and it also allowsyou to defne dierent service levels or di-erent users. Further, Sybase IQ allows youto encrypt data on a column-by-column ba-sis, which urther reinorces this message. Inact, the product supports three levels o en-cryption: RSA and strong encryption (ECC) ordata in ight plus RSA and RSA with FIPS 140-2 strong encryption or data at rest.

    Should any node ail, you can switch users orresponsibilities to another node. Hot standby,ailover and load-balancing are possible across

    nodes. These unctions are not automated butare under the DBAs control, which allows theDBA to dynamically allocate resources basedupon business needs. In addition, there is anOpenSwitch load balancing application availa-ble, i required, that operates at the applicationserver level. Sybase IQ InoPrimer also hasload-balancing capabilities or Sybase IQdata loading tasks. It is urther worth notingthe companys partnerships with a number ostorage hardware vendors to urther ensurehigh availability and disaster recovery. Thereis support or range partitioning and you can

    partition, re-partition, join, rename, split anddrop table partitions as required.

    There is also a NonStopIQ HA-DR methodol-ogy, which typically employs a local SAN and aremote one, with either synchronous or asyn-chronous communications between them. Thebig advantage o this is not just that it providesdisaster recovery but also that it eliminatesthe need to take the system down, even orplanned outages. Note that as more and morecompanies adopt operational BI and embedquery capability into operational applicationsthen the warehouse increasingly becomes

    as mission-critical as those applications, orwhich you need a solution such as this.

    There is also support or query ederation.This is intended or environments where op-erational data cannot be moved or copied romtheir source systems or compliance or otherreasons but which you may want to include inqueries or reports. In this sort o environment,the amount o data to be sourced rom theoperational system is typically small so nearreal-time support can be achieved. The queryederation technology supports Sybase ASE,

    Oracle and Microsot SQL Server.

    One element o the physical architecture thatis not illustrated above is that the product nowhas a built-in web server to enable participa-tion in web services.

    Query perormance

    The Sybase IQ optimiser knows about and willleverage the MPP-based capabilities o theproduct. With a massively parallel architectureyou have the ability to distribute queries acrossnodes (within a logical server i you have one).

    So the frst thing that the optimiser does is todetermine whether the query will beneft romthis sort o parallelism. Not all queries do. Forexample, i the query is I/O bound then extra

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    9/177 2012 Bloor ResearchA Bloor InDetail Paper

    Sybase IQ 15.4

    The product

    processing capacity may make an insignifcantimpact on query perormance. Similarly, theoverhead involved in distributing a query maybe deleterious or, say, a short running query.As another example, i a query cannot makeuse o all the resources available on singleserver then, again, it will probably not makesense to distribute the query. The optimisermay decide that no part o a particular queryshould be distributed, that the whole querycould useully be distributed or that a part orparts (ragments) can be distributed.

    Within each node that is executing a query,threads are allocated dynamically with threadsadded or removed as the query executes.Threads are scaled up or down according toworkload and resource availability. As notedpreviously, physical servers can be dynamical-ly allocated to a logical grouping.

    Other notable eatures include the ability todetermine how much parallelism to apply towhich tasks and sub-query correlations, op-timised use o temp space, operations thatcan run directly o compressed data with-out requiring decompression, and concurrent

    workload management.

    Indexes

    Although every column is, in eect, its own in-dex, there are substantial advantages to usingspecifc indexes in a number o situations. Thisis one area where Sybase has a major advan-tage over appliance vendors. Indeed, one o thestrengths o Sybase IQ is its indexing capabili-ties. As Sybase customers discover new needsor analysis, Sybase can simply create new in-dex types to meet those needs. The beauty o

    this approach is that new indexes can be addedto the data warehouse with little, i any, impacton the data warehouse architecture or theanalytical applications using the warehouse.Sybase IQ oers a number o dierent index-ing techniques:

    Low Fast indexes: these are low cardinalityindexes (typically used or felds that haveless than 1,500 unique values) that use aprocess known as tokenisation. Using thisprocess, non-integer data is converted intoa token (an integer; an existing integer be-comes its own token) and then the tokens

    are stored rather than the data. This isparticularly useul or reducing the quan-tity o redundant data and saving on diskspace. Once the tokens are established (an

    automated process), a bitmapped index iscreated to reerence these tokens.

    Bit-Wise indexes: or high cardinality felds,where the number o possible values ex-ceeds 1,500 (or example, monetary values)Sybase IQ uses a patented technology knownas Bit-Wise indexing. This is particularlyuseul where you want to combine calcula-tions with range searches, or example tofnd the total revenue and number o unitssold where the price was less than 50.

    High Group indexes: these are, in act, B-trees. However, the principle here is that theuser only defnes these indexes when sever-al columns are likely to be used in a group, inparticular to combine low and high-cardinal-ity searches. An example here might be aninquiry about product item sales and value(high cardinality) by store (low cardinality).High Group indices are multi-threaded.

    Fast Projection indexes: the deault indexis simply the column store itsel. I a useralways plans to retrieve an entire column odata, then the act that storage is columnar

    means the column can be projected into areport or inquiry without having to explicitlydefne any index at all. This is useul, or ex-ample, in WHERE clauses.

    Text indexes: these support ull text search.The text index stores complete positional in-ormation or every instance o every term inthe indexed columns. Some o the unctionsthat are possible with the text index are dis-cussed in the next section.

    Compare indexes: this indexing technique

    allows data column comparisons that areeectively equivalent to an i then elsestatement. For example, i expenses aregreater than revenue, then . This typeo index is particularly useul or real-timecomparisons in web applications.

    Join indexes: as the name implies, theseare designed to obviate the need or tablejoins. Like a number o the supported in-dexes, these will be most useul when queryrequirements can be predicted in advance.The product supports parallelism or bothcolumn scans and joins.

    Time Analytic indexes: these oer the op-tion to create indexes based on a date, time,or date and time. It should be noted that

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    10/178 2012 Bloor Research A Bloor InDetail Paper

    Sybase IQ 15.4

    The product

    time-based queries tend to be particularlydifcult or conventional relational databasesto handle. They are also increasingly impor-tant or certain sorts o applications such asthose involving smart meters, or example.

    A number o extended acilities are supportedto allow the use o these indexes in a variety ocircumstances. These include index compres-sion to reduce disk (or memory: bitmaps maybe cached) requirements, the ability to use di-erent types o index in combination, pipelineparallelism or GROUP BY and ORDER BY as

    well as hash and merge joins, and the acili-ty to flter bit arrays using Boolean operators.These eatures mean that the indexing in Syb-ase IQ overcomes a number o the traditionaldrawbacks o bitmapping, namely, that it is notsuitable or joining tables or aggregating data.While on this topic it is also worth noting thatwhile Sybase is ast enough that you do notneed to pre-aggregate data or OLAP-basedprocessing (which is a signifcant advantagein administrative terms) Sybase IQ does sup-port OLAP capabilities with eatures such asrankings, partition windows, percentiles andaveraging. It is also noteworthy that Sybase IQ

    includes an Index Advisor that will advise ad-ministrators as to when it would be useul toadd a new index and o what type.

    Other eatures provided to improve perormanceinclude predicate pushdown and sub-query op-timisations (correlated sub-queries, sub-querydisjunction and automatic query attening).

    In-database analytics

    In conventional environments, data min-ing unctions have to be perormed outside

    the database: the data is extracted rom thewarehouse and then processed by the rele-vant sotware in a conventional manner. Theproblems with this approach are twoold. First,perormance is degraded because there is anextraction process and because the applicationserver will not have the same sort o parallelcapabilities as provided in the analytics serv-er. In order to oset this perormance loss,analysts perorming data mining will typical-ly only sample the data. While this can osetperormance consequences to some degree, itmeans that accuracy is sacrifced. In particu-lar, population outliers can easily be missed.

    In-database analytics resolves both o theseproblems: because mining algorithms are run in-side the database you are not limited to sampling

    the data, thereby maximising accuracy and, orthe same reason, you can take advantage oparallelism and other characteristics o the an-alytics server in order to optimise perormance.

    Sybase has implemented in-database analyt-ics by means o user-defned unctions (UDFs)that are executed as SQL unctions. When thisacility was frst introduced the company onlymade it available to specifc partners such asFuzzy Logix. This was because Sybase wasconcerned about the development o UDFsthat might adversely impact on implementa-

    tions. As a saeguard against this happening,with this release it has introduced a UDF sim-ulator that allows you to test UDFs withoutactually impacting on the live warehousingenvironment.

    Fuzzy Logix provides DB Lytix or Sybase IQ,which is available (on Linux, Solaris, Windowsand AIX), with support or algorithms includingneural networks, k-means clustering, MonteCarlo simulation, linear and logistic regres-sions, and so on. Details can be ound on theSybase website at http://www.sybase.com/detail?id=1065214. In conjunction with the lat-

    est release, Fuzzy Logix has re-written theunctions it provides to make use o the in-da-tabase deployment o MapReduce (see next)that Sybase now oers.

    In the case o R, the open source statisticallanguage, you can either run an R job that ac-cesses Sybase data or you can run a SQL querythat fres o an R process. In the latter casethe user does not have to know anything aboutthe R unctionality that is being used.

    Finally, Sybase IQ now supports in-database

    scoring o data mining models, regardlesso whether these have been developed with-in the database or elsewhere. This is achievedthrough support or PMML (predictive model-ling mark-up language), which is the industrystandard or model portability. The companyhas not yet implemented PMML support in Syb-ase ESP, which we would like to see: it wouldenable such things as real-time raud detection.

    Going beyond structured analytics, Sybase sup-ports signifcant text analytics capabilities,which also operate in-database. These includeunctions that include proximity search, support

    or Boolean operators, searching or phrasesas well as terms, and scoring both within andacross documents. There are also plug-in APIsor multi-media analysis unctions and libraries.

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    11/179 2012 Bloor ResearchA Bloor InDetail Paper

    Sybase IQ 15.4

    The product

    Figure 2: Methods or integrating Sybase with Hadoop

    Big Data

    With in-database support or both data and text analytics, as well as theavailability o Time Analytic indexes (useul or applications involvingsmart meters, or example), Sybase IQ was already positioned, prior tothis release, to exploit big data analytics once relevant additional ea-tures were added. With the introduction o in-database native MapReduce(which includes the provision o pre-certifed data mining unctions) youcan now run relevant analytics against big data directly within a logicalserver. This operates in one o two ways: either as in an in-process UDFrunning in the database kernel or as an out-o-process unction (outsidethe kernel but still in-database) with its own memory space. The lattercan support MapReduce tasks that have been written externally (in Ha-

    doop, or example). These do not typically oer ault tolerant capabilitieswhen run within a Hadoop environment and by bringing them inside theSybase IQ database you get that as an automatic beneft.

    A major eature o the native MapReduce implementation is that (or in-process UDFs) it eectively makes MapReduce declarative rather thanprocedural. That is, you speciy what has to be done but not how you haveto do it. This is achieved because you can embed MapReduce code withina SQL statement. Moreover, you can speciy what partitions you would liketo work with and then the sotware will automatically assign processingaccording to the scale o the task (based on the disjoint sets being used).

    However, it will oten be the case that you wantto deploy Hadoop or your haystacks and use itin conjunction with Sybase IQ or your needles.This is because Hadoop is much less cost-ly. However, it is not suitable (or at least notyet) where compliance or security are issues,or where exceptional perormance is requiredor i you need to perorm ad hoc queries. Sothere will oten be a requirement to use Ha-doop in conjunction with your data warehouse,or example, where you want inormation romHadoop to participate in ad hoc queries, eventhough those are not supported by Hadoop per

    se. In order to enable this, Sybase oers ourdierent ways o integrating with Hadoop, asillustrated in Figure 2.

    The frst method illustrated shows resultsrom Sybase IQ and Hadoop being collated bythe Toad product rom Quest Sotware (a part-ner o Sybase).

    The second method involves straightorwardETL processes, moving HDFS subsets (Sybasedoes not currently support other stores suchas GPFS) into Sybase IQ or analysis. Sqoop isshown here but you could use the data integra-

    tion tool o your choice, though it may not havebeen optimised or use with Sybase IQ.

    Thirdly, you can ederate data rom Hadoop intoSybase IQ. Here, HDFS subsets are material-ised in Sybase IQ as in-memory virtual tables.This process can be triggered on the y as apart o a query.

    Finally, you can ederate Hadoop process-es (jobs) into Sybase IQ whereby you triggerMapReduce tasks in both Sybase IQ and Ha-doop and then join the results on the y.

    Database operations

    Sybase IQ includes a SQL API that allows SQL-based access. This is SQL-2008 compliant andis the same SQL that is used in Sybase AdaptiveServer Anywhere and (with a ew exceptions)is also compatible with the syntax employed inSybase ASE (that is, T-SQL) so that Sybase IQcan natively use most Sybase ASE stored pro-cedures. In this context it is also worth notingthat both Sybase IQ and Sybase ASE have thesame look and eel. Within the product, SybaseIQ includes a graphical SQL Editor.

    Sybase IQ supports both ODBC and JDBC (2.0)call-level interaces. Alternatively, Sybase IQalso provides Java 2 capability, and this language

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    12/1710 2012 Bloor Research A Bloor InDetail Paper

    Sybase IQ 15.4

    The product

    can be used or writing stored procedures andor creating user-defned unctions. However,Java objects are not supported in the database.

    There is also support or XML, including theability to store and retrieve XML documentsas well as the ability to export query resultsin XML ormat (with an embedded DTD). In ad-dition, it is important to appreciate the webservices unctionality that is available in Sy-base IQ. There is an HTTP(S) web server builtdirectly into the database, which supports theretrieval o data in XML ormat as well as via

    standards such as SOAP. There is also directintegration with Microsot Visual Studio .NETvia an ADO.NET provider, as well as drivers ormajor Web languages such as PHP, Python,and Ruby.

    Database administration

    Sybase IQ supports conventional relationalschemas, including the normalised schemasused or transaction processing as well as thestar, snowake, and constellation (a collec-tion o stars) schemas that are used in datawarehousing. In addition, the product also

    supports at schemas (known as Rcubes) thathave a number o advantages such as ewertables (thereore reducing the number o joinsand thereby improving perormance), reducedcomplexity and greater manageability.

    As ar as the actual process o administrationis concerned, this is provided through Sybase

    Figure 3: Sybase Central

    Central, illustrated in Figure 3, here showingthe topology o the environment.

    Sybase Central supports one-click clustermanagement; in-ight maintenance opera-tions (including adding columns on the y);and graphical displays or CPU, thread utilisa-tion and timing, in order to support problemresolution. This is also where you defne andmanage both resource management and se-curity. A web-enabled companion to SybaseCentral is the Sybase Control Center or Sy-base IQ that helps monitor the Sybase IQ

    inrastructure remotely using web interaces.An extensive library o system metrics, alongwith historical metrics records, can be dis-played in a rich graphical ormat or ease ouse. This is to be extended to incorporate theunctionality o Sybase Central but or the timebeing both products remain available.

    In terms o security the product providessupport or Kerberos authentication, whichenables common user IDs and passwords(user settable) to be used across the data-base and operating system environments.LDAP is not directly supported at this time.

    Further, role-based security not only appliesto users but also administrators, so that youcan have dierent roles or system moni-toring, login and permission management,backup and restore administration, and mul-tiplex grid management. This segregation oduties is important where you have multi-do-main data warehouses.

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    13/1711 2012 Bloor ResearchA Bloor InDetail Paper

    Sybase IQ 15.4

    The vendor

    Sybase IQ is based on technology that Sybase (which is an SAP compa-ny) acquired when it purchased Expressway in 1995. It has always hadan emphasis on scaling incrementally all the way up to very large datawarehouses (VLDW). In this context it is worth noting Sybases 2007 im-plementation (using Sybase IQ 12.x) o a reerence one petabyte (1,000TB) VLDW in conjunction with Sun (now Oracle) and BMMsot, which wasthe frst independently audited, petabyte scale warehouse.

    There are now more than 4,500 Sybase IQ installations worldwidein more than 2,000 organisations. It is also particularly pertinent tonote that a signifcant percentage o the products sales have beento organisations that do not use Sybase ASE (the companys agshiptransactional database). While this has always been the case we would

    expect to see an acceleration in this trend now that Sybase is an SAPcompany. Indeed, while Sybase previously targeted IQ at very specifcmarkets, the presence o SAP, combined with Business Objects and theproducts own new eatures, means that Sybase IQ is now much morebroadly targeted rom a unctional perspective: on marketing (ocusingon digital channels), sales (ocusing on correlation across media types),operational (ocusing on machine data) and fnancial (ocusing on sim-ulation) analytics. Target sectors include banking, insurance, capitalmarkets, telecommunications, retail, healthcare and, to a lesser extent,government.

    More generally, and with respect to SAPs other oerings, Sybase IQis targeted at environments where users have heterogeneous or non-SAP application environments. Where SAP is the dominant provider o

    applications these accounts will be targeted by SAP BW and SAPs in-memory technology product or data management: HANA.

    Sybase has entered into a number o partnerships, ocused on SybaseIQ, with vendors that include specialists in hardware, storage, dataquality, business intelligence and other areas, as well as various VARsand system integrators. Business Objects, as a ellow subsidiary o SAP,is a close partner.

    Sybase IQ Web address: www.sybase.com/bi.

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    14/1712 2012 Bloor Research A Bloor InDetail Paper

    Sybase IQ 15.4

    Summary

    Sybase IQ is a columnar database that was designed to support datawarehousing rom its inception. This approach has now been copiedmany times and has been vindicated in organisations worldwide: itsbenefts have been well rehearsed and do not need repeating here. How-ever, it is worth bearing in mind that Sybase IQ is, and always has been,the leading column-based vendor in the market. It now has the ben-eft o the reach and scale o its parent SAP as well as, in this release,an extensive big data capability. This leaves Sybase well positioned tocontinue to capture market share.

    Further Inormation

    Further inormation about this subject is available rom

    http://www.BloorResearch.com/update/2119

    http://www.bloorresearch.com/update/2119http://www.bloorresearch.com/update/2119
  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    15/17

    Bloor Research overview

    Bloor Research is one o Europes leading ITresearch, analysis and consultancy organisa-tions. We explain how to bring greater Agil-ity to corporate IT systems through the eec-tive governance, management and leverageo Inormation. We have built a reputation ortelling the right story with independent, in-telligent, well-articulated communicationscontent and publications on all aspects o theICT industry. We believe the objective o tellingthe right story is to:

    Describe the technology in context to its

    business value and the other systems andprocesses it interacts with.

    Understand how new and innovative tech-nologies ft in with existing ICT invest-ments.

    Look at the whole market and explain allthe solutions available and how they can bemore eectively evaluated.

    Filter noise and make it easier to fnd theadditional inormation or news that sup-ports both investment and implementation.

    Ensure all our content is available throughthe most appropriate channel.

    Founded in 1989, we have spent over two dec-ades distributing research and analysis to ITuser and vendor organisations throughoutthe world via online subscriptions, tailoredresearch services, events and consultancyprojects. We are committed to turning ourknowledge into business value or you.

    About the author

    Philip HowardResearch Director - Data Management

    Philip started in the computer industry way backin 1973 and has variously worked as a systemsanalyst, programmer and salesperson, as wellas in marketing and product management, ora variety o companies including GEC Marconi,GPT, Philips Data Systems, Raytheon and NCR.

    Ater a quarter o a century o not being his own boss Philip set up hisown company in 1992 and his frst client was Bloor Research (thenButlerBloor), with Philip working or the company as an associate ana-lyst. His relationship with Bloor Research has continued since that time

    and he is now Research Director ocused on Data Management.

    Data management reers to the management, movement, governanceand storage o data and involves diverse technologies that include (butare not limited to) databases and data warehousing, data integration(including ETL, data migration and data ederation), data quality, masterdata management, metadata management and log and event manage-ment. Philip also tracks spreadsheet management and complex eventprocessing.

    In addition to the numerous reports Philip has written on behal o Bloor Re-search, Philip also contributes regularly to IT-Director.com and IT-Analysis.com and was previously editor o both Application Development Newsand Operating System News on behal o Cambridge Market Intelligence

    (CMI). He has also contributed to various magazines and written a numbero reports published by companies such as CMI and The Financial Times.Philip speaks regularly at conerences and other events throughout Europeand North America.

    Away rom work, Philips primary leisure activities are canal boats, ski-ing, playing Bridge (at which he is a Lie Master), dining out and walkingBenji the dog.

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    16/17

  • 7/30/2019 BloorReport PhilipHoward SybaseIQ 15.4 AR

    17/17

    2nd Floor,145157 St John Street

    LONDON,EC1V 4PY, United Kingdom

    Tel: +44 (0)207 043 9750

    Fax: +44 (0)207 043 9748Web: www.BloorResearch.com

    email: [email protected]

    http://www.bloor-research.com/mailto:[email protected]:[email protected]://www.bloor-research.com/