nosql and hadoop: a new generation of databases - changing the game: monthly technology briefs

10
Changing the Game: Monthly Technology Briefs December 2011 the way we see it NoSQL and Hadoop: A New Generation of Databases Read the Capgemini Chief Technology Officers’ Blog at www.capgemini.com/ctoblog

Upload: capgemini

Post on 20-Aug-2015

2.344 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: NoSQL and Hadoop: A New Generation of Databases - Changing the Game: Monthly Technology Briefs

Changing the Game:Monthly Technology Briefs

December 2011

the way we see it

NoSQL and Hadoop: A New Generation of Databases

Read the Capgemini Chief Technology Officers’ Blog at www.capgemini.com/ctoblog

Page 2: NoSQL and Hadoop: A New Generation of Databases - Changing the Game: Monthly Technology Briefs

Public the way we see it

NoSQL and Hadoop: A New Generation of Databases

Oracle has recently joined IBM in offering alternatives to established players and market leading relational databases. Oracle is keen on adding a NoSQL database with the comment: “Oracle NoSQL Database is a key component of Oracle’s big data strategy.” Meanwhile, IBM positions the use of the NoSQL Database Apache Hadoop as the answer to holding data in their SmartCloud approach. Both are aiming for a share in what Gartner claims will be the huge rise in unstructured data from services on Clouds and the web as opposed to structured data from traditional Enterprise Applications. The analyst believes there will be a 650% growth in stored data by 2014 and that 85% of that data will be unstructured. For the same reason, Microsoft, EMC, Google and Amazon are all supporters of the NoSQL movement, and have mostly chosen Apache Hadoop as their preferred NoSQL database.

The industry standard, Relational database (RD), has had more than 40 years of development behind it to enable it to provide an effective mechanism to store data and find it through Standard Query Language (SQL). Today, though it cannot accommodate every data storage requirement, RDs dominate the support of traditional applications from major vendors. However, there are limitations in both scale and performance, particularly in supporting ‘queries’ as opposed to the structured processing of Online Transaction Processing (OLTP). These limitations were visible in Data Warehouses where supporting queries across the whole of the data stored would be very slow, and so Data Marts that divided the data up into subsets that shared similar characteristics were devised.

This is why the RD is not the correct approach for unstructured data that can be only accessed through queries. In addition, its centralized deployment model and challenges on scale are problematic in genuine cloud storage systems as is its need to have a model determined for data before it can be used. That is not to say that images of SQL databases cannot be run on virtual machines of the type that EMC, Microsoft, Amazon and Google provide online. They can be and are, but use is limited in terms of their scalability and long-term suitability. By contrast, NoSQL databases such as Apache Hadoop, CouchDB, and MongoDB are already being offered as cloud services to overcome this as they can run natively on virtualized machines, scale up and down readily and support heavy read/write query-based activity.

The concept of NoSQL (or Not Only SQL) moved into the limelight in 2009 after some years of quiet development in response to the growing need for handing the increasing amount of unstructured data from web-based activities. The name was used to describe a number of new types of opensource, non-relational databases to indicate that SQL was not used to access these databases. At the time, others thought the significance was that they did not possess the standard features of a relational database referred to as Atomicity, Consistency, Isolation, Durability, ACID for short, and therefore should be referred to as Non Relational (NonREL). In time, the term BASE, standing for Basically Available, Soft-state and Eventually consistent, was coined to describe the principle values of NoSQL. As the importance and use of NoSQL databases has grown in connection with cloud-based storage so has the need for a unified and standardized query language for use on NoSQL databases. In 2011, under an Opensource project, work has started on Unstructured Query language (UNSQL) though, ironically, it is seen as a superset of SQL.

1

Page 3: NoSQL and Hadoop: A New Generation of Databases - Changing the Game: Monthly Technology Briefs

the way we see it

Changing The Game: Monthly Technology Briefs (December 2011) 2

Hadoop, or more accurately Apache Hadoop, is a major opensource project to build a full-scale NoSQL database that can answer the challenges over the coming years for unstructured storage and distributed huge scale operations in an online cloud environment. The name Hadoop is taken from the soft toy belonging to one of its key creators Doug Cutting’s son. It has two core inspirations/roots: the first is Google’s MapReduce, which demonstrates huge scalability but is not particularly fast; and Google’s File System. The latter has been such a huge contribution that Yahoo has made to the project as one of the first and largest users of Hadoop.

Hadoop was designed from the start to be a fully distributed architecture drawing on Google’s experience with MapReduce, an already proven approach to running across many computers in parallel, but also incorporating Java to make it easier to integrate with existing services and APIs. A frequently quoted example of the power of NoSQL and Hadoop: the New York Times processing 4 terabytes of Tagged Image File Format (TIFF) data into 11 million PDFs in 24 hours using multiple Amazon Elastic Compute Cloud (EC2) instances costing $240 was actually achieved using Hadoop.

The impact of this and other technologies is discussed in the Capgemini CTO Blog.

Page 4: NoSQL and Hadoop: A New Generation of Databases - Changing the Game: Monthly Technology Briefs

#

Public the way we see it

3

Movements by Industry LeadersCisco Telepresence Callway Service allows the buying/leasing of end-point Tele presence sets and has added a new MX300 unit offering high quality video at low-frame rates. Cisco Jabber Video for Telepresence supports smartphones, tablets, and PCs offering a video call from any of the devices. www.cisco.com

Oracle is acquiring RightNow Technologies to bring its Customer Experience Cloud Platform under the Oracle Public Cloud offerings. Oracle NoSQL Database for Big Data Enterprise Edition handles large amounts of unstructured data and is part of the Oracle Big Data Appliance system with an open source version promised to follow. Oracle SOA Suite for Healthcare supports specific sector messaging and data standards for ease of integration. Oracle Enhanced Identity Analytics part of Fusion Middleware 11g and Oracle Identity Management 11g simplifies understanding user work patterns and permissions. Oracle Solaris 11 sparc and x86 versions is aimed at being able to manage increased virtual machines and networks to increase its ability to work as a cloud platform. Oracle PeopleSoft HCM 9.1 feature pack II adds new employee self-service capabilities. Oracle Communications Service Controller and Policy Server for service providers aids the control of data and customer experiences. Oracle Agile Product Life Cycle Management (PLM) for Process 6.1 adds a new user interface, mobility and a library of processes and plug-ins. www.oracle.com

HP will keep their PC business announced CEO Meg Whitman and strategically invest in the business. HP Slate 2 Tablet running Windows 7 offers WiFi and 3G connectivity and a ‘stylus’ in that makes handwritten note translations. HP ZR Series Performances displays and HP DreamColor LP2480zx add new specialist workstations for graphic artists and media. HP Client Virtualization, Analysis and Modeling Service specializes in virtualizing Microsoft Windows 7 environments. HP Proliant Server Range has new high end units claimed to increase performance by up to 35% using new AMD Opteron 6200 chips. HP Folio 13 UltraBook is a new range for business use featuring up to nine hours battery life plus low weight and thin form. www.hp.com

Intel Solid State Drives (SSD) Toolbox 3.0 provides a new ease of use and management tool for administering SSDs. A new generation of Intel Wireless Display TV adaptors (WiDi) makes streaming video from the PC to the TV even easier and cheaper. Intel Cluster Studio XE Tools Suite is claimed as the first developer tool kit to allow high performance clusters. Intel High performance computing to be tuned to applications. Intel Cloud Access 360 provides new cloud sign on and user security functions together with the McAfee Cloud Security Platform to support cloud hosting and brokerage activities. Intel Embedded Software Development Tool Suite 2.3 adds more support for Atom Processors and new tools for the Linux Foundation Yocto project. www.intel.com

Leading Company Results (Revenues)

Leading Company Results (Revenues)

Q4 HP 3% @ $32.1bn

Q3

ARM 22% @ $192mn

Software AG 0% @ €274.6mn

Salesforce.com 36% @ $584mn

Amazon 44% @ $10.88bn

Cognizant 31% @ $1.6bn

Dell 0% @ $15.37bn

SAP 14% @ $3.41bn

Capgemini 13% @ €2.4bn

Alcatel-Lucent 7%@ $3.8bn

Q2 Lenovo 36% @ $7.8bn

Q1 Cisco 5% @ $11.3bn

Page 5: NoSQL and Hadoop: A New Generation of Databases - Changing the Game: Monthly Technology Briefs

the way we see itPublic the way we see it

Changing The Game: Monthly Technology Briefs (December 2011) 4

IBM Cognos Mobile for iPad is a free Apple App shop download that allows on or offline use of Cognos reports optimized for an iPad interface. IBM InfoSphere BigInSights Platform adds a new big data analysis capability to IBM SmartCloud Services. IBM SPSS Statistics 20.0 gains mapping software to enable locations to be used as a further analysis feature. IBM zseries Servers are now able to support Windows-based applications with the claimed advantage of consolidation and ease of management. IBM Hosted Mobile Security Device Management provides an outsourcing service for enterprises who want add the management of various mobile devices to their desktop management outsource. www.ibm.com

Microsoft Dynamics CRM Online update improves interactivity with customers via different websites and platforms with both outgoing and incoming capabilities. Microsoft Windows Phone 7.5 is shipping on various manufacturers’ phones. Microsoft Embedded Roadmap coded ‘v.next’ shows new versions to share the Windows 8 platform and will support x86 and ARM chips. Microsoft SQL Server 2012 Release Candidate is now available for qualified developers and is said to be a next generation cloud database. www.microsoft.com

SAP Business One 8.82 adds new capabilities around CRM and campaign management, new controls for serial numbers asset management, pick-and-pack improvements, ,material planning, and long-term business planning, etc. SAP Innovation Services is a new professional services offering to help business managers apply SAP technology in new areas of their business. SAP NetWeaver Portal Content Management and SAP NetWeaver Portal Site Management, by OpenText in collaboration with SAP, adds new content management capabilities. SAP CEO Jim Hagemann Snabe clarified that SAP would offer full cloud-based SAP on-demand globally as well as on-premise in a dual strategy; seems like in the current 8 markets they have more than 700 customers already. He also stated that SAP Sybase IQ and SAP Hana will be maintained as two separate database families. In the same direction, SAP CTO Vishal Sikka said that he expects to have all SAP ERP applications running on HANA as the SAP alternative to a conventional database; in a further move, NetWeaver on HANA is just about ready to be released and will integrate the services orchestrations with the existing EAI application integration capabilities. There will be an SAP version of the Apple App Store and iCloud to allow full mobility between devices from a common set of services and user-centric data store. SAP NetWeaver Process Orchestration and SAP NetWeaver Gateway add a wide range of new capabilities for integration and orchestration. SAP Mobile Apps for Apple iOS, BlackBerry, Android, and Windows Mobile cover a wide range of knowledge and field-based workers. www.sap.com

Google supports integration between Apps and Google+ as an administrator controlled option to allow employees to use both sets of tools in support of their work. Google+ Pages for Business allows a business to profile itself and provide links to associated personal accounts. Google Online Music Store for Android devices, in partnership with 23 Music labels, provide streaming music from a library of 23 million songs. Google Wallet will now have Google Checkout Pay Platform combined into a single new full service system. Google Search for iPad offers claimed improvements mainly directing search away from the Apple Safari based option. The Google ‘cleanup’ of low usage, or superseded services, continues with Google Knot, Google Wave and Google Friend Connect being added to the list. www.google.com

Apple iPhone 4S battery life has been reported as a significant problem, with users claiming discharge in use is too fast. A downloadable fix iOS 5.0.1 has not apparently improved the situation. www.apple.com

Page 6: NoSQL and Hadoop: A New Generation of Databases - Changing the Game: Monthly Technology Briefs

#

Public the way we see it

5

Open Source UpdateNovell SuSe alpha of a Cloud Builder built using OpenStack and SuSe Linux Enterprise Server is now available as a preview download for comment. http://www.novell.com/home/

Redhat is joining the Open Compute Project aiming to ensure that Redhat Enterprise Linux will be certified as compliant with the standards that emerge.www.redhat.com

Canonical Ubuntu Roadmap will see the operating system being able to be used on Tablets and Smartphones as a single version within 18 months. www.ubuntu.com

Mozilla Firefox 8 is now on final release after widespread betas and in a move that seems to make Firefox a continuous development and beta program Firefox 9 first beta is now available adding optimization for Android Tablets to other features. www.mozilla.com

Standards WatchThe World Wide Web Consortium (W3C) Tracking Preference Expression, and the Tracking Compliance and Scope Specification are two new draft standards aiming at controlling track and trace of website visitors to improve privacy. www.w3.org

OpenCL 1.2 update adds new development tools to the standard for Graphic Processor Units, which allow library tools to be built and shared without needing source code. www.khronos.org

More Noteworthy NewsSalesforce.com is acquiring Mobile Metrics whose analytic platform is used on Mobile and Social Networks to analyze effectiveness of apps and is already a partner with Salesforce.com. www.salesforce.com

Gartner records Q3 2011 PC sales at $14.8mn up slightly from Q2 but down from over $16mn in Q2 2010. HP is at number one with 22.7% share of the market, down slightly after the uncertain future of HP PCs early in the quarter whilst Apple PC sales rose 19% against Q3 2011. www.gartner.com

Gigaspaces Cloudify is a combination of a middleware platform and tool set to migrate applications to cloud platforms and seems to be vendor neutral in terms of the cloud/virtualization services that it can work with. www.gigaspaces.com

BlackBerry Business Cloud Services now provides connectivity and support for Microsoft Office 365. Two new BlackBerry Smartphones running the new BlackBerry 7 OS have been launched, the Curve 9380 is the first to shift to a virtual keyboard, and the Bold 9790 with a keyboard. www.blackberry.com

Hitachi Data Systems Three Tier Storage architecture provides three separate clouds each designed to fufil a different purpose in what and how they support a specific requirement; An Infrastructure Cloud for basic services; a Content Cloud; and an Information cloud. www.hitachidatasystems.com

Citrix HDX Platform could be a breakthrough as it is achieved with the thin client on a single chip, thus making it smaller and needing lower power than any previous thin client solution. Citrix Online GoToMeeting adds document collaboration to its existing web conferencing capabilities. www.citrix.com

Page 7: NoSQL and Hadoop: A New Generation of Databases - Changing the Game: Monthly Technology Briefs

the way we see itPublic the way we see it

Changing The Game: Monthly Technology Briefs (December 2011) 6

SAS High Performance Analytics will ship in December offering analytics on huge volumes of data in minutes rather than a task that would have taken hours. SAS Conversation Center as part of the SAS Social Media Analytics Platform adds the capability to monitor comments on various social sites. www.sas.com

Nokia Lumia 800 and 710 powered by Microsoft Windows Phone 7 Mango launch is an important milestone as it moves Nokia onto the Microsoft Platform. www.nokia.com

Dell Latitude ST Tablet running Windows 7 on an Intel Atom processor is aimed at the Enterprise Business market and can be managed within the Windows 7 single user profile. Dell Kace K1110-ADV systems management and K2100-ADV systems deployment appliances offer management of up to 3000 end stations. Dell PowerEdge Server range has new models using the new Intel Opteron processor for increased performance. www.dell.com

Dropbox, the popular Online Storage and file exchange, is introducing Dropbox for teams a business service with full administration controls including billing and technical support. www.dropbox.com

Sony Ericsson changes makeup with Sony buying Ericsson out in a €1.05 bn-move and an attempt to reposition the brand within the Sony user devices marketplace. www.sonyericsson.com

ARM v8 chip set takes ARM into 64 bit processors and allows enterprise IT applications to be supported on tablets and mobile smartphones. www.arm.com

Juniper Networks Junos Pulse Security Platform has been chosen by Samsung to be included in its smartphone hardware to manage SSL Virtual Private Network connections including administration of the phone by the enterprise. www.junipernetworks.com

iPass Open Mobile Platform 2.0 has extended the WiFi hotspot aggregation service for wireless connection of mobile devices to add metering of usage of iPhone and Android devices together with improved endpoint verification and a new password manager that automates log-ons. www3.ipass.com

Git.Hub Enterprise is an on-premise version of the online hosted collaborative development platform for customers who want to have a closed and secure development environment. https://github.com

Opera Mobile 11.5 Browser update for iPhone, iPad, BlackBerry, and Symbian is claimed to track data usage in a manner that will give big cost savings on download charges. www.opera.com

AMD cuts 10% of workforce of 12,000 to improve its competitive position against Intel. AMD Opeteron 6200 Interlagos and 4200 Valencia chips are now shipping based on the new AMD Bulldozer architecture, which is to offer ‘revolutionary’ increases in performance of up to 88% faster than conventional architectures. www.amd.com

Samsung Bada 2.0 operating system for Smartphones Software Development Kit is now available and the Samsung Wave 111 smartphone will ship with Bada 2.0, which is Samsung’s new focus for all their smartphones. Samsung GalaxyTab 10.1 N is a redesigned version to try to stop the Apple dispute on Samsung infringing Apple patents. Samsung SUR40 Microsoft Surface Tabletop based computer is now available. www.samsung.com

Page 8: NoSQL and Hadoop: A New Generation of Databases - Changing the Game: Monthly Technology Briefs

#

Public the way we see it

7

Cortado Corporate Server 5.3 upgrade will have full HTML 5.0 support, allowing PCs and laptops to enjoy the same capabilities as iPhone, iPad, BlackBerry and Android devices to access and view content remote from files on the corporate network. www.cortado.com

Zenprise Bring Your Own Toolkit is designed to allow enterprises to determine the risk and manage the increasing number of personal devices that employees are using for their work. www.zenprise.com

McAfee Cloud Security Suite has gained three new products; Cloud Identity Manager to federate multiple sign-ons; Web Gateway Platform to manage what users may do online via their identities and; Cloud Security Data Loss Prevention Platform to manage and maintain data on virtual machines. www.mcafee.com

Huawei Media Tablet based on Android 3.2 with a 7inch screen, WiFi, 3G, Flash 10.3 plus front and rear cameras to support video conferencing marks a move into new markets by the Chinese Telecommunications supplier. Huawei has bought out Symantec from their joint venture in a move to strengthen its security products and portfolio of integrated services. www.huawei.com

Rackspace Cloud Private Edition is a self-contained private cloud system using OpenStack that can be installed in a client’s data center and provide an on-premise host for the enterprises services as opposed to most private clouds that are merely virtualized client servers for applications. www.rackspace.com

Adobe is making 750 employees redundant in USA and Europe to match trading conditions and maintain its profitably said the press release. Adobe Flash for Mobiles is to be discontinued and work on HTML 5.0 is to accelerated. www.adobe.com

Fujitsu Business Solutions App Store is a new business venture that allows their cloud-based clients to find and use common business applications. www.fujitsu.com

CA Executive Insight for Service Assurance takes business data feeds and allows users to design how they want to view the results in visual and graphical outcomes that can suit mobile devices as much as PCs. CA Cloud 360 provides a modeling and decision support environment to help optimize the correct places and ways to adopt cloud technology. www.ca.com

Tibco Spitfire 4.0 Business Intelligence and Analytics now integrates to Microsoft SharePoint and features a user-driven dashboard claimed to improve predicative analytics. www.tibco.com

Trend Micro SafeSync 5 online data storage service allows collaboration on the same data between different users or devices, or data storage externally for business but boasts that it is a European based service meeting EU requirements for data holding inside the EU. www.trendmicro.com

AeroHive Banch onDemand provides mobility devices and users with automatic VPN access to enterprise services both via wireless and wired connections. www.aerohive.com

Page 9: NoSQL and Hadoop: A New Generation of Databases - Changing the Game: Monthly Technology Briefs

the way we see it

Changing The Game: Monthly Technology Briefs (December 2011) 8

Skype Video Chat beta has Facebook integrated so that calls can be made automatically to Facebook friends. www.skype.com

Storage Fusion Storage Resource Analysis Enterprise Edition now includes cloud-based storage in its comprehensive analysis of how storage is deployed. www.storagefusion.com

Aruba Networks is acquiring Avenda to add its authentication and authorization platform to the Aruba WiFi networks business. www.aruba.com

Page 10: NoSQL and Hadoop: A New Generation of Databases - Changing the Game: Monthly Technology Briefs

www.capgemini.com/ctoblog

Copyright © 2011 Capgemini. All rights reserved.

Andy Mulholland

Global Chief Technology Officer, Capgemini

Tel. +44 (0)207 434 2171

[email protected]

With more than 115,000 people in 40 countries,

Capgemini is one of the world’s foremost providers of consulting, technology and outsourcing services. The Group reported 2010 global revenues of EUR 8.7 billion. Together with its clients, Capgemini creates and delivers business and technology solutions that fit their needs and drive the results they want. A deeply multicultural

organization, Capgemini has developed its own way of working, the Collaborative Business ExperienceTM, and draws on Rightshore®, its worldwide delivery model.

Learn more about us at www.capgemini.com.

Rightshore® is a trademark belonging to Capgemini

About Capgemini