altic's big analytics stack, charly clairmont, altic
DESCRIPTION
For a long time Altic has been an active member of the OW2 BI Initiative. Since a few years, Altic has taken a deep interest in the Big Data technologies, like many others actors of the OW2 consortium. Some of them even have added new features related to Big Data in their offers. Altic and its partners, Talend and Engeeniring Informatica (SpagoBI), have decided to create a Big Data Stack using their own solutions. The magic thing : Altic hasn't changed the way its projects are done but only learnt how to store Big Data and compute them. In this presentation we will propose to discover our Big Data stack with : * Hadoop and Spark to store and compute data * Talend DI to create Big Data tasks published and scheduled in SpagoBI * SpagoBI to manage security and allow end users to access to data visualization. For a more user friendly Big Data stack we provide new components : * tMahout talend component which helps us to create Datamining job inside Hadoop without code development in MapReduce paradingm, * SpagoBID3Engine which provides easy manipulation of data to develop beautiful data visualizationsTRANSCRIPT
Twitter #ow2con @egwadawww.ow2.org
Our historical tools
• ETL : Talend
• Reporting : JasperReports, Birt
• OLAP : Mondrian, Palo
• BI platform : SpagoBI
Twitter #ow2con @egwadawww.ow2.org
Smart assembling Innovation & customers'needs
● Identify when applied research is an opportunity for us, our solutions and our customers.
● Understand the business process of our customer & assess the impact of Open IT on their activities
● Offer an approach of the project both a technical and a operative
➔ Altic projects
➔ Allows our customer to optimize their business process
➔ Takes the customer job into account
➔ Offers perennial solutions
➔ Follows the customer present needs and not the editors' agenda
Twitter #ow2con @egwadawww.ow2.org
Our first Big Data project at Altic
● eFraudBox project (2010 – 2013)● Goal : predict frauds on Internet● Context :
– Customer : GIE carte bancaire– European Research and Development project– Lot of industrial and academic partners
● Data :– Type : Banking transactions– Volume : One GB per day
Twitter #ow2con @egwadawww.ow2.org
« In data mining processing is done line by line » … [ there's not about a data volume issue ]
Twitter #ow2con @egwadawww.ow2.org
● Open Source
● MPP compute platform
● Distributed file system
● MapReduce processing
● Cost efficient
● Fault tolerant
● Infinite scale
● Enterprise Information System ready
● Continuous Improvement
● Growing community
Let's have a look at Hadoop ?
« Even transactions are possible on Hadoop - it's inevitable that ALL kinds of workloads will move there
in the future »
Doug CUTTINGHadoop Creator
Octobre 2013
Twitter #ow2con @egwadawww.ow2.org
How do we query Hadoop ?
● SQL like● Easy development
● Pig Latin● Easy syntax● Support unstructured data
● Java● Very optimised● Very customisable
Twitter #ow2con @egwadawww.ow2.org
How do we query Hadoop ?
● We already know SQL !
● Why not ?● Need to code evertything
Twitter #ow2con @egwadawww.ow2.org
Ok, we have our storage and computation engine, but how can we
manage data ?
By using our Swiss Army Knife !
Twitter #ow2con @egwadawww.ow2.org
Now our Hadoop / Hive platform is filled with Big Data,
but It's a little bit too slow to query for end users...
http://ih2.redbubble.net/image.13088996.5766/sticker,375x360.png
Twitter #ow2con @egwadawww.ow2.org
Processing data with Hive and store results in fast databases
Aggregate data
Twitter #ow2con @egwadawww.ow2.org
Ok, now we have our fast queryable datasets, but how can we visualize these ?
To manage users and visualizations
To quickly have a vision of your data
To go deeper in your visualizations
Twitter #ow2con @egwadawww.ow2.org
BigData and Datamining v2
● Spark : new InMemory data processing framework
● Very appropriate for Machine learning● MLBase : Machine learning library● Spark-clustering : Implementation of SOM algorithm● Proof Of Concept : Analysis of mobile
telecommunications
Twitter #ow2con @egwadawww.ow2.org
BI & Big Data for Altic
● Eventually, we still do BI as usual● Tools evolve :
– New storage and processing– We do not change our tools, fortunately THEY progress
for us and we contribute● Fundamental does not really change, only
technologies do– Hadoop– Spark
Twitter #ow2con @egwadawww.ow2.org
We improve our Big Data stack and its approach...
And support Big Analytic customer project
Our Big Data Stack Our Big Data Approach
Twitter #ow2con @egwadawww.ow2.org
Questions ?
Thanks !
Charly CLAIRMONTCTO at ALTIC
http://altic.org