how to share and deliver big data fast – considerations when implementing big data infrastructure

16
Thursday at 14:30 in the Data Insight & Analytics Lab. FileCatalyst Stand : G32 How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure Presented by John Tkaczewski, President and Co-founder

Upload: filecatalyst

Post on 03-Jul-2015

179 views

Category:

Technology


1 download

DESCRIPTION

Big data is growing - in every sense of the word. And an increasing number of companies across a variety of industries are beginning to realize the benefits of leveraging big data and adopting a big data strategy in the workplace. In a recent survey conducted by Gartner it was found that 42% of IT leaders have invested in big data, or plan to do so within 12 months. (Gartner) When implementing big data within an organization, a strategy must be put in place to fully leverage its benefits. One extremely important big data strategy aspect and often overlooked is how to move this big data from one geographic location to another. File transfer bottlenecks such as failed data transfers and network delays are commonly experienced when transferring massive amounts of data that can easily run into terabytes spread over millions of files. This IP EXPO 2013 presentation provides an understanding of the challenges and solutions associated with the agile and reliable movement of big data, as well as an overview file transfer technologies optimizing user networks for cost-efficient IT processes. Other takeaways include an understanding of the technology behind accelerated file transfer, its benefits over other methods of file transfer, and an in depth look at why accelerated and managed file transfer should be included in every big data strategy. Also see a video recording of this presentation from IP EXPO 2013 at the end of the presentation slides.

TRANSCRIPT

Page 1: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

Thursday at 14:30 in the Data Insight & Analytics Lab.

FileCatalyst Stand : G32

How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

Presented by John Tkaczewski, President and Co-founder

Page 2: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

Big Data

| © Copyright FileCatalyst, 20122

Many existing solutions propose to....• Store it• Mine it• Search it

But what about moving it?Big data usually comes from many geographical locations.

Should we still ship tapes or hard drives to move our big data?

Page 3: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

What We Do

• Provide software solutions for moving big data around the world using the Internet or dedicated WAN links

• Accelerate file transfers via our unique UDP based approach

• Simplify transfer and distribution of big data for the end enterprise users

• Provide the tools for fast integration into existing enterprise infrastructure (Storage, LDAP/AD, AV and SNMP)

• Provide management, security and monitoring tools required to move big data across corporate networks and/or the Internet

• Integrate with all major cloud vendors, including, Amazon, OpenStack and Windows Azure to accelerate moving big data to the cloud

| © Copyright FileCatalyst, 2013 Booth F6163

Page 4: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

Underlying File Transfer Technology andWhy FTP is not enough?

Page 5: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

TCP Overview

• Provides reliability, error checking, ordered packets in a stream

• Congestion control built in

• Internet could not survive without it

• Works well for most internet traffic, email, web browsing small ad-hoc transfers (SFTP,HTTP, FTP, SMTP, SMB and more)

| © Copyright FileCatalyst, 20135

Page 6: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

Solving Latency Issues

| © Copyright FileCatalyst, 20126

Every network suffers from latency

Latency is an expression of how much time it takes for a packet of data to get from one designated point to another.

Network latency is measured by sending a packet and receiving an acknowledgement. This round-trip time (RTT) is the “latency”.

Visualize the flow of packets like water running through a hose. With FTP, there is a kink in the hose restricting the water’s flow. With FileCatalyst, the kink is removed and water can rush through unrestricted.

FTP is a very serial process. Each packet of data must be received before a new packet is sent = Decreased transfer speed

FileCatalyst completely saturates the pipe by sending multiple blocks of data = Increased transfer speed

Source FileDestination

File

Acknowledgments

Data Packet

Source FileDestination

File

Acknowledgments

Page 7: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

More Efficient Transport Protocol

Introducing FileCatalyst protocol for file transfers at full link speed:

• UDP with proprietary retransmission and congestion control

• Patent pending algorithm built from the ground up in-house

• Transfer rates up to 10 Gbps (with encryption) using commodity hardware

• Not affected by latency and speed degrades, linearly with packet loss

• Ability to fully leverage multi-core CPU architecture and virtual / cloud systems

| © Copyright FileCatalyst, 20127

Page 8: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

| © Copyright FileCatalyst, 20128

Results based on a T3 (45 Mbps) connection

*NA = North America

Speed gains with UDP

FileCatalyst Throughput

Bandwidth RecoveredTransfer Scenario FTP Actual Throughput

44.3 Mbps44.3 Mbps44.3 Mbps44.3 Mbps

44.3 Mbps44.3 Mbps44.3 Mbps44.3 Mbps1 Mbps1 Mbps

44.3 Mbps44.3 Mbps44.3 Mbps44.3 Mbps700 Kbps700 Kbps

6 Mbps

RTT 200 ms / 220 ms

EuropeEuropeNANA

RTT 40 ms / 60 ms

NANANANA

AsiaAsiaNANARTT 300 ms / 320 ms

Page 9: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

| © Copyright FileCatalyst, 20129

Time Savings

LA to Hong KongLA to Hong KongRTT 250ms / Loss 1.5%

LA to Hong KongLA to Hong KongRTT 250ms / Loss 1.5%

FTP

20 minutes20 minutes20 minutes20 minutes44 hours

LA to AucklandLA to AucklandRTT 200ms / Loss 2%

LA to AucklandLA to AucklandRTT 200ms / Loss 2% 20 minutes20 minutes20 minutes20 minutes35 hours

LA to LondonLA to LondonRTT 110ms / Loss 1%

LA to LondonLA to LondonRTT 110ms / Loss 1% 20 minutes20 minutes20 minutes20 minutes16 hours

6GB File over a T3 (45 Mbps) connection

FileCatalyst

Time Savings with UDP

Page 10: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

Other Acceleration Methods

| © Copyright FileCatalyst, 201210

• RSYNC - Ability to send deltas over the UDP protocol

• Bundling multiple small files into a single archive. Sending this archive as it’s being created. (Zip Chunking)

• Transferring multiple files simultaneously over multiple concurrent sessions

• Compression via ZIP or LMZA

Page 11: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

• UDP combined with “other” methods– Transferring directory structure of 7000 files (~6.96 GB), – 110ms latency link, 0% packet loss over 1Gbps line

• 1 large 1.0 GB file• 8 large 500 MB files• 4 large 100 MB files• 6988 small 250KB files

* Note: FileZilla or standard FTP transfer was over 21 hours

Combined Speed Gains

| © Copyright FileCatalyst, 201311

Page 12: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

Congestion Control

Network congestion occurs when a link is carrying so much data that its quality of service deteriorates

| © Copyright FileCatalyst, 201212

FileCatalyst traffic plays nicely with other traffic on the link:

• Congestion control will immediately detect server receive rate and react

• Aggression (how much link capacity FileCatalyst tries to continuously recover) can be tuned to suit your network

• Shares bandwidth with concurrent TCP streams

• Slows down very quickly when congestion detected

• Quick ramp up to reclaim un-used bandwidth

• Zero tuning required for links between 1Mbps up-to 1Gbps

Page 13: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

Security Concerns

Security for both data transmission and user access:• Encrypted with industry standards: SSL for control channel

and AES for data• Brute force attack protection• Filter by IP address for either inclusion or exclusion• Central Monitoring advises administrators about suspected

threats• Authenticate via LDAP, Active Directory or PAM (LDAP, LDAPS,

AD, ADS)• Ban IPs or block accounts in real time

| © Copyright FileCatalyst, 201213

Page 14: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

Use Scenarios / Verticals

| © Copyright FileCatalyst, 201214

• General IT• DR/Backup• Sending large email attachments• Cloud deployments• Replacement for FTP going over WAN Accelerators

• Broadcast / Media• Moving large media production files• Moving dailies

• Military and Law enforcement• Faster communication with the fleet• Moving intelligence and mapping data

• Oil and Gas• Moving seismic data• Moving well head operations data

Page 15: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

FileCatalyst Solution Portfolio

| © Copyright FileCatalyst, 201215

The FileCatalyst platform is designed to take the complexity out of high speed transfers, providing a flexible and comprehensive solution for almost any file transfer workflow.

Page 16: How to Share and Deliver Big Data Fast – Considerations When Implementing Big Data Infrastructure

Summary

| © Copyright FileCatalyst, 201216

• FileCatalyst provides software-based solutions designed to accelerate and optimize file transfers across global networks, featuring industry leading acceleration technology at 10 Gbps

•FileCatalyst solutions are modular (purchase only what you need) and add more functionality and capacity when required.

•File transfers with FileCatalyst are secure, reliable, and immune to packet loss and latency

• Visit us at our booth: G32 or online at www.filecatalyst.com

For more information, look at:-Blog, Webinars, Monthly Newsletter, White Papers, Youtube channel, support portal