goutham r et al, international journal of computer ... · pdf...

5
REAL-TIME STREAM PROCESSING IN A BIG DATA WITH HADOOP Goutham R Saravana Kumar K Mr.R.Subash(Assistant Professor(O.G)) Computer Science & Engineering Computer Science & Engineering Computer Science & Engineering SRM University SRM University SRM University Chennai, India Chennai, India Chennai, India Abstract- Real-Time Stream Processing In A Big Data With Hadoop(RSPBH), The remote sensing virtual international daily generate large extent of real-time records, data are remotely amassed inclusive of to investigate, aggregate, and stock it. huge statistics is extracting the beneficial information in an efficient manner. The Analytical Architecture overcomes these challenges by using 3 main units SDU, SDPU, DMU-CA. In Streaming data unit(SDU) the data is collected and placed in the local file of the host machine. In Streaming Data process unit(SDPU) the main reduction of data takes place by Mapreduce technique present again in Hadoop. In Decision making unit for complex analysis (DMU-CA) the data stored after the Mapreduce is placed back in local host machine as we don’t have analysis option in Hadoop. Keywords: Big data, SDU, SDPU, DMU-CA, RSPBH I. INTRODUCTION As of late, a number of enthusiasm for Big Data has risen, for the most part determined from across the board number of research issues emphatically identified with genuine applications and systems. Day by day the data is increasing very large volume from social media, videos, emails, online transitions, logs, Scientific data, mobile phones, Remote sensors and other applications. These data store in the database and grow rapidly with a massive amount becomes complicated to store, process, manage and analyze. The advanced technology in the big data give a way to the remote data, which can be collection, managing, analyzing and processing. Recently designed remote sensors that are used for the earth observatory streams the data continuously. In remote sensing records from the satellite, along with gradient based totally edge detection [4], change detection [5] and etc. This paper is concentrated on the high speed continuous real time streaming data through offline.it becomes to a new challenge. Such moments for scientific understanding of transformation of the remote sensed data is critical task [3]. Data is collected from the remote sensors; these remote sensors generates a very large volume raw data this is also called as data acquisition. The collected data has no meaning in it, the sensor simply collects all the information. So the data need to be processed and filtered to extract the useful information from it. The main challenge in this is the data accuracy, the information that are generated by the remote sensors are not in the correct format for analysis. Now the data need to be extracted to pull the useful or meaningful data and converted into to the structured format for best analysis. Sometimes the data might be not clear or it may be erroneous too. To address the above needs, the architecture is introduced, for the faraway sensing large facts. this structure has the capacity to research both sort of facts, offline facts in addition to actual time information. first, the information has to be remotely processed within the readable format of the gadget then the beneficial statistics is sent to the base station of the earth for the further records processing. the earth base station methods 2 kinds of records one is offline facts and the other is actual time streaming information. the offline statistics are despatched to the offline statistics storage device incorporation of the later usage of facts. in which inside the real time records, the statistics is directly processed to filtering and the weight balancing server. filtering extracts the meaningful or useful facts from the big information and the weight balancing will balance processing through distributing the real time records similarly to the Goutham R et al, International Journal of Computer Technology & Applications,Vol 8(3),306-310 IJCTA | May-June 2017 Available [email protected] 306 ISSN:2229-6093

Upload: tranhanh

Post on 27-Mar-2018

217 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Goutham R et al, International Journal of Computer ... · PDF fileanalysisoptioninHadoop.Laterusingjava-script ... The aim of the project is to provide ... Available online@ 310 ISSN:2229-6093

REAL-TIME STREAM PROCESSING IN A BIGDATAWITH HADOOP

Goutham R Saravana Kumar K Mr.R.Subash(Assistant Professor(O.G))Computer Science & Engineering Computer Science & Engineering Computer Science & EngineeringSRM University SRM University SRM UniversityChennai, India Chennai, India Chennai, India

Abstract- Real-Time Stream Processing In A BigData With Hadoop(RSPBH), The remotesensing virtual international daily generate largeextent of real-time records, data are remotelyamassed inclusive of to investigate, aggregate,and stock it. huge statistics is extracting thebeneficial information in an efficient manner.The Analytical Architecture overcomes thesechallenges by using 3 main units SDU, SDPU,DMU-CA. In Streaming data unit(SDU) thedata is collected and placed in the local file ofthe host machine. In Streaming Data processunit(SDPU) the main reduction of data takesplace by Mapreduce technique present again inHadoop. In Decision making unit for complexanalysis (DMU-CA) the data stored after theMapreduce is placed back in local host machineas we don’t have analysis option in Hadoop.

Keywords: Big data, SDU, SDPU, DMU-CA,RSPBH

I. INTRODUCTION

As of late, a number of enthusiasm for Big Datahas risen, for the most part determined from acrossthe board number of research issues emphaticallyidentified with genuine applications and systems.Day by day the data is increasing very large volumefrom social media, videos, emails, onlinetransitions, logs, Scientific data, mobile phones,Remote sensors and other applications. These datastore in the database and grow rapidly with amassive amount becomes complicated to store,process, manage and analyze. The advancedtechnology in the big data give a way to the remotedata, which can be collection, managing, analyzingand processing. Recently designed remote sensors

that are used for the earth observatory streams thedata continuously. In remote sensing records fromthe satellite, along with gradient based totally edgedetection [4], change detection [5] and etc. Thispaper is concentrated on the high speed continuousreal time streaming data through offline.it becomesto a new challenge. Such moments for scientificunderstanding of transformation of the remotesensed data is critical task [3]. Data is collectedfrom the remote sensors; these remote sensorsgenerates a very large volume raw data this is alsocalled as data acquisition. The collected data has nomeaning in it, the sensor simply collects all theinformation. So the data need to be processed andfiltered to extract the useful information from it.The main challenge in this is the data accuracy, theinformation that are generated by the remotesensors are not in the correct format for analysis.Now the data need to be extracted to pull the usefulor meaningful data and converted into to thestructured format for best analysis. Sometimes thedata might be not clear or it may be erroneous too.To address the above needs, the architecture isintroduced, for the faraway sensing large facts. thisstructure has the capacity to research both sort offacts, offline facts in addition to actual timeinformation. first, the information has to beremotely processed within the readable format ofthe gadget then the beneficial statistics is sent to thebase station of the earth for the further recordsprocessing. the earth base station methods 2 kindsof records one is offline facts and the other is actualtime streaming information. the offline statistics aredespatched to the offline statistics storage deviceincorporation of the later usage of facts. in whichinside the real time records, the statistics is directlyprocessed to filtering and the weight balancingserver. filtering extracts the meaningful or usefulfacts from the big information and the weightbalancing will balance processing throughdistributing the real time records similarly to the

Goutham R et al, International Journal of Computer Technology & Applications,Vol 8(3),306-310

IJCTA | May-June 2017 Available [email protected]

306

ISSN:2229-6093

Page 2: Goutham R et al, International Journal of Computer ... · PDF fileanalysisoptioninHadoop.Laterusingjava-script ... The aim of the project is to provide ... Available online@ 310 ISSN:2229-6093

server. those filtering and the weight balancingserver will also improve the system efficiency.Next, the data is directly sent to the dataaggregation unit for comparison by analyzing andthe decision server. The proposed method isimplemented by the Hadoop framework using themap reduce programming by the data of remotesensing.

II. RSPBH

A. System Framework

The below fig (I) shown is the system architecturewhere it contains 3 major units,

-Streaming data unit.(SDU).

-Streaming Data process unit(SDPU).

-Decision making unit for complex analysis(DMU-CA).

In SDU, the data is collected and placed in the localfile of the host machine, later it is stored in theHadoop DFS where data is divided into many datanode and their location is stored in name node butin project only single cluster has been used.

In SDPU, the main reduction of data takes place byMapreduce technique present again in Hadoop,where the work is divided among several workersand master will take care of workers. Once the taskis done either combined or sorting is been done andplaced back in Hadoop Distributed FileSystem(HDFS) by creating a particular directory sothat it will be clearly known to all.

In DMU-CA, the data stored after the Mapreduce isplaced back in local host machine as we don’t haveanalysis option in Hadoop.Later using java- scriptlanguage in project it has been analyzed and plottedin the form of graph i.es line and bar graph so thateveryone can take decision easily without muchdifficulty in understanding.

The SDU, Streaming data unit is used for datacollection and its named as data collection centre.Then the SDPU, streaming data processing unit isused for filtration and load balancing process and itcan also store data in offline data storage. TheDMU-CA, decision making unit for complexanalysis is the key unit for the process whichcollects the data provided by SDPU and send it tothe earth base station by mapreduce concept.

Fig.i. System Framework

B. Methodology

There may be usually a similar version on climateconditions which can also depend on the ultimateseven days or variant. variation refers distinctionamong before day, cutting-edge’s parameter. Alsothere exists dependency between the weatherconditions persisting in current week inconsideration and those of previous years.method being proposed which might technicallyversion two forms of non-dependency and usethose to expect the future climate situations. topredict the climate situations to consider thecondition triumphing in past week, on closingseven days will be recognised. additionallyweather circumstance for 7 preceding days thenseven future days of past yr be considered.ifclimate situation of 10th february 2017 forexpectation so do think about the situations of 03

Streamingdata unit(SDU)

StreamingData processunit(SDPU).

Decision makingunit for complexanalysis(DMU-CA).

Goutham R et al, International Journal of Computer Technology & Applications,Vol 8(3),306-310

IJCTA | May-June 2017 Available [email protected]

307

ISSN:2229-6093

Page 3: Goutham R et al, International Journal of Computer ... · PDF fileanalysisoptioninHadoop.Laterusingjava-script ... The aim of the project is to provide ... Available online@ 310 ISSN:2229-6093

february 2017 to 09 february 2017 and situationsfor preceding years too. now a good way to versionthe earlier than said dependencies the present dayyr’s version throughout the week . the fine windowbe chosen for making the predict. chosen oneand present day yr weekly versions expect theweather situation. Though the purpose to make useof slide alg matches the climate conditionstriumphing in a 12 months won't cheat stuck onprecisely to equal conditions may be exists onpreceding yr.because preceding counts and comingdays taken into consideration.

C. Methods used

Paintings wishes to find an afternoon’s climatesituations. preceding seven days climate willconsidered in conjunction along with fortnightweather conditions. In case if need expect weatherof 23rd august 2016 then we are able to take intoconsideration the weather situations of sixteenthaugust 2016 to 22nd august 2016 at the side of theweather conditions winning within the span ofsixteenth august to 29th august in beyond years.then the daily version in cutting-edge 12 months iscomputed. the version is also being computed fromthe fortnight records of previous yr. on this artworkthe 4 primary climate parameters is probablyconsidered, that is, maximum temperature, minimaltemperature, humidity and rainfall. consequentlythe scale of the version of the contemporary yearcan be represented with the resource of way ofmatrix of period . and further for past one year thematrix duration might be . now, the first step is todivide the matrix of duration into the sliding homewindows.now the subsequent step is to observe each windowwith the contemporary yr’s version. thefirst-rate-matched window is selected for makingthe prediction. the euclidean distance approach isused for the purpose of matching. the reason fortaking euclidean distance is its strength tosymbolize similarity irrespective of its simplicity.following are the parameters used for the climatesituation prediction:(1)suggest: endorse of day’sweather situations, that is, maximum temperature,minimum temperature, humidity, and rainfall. aftersuch as every one after the alternative, and divideby means of the usage of ordinary day’srange(2)model: calculate each day version aftertaking distinction of every parameter. this tells howthe following day’s weather is associated with

previous day’s climate;(three)euclidean distance: itcompares facts model of cutting-edge-day year andprevious year.by using the use of this we are capable ofmathematically model the aforesaid describeddependencies. that the connection among preceding365 days and former week facts is being describedmathematically can be used to are looking ahead tothe future conditions.the sliding window used forpredicting the wide variety of weather conditions.the algorithm for this one is as follows,

step1: take matrix ‘ab’ of remaining 7days formodern yr information of size 7x4.

step2: take matrix ‘bc’ of 14days for preceding yr’sstatistics of size 14x4.

step3: make eight sliding windows of length 7x4every in ’pd’ as w1, w2, w3,…w8.

step4: form the eucliden distance of each slidingwindow with the matrix ‘ab’ as ed1, ed2,ed3,…ed8.

step5: choose matrix wi aswi=corresponding-matrix (min(ed1))

step6: for z=one - n.

(a) following WCK the version vector to matrix‘ab’ of size 6x1 as ‘xy’.

(b) for wck compute the variation vector ‘pd’ ofsize 6x1 as ‘vice chairman’.

(c)add 1 =suggest(xy)

(d) add 2=suggest(vice chairman)

(e) prediction radiation “m”=(add1+add2)/2

(f) sum ‘m’ for preceding days climate situationto get the prediction situation.

The principle concept back using sliding windowapproach be weather situations triumphing in a fewcounting days inside some yr might not haveexisted inside the same span of days in previous yr.for example the weather circumstance in first weekof february 2016 may not have existed in the firstweek of february in 2015. the similar climatesituations may have prevailed in previous yr so notnecessary on equal week so in later days.Ofopportunity to find the parallel weather situations

Goutham R et al, International Journal of Computer Technology & Applications,Vol 8(3),306-310

IJCTA | May-June 2017 Available [email protected]

308

ISSN:2229-6093

Page 4: Goutham R et al, International Journal of Computer ... · PDF fileanalysisoptioninHadoop.Laterusingjava-script ... The aim of the project is to provide ... Available online@ 310 ISSN:2229-6093

are maximum on the taken into considerationfortnight junk mail.

D. PROCESS

1. Retrieving user information

The user will request for weather predictionrecommendation via an online request.The inputdata will be name, age, number of persons andfamily details of the individuals and the destinationplace for which the weather report has to becollected. The user registration will be entering thedifferent category then the report will be delivered.

Fig.ii.Retrieving user information

2. Retrieving online weather report into HDFSAfter getting user information the weather

report of the particular destination will be collectedonline into HDFS.These report will be collectedbased on the information provided by the user. Asthe admin will locate the retrieval on specificlocation and dumb to HDFS through online..

Fig.iii.Retrieving online weather report into HDFS

3.Mapping weather attributes with user attributes

After getting the user information and theweather attributes they are then mapped forprediction. This prediction will be based on the agedetails of the persons going for tourist.

Fig.iv. Mapping weather attributes with userattributes

4. Prediction Recommendation

The prediction recommendation will beprovided briefly to the user. This information begiven based on the family details of the individuals.For example if there are more number of childrenand elderly people then a sunny place will berecommended but not a coolplace.

Fig.v.Prediction Recommendation

III. SUMMARY

The top speed continuous data stream or excessivevolume offline records in large part saved dumps,when required. we amplify the present structure tomake it more well matched for large informationanalysis. The aim of the project is to providerecommendation to the user based on the onlineWeather condition. This system predicts theweather condition, And recommend whether theplace is suitable for the user or not. The user willrequest for weather prediction, recommendation forthe place where he wants to go for a trip with hisfamily, via an online request by submitting all thenecessary details including the location, name, age,sex, total number of people, and if any health issuepeople details.

After getting the user information and the weatherattributes they are then mapped for prediction. Theprediction will be based on the details given by theuser. Later the prediction recommendation will beprovided briefly to the user.

Goutham R et al, International Journal of Computer Technology & Applications,Vol 8(3),306-310

IJCTA | May-June 2017 Available [email protected]

309

ISSN:2229-6093

Page 5: Goutham R et al, International Journal of Computer ... · PDF fileanalysisoptioninHadoop.Laterusingjava-script ... The aim of the project is to provide ... Available online@ 310 ISSN:2229-6093

IV. CONCLUSION

The Remote sensing large information structureefficiently processed and analyzed ongoing timeand offline far off sensing huge records todecision-making. The three foremost levels,inclusive of 1) SDU; 2) SDPU; 3) DMU-CA. Thisdevices import algorithms for every level of thestructure relying on the wanted evaluation. thestructure of actual-time large are standard (softwareindependent) this use to any type of remote sensinglarge facts analysis. the abilities of investigate,dividing, and stock processing of simplest usefulinfo is achieved with the aid of discarding alldifferent greater facts. these tactics does thehigher choice for actual-time far off sensing hugerecords analysis. the far off sensing huge statisticsstructure welcomes researchers,organizations toany form faraway sensory huge statistics evaluationthrough develop algorithm in each level of thestructure relying for evaluation needs.

Further Process

The further study will be done by improvingRSPBH‘s performance like results can be alteredby changing the size of the window. Accuracyof the unpredictable months can be increasedby increasing the window size to one monthand also in recommendation process can beextended to travel booking for flight, train,buses,etc.

REFERENCES

[1] D. Agrawal, S. Das, and A. E. Abbadi, “BigData and cloud computing: Current state and futureopportunities,” in Proc. Int. Conf.ExtendingDatabase Technol. (EDBT), 2011, pp.530–533.[2] J. Cohen, B. Dolan, M. Dunlap, J. M.Hellerstein, and C. Welton, “Madskills: Newanalysis practices for Big Data,” PVLDB, vol. 2, no.2, pp. 1481–1492, 2009.[3] J. Dean and S. Ghemawat, “Mapreduce:Simplified data processing on large clusters,”Commun. ACM, vol. 51, no. 1, pp. 107–113, 2008.[4] H. Herodotou et al., “Starfish: A self-tuningsystem for Big Data analytics,”in Proc. 5th Int.Conf. Innovative Data Syst. Res. (CIDR), 2011,pp.261–272.

[5] K. Michael and K. W. Miller, “Big Data: Newopportunities and new challenges [guest editors’introduction],” IEEE Comput., vol. 46, no. 6, pp.22–24, Jun. 2013.[6] C. Eaton, D. Deroos, T. Deutsch, G. Lapis, andP. C. Zikopoulos, Understanding Big Data:Analytics for Enterprise Class Hadoop andStreaming Data. New York, NY, USA: McGraw-Hill, 2012.[7] R. D. Schneider, Hadoop for Dummies SpecialEdition. Hoboken, NJ, USA: Wiley, 2012.[8] A. Cuzzocrea, D. Saccà, and J. D. Ullman, “BigData: A research agenda,” in Proc. Int. DatabaseEng. Appl. Symp. (IDEAS’13), Barcelona, Spain,Oct. 09–11, 2013.[9] R. A. Schowengerdt, Remote Sensing: Modelsand Methods for Image Processing, 2nd ed. NewYork, NY, USA: Academic Press, 1997.[10] D. A. Landgrebe, Signal Theory Methods inMultispectral Remote Sensing. Hoboken, NJ, USA:Wiley, 2003.[11] C.-I. Chang, Hyperspectral Imaging:Techniques for Spectral Detection andClassification. Norwell, MA, USA: Kluwer, 2003.[12] J. A. Richards and X. Jia, Remote SensingDigital Image Analysis: An Introduction. NewYork, NY, USA: Springer, 2006.[13] J. Shi, J. Wu, A. Paul, L. Jiao, and M. Gong,“Change detection in synthetic aperture radarimage based on fuzzy active contour models andgenetic motion estimation in H.264/AVC,” IEICEElectron. Express, vol. 7, no. 2, pp. 47–52, Jan.2010.[14] A. Paul, J. Wu, J.-F. Yang, and J. Jeong,“Gradient-based edge detection for motionestimation in H.264/AVC,” IET Image Process.,vol. 5, no. 4, pp. 323–327, Jun. 2011.[15] A. Paul, K. Bharanitharan, and J.-F.Wang,“Region similarity based edgedetection for motion estimation in H.264/AVC,”IEICE Electron. Express,vol. 7, no. 2, pp. 47–52, Jan. 2010.

Goutham R et al, International Journal of Computer Technology & Applications,Vol 8(3),306-310

IJCTA | May-June 2017 Available [email protected]

310

ISSN:2229-6093