hbase_lab2

24
IBM Software Using HBase for Real-time Access to your BigData Running HBase operations using the Java client API

Upload: muhammad-sadiq

Post on 08-Nov-2015

25 views

Category:

Documents


2 download

DESCRIPTION

lab2

TRANSCRIPT

  • IBM Software

    Using HBase for Real-time Access to your BigData Running HBase operations using the Java client API

  • Copyright IBM Corporation, 2013

    US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

  • IBM Software

    Contents RUNNING HBASE OPERATIONS USING THE JAVA CLIENT API .............................................................................................. 4

    2.1 SETTING UP YOUR ECLIPSE ENVIRONMENT ................................................................................................. 5 2.2 CODING THE HBASE JAVA CLASSES ......................................................................................................... 12 2.3 RUNNING THE HBASE JAVA CLASSES ....................................................................................................... 18 2.4 SUMMARY ............................................................................................................................................. 21

    Contents Page 3

  • IBM Software

    Running HBase Operations using the Java client API In this lab, we will go over actual Java code that demonstrates the usage of the different HBase operations. HBase is built using Java and its native API is in Java. While you can use the HBase shell for simple or even administrative tasks, HBase applications requires some type of programming language to fully utilize all it has to offer. However, Java is not the only language you can use. We will look at other client APIs in a different lesson.

    After completing this hands-on lab, you will be able to:

    Create Java classes using the client API to use the HBase operations

    Solutions for these exercises can be found in the Lab_Files/LabSolutions from the files that you downloaded from the Big Data Universitys course web page.

    This lab assumes some familiarity with the Eclipse environment.

    Allow 60 to 90 minutes to complete this section of lab.

    This version of the lab was designed using the InfoSphere BigInsights 2.1 Quick Start Edition. Throughout this lab you will be using the following account login information:

    Username Password

    VM image setup screen root password

    Linux biadmin biadmin

    Page 4

  • IBM Software

    2.1 Setting up your Eclipse Environment

    To prepare for this lab, you must set up your Eclipse environment.

    __1. Double click on the Eclipse icon from the desktop:

    __2. Select the default workspace:

    __3. Once the workspace has started, you will see this screen:

    Contents Page 5

  • IBM Software

    __4. To get started, create a Java project. Give it a project name of HBase_Exercises. Everything else can be left as default:

    Page 6

  • IBM Software

    __5. Once the project has been created, you will configure the build path to include the HBase libraries. BigInsights makes this easier by providing a single library file.

    Right-click the project Build Path Configure Build Path:

    __6. Select the Libraries tab, and Add Library:

    __7. Select the BigInsights Libraries

    Contents Page 7

  • IBM Software

    Click Next and then Finish to add the library. Then click OK to get out of the Build Path properties. At this point, you should have added the BigInsights Libraries to your projects build path.

    __8. On your workspace, go ahead and close the Task Launcher for Big Data tab. You will not need this:

    Page 8

  • IBM Software

    __9. Now you will create the package for your Java class

    __10. Right-click on the src folder, then go to New and Package. Give the package name of: hbase.exercise2. Click Finish to create the package.

    Contents Page 9

  • IBM Software

    __11. You should have downloaded the lab files from the Big Data Universitys course page prior to starting this course. If you did not download them, go back to the course page for the instructions to get your lab files.

    __12. You will import the files for exercise 2 into your Eclipse workspace. The files are partially completed classes where you will get a chance to fill in the code needed to complete the classes.

    Right-click on the hbase.exercise2 package that you had created, and select Import. Then Choose General File System:

    Click Next.

    __13. Navigate to the Exercise2 directory and select all 3 files to import and click Finish

    Page 10

  • IBM Software

    Contents Page 11

  • IBM Software

    2.2 Coding the HBase Java classes

    In this section, you will be writing a fair amount of Java code to access HBase. Fear not, there is a solution provided if you do get stuck. The solution is located along with the lab files that you downloaded.

    __14. Now that all 3 classes have been added, we will go over each one. First, open up the User.java class:

    This class is basically our Users object. We will create different User to load into our HBase. Theres nothing for you to do here except to understand the fields of the User object.

    __15. It is important to emphasize again, that in our exercises, we are using human readable columns, such as user, name, and email. This is to make it easier for you to grasp the concept of the columns and column families of HBase. In practice, you would not want to do this because the column names are stored with the physical HFiles as bytes. If each columns span across multiple HFiles (which is very likely in a BigData world), and repeating this for tens of thousands of columns, this will take up a lot of memory and reduce the effectiveness of HBase. Be sure to understand this concept before moving forward.

    __16. Next, open up the AccessObject.java. This class will be where the majority of our HBase coding will take place. This is the class that will access the HBase tables. There are partially completed snippets of code in this class where you will be completing it yourself. Lets take a look at the overall structure of this class:

    Page 12

  • IBM Software

    This first part is just the declarations of the commonly used bytes. Recall that HBase stores all data in terms of bytes. So in order to work with anything, we need to use the Bytes utilities class provided by the Java. What we have here is just setting up the bytes constants of our table name, column family, and column qualifiers. Later on, we can just refer to their constants rather than doing the Bytes conversion each time.

    After the constants declaration, we just declare the HTablePool that will be used for managing the table connections. Creating a table instance is a relatively expensive operation, so we will use the HTablePool to take care of it for us. There will be more on this topic later in the course. For now, we will just use HTablePool to manage our tables.

    Finally, we have our AccessObject constructor which just initializes the pool.

    __17. Each time we want to invoke the Get command, we need to create a Get object, pass in the appropriate qualifiers to tell HBase what to retrieve. This mkGet method will create and return the Get object. We need to add the column family to narrow down what is returned to us. Add in the code to specify the column family for the Get object.

    Contents Page 13

  • IBM Software

    __18. You will do something similar for the Put function. You will want to specify the user column, the name, column, and the email column for each User object you put into HBase. Remember that for each property of the User object you pass into the Put, you will need to convert it to Bytes. Heres the first one to get you started:

    p.add(INFO_FAMILY, USER_COL,Bytes.toBytes(u.user));

    Do the same for u.name and u.email.

    __19. The mkDelete is simple for our example. We will just remove the User with the specify Row-Key. There is nothing you need to do here except to understand how the mkDelete method is coded.

    __20. For the Scan function, you need to add the column family to the mkScan method to make sure that you scan only from a particular column family.

    __21. Now that you have created all the helper methods, its time to create the actual methods that interact directly with our HBase tables. The 4 methods, getUser (get), addUser (put), deleteUser (delete), and getUsers (scan) will do just that for us.

    Lets go over what we need to do. The first thing we need to do is to get a handle to the table by using the HTablePool object. We pass this object to an HTableInterface that acts as a table hander. Then we invoked the mkGet to create the Get object of the user. Once we have the get object created, we just invoke the tables get command and pass the result into a Result object.

    __a. You can do so with this command: Result result = users.get(g);

    __b. Then you need to basically create a User object from the results and pass it to the calling method by returning it. We make this easier for you by allowing you to create the User object with the Result object as the parameter:

    Page 14

  • IBM Software

    User (Result r);

    __c. Remove the return null and replace with the instance of the returned User instead.

    __22. Create the addUser() method.

    __a. The first line to get the table handle from the HTablePool is the same as from the getUser() method in the previous step.

    __b. Recall that our mkPut method takes a User as an input. Create the Put object using mkPut(User u).

    Put p = mkPut(new User(user,name,email));

    __c. Invoke the tables put command to insert the object into the table.

    users.put(p);

    __d. Close the tables connection resource.

    users.close();

    __23. Create the getUsers() method. This method is slightly different from the getUser(), no s, method that you had already created. This one essential gets all the users of the table.

    __a. Get the handle to the table.

    __b. Get the scanner object by invoking the mkScan() method

    ResultScanner results = users.getScanner(mkScan());

    __c. Then you want to iterate through the results and add each user to the List.

    Contents Page 15

  • IBM Software

    ArrayList al = new ArrayList();

    for(Result r : results) { al.add(new User(r)); }

    __d. Finally, we need to return the list.

    __24. Create the deleteUser() method.

    __25. The remaining of the AccessObject class has been completed for you. It contains the User classs constructors.

    Page 16

  • IBM Software

    __26. Finally, our last class, the HBaseTester.java contains the main() method. This class has been completed for you. It runs all of our methods. Take a few minutes and open up the HBaseTester class and take a look at what it does.

    Essentially, HBaseTester.java will accept arguments for what you want to do. For example, if you wanted to add a user, and then pass in the arguments add where the values are the ones you provide.

    Contents Page 17

  • IBM Software

    2.3 Running the HBase Java classes

    __27. Before you run the classes, you need to create a table in HBase that you can use. We will look at how to create tables and schemas in a later lesson, so for now, create the table using the HBaseShell. Open up the HBase Shell and type in the following command

    create users, info

    __28. Now that you have your table, go back to Eclipse to execute your classes. Remember, I have included the completed classes in the LabSolutions folder in the lab files if you need them.

    Go to Run Run Configurations from the menu bar. Double click the Java application to create a new run configuration:

    Page 18

  • IBM Software

    __29. Click the Arguments tab and add in the following arguments:

    __30. You are telling the program to add the user (kim123) name (kim) email ([email protected]) to the HBase table.

    __31. Click Run after you have added the arguments to run the program. You will see the output from the console that the user has been added.

    Contents Page 19

  • IBM Software

    __32. Run the program again with the following arguments and notice their outputs to see the different results.

    __a. add jay456 Jay [email protected]

    __b. add scottie789 scottie [email protected]

    __c. list

    __d. delete jay456

    __e. list

    __f. get kim123

    __33. You are done with this lab exercise. Go ahead and save and close your Eclipse and any other windows or terminals that you may have open.

    Page 20

  • IBM Software

    2.4 Summary

    Great job! You have created the Java classes to demonstrate the use of HBase Get, Put and Scan operations. These are the building blocks of HBase as you start to think about how to develop for your BigData applications. Most of these tasks that we have seen here can easily be implemented, much quicker using the HBase Shell. You might ask, why should I use Java when the HBase Shell can accomplish this is a more efficient fashion? Java (and other client APIs) allows you to build more complex applications that using a Shell simply cannot accomplish.

    Contents Page 21

  • NOTES

  • NOTES

  • Copyright IBM Corporation 2013.

    The information contained in these materials is provided for

    informational purposes only, and is provided AS IS without warranty

    of any kind, express or implied. IBM shall not be responsible for any

    damages arising out of the use of, or otherwise related to, these

    materials. Nothing contained in these materials is intended to, nor

    shall have the effect of, creating any warranties or representations

    from IBM or its suppliers or licensors, or altering the terms and

    conditions of the applicable license agreement governing the use of

    IBM software. References in these materials to IBM products,

    programs, or services do not imply that they will be available in all

    countries in which IBM operates. This information is based on

    current IBM product plans and strategy, which are subject to change

    by IBM without notice. Product release dates and/or capabilities

    referenced in these materials may change at any time at IBMs sole

    discretion based on market opportunities or other factors, and are not

    intended to be a commitment to future product or feature availability

    in any way.

    IBM, the IBM logo and ibm.com are trademarks of International

    Business Machines Corp., registered in many jurisdictions

    worldwide. Other product and service names might be trademarks of

    IBM or other companies. A current list of IBM trademarks is

    available on the Web at Copyright and trademark information at

    www.ibm.com/legal/copytrade.shtml.

    Running HBase Operations using the Java client API2.1 Setting up your Eclipse Environment2.2 Coding the HBase Java classes2.3 Running the HBase Java classes2.4 Summary