oracle.braindumps.1z0-449.v2018-11-29.by.anna · hdfs transparent encryption protects hadoop data...

https://www.gratisexam.com/

1z0-449.exam.42q

Number: 1z0-449Passing Score: 800Time Limit: 120 min


1z0-449

Oracle Big Data 2017 Implementation Essentials


Exam A

QUESTION 1The hdfs_stream script is used by the Oracle SQL Connector for HDFS to perform a specific task to access data.

What is the purpose of this script?


A. It is the preprocessor script for the Impala table.

B. It is the preprocessor script for the HDFS external table.

C. It is the streaming script that creates a database directory.

D. It is the preprocessor script for the Oracle partitioned table.

E. It defines the jar file that points to the directory where Hive is installed.

Correct Answer: BSection: (none)Explanation

Explanation/Reference:The hdfs_stream script is the preprocessor for the Oracle Database external table created by Oracle SQL Connector for HDFS.

References: https://docs.oracle.com/cd/E37231_01/doc.20/e36961/start.htm#BDCUG107

QUESTION 2How should you encrypt the Hadoop data that sits on disk?

A. Enable Transparent Data Encryption by using the Mammoth utility.

B. Enable HDFS Transparent Encryption by using bdacli on a Kerberos-secured cluster.

C. Enable HDFS Transparent Encryption on a non-Kerberos secured cluster.

D. Enable Audit Vault and Database Firewall for Hadoop by using the Mammoth utility.

Correct Answer: BSection: (none)


Explanation

Explanation/Reference:HDFS Transparent Encryption protects Hadoop data that’s at rest on disk. When the encryption is enabled for a cluster, data write and read operations onencrypted zones (HDFS directories) on the disk are automatically encrypted and decrypted. This process is “transparent” because it’s invisible to the applicationworking with the data.

The cluster where you want to use HDFS Transparent Encryption must have Kerberos enabled.

Incorrect Answers:D: The cluster where you want to use HDFS Transparent Encryption must have Kerberos enabled.

References: https://docs.oracle.com/en/cloud/paas/big-data-cloud/csbdi/using-hdfs-transparent-encryption.html#GUID-16649C5A-2C88-4E75-809A-BBF8DE250EA3

QUESTION 3What two things does the Big Data SQL push down to the storage cell on the Big Data Appliance? (Choose two.)

A. Transparent Data Encrypted data

B. the column selection of data from individual Hadoop nodes

C. WHERE clause evaluations

D. PL/SQL evaluation

E. Business Intelligence queries from connected Exalytics servers

Correct Answer: ABSection: (none)Explanation

Explanation/Reference:

QUESTION 4Your customer has an older starter rack Big Data Appliance (BDA) that was purchased in 2013. The customer would like to know what the options are for growingthe storage footprint of its server.

Which two options are valid for expanding the customer’s BDA footprint? (Choose two.)

A. Elastically expand the footprint by adding additional high capacity nodes.

B. Elastically expand the footprint by adding additional Big Data Oracle Database Servers.

C. Elastically expand the footprint by adding additional Big Data Storage Servers.


D. Racks manufactured before 2014 are no longer eligible for expansion.

E. Upgrade to a full 18-node Big Data Appliance.

Correct Answer: DESection: (none)Explanation


QUESTION 5

What are three correct results of executing the preceding query? (Choose three.)

A. Values longer than 100 characters for the DESCRIPTION column are truncated.

B. ORDER_LINE_ITEM_COUNT in the HDFS file matches ITEM_CNT in the external table.


C. ITEM_CNT in the HDFS file matches ORDER_LINE_ITEM_COUNT in the external table.

D. Errors in the data for CUST_NUM or ORDER_NUM set the value to INVALID_NUM.

E. Errors in the data for CUST_NUM or ORDER_NUM set the value to 0000000000.

F. Values longer than 100 characters for any column are truncated.

Correct Answer: ACDSection: (none)Explanation

Explanation/Reference:com.oracle.bigdata.overflow: Truncates string data. Values longer than 100 characters for the DESCRIPTION column are truncated.

com.oracle.bigdata.overflow: Truncates string data. Values longer than 100 characters for the DESCRIPTION column are truncated.

com.oracle.bigdata.erroropt: Replaces bad data. Errors in the data for CUST_NUM or ORDER_NUM set the value to INVALID_NUM.

References: https://docs.oracle.com/cd/E55905_01/doc.40/e55814/bigsql.htm#BIGUG76679

QUESTION 6What does the following line do in Apache Pig?

products = LOAD ‘/user/oracle/products’ AS (prod_id, item);

A. The products table is loaded by using data pump with prod_id and item.

B. The LOAD table is populated with prod_id and item.

C. The contents of /user/oracle/products are loaded as tuples and aliased to products.

D. The contents of /user/oracle/products are dumped to the screen.

Correct Answer: CSection: (none)Explanation

Explanation/Reference:The LOAD function loads data from the file system.

Syntax: LOAD 'data' [USING function] [AS schema];Terms: 'data'The name of the file or directory, in single quote


References: https://pig.apache.org/docs/r0.11.1/basic.html#load

QUESTION 7What is the output of the following six commands when they are executed by using the Oracle XML Extensions for Hive in the Oracle XQuery for HadoopConnector?

1. $ echo "xxx" > src.txt2. $ hive --auxpath $OXH_HOME/hive/lib -i $OXH_HOME/hive/init.sql3. hive> CREATE TABLE src (dummy STRING);4. hive> LOAD DATA LOCAL INPATH 'src.txt' OVERWRITE INTO TABLE src;5. hive> SELECT * FROM src; OK xxx

6. hive> SELECT xml_query ("x/y", "<x><y>123</y><z>456</z></x>") FROM src;

A. xyz

B. 123

C. 456

D. xxx

E. x/y


Explanation/Reference:Using the Hive Extensions

To enable the Oracle XQuery for Hadoop extensions, use the --auxpath and -i arguments when starting Hive:

$ hive --auxpath $OXH_HOME/hive/lib -i $OXH_HOME/hive/init.sql

The first time you use the extensions, verify that they are accessible. The following procedure creates a table named SRC, loads one row into it, and calls thexml_query function.

To verify that the extensions are accessible: 1. Log in to an Oracle Big Data Appliance server where you plan to work.2. Create a text file named src.txt that contains one line:$ echo "XXX" > src.txt

3. Start the Hive command-line interface (CLI):$ hive --auxpath $OXH_HOME/hive/lib -i $OXH_HOME/hive/init.sql

The init.sql file contains the CREATE TEMPORARY FUNCTION statements that declare the XML functions.


4. Create a simple table:hive> CREATE TABLE src(dummy STRING);

The SRC table is needed only to fulfill a SELECT syntax requirement. It is like the DUAL table in Oracle Database, which is referenced in SELECT statements totest SQL functions.5. Load data from src.txt into the table:hive> LOAD DATA LOCAL INPATH 'src.txt' OVERWRITE INTO TABLE src;

6. Query the table using Hive SELECT statements:hive> SELECT * FROM src;OKxxx

7. Call an Oracle XQuery for Hadoop function for Hive. This example calls the xml_query function to parse an XML string:hive> SELECT xml_query("x/y", "<x><y>123</y><z>456</z></x>") FROM src; . . .["123"]

If the extensions are accessible, then the query returns ["123"], as shown in the example

References: https://docs.oracle.com/cd/E53356_01/doc.30/e53067/oxh_hive.htm#BDCUG693

QUESTION 8The NoSQL KVStore experiences a node failure. One of the replicas is promoted to primary.

How will the NoSQL client that accesses the store know that there has been a change in the architecture?

A. The KVLite utility updates the NoSQL client with the status of the master and replica.

B. KVStoreConfig sends the status of the master and replica to the NoSQL client.

C. The NoSQL admin agent updates the NoSQL client with the status of the master and replica.

D. The Shard State Table (SST) contains information about each shard and the master and replica status for the shard.

Correct Answer: DSection: (none)Explanation

Explanation/Reference:Given a shard, the Client Driver next consults the Shard State Table (SST). For each shard, the SST contains information about each replication node comprisingthe group (step 5). Based upon information in the SST, such as the identity of the master and the load on the various nodes in a shard, the Client Driver selects thenode to which to send the request and forwards the request to the appropriate node. In this case, since we are issuing a write operation, the request must go to themaster node.

Note: If the machine hosting the master should fail in any way, then the master automatically fails over to one of the other nodes in the shard. That is, one of the


replica nodes is automatically promoted to master.

References: http://www.oracle.com/technetwork/products/nosqldb/learnmore/nosql-wp-1436762.pdf

QUESTION 9Your customer is experiencing significant degradation in the performance of Hive queries. The customer wants to continue using SQL as the main query languagefor the HDFS store.

Which option can the customer use to improve performance?


A. native MapReduce Java programs

B. Impala

C. HiveFastQL

D. Apache Grunt


Explanation/Reference:Cloudera Impala is Cloudera's open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop.

Impala brings scalable parallel database technology to Hadoop, enabling users to issue low-latency SQL queries to data stored in HDFS and Apache HBase withoutrequiring data movement or transformation.

References: https://en.wikipedia.org/wiki/Cloudera_Impala

QUESTION 10Your customer’s Oracle NoSQL store has a replication factor of 3. One of the customer’s replica nodes goes down.

What will be the long-term performance impact on the customer’s NoSQL database if the node is replaced?

A. There will be no performance impact.


B. The database read performance will be impacted.

C. The database read and write performance will be impacted.

D. The database will be unavailable for reading or writing.

E. The database write performance will be impacted.


Explanation/Reference:The number of nodes belonging to a shard is called its Replication Factor. The larger a shard's Replication Factor, the faster its read throughput (because there aremore machines to service the read requests) but the slower its write performance (because there are more machines to which writes must be copied).

Note: Replication Nodes are organized into shards. A shard contains a single Replication Node which is responsible for performing database writes, and whichcopies those writes to the other Replication Nodes in the shard. This is called the master node. All other Replication Nodes in the shard are used to service read-only operations.

References: https://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/introduction.html#replicationfactor

QUESTION 11Your customer is using the IKM SQL to HDFS File (Sqoop) module to move data from Oracle to HDFS. However, the customer is experiencing performance issues.

What change should you make to the default configuration to improve performance?

A. Change the ODI configuration to high performance mode.

B. Increase the number of Sqoop mappers.

C. Add additional tables.

D. Change the HDFS server I/O settings to duplex mode.


Explanation/Reference:Controlling the amount of parallelism that Sqoop will use to transfer data is the main way to control the load on your database. Using more mappers will lead to ahigher number of concurrent data transfer tasks, which can result in faster job completion. However, it will also increase the load on the database as Sqoop willexecute more concurrent queries.

References: https://community.hortonworks.com/articles/70258/sqoop-performance-tuning.html


QUESTION 12What is the result when a flume event occurs for the following single node configuration?

A. The event is written to memory.

B. The event is logged to the screen.

C. The event output is not defined in this section.

D. The event is sent out on port 44444.

E. The event is written to the netcat process.

Correct Answer: BSection: (none)


Explanation

Explanation/Reference:This configuration defines a single agent named a1. a1 has a source that listens for data on port 44444, a channel that buffers event data in memory, and a sinkthat logs event data to the console.

Note: A sink stores the data into centralized stores like HBase and HDFS. It consumes the data (events) from the channels and delivers it to the destination. Thedestination of the sink might be another agent or the central stores.

A source is the component of an Agent which receives data from the data generators and transfers it to one or more channels in the form of Flume events.

Incorrect Answers:D: port 4444 is part of the source, not the sink.

References: https://flume.apache.org/FlumeUserGuide.html

QUESTION 13What kind of workload is MapReduce designed to handle?

A. batch processing

B. interactive

C. computational

D. real time

E. commodity

Correct Answer: ASection: (none)Explanation

Explanation/Reference:Hadoop was designed for batch processing. That means, take a large dataset in input all at once, process it, and write a large output. The very concept ofMapReduce is geared towards batch and not real-time. With growing data, Hadoop enables you to horizontally scale your cluster by adding commodity nodes andthus keep up with query. In hadoop Map-reduce does the same job it will take large amount of data and process it in batch. It will not give immediate output. It willtake time as per Configuration of system,namenode,task-tracker,job-tracker etc.

References: https://www.quora.com/What-is-batch-processing-in-hadoop

QUESTION 14Your customer uses LDAP for centralized user/group management.


How will you integrate permissions management for the customer’s Big Data Appliance into the existing architecture?

A. Make Oracle Identity Management for Big Data the single source of truth and point LDAP to its keystore for user lookup.

B. Enable Oracle Identity Management for Big Data and point its keystore to the LDAP directory for user lookup.

C. Make Kerberos the single source of truth and have LDAP use the Key Distribution Center for user lookup.

D. Enable Kerberos and have the Key Distribution Center use the LDAP directory for user lookup.


Explanation/Reference:Kerberos integrates with LDAP servers – allowing the principals and encryption keys to be stored in the common repository.The complication with Kerberos authentication is that your organization needs to have a Kerberos KDC (Key Distribution Center) server setup already, which willthen link to your corporate LDAP or Active Directory service to check user credentials when they request a Kerberos ticket.

References: https://www.rittmanmead.com/blog/2015/04/setting-up-security-and-access-control-on-a-big-data-appliance/

QUESTION 15Your customer collects diagnostic data from its storage systems that are deployed at customer sites. The customer needs to capture and process this data bycountry in batches.

Why should the customer choose Hadoop to process this data?

A. Hadoop processes data on large clusters (10-50 max) on commodity hardware.

B. Hadoop is a batch data processing architecture.

C. Hadoop supports centralized computing of large data sets on large clusters.

D. Node failures can be dealt with by configuring failover with clusterware.

E. Hadoop processes data serially.


Explanation/Reference:Hadoop was designed for batch processing. That means, take a large dataset in input all at once, process it, and write a large output. The very concept ofMapReduce is geared towards batch and not real-time. With growing data, Hadoop enables you to horizontally scale your cluster by adding commodity nodes andthus keep up with query. In hadoop Map-reduce does the same job it will take large amount of data and process it in batch. It will not give immediate output. It willtake time as per Configuration of system,namenode,task-tracker,job-tracker etc.


Incorrect Answers:A: Yahoo! has by far the most number of nodes in its massive Hadoop clusters at over 42,000 nodes as of July 2011.

C: Hadoop supports distributed computing of large data sets on large clusters

E: Hadoop processes data in parallel.

References: https://www.quora.com/What-is-batch-processing-in-hadoop

QUESTION 16Your customer wants to architect a system that helps to make real-time recommendations to users based on their past search history.

Which solution should the customer use?

A. Oracle Container Database

B. Oracle Exadata

C. Oracle NoSQL

D. Oracle Data Integrator


Explanation/Reference:Oracle Data Integration (both Oracle GoldenGate and Oracle Data Integrator) help to integrate data end-to-end between big data (NoSQL,Hadoop-based)environments and SQL-based environments. These data integration technologies are the key ingredient to Oracle’s Big Data Connectors. Oracle Big DataConnectors provide integration to from Oracle Big Data Appliance to relational Oracle Databases where in-Database analytics can be performed.

Oracle’s data integration solutions speed the loads of the Connecting Visibility to Value Oracle Exadata Database Machine by 500% while providing continuousaccess to business critical information across heterogeneous sources.

References: http://www.oracle.com/us/solutions/fastdata/fast-data-gets-real-time-wp-1927038.pdf

QUESTION 17How should you control the Sqoop parallel imports if the data does not have a primary key?

A. by specifying no primary key with the --no-primary argument

B. by specifying the number of maps by using the –m option

C. by indicating the split size by using the --direct-split-size option


D. by choosing a different column that contains unique data with the --split-by argument


Explanation/Reference:If the actual values for the primary key are not uniformly distributed across its range, then this can result in unbalanced tasks. You should explicitly choose adifferent column with the --split-by argument. For example, --split-by employee_id.

Note: When performing parallel imports, Sqoop needs a criterion by which it can split the workload. Sqoop uses a splitting column to split the workload. By default,Sqoop will identify the primary key column (if present) in a table and use it as the splitting column. The low and high values for the splitting column are retrievedfrom the database, and the map tasks operate on evenly-sized components of the total range.

References: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_importing_data_into_hbase

QUESTION 18Your customer uses Active Directory to manage user accounts. You are setting up Hadoop Security for the customer’s Big Data Appliance.

How will you integrate Hadoop and Active Directory?

A. Set up Kerberos’ Key Distribution Center to be the Active Directory keystore.

B. Configure Active Directory to use Kerberos’ Key Distribution Center.

C. Set up a one-way cross-realm trust from the Kerberos realm to the Active Directory realm.

D. Set up a one-way cross-realm trust from the Active Directory realm to the Kerberos realm.


Explanation/Reference:If direct integration with AD is not currently possible, use the following instructions to configure a local MIT KDC to trust your AD server:1. Run an MIT Kerberos KDC and realm local to the cluster and create all service principals in this realm.2. Set up one-way cross-realm trust from this realm to the Active Directory realm. Using this method, there is no need to create service principals in Active

Directory, but Active Directory principals (users) can be authenticated to Hadoop.

Incorrect Answers:B: The complication with Kerberos authentication is that your organization needs to have a Kerberos KDC (Key Distribution Center) server setup already, which willthen link to your corporate LDAP or Active Directory service to check user credentials when they request a Kerberos ticket.


References: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_hadoop_security_active_directory_integrate.html#topic_15_1

QUESTION 19What is the main purpose of the Oracle Loader for Hadoop (OLH) Connector?

A. runs transformations expressed in XQuery by translating them into a series of MapReduce jobs that are executed in parallel on a Hadoop cluster

B. pre-partitions, sorts, and transforms data into an Oracle ready format on Hadoop and loads it into the Oracle database

C. accesses and analyzes data in place on HDFS by using external tables

D. performs scalable joins between Hadoop and Oracle Database data

E. provides a SQL-like interface to data that is stored in HDFS

F. is the single SQL point-of-entry to access all data


Explanation/Reference:Oracle Loader for Hadoop is an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database. Itprepartitions the data if necessary and transforms it into a database-ready format.

References: https://docs.oracle.com/cd/E37231_01/doc.20/e36961/olh.htm#BDCUG140

QUESTION 20Your customer has three XML files in HDFS with the following contents. Each XML file contains comments made by users on a specific day. Each comment canhave zero or more “likes” from other users. The customer wants you to query this data and load it into the Oracle Database on Exadata.

How should you parse this data?


A. by creating a table in Hive and using MapReduce to parse the XML data by column

B. by configuring the Oracle SQL Connector for HDFS and parsing by using SerDe

C. by using the XML file module in the Oracle XQuery for Hadoop Connector

D. by using the built-in functions for reading JSON in the Oracle XQuery for Hadoop Connector




Using Oracle SQL Connector for HDFS, you can use Oracle Database to access and analyze data residing in Apache Hadoop in these formats:Data Pump files in HDFSDelimited text files in HDFSDelimited text files in Apache Hive tables

SerDe is short for Serializer/Deserializer. Hive uses the SerDe interface for IO. The interface handles both serialization and deserialization and also interpreting theresults of serialization as individual fields for processing.A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats.

References:https://docs.oracle.com/cd/E53356_01/doc.30/e53067/osch.htm#BDCUG126 https://cwiki.apache.org/confluence/display/Hive/SerDe

QUESTION 21Identify two ways to create an external table to access Hive data on the Big Data Appliance by using Big Data SQL. (Choose two.)

A. Use Cloudera Manager's Big Data SQL Query builder.

B. You can use the dbms_hadoop.create_extdd1_for_hive package to return the text of the CREATE TABLE command.


C. Use a CREATE table statement with ORGANIZATION EXTERNAL and the ORACLE_BDSQL access parameter.

D. Use a CREATE table statement with ORGANIZATION EXTERNAL and the ORACLE_HIVE access parameter.

E. Use the Enterprise Manager Big Data SQL Configuration page to create the table.

Correct Answer: BDSection: (none)Explanation

Explanation/Reference:CREATE_EXTDDL_FOR_HIVE returns a SQL CREATE TABLE ORGANIZATION EXTERNAL statement for a Hive table. It uses the ORACLE_HIVE accessdriver.

References: https://docs.oracle.com/cd/E55905_01/doc.40/e55814/bigsqlref.htm#BIGUG76630

QUESTION 22


What access driver does the Oracle SQL Connector for HDFS use when reading HDFS data by using external tables?

A. ORACLE_DATA_PUMP

B. ORACLE_LOADER

C. ORACLE_HDP

D. ORACLE_BDSQL

E. HADOOP_LOADER

F. ORACLE_HIVE_LOADER


Explanation/Reference:Oracle SQL Connector for HDFS creates the external table definition for Data Pump files by using the metadata from the Data Pump file header. It uses theORACLE_LOADER access driver with the preprocessor access parameter. It also uses a special access parameter named EXTERNAL VARIABLE DATA, whichenables ORACLE_LOADER to read the Data Pump format files generated by Oracle Loader for Hadoop.

References: https://docs.oracle.com/cd/E37231_01/doc.20/e36961/sqlch.htm#BDCUG356

QUESTION 23You recently set up a customer’s Big Data Appliance. At the time, all users wanted access to all the Hadoop data. Now, the customer wants more control over thedata that is stored in Hadoop.

How should you accommodate this request?

A. Configure Audit Vault and Database Firewall protection policies for the Hadoop data.

B. Update the MySQL metadata for Hadoop to define access control lists.

C. Configure an /etc/sudoers file to restrict the Hadoop data.

D. Configure Apache Sentry policies to protect the Hadoop data.


Explanation/Reference:Apache Sentry is a new project that delivers fine grained access control; both Cloudera and Oracle are the project’s founding members. Sentry satisfies thefollowing three authorization requirements:


Secure Authorization: the ability to control access to data and/or privileges on data for authenticated users.Fine-Grained Authorization: the ability to give users access to a subset of the data (e.g. column) in a databaseRole-Based Authorization: the ability to create/apply template-based privileges based on functional roles.

Incorrect Answers:C: The file /etc/sudoers contains a list of users or user groups with permission to execute a subset of commands while having the privileges of the root user oranother specified user. The program may be configured to require a password.

References: https://blogs.oracle.com/datawarehousing/new-big-data-appliance-security-features

QUESTION 24You are working with a client who does not allow the storage of user or schema passwords in plain text.

How can you configure the Oracle Loader for Hadoop configuration file to meet the requirements of this client?

A. Store the password in an Access Control List and configure the ACL location in the configuration file.

B. Encrypt the password in the configuration file by using Transparent Data Encryption.

C. Configure the configuration file to prompt for the password during remote job executions.

D. Store the information in an Oracle wallet and configure the wallet location in the configuration file.


Explanation/Reference:In online database mode, Oracle Loader for Hadoop can connect to the target database using the credentials provided in the job configuration file or in an Oraclewallet. Oracle Wallet Manager is an application that wallet owners use to manage and edit the security credentials in their Oracle wallets. A wallet is a password-protectedcontainer used to store authentication and signing credentials, including private keys, certificates, and trusted certificates needed by SSL.

Note: Oracle Wallet Manager provides the following features:Wallet Password ManagementStrong Wallet EncryptionMicrosoft Windows Registry Wallet StorageBackward CompatibilityPublic-Key Cryptography Standards (PKCS) SupportMultiple Certificate SupportLDAP Directory Support

References: https://docs.oracle.com/cd/B28359_01/network.111/b28530/asowalet.htm#BABHEDIE


QUESTION 25Your customer needs the data that is generated from social media such as Facebook and Twitter, and the customer’s website to be consumed and sent to anHDFS directory for analysis by the marketing team.

Identify the architecture that you should configure.

A. multiple flume agents with collectors that output to a logger that writes to the Oracle Loader for Hadoop agent

B. multiple flume agents with sinks that write to a consolidated source with a sink to the customer's HDFS directory

C. a single flume agent that collects data from the customer's website, which is connected to both Facebook and Twitter, and writes via the collector to thecustomer's HDFS directory

D. multiple HDFS agents that write to a consolidated HDFS directory

E. a single HDFS agent that collects data from the customer's website, which ls connected to both Facebook and Twitter, and writes via the Hive to the customer'sHDFS directory


Explanation/Reference:Apache Flume - Fetching Twitter Data. Flume in this case will be responsible for capturing the tweets from Twitter in very high velocity and volume, buffer them inmemory channel (maybe do some aggregation since we're getting JSONs) and eventually sink them into HDFS.

References: https://www.tutorialspoint.com/apache_flume/fetching_twitter_data.htm

QUESTION 26What are the two advantages of using Hive over MapReduce? (Choose two.)

A. Hive is much faster than MapReduce because it accesses data directly.

B. Hive allows for sophisticated analytics on large data sets.

C. Hive does not require MapReduce to run in order to analyze data.

D. Hive is a free tool; Hadoop requires a license.

E. Hive simplifies Hadoop for new users.

Correct Answer: BESection: (none)Explanation

Explanation/Reference:E: A comparison of the performance of the Hadoop/Pig implementation of MapReduce with Hadoop/Hive.


Both Hive and Pig are platforms optimized for analyzing large data sets and are built on top of Hadoop. Hive is a platform that provides a declarative SQLlikelanguage whereas Pig requires users to write a procedural language called PigLatin.Writing MapReduce jobs in Java can be difficult, Hive and Pig has been developed and works as platforms on top of Hadoop. Hive and Pig allows users easyaccess to data compared to implementing their own MapReduce in Hadoop.

Incorrect Answers: A: Hive and Pig has been developed and works as platforms on top of Hadoop.

C: Apache Hive provides an SQL-like query language called HiveQL with schema on read and transparently converts queries to MapReduce, Apache Tez andSpark jobs.

D: Apache Hadoop is an open-source software framework, licensed through Apache License, Version 2.0 (ALv2), which is a permissive free software licensewritten by the Apache Software Foundation (ASF).

References: https://www.kth.se/social/files/55802074f2765474cb6f543c/7.pdf

QUESTION 27During a meeting with your customer’s IT security team, you are asked the names of the main OS users and groups for the Big Data Appliance.

Which users are created automatically during the installation of the Oracle Big Data Appliance?

A. flume, hbase, and hdfs

B. mapred, bda, and engsys

C. hbase, cdh5, and oracle

D. bda, cdh5, and oracle



QUESTION 28Which command should you use to view the contents of the HDFS directory, /user/oracle/logs?

A. hadoop fs –cat /user/oracle/logs

B. hadoop fs –ls /user/oracle/logs

C. cd /user/oracle


hadoop fs –ls logs

D. cd /user/oracle/logs hadoop fs –ls *

E. hadoop fs –listfiles /user/oracle/logs

F. hive> select * from /user/oracle/logs


Explanation/Reference:To list the contents of a directory named /user/training/hadoop in HDFS.#hadoop fs -ls /user/training/hadoop

Incorrect Answers:A: hadoop fs –cat displays the content of a file.

References: http://princetonits.com/blog/technology/33-frequently-used-hdfs-shell-commands/

QUESTION 29Your customer receives data in JSON format.

Which option should you use to load this data into Hive tables?

A. Python

B. Sqoop

C. a custom Java program

D. Flume

E. SerDe

Correct Answer: ESection: (none)Explanation

Explanation/Reference:SerDe is short for Serializer/Deserializer. Hive uses the SerDe interface for IO. The interface handles both serialization and deserialization and also interpreting theresults of serialization as individual fields for processing.A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats.


The JsonSerDe for JSON files is available in Hive 0.12 and later.

References: https://cwiki.apache.org/confluence/display/Hive/SerDe

QUESTION 30Your customer needs to move data from Hive to the Oracle database but does have any connectors purchased.

What is another architectural choice that the customer can make?

A. Use Apache Sqoop.

B. Use Apache Sentry.

C. Use Apache Pig.

D. Export data from Hive by using export/import.


Explanation/Reference:Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL,Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases.

Incorrect Answers:B: Apache Sentry is an authorization module for Hadoop that provides the granular, role-based authorization required to provide precise levels of access to the rightusers and applications.

C: Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure forevaluating these programs.

References: https://www.tutorialspoint.com/sqoop/

QUESTION 31Your customer is setting up an external table to provide read access to the Hive table to Oracle Database.

What does hdfs:/user/scott/data refer to in the external table definition for the Oracle SQL Connector for HDFS?


A. the default directory for the Oracle external table

B. the local file system location for the data

C. the location for the log directory

D. the location of the HDFS input data

E. the location of the Oracle data file for SALES_DP_XTAB


Explanation/Reference:hdfs:/user/scott/data/ is the location of the HDFS data.


QUESTION 32Your customer’s IT staff is made up mostly of SQL developers. Your customer would like you to design a system to analyze the spending patterns of customers inthe web store. The data resides in HDFS.

What tool should you use to meet their needs?

A. Oracle Database 12c

B. SQL Developer

C. Apache Hive

D. MapReduce

E. Oracle Data Integrator



Explanation/Reference:Oracle SQL Developer is one of the most common SQL client tool that is used by Developers, Data Analyst, Data Architects etc for interacting with Oracle and otherrelational systems. SQL Developer and Data Modeler (version 4.0.3) now support Hive andOracle Big Data SQL. The tools allow you to connect to Hive, use the SQL Worksheet toquery, create and alter Hive tables, and automatically generate Big Data SQL-enabled Oracle external tables that dynamically access data sources defined in theHive metastore.

Incorrect Answers:E: Oracle Data Integrator (ODI) is an Extract, load and transform (ELT) (in contrast with the ETL common approach) tool produced by Oracle that offers a graphicalenvironment to build, manage and maintain data integration processes in business intelligence systems.

References: https://blogs.oracle.com/datawarehousing/oracle-sql-developer-data-modeler-support-for-oracle-big-data-sql

QUESTION 33Which statement is true about the NameNode in Hadoop?

A. A query in Hadoop requires a MapReduce job to be run so the NameNode gets the location of the data from the JobTracker.

B. If the NameNode goes down and a secondary NameNode has not been defined, the cluster is still accessible.

C. When loading data, the NameNode tells the client or program where to write the data.

D. All data passes through the NameNode; so if it is not sized properly, it could be a potential bottleneck.


Explanation/Reference:Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a SecondaryNameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error.

In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and theother is in a Standby state.

Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a SecondaryNameNode, CheckpointNode, or BackupNode in an HA cluster.

References: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html


QUESTION 34How does increasing the number of storage nodes and shards impact the efficiency of Oracle NoSQL Database?

A. The number of shards or storage nodes does not impact performance.

B. Having more shards reduces the write throughput because of increased node forwarding.

C. Having more shards results in reduced read throughput because of increased node forwarding.

D. Having more shards increases the write throughput because more master nodes are available for writes.


Explanation/Reference:The more shards that your store contains, the better your write performance is because the store contains more nodes that are responsible for servicing writerequests.

References: https://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/introduction.html

QUESTION 35Your customer is worried that the redundancy of HDFS will not meet its needs. The customer needs to store certain files with higher levels of redundancy than otherfiles.

Which architectural feature of HDFS should you choose?

A. Apache Impala, which can be used to set the duplex level for each file

B. Automatic Storage Management, which can be used to multiplex the data

C. Apache Scala on top of MapReduce, which allows you to store files at various redundancies

D. the replication factor, which can be set at the file level


Explanation/Reference:You can change the replication factor on a per-file basis using the Hadoop FS shell.To set replication of an individual file to 4:./bin/hadoop dfs -setrep -w 4 /path/to/file

Incorrect Answers:A: Cloudera Impala is Cloudera's open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache


Hadoop.

C: Apache Scala is a high-level programming language which is a combination of object-oriented programming and functional programming. It is highly scalablewhich is why it is called Scala.

References: https://sites.google.com/site/hadoopandhive/home/how-to-change-replication-factor-of-existing-files-in-hdfs

QUESTION 36Identify two valid steps for setting up a Hive-to-Hive transformation by using Oracle Data Integrator. (Choose two.)

A. Ensure that the Hive server is up and running.

B. Create a Logical Schema object.

C. Configure ODI by using the mammoth utility.

D. Ensure that Apache Sentry is configured.

Correct Answer: ABSection: (none)Explanation

Explanation/Reference:Setting Up the Hive Data Source The following steps in Oracle Data Integrator are required for connecting to a Hive system.

To set up a Hive data source (see step 6 and 8): 1. Place all required Hive JDBC jars into the Oracle Data Integrator user lib folder:$HIVE_HOME/lib/*.jar$HADOOP_HOME/hadoop-*-core*.jar,$HADOOP_HOME/Hadoop-*-tools*.jar

2. Create a DataServer object under Hive technology.3. Set the following locations under JDBC:JDBC Driver: org.apache.hadoop.hive.jdbc.HiveDriverJDBC URL: for example, jdbc:hive://BDA:10000/default4. Set the following under Flexfields:Hive Metastore URIs: for example, thrift://BDA:100005. Create a Physical Default Schema.As of Hive 0.7.0, no schemas or databases are supported. Only Default is supported. Enter default in both schema fields of the physical schema definition.6. Ensure that the Hive server is up and running.7. Test the connection to the DataServer.8. Create a Logical Schema object.9. Create at least one Model for the LogicalSchema.10. Import RKM Hive as a global KM or into a project.


11. Create a new model for Hive Technology pointing to the logical schema.12. Perform a custom reverse using RKM Hive.

References: https://docs.oracle.com/cd/E27101_01/doc.10/e27365/odi.htm

QUESTION 37What are two reasons that a MapReduce job is not run when the following command is executed in Hive? (Choose two.)

hive> create view v_consolidated_credit_products as selecthcd.customer_id, hcd.credit_card_limits, hcd.credit_balance, hmda.n_mortgages, hmda.mortgage_amountfrom mortgages_department_agg hmda join credit_department hcd onhcd.customer_id=hmda.customer_id;OKTime taken: 0.316 seconds

A. The MapReduce job is run when the view is accessed.

B. MapReduce is run; the command output is incorrect.

C. MapReduce is not run with Hive. Hive bypasses the MapReduce layer.

D. A view only defines the metadata in the RDBMS store.

E. MapReduce is run in the background.

Correct Answer: ADSection: (none)Explanation

Explanation/Reference:The CREATE VIEW statement lets you create a shorthand abbreviation for a more complicated query. The base query can involve joins, expressions, reorderedcolumns, column aliases, and other SQL features that can make a query hard to understand or maintain.

Because a view is purely a logical construct (an alias for a query) with no physical data behind it, ALTER VIEW only involves changes to metadata in the metastoredatabase, not any data files in HDFS.

References: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_create_view.html

QUESTION 38Your customer has had a security breach on its running Big Data Appliance and needs to quickly enable security for Hive and Impala.

What is the quickest way to enable the Sentry service?

A. Execute bdacli sentry enable.


B. Use the Mammoth utility during installation.

C. Sentry is enabled by default on the Big Data Appliance.

D. Use the Cloudera Manager.


Explanation/Reference:Sentry Setup with Hive.Before Sentry is setup we have to ensure that Kerberos is enabled in the cluster and users have specific access to folders and services. Follow the below steps tosetup Sentry with Kerberos:1. Open Cloudera Manager and go to settings and install new parcel http://archive.cloudera.com/sentry/parcels/latest/2. Set the user for hive warehouse folder in Hadoop to hive3. In Cloudera Manager go to HiveServer2 settings4. Under Configurations section, uncheck the HiveServer2 Enable Impersonation property.5. Create a new File sentry-provider.ini in /user/hive/sentry6. Go to Cloudera Manager MapReduce configuration and set the Minimum User ID for Job Submission to 0.7. In Hive settings go to Service-Wide category, Sentry section, check Enable Sentry Authorization.8. Save all changes and deploy client configuration.9. Restart cluster for changes to take effect.

Incorrect Answers:A: You use the command bdacli enable sentry, not bdacli sentry enable.

References: http://ngvtech.in/droidhub/hadoop-kerberos-security/

QUESTION 39Your customer has a Big Data Appliance and an Exadata Database Machine and would like to extend security. Select two ways that security works in Big DataSQL. (Choose two.)

A. On the Big Data Appliance, Hadoop's native security is used.

B. On the Exadata Database Machine, Oracle Advanced Security is used for fine-grained access control.

C. On the Big Data Appliance, Oracle Advanced Hadoop Security is used for fine grained access control.

D. On the Big Data Appliance, Oracle Identity Management is used.

E. On the Big Data Appliance, data is encrypted by using Oracle Transparent Data Encryption (TDE).

Correct Answer: BESection: (none)Explanation


Explanation/Reference:Transparent Data Encryption is a great way to protect sensitive data in large-scale Exadata scenarios. With Exadata, substantial crypto performance gains arepossible.

Oracle Exadata Database Machine with Oracle Advanced SecurityOracle Advanced Security transparent data encryption (TDE) protects sensitive data such as credit card numbers and email addresses from attempts to access atthe operating system level, on backup media or in database exports. No triggers, views or other costly changes to the application are required. TDE leveragesperformance and storage optimizations of the Oracle Exadata Database Machine, including the Smart Flash Cache and Hybrid Columnar Compression (EHCC).

References: http://www.oracle.com/us/products/database/exadata-db-machine-security-ds-401799.pdf

QUESTION 40Your customer wants you to set up ODI by using the IKM SQL to Hive module and overwrite existing Hive tables with the customer’s new clinical data.

What parameter must be set to true?

A. SQOOP_ODI_OVERWRITE

B. OVERWRITE_HIVE_TABLE

C. OVERWRITE_HDFS_TABLE

D. CREATE_HIVE_TABLE



QUESTION 41Which user in the following set of ACL entries has read, write and execute permissions?

user::rwxgroup::r-xother::r-xdefault:user::rwxdefault:user:bruce:rwx #effective:r-xdefault:group::r-xdefault:group:sales:rwx #effective:r-xdefault:mask::r-xdefault:other::r-x


A. sales

B. bruce

C. all users

D. hdfs

E. user

Correct Answer: ESection: (none)Explanation

Explanation/Reference:References: https://askubuntu.com/questions/257896/what-is-meant-by-mask-and-effective-in-the-output-from-getfacl

QUESTION 42How does the Oracle SQL Connector for HDFS access HDFS data from the Oracle database?

A. NoSQL tables

B. Data Pump files

C. external tables

D. Apache Sqoop files

E. non-partitioned tables


Explanation/Reference:Using Oracle SQL Connector for HDFS, you can use Oracle Database to access and analyze data residing in Hadoop in these formats:

Data Pump files in HDFSDelimited text files in HDFSHive tables


oracle.braindumps.1z0-449.v2018-11-29.by.anna · hdfs transparent encryption protects hadoop data...

Documents