Q.1 What is a Hive variable? What do we use it for?Ans: Hive variables are basically created in the Hive environment that is referenced by Hive scripting languages. They allow to pass some values to a Hive query when the query starts executing. They use the source command. |
Q.2 Explain the process to access subdirectories recursively in Hive queries.Ans: hive> Set mapred.input.dir.recursive=true;hive> Set hive.mapred.supports.subdirectories=true;Create external table and specify root directory as a location:LOCATION 'hdfs://.../data' |
Q.3 Which java class handles the Input record encoding into files which store the tables in Hive?Ans: The 'org.apache.hadoop.mapred.TextInputFormat' class. |
Q.4 What is a view in Hive?Ans: A view is a logical construct that allows search queries to be treated as tables. |
Q.5 Hive limitations ?Ans: Below are the limitations of HIVE:No Row Level Insert, Update and deletes.It does not support ACID properties as in RDBMS, hence does not support transactions. Though newer version support transaction but the performance degrades with it.Not designed for OLTP hence, no Real time access to data.High Latency.No difference between NULL and null values.It does not support triggers. |
Q.6 what is Hcatolog?Ans: HCatalog is a table storage management tool for Hadoop that exposes the tabular data of Hive metastore to other Hadoop applications. It enables users with different data processing tools (Pig, MapReduce) to easily write data onto a grid. |
Q.7 What are the different components of a Hive architecture?Ans: Following are the five components of a Hive Architecture:User Interface: It helps the user to send queries to the Hive system and other operations. The user interface provides hive Web UI, Hive Command-Line and Hive HDInsight.Driver: It designs a session handle for the query, and then the queries are sent to the compiler for the execution plan.Metastore: It contains the organized data and information on various warehouse tables and partitions.Compiler: It creates the execution plan for the queries, performs semantic analysis on different query blocks, and generates query expression.Execution Engine: It implements the execution plans created by the compiler. |
Q.8 Can you avoid MapReduce on Hive?Ans: You can make Hive avoid MapReduce to return query results by setting the hive.exec.mode.local.auto property to ‘true’. |
Q.9 what is Bucketing in hive?Ans: Bucketing in hive is the concept of breaking data down into ranges, which are known as buckets, to give extra structure to the data so it may be used for more efficient queries. |
Q.10 what is difference in Partitioning and Bucketing in hive?Ans: Partitioning is most effective for low volume data, as it carries the possibility of too many small partition creations and too many directories. And since bucketing results in equal volumes of data in each partition, joins at Map side will be quicker.Partitioning is based on a column that is repeated in the dataset and involves grouping data by a particular value of the partition column. While bucketing organizes data by a range of values, mainly involving primary key or non-repeated values in a dataset. |
|