In this blog, I will discuss about the easiest way to run Hive script file without Oozie or other Hadoop tools.
Nowadays companies are moving from Traditional Hadoop vendors like Cloudera, MapR, and Hortonworks to Cloud Solution providers like AWS, GCP, etc. because it simplifies the steps involved in the data processing.
The following are easier way to execute the Script file through AWS.
Create a cluster with Step:
Note: Auto-terminate cluster option is also available for Script file
After adding Script in Steps:
Next, proceed with the required hardware, general cluster settings, and security pages. Then create the cluster.
If you are already running a cluster and want to run the script file, again you can add a step in the running cluster.
Listed are the steps to run the Script file after cluster creation:-
- Upload the Script file in the AWS S3 location.
- Start your AWS EMR cluster with the necessary configuration.
- Enter the cluster and navigate to Steps Menu.
- Click on add Stepsoption and follow the steps below,
- In the Step, type choose Hive program
- In the Script S3 location, select the exact Script file location
- Now click on the Add button.
- The step starts running and gives the desired output.
Once after completion of the step, you can refer the execution in the log file called stderr
- It is only a single time file upload into S3.
- No need to add steps every time to the newly created cluster. We can simply clone the existing or terminated clusters, in which the step was added already. This reduces a lot of steps involved in running a Hive Script shell.
- The cluster can be set to terminate automatically, once after completion of the Step. This reduces the unnecessary runtime of the cluster and reduces the cost.
- We can copy the AWS CLI export and run it to clone the cluster easily.
Through these simple steps,
- we can surpass the script file upload into HDFS each and every time we create a new EMR cluster.
- We can skip creating Oozie workflow and running it.
- We can skip above mentioned steps each and every time while creating the Cluster.