Table of Contents
Gobblin CLI
The Gobblin distribution contains a CLI at bin/gobblin
providing CLI access to various Gobblin applications and commands. To run:
bin/gobblin
The usage is bin/gobblin <command>
where the command specifies an application to run. Running bin/gobblin -h
provides a list of available commands.
The special command bin/gobblin classpath
is trapped by the bash script and simply displays the full classpath that Gobblin uses.
If running from an IDE, the main method for the CLI is gobblin.runtime.cli.GobblinCli
.
A note on Hadoop classpath
When running bin/gobblin
, the script automatically finds the classpath for the job. Although Hadoop jars are included in the Gobblin distribution, if HADOOP_HOME
is set in the environment, Gobblin will instead use the classpath provided by the local Hadoop installation.
An important side effect of this is that (depending on the application) non-fully-qualified paths (like /my/path
) will default to local file system if HADOOP_HOME
is not set, while they will default to HDFS if the variable is set. When referring to local paths, it is always a good idea to use the fully qualified path (e.g. file:///my/path
).
Gobblin ingestion applications
Gobblin ingestion applications can be accessed through the command run
:
bin/gobblin run [listQuickApps] [<quick-app>] -jobName <jobName> [OPTIONS]
For usage run bin/gobblin run
.
bin/gobblin run
uses Embedded Gobblin and subclasses to run Gobblin ingestion jobs, giving CLI access to most functionality that could be achieved using EmbeddedGobblin
. For example, the following command will run a Hello World job (it will print "Hello World 1 !" somewhere in the logs).
bin/gobblin run -jobName helloWorld -setTemplate resource:///templates/hello-world.template
Obviously, it is daunting to have to know the path to templates and exactly which configurations to set. The alternative is to use a quick app. Running:
bin/gobblin run listQuickApps
will provide with a list of available quick apps. To run a quick app:
bin/gobblin run <quick-app-name>
Quick apps may require additional arguments. For the usage of a particular app, run bin/gobblin run <quick-app-name> -h
.
For example, consider the quick app distcp:
$ bin/gobblin run distcp -h
usage: gobblin run distcp [OPTIONS] <source> <target>
-delete Delete files in target that don't exist
on source.
-deleteEmptyParentDirectories If deleting files on target, also delete
newly empty parent directories.
-distributeJar <arg>
-h,--help
-l Uses log to print out erros in the base
CLI code.
-mrMode
-setConfiguration <arg>
-setJobTimeout <arg>
-setLaunchTimeout <arg>
-setShutdownTimeout <arg>
-simulate
-update Specifies files should be updated if
they're different in the source.
-useStateStore <arg>
This provides usage for the app distcp, as well as listing all available options. Distcp could then be run:
bin/gobblin run distcp file:///source/path file:///target/path
Developing quick apps for the CLI
It is very easy to convert a subclass of EmbeddedGobblin
into a quick application for Gobblin CLI. All that is needed is to implement a EmbeddedGobblinCliFactory
which knows how instantiate the EmbeddedGobblin
from a CommandLine
object and annotate it with the Alias
annotation. There are two utility classes that make this very easy:
PublicMethodsGobblinCliFactory
: this class will automatically infer CLI options from the public methods of a subclass ofEmbeddedGobblin
. All the developer has to do is implement the methodconstructEmbeddedGobblin(CommandLine)
that calls the appropriate constructor of the desiredEmbeddedGobblin
subclass with parameters extracted from the CLI. Additionally, it is a good idea to overridegetUsageString()
with the appropriate usage string. For an example, seegobblin.runtime.embedded.EmbeddedGobblinDistcp.CliFactory
.ConstructorAndPublicMethodsGobblinCliFactory
: this class does everythingPublicMethodsGobblinCliFactory
does, but it additionally automatically infers how to construct theEmbeddedGobblin
object from a constructor annotated withEmbeddedGobblinCliSupport
. For an example, seegobblin.runtime.embedded.EmbeddedGobblin.CliFactory
.
Implementing new Gobblin commands
To implement a new Gobblin command to list and execute using bin/gobblin
, implement the class gobblin.runtime.cli.CliApplication
, and annotate it with the Alias
annotation. The Gobblin CLI will automatically find the command, and users can invoke it by the Alias value.