...
- Run gobblin-standalone.sh:
- bin/gobblin-standalone.sh start --conf <path-to-job-config-file> --workdir /tmp
You should see the following lines in the gobblin logs:
INFO [JobScheduler-2] org.apache.gobblin.source.extractor.hadoop.AvroFileSource 58 - Running ls command with input /data/wikipedia/org/apache/gobblin/example/wikipedia/WikipediaOutput
INFO [JobScheduler-2] org.apache.gobblin.source.extractor.filebased.FileBasedSource 257 - Will pull the following files in this run: [hdfs://localhost:9000/data/wikipedia/org/apache/gobblin/example/wikipedia/WikipediaOutput/20171207212140_append/LinkedIn/part.task_PullFromWikipedia_1512681699997_0_0.avro, hdfs://localhost:9000/data/wikipedia/org/apache/gobblin/example/wikipedia/WikipediaOutput/20171207212140_append/Wikipedia_Sandbox/part.task_PullFromWikipedia_1512681699997_1_0.avro]
INFO [JobScheduler-2] org.apache.gobblin.source.extractor.filebased.FileBasedSource 195 - Total number of work units for the current run: 2
Finally, upon successful execution, you should see the
...
following lines in gobblin logs:
INFO [Task-committing-pool-0] org.apache.gobblin.runtime.fork.Fork 345 - Committing data for fork 0 of task <task_id> INFO [Task-committing-pool-0] org.apache.gobblin.writer.AsyncWriterManager 441 - Commit called, will wait for commitTimeout : 60000 ms INFO [Task-committing-pool-1] org.apache.gobblin.publisher.TaskPublisher 48 - All components finished successfully, checking quality tests INFO [Task-committing-pool-1] org.apache.gobblin.publisher.TaskPublisher 50 - All required test passed for this task passed. INFO [Task-committing-pool-1] org.apache.gobblin.publisher.TaskPublisher 52 - Cleanup for task publisher executed successfully. INFO [Task-committing-pool-1] org.apache.gobblin.runtime.fork.Fork 345 - Committing data for fork 0 of task <task_id> INFO [Task-committing-pool-1] org.apache.gobblin.writer.AsyncWriterManager 441 - Commit called, will wait for commitTimeout : 60000 ms INFO [Task-committing-pool-0] org.apache.gobblin.writer.AsyncWriterManager 482 - Successfully committed 2 records. INFO [Task-committing-pool-0] org.apache.gobblin.writer.AsyncWriterManager 424 - Close called INFO [Task-committing-pool-0] org.apache.gobblin.writer.AsyncWriterManager 430 - Successfully done closing INFO [Task-committing-pool-1] org.apache.gobblin.writer.AsyncWriterManager 482 - Successfully committed 10 records. |
- You can also verify the output is written to kafka topic WikipediaExample by consuming from the topic.
Mapreduce
This examples runs Gobblin in MapReduce mode. It reads files from HDFS using the HadoopTextFileSource implementation in gobblin-example and writes data to a single partition Kafka topic called MRTest.
...