はじめてのCloverETL

CloverETLはフリーで使えるclover.ETLと有償なclover.GUIってのがあります。つまり実行環境はフリーでお絵かきツールはお金がいるということですね。でも、一応GUIもお試しはできるみたいなのでそれをつかって、見ていくことにしましょう。

clover.GUIのインストール

GUIEclipseプラグインとして提供されます。

流れとしては

  • ユーザ登録
  • Eclipseのupdate siteよりインストール

って感じです。

詳しくは下記サイトに書いてあるので見てください。
http://www.cloveretl.org/download/clover-gui/

何をする?

まずはこんな感じの例で試して見ます。

  • ファイル(kokyaku_in.csv)を読み込む
  • ソートする
  • ファイル(graph1_kokyaku_out.csv)へ出力する

するファイルは下記のとおりです。

  • kokyaku_in.csv
id,name
3,ほげほげほげ
1,ほげ
2,ほげほげ

で、出力結果はidの列でソートされた下記を想定します。

  • graph1_kokyaku_in.csv
id,name
1,ほげ
2,ほげほげ
3,ほげほげほげ

グラフ

clover.ETLでは変換するルールみたいなものをグラフというみたいです。(私もよくわかってないので違うかもしれませんが・・・)

で、今回はこのグラフをclover.GUIで書いてみました。とりあえず見てください。

この図の四角部分をclover.ETLではノードというみたいです。そして矢印をエッジと呼びます。ノードの種類によって違うのですが、基本的に入力のポートがひとつ、出力のポートが2つあるみたいです。ちなみに図の両サイドはそれぞれファイルの入力、出力を表すノードなので、入力のポートがなかったり出力のポートがなかったりします。

こんな感じでEclipseでぺたぺた配置したり、設定したものはgrfファイルとして出力されます。それはこんな感じです。

  • graph1.grf
<?xml version="1.0" encoding="UTF-8"?>
<Graph created="Wed Oct 03 22:45:31 JST 2007" guiVersion="1.8" licenseType="Evaluation license." modified="Wed Oct 03 23:46:44 JST 2007" name="graph1" revision="1.12">
<Global>
<Metadata id="Metadata0">
<Record name="kokyaku_in" recordDelimiter="" recordSize="-1" type="delimited">
<Field delimiter="," name="id" nullable="true" shift="0" type="integer"/>
<Field delimiter="\r\n" name="name" nullable="true" shift="0" type="string"/>
</Record>
</Metadata>
</Global>
<Phase number="0">
<Node charset="UTF-8" enabled="enabled" fileURL="C:/dev/workspaces/workspace_cloverETL/sample_etl/source/kokyaku_in.csv" guiHeight="0" guiName="kokyaku_in" guiWidth="0" guiX="91" guiY="170" id="DELIMITED_DATA_READER0" skipFirstLine="true" type="DELIMITED_DATA_READER"/>
<Node append="false" charset="UTF-8" enabled="enabled" fileURL="C:/dev/workspaces/workspace_cloverETL/sample_etl/destination/graph1_kokyaku_out.csv" guiHeight="0" guiName="kokyaku_out" guiWidth="0" guiX="468" guiY="170" id="DELIMITED_DATA_WRITER0" outputFieldNames="true" type="DELIMITED_DATA_WRITER"/>
<Node enabled="enabled" guiHeight="0" guiName="Sort" guiWidth="0" guiX="288" guiY="170" id="SORT0" sortKey="id" sortOrder="ascending" type="SORT"/>
<Edge fromNode="DELIMITED_DATA_READER0:0" guiBendpoints="" id="Edge0" inPort="Port 0 (in)" metadata="Metadata0" outPort="Port 0 (out)" toNode="SORT0:0"/>
<Edge fromNode="SORT0:0" guiBendpoints="" id="Edge1" inPort="Port 0 (in)" metadata="Metadata0" outPort="Port 0 (out)" toNode="DELIMITED_DATA_WRITER0:0"/>
</Phase>
</Graph>

うーん、慣れれば?!かけるのかも知れませんが、やはりGUIを使ったほうがよさげですね。
SpringやStrutsXML書くのは慣れた方ならぜんぜんいけるでしょう?!

そうそう、この中ではこのgrfファイルの中に指定されちゃっていますが、ファイルやテーブルの構造を別ファイルにすることもできるみたいです。上のXMLのMetadataタグの中身ですね。

実行

Eclipse上では実行から簡単に実行可能です。
ま、ファイルがさくっと出来上がるだけなのですが、下記のようなメッセージが流れていきます。

  • コンソールのログ
***  CloverETL framework/transformation graph runner ver 2.3, (c) 2002-06 D.Pavlis, released under GNU Lesser General Public License  ***
 Running with framework version: 2.3 build#${build.number} compiled ${TODAY}

DEBUG [main] - Plugin org.jetel.bulkloader loaded.
	id - org.jetel.bulkloader
	version - 2.3.0
	provider-name - 
		component { type = ORACLE_DATA_READER; className = org.jetel.component.OracleDataReader;  }
		component { type = ORACLE_DATA_WRITER; className = org.jetel.component.OracleDataWriter;  }
		component { type = MYSQL_DATA_READER; className = org.jetel.component.MysqlDataReader;  }
		component { type = MYSQL_DATA_WRITER; className = org.jetel.component.MysqlDataWriter;  }
		component { type = DB2_DATA_WRITER; className = org.jetel.component.DB2DataWriter;  }
		component { type = INFORMIX_DATA_WRITER; className = org.jetel.component.InformixDataWriter;  }
		component { type = MS_SQL_DATA_WRITER; className = org.jetel.component.MsSqlDataWriter;  }

DEBUG [main] - Plugin org.jetel.component loaded.
	id - org.jetel.component
	version - 2.3.0
	provider-name - null
		component { type = DATA_READER; className = org.jetel.component.DataReader;  }
		component { type = DATA_WRITER; className = org.jetel.component.DataWriter;  }
		component { type = DELIMITED_DATA_READER; className = org.jetel.component.DelimitedDataReader;  }
		component { type = DELIMITED_DATA_WRITER; className = org.jetel.component.DelimitedDataWriter;  }
		component { type = SIMPLE_COPY; className = org.jetel.component.SimpleCopy;  }
		component { type = CONCATENATE; className = org.jetel.component.Concatenate;  }
		component { type = SIMPLE_GATHER; className = org.jetel.component.SimpleGather;  }
		component { type = REFORMAT; className = org.jetel.component.Reformat;  }
		component { type = DB_INPUT_TABLE; className = org.jetel.component.DBInputTable;  }
		component { type = SORT; className = org.jetel.component.Sort;  }
		component { type = DB_OUTPUT_TABLE; className = org.jetel.component.DBOutputTable;  }
		component { type = FIXLEN_DATA_WRITER; className = org.jetel.component.FixLenDataWriter;  }
		component { type = DEDUP; className = org.jetel.component.Dedup;  }
		component { type = FIXLEN_DATA_READER; className = org.jetel.component.FixLenDataReader;  }
		component { type = MERGE; className = org.jetel.component.Merge;  }
		component { type = MERGE_JOIN; className = org.jetel.component.MergeJoin;  }
		component { type = EXT_MERGE_JOIN; className = org.jetel.component.MergeJoin;  }
		component { type = SORTED_JOIN; className = org.jetel.component.MergeJoin;  }
		component { type = TRASH; className = org.jetel.component.Trash;  }
		component { type = DB_EXECUTE; className = org.jetel.component.DBExecute;  }
		component { type = HASH_JOIN; className = org.jetel.component.HashJoin;  }
		component { type = EXT_HASH_JOIN; className = org.jetel.component.HashJoin;  }
		component { type = CHECK_FOREIGN_KEY; className = org.jetel.component.CheckForeignKey;  }
		component { type = DBF_DATA_READER; className = org.jetel.component.DBFDataReader;  }
		component { type = EXT_FILTER; className = org.jetel.component.ExtFilter;  }
		component { type = EXT_SORT; className = org.jetel.component.ExtSort;  }
		component { type = PARTITION; className = org.jetel.component.Partition;  }
		component { type = DATA_INTERSECTION; className = org.jetel.component.DataIntersection;  }
		component { type = AGGREGATE; className = org.jetel.component.Aggregate;  }
		component { type = SYS_EXECUTE; className = org.jetel.component.SystemExecute;  }
		component { type = KEY_GEN; className = org.jetel.component.KeyGenerator;  }
		component { type = APROX_MERGE_JOIN; className = org.jetel.component.AproxMergeJoin;  }
		component { type = DBJOIN; className = org.jetel.component.DBJoin;  }
		component { type = XLS_READER; className = org.jetel.component.XLSReader;  }
		component { type = XLS_WRITER; className = org.jetel.component.XLSWriter;  }
		component { type = CLOVER_WRITER; className = org.jetel.component.CloverDataWriter;  }
		component { type = CLOVER_READER; className = org.jetel.component.CloverDataReader;  }
		component { type = STRUCTURE_WRITER; className = org.jetel.component.StructureWriter;  }
		component { type = NORMALIZER; className = org.jetel.component.Normalizer;  }
		component { type = DENORMALIZER; className = org.jetel.component.Denormalizer;  }
		component { type = JMS_READER; className = org.jetel.component.JmsReader;  }
		component { type = JMS_WRITER; className = org.jetel.component.JmsWriter;  }
		component { type = LOOKUP_JOIN; className = org.jetel.component.LookupJoin;  }
		component { type = LOOKUP_TABLE_READER_WRITER; className = org.jetel.component.LookupTableReaderWriter;  }
		component { type = DATA_GENERATOR; className = org.jetel.component.DataGenerator;  }
		component { type = MULTI_LEVEL_READER; className = org.jetel.component.MultiLevelReader;  }
		component { type = SEQUENCE_CHECKER; className = org.jetel.component.SequenceChecker;  }
		component { type = TEXT_TABLE_WRITER; className = org.jetel.component.TextTableWriter;  }
		component { type = XML_XPATH_READER; className = org.jetel.component.XmlXPathReader;  }
		component { type = PACEMAKER; className = org.jetel.component.Pacemaker;  }

DEBUG [main] - Plugin org.jetel.connection loaded.
	id - org.jetel.connection
	version - 2.3.0
	provider-name - null
		connection { type = JDBC; className = org.jetel.connection.DBConnection;  }
		connection { type = JMS; className = org.jetel.connection.JmsConnection;  }

DEBUG [main] - Plugin org.jetel.jdbc loaded.
	id - org.jetel.jdbc
	version - 2.3.0
	provider-name - null
		jdbcDriver { dbDriver = org.gjt.mm.mysql.Driver; urlHint = jdbc:mysql://hostname:3306/database; name = MySQL; database = MYSQL; driverLibrary = lib/mysql/mysql-connector-java-3.0.11-stable-bin.jar;  }

DEBUG [main] - Plugin org.jetel.lookup loaded.
	id - org.jetel.lookup
	version - 2.3.0
	provider-name - null
		lookup { type = simpleLookup; className = org.jetel.lookup.SimpleLookupTable;  }
		lookup { type = dbLookup; className = org.jetel.lookup.DBLookupTable;  }
		lookup { type = rangeLookup; className = org.jetel.lookup.RangeLookupTable;  }

DEBUG [main] - Plugin org.jetel.sequence loaded.
	id - org.jetel.sequence
	version - 2.3.0
	provider-name - null
		sequence { type = SIMPLE_SEQUENCE; className = org.jetel.sequence.SimpleSequence;  }
		sequence { type = PRIMITIVE_SEQUENCE; className = org.jetel.sequence.PrimitiveSequence;  }

DEBUG [main] - Plugin org.jetel.thirdparty loaded.
	id - org.jetel.thirdparty
	version - 2.3.0
	provider-name - null
		component { type = XML_EXTRACT; className = org.jetel.component.XMLExtract;  }
		component { type = FILTER; className = org.jetel.component.Filter;  }
		component { type = LDAP_READER; className = com.linagora.component.LdapReader;  }
		component { type = LDAP_WRITER; className = com.linagora.component.LdapWriter;  }

DEBUG [main] - Plugin org.jetel.tlfunction loaded.
	id - org.jetel.tlfunction
	version - 2.3.0
	provider-name - null
		tlfunction { className = org.jetel.interpreter.extensions.MathLib; function = sqrt,log,log10,exp,round,pow,pi,e,random; libraryName = math;  }
		tlfunction { className = org.jetel.interpreter.extensions.DateLib; function = today,dateadd,datediff,trunc; libraryName = date;  }
		tlfunction { className = org.jetel.interpreter.extensions.StringLib; function = concat,uppercase,lowercase,substring,left,right,trim,length,soundex,replace,split,char_at,is_blank,is_ascii,is_number,is_integer,is_date,remove_diacritic; libraryName = string;  }
		tlfunction { className = org.jetel.interpreter.extensions.ConvertLib; function = num2str,date2str,str2date,date2num,str2num; libraryName = convert;  }

Graph definition file: graph\graph1.grf
INFO  [main] - Checking graph configuration...
DEBUG [main] - Opening input file C:/dev/workspaces/workspace_cloverETL/sample_etl/source/kokyaku_in.csv
DEBUG [main] - Reading input file C:/dev/workspaces/workspace_cloverETL/sample_etl/source/kokyaku_in.csv
INFO  [main] - Starting WatchDog thread ...
INFO  [WatchDog] - Thread started.
INFO  [WatchDog] - Running on 2 CPU(s) max available memory for JVM 65088 KB
INFO  [WatchDog] - [Clover] Initializing phase: 0
DEBUG [WatchDog] -  initializing edges: 
DEBUG [WatchDog] -  all edges initialized successfully... 
DEBUG [WatchDog] -  initializing nodes: 
DEBUG [WatchDog] - Opening input file C:/dev/workspaces/workspace_cloverETL/sample_etl/source/kokyaku_in.csv
DEBUG [WatchDog] - Reading input file C:/dev/workspaces/workspace_cloverETL/sample_etl/source/kokyaku_in.csv
DEBUG [WatchDog] - 	DELIMITED_DATA_READER0 ...OK
DEBUG [WatchDog] - 	DELIMITED_DATA_WRITER0 ...OK
DEBUG [WatchDog] - 	SORT0 ...OK
INFO  [WatchDog] - [Clover] phase: 0 initialized successfully.
INFO  [WatchDog] - Starting up all nodes in phase [0]
DEBUG [WatchDog] - DELIMITED_DATA_READER0 ... started
DEBUG [WatchDog] - DELIMITED_DATA_WRITER0 ... started
DEBUG [WatchDog] - SORT0 ... started
INFO  [WatchDog] - Sucessfully started all nodes in phase!
INFO  [WatchDog] - Execution of phase [0] successfully finished - elapsed time(sec): 1
INFO  [WatchDog] - ----------------------** Final tracking Log for phase [0] **---------------------
INFO  [WatchDog] - Time: 03/10/07 23:46:49
INFO  [WatchDog] - Node                   Status     Port      #Records         #KB  Rec/s    KB/s
INFO  [WatchDog] - ----------------------------------------------------------------------------------
INFO  [WatchDog] - DELIMITED_DATA_READER0 FINISHED_OK    
INFO  [WatchDog] -  %cpu:..                          Out:0            3           0      0       0
INFO  [WatchDog] - DELIMITED_DATA_WRITER0 FINISHED_OK    
INFO  [WatchDog] -  %cpu:..                           In:0            3           0      0       0
INFO  [WatchDog] - SORT0                  FINISHED_OK    
INFO  [WatchDog] -  %cpu:..                           In:0            3           0      0       0
INFO  [WatchDog] -                                   Out:0            3           0      0       0
INFO  [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO  [WatchDog] - Forcing garbage collection ...
INFO  [WatchDog] - -----------------------** Summary of Phases execution **---------------------
INFO  [WatchDog] - Phase#            Finished Status         RunTime(sec)    MemoryAllocation(KB)
INFO  [WatchDog] - 0                 FINISHED_OK                        1              1741
INFO  [WatchDog] - ------------------------------** End of Summary **---------------------------
INFO  [main] - WatchDog thread finished - total execution time: 6 (sec)
INFO  [main] - Graph execution finished successfully
Execution of graph successful !

うーん、いろいろな情報が出力されますね。この中身はおいおいわかればいいんでしょうかね。

ここまでやってみたのですが、意外と簡単にできることがわかりました。
また、うまくやると変換のルールもJavaでかけたりもするみたいなので、かなりこったこともできそうです。
今度はもう少しこったことでもしてみるとしましょう。