はじめてのCloverETL
CloverETLはフリーで使えるclover.ETLと有償なclover.GUIってのがあります。つまり実行環境はフリーでお絵かきツールはお金がいるということですね。でも、一応GUIもお試しはできるみたいなのでそれをつかって、見ていくことにしましょう。
clover.GUIのインストール
流れとしては
- ユーザ登録
- Eclipseのupdate siteよりインストール
って感じです。
詳しくは下記サイトに書いてあるので見てください。
http://www.cloveretl.org/download/clover-gui/
何をする?
まずはこんな感じの例で試して見ます。
するファイルは下記のとおりです。
- kokyaku_in.csv
id,name 3,ほげほげほげ 1,ほげ 2,ほげほげ
で、出力結果はidの列でソートされた下記を想定します。
- graph1_kokyaku_in.csv
id,name 1,ほげ 2,ほげほげ 3,ほげほげほげ
グラフ
clover.ETLでは変換するルールみたいなものをグラフというみたいです。(私もよくわかってないので違うかもしれませんが・・・)
で、今回はこのグラフをclover.GUIで書いてみました。とりあえず見てください。
この図の四角部分をclover.ETLではノードというみたいです。そして矢印をエッジと呼びます。ノードの種類によって違うのですが、基本的に入力のポートがひとつ、出力のポートが2つあるみたいです。ちなみに図の両サイドはそれぞれファイルの入力、出力を表すノードなので、入力のポートがなかったり出力のポートがなかったりします。
こんな感じでEclipseでぺたぺた配置したり、設定したものはgrfファイルとして出力されます。それはこんな感じです。
- graph1.grf
<?xml version="1.0" encoding="UTF-8"?> <Graph created="Wed Oct 03 22:45:31 JST 2007" guiVersion="1.8" licenseType="Evaluation license." modified="Wed Oct 03 23:46:44 JST 2007" name="graph1" revision="1.12"> <Global> <Metadata id="Metadata0"> <Record name="kokyaku_in" recordDelimiter="" recordSize="-1" type="delimited"> <Field delimiter="," name="id" nullable="true" shift="0" type="integer"/> <Field delimiter="\r\n" name="name" nullable="true" shift="0" type="string"/> </Record> </Metadata> </Global> <Phase number="0"> <Node charset="UTF-8" enabled="enabled" fileURL="C:/dev/workspaces/workspace_cloverETL/sample_etl/source/kokyaku_in.csv" guiHeight="0" guiName="kokyaku_in" guiWidth="0" guiX="91" guiY="170" id="DELIMITED_DATA_READER0" skipFirstLine="true" type="DELIMITED_DATA_READER"/> <Node append="false" charset="UTF-8" enabled="enabled" fileURL="C:/dev/workspaces/workspace_cloverETL/sample_etl/destination/graph1_kokyaku_out.csv" guiHeight="0" guiName="kokyaku_out" guiWidth="0" guiX="468" guiY="170" id="DELIMITED_DATA_WRITER0" outputFieldNames="true" type="DELIMITED_DATA_WRITER"/> <Node enabled="enabled" guiHeight="0" guiName="Sort" guiWidth="0" guiX="288" guiY="170" id="SORT0" sortKey="id" sortOrder="ascending" type="SORT"/> <Edge fromNode="DELIMITED_DATA_READER0:0" guiBendpoints="" id="Edge0" inPort="Port 0 (in)" metadata="Metadata0" outPort="Port 0 (out)" toNode="SORT0:0"/> <Edge fromNode="SORT0:0" guiBendpoints="" id="Edge1" inPort="Port 0 (in)" metadata="Metadata0" outPort="Port 0 (out)" toNode="DELIMITED_DATA_WRITER0:0"/> </Phase> </Graph>
うーん、慣れれば?!かけるのかも知れませんが、やはりGUIを使ったほうがよさげですね。
SpringやStrutsでXML書くのは慣れた方ならぜんぜんいけるでしょう?!
そうそう、この中ではこのgrfファイルの中に指定されちゃっていますが、ファイルやテーブルの構造を別ファイルにすることもできるみたいです。上のXMLのMetadataタグの中身ですね。
実行
Eclipse上では実行から簡単に実行可能です。
ま、ファイルがさくっと出来上がるだけなのですが、下記のようなメッセージが流れていきます。
- コンソールのログ
*** CloverETL framework/transformation graph runner ver 2.3, (c) 2002-06 D.Pavlis, released under GNU Lesser General Public License *** Running with framework version: 2.3 build#${build.number} compiled ${TODAY} DEBUG [main] - Plugin org.jetel.bulkloader loaded. id - org.jetel.bulkloader version - 2.3.0 provider-name - component { type = ORACLE_DATA_READER; className = org.jetel.component.OracleDataReader; } component { type = ORACLE_DATA_WRITER; className = org.jetel.component.OracleDataWriter; } component { type = MYSQL_DATA_READER; className = org.jetel.component.MysqlDataReader; } component { type = MYSQL_DATA_WRITER; className = org.jetel.component.MysqlDataWriter; } component { type = DB2_DATA_WRITER; className = org.jetel.component.DB2DataWriter; } component { type = INFORMIX_DATA_WRITER; className = org.jetel.component.InformixDataWriter; } component { type = MS_SQL_DATA_WRITER; className = org.jetel.component.MsSqlDataWriter; } DEBUG [main] - Plugin org.jetel.component loaded. id - org.jetel.component version - 2.3.0 provider-name - null component { type = DATA_READER; className = org.jetel.component.DataReader; } component { type = DATA_WRITER; className = org.jetel.component.DataWriter; } component { type = DELIMITED_DATA_READER; className = org.jetel.component.DelimitedDataReader; } component { type = DELIMITED_DATA_WRITER; className = org.jetel.component.DelimitedDataWriter; } component { type = SIMPLE_COPY; className = org.jetel.component.SimpleCopy; } component { type = CONCATENATE; className = org.jetel.component.Concatenate; } component { type = SIMPLE_GATHER; className = org.jetel.component.SimpleGather; } component { type = REFORMAT; className = org.jetel.component.Reformat; } component { type = DB_INPUT_TABLE; className = org.jetel.component.DBInputTable; } component { type = SORT; className = org.jetel.component.Sort; } component { type = DB_OUTPUT_TABLE; className = org.jetel.component.DBOutputTable; } component { type = FIXLEN_DATA_WRITER; className = org.jetel.component.FixLenDataWriter; } component { type = DEDUP; className = org.jetel.component.Dedup; } component { type = FIXLEN_DATA_READER; className = org.jetel.component.FixLenDataReader; } component { type = MERGE; className = org.jetel.component.Merge; } component { type = MERGE_JOIN; className = org.jetel.component.MergeJoin; } component { type = EXT_MERGE_JOIN; className = org.jetel.component.MergeJoin; } component { type = SORTED_JOIN; className = org.jetel.component.MergeJoin; } component { type = TRASH; className = org.jetel.component.Trash; } component { type = DB_EXECUTE; className = org.jetel.component.DBExecute; } component { type = HASH_JOIN; className = org.jetel.component.HashJoin; } component { type = EXT_HASH_JOIN; className = org.jetel.component.HashJoin; } component { type = CHECK_FOREIGN_KEY; className = org.jetel.component.CheckForeignKey; } component { type = DBF_DATA_READER; className = org.jetel.component.DBFDataReader; } component { type = EXT_FILTER; className = org.jetel.component.ExtFilter; } component { type = EXT_SORT; className = org.jetel.component.ExtSort; } component { type = PARTITION; className = org.jetel.component.Partition; } component { type = DATA_INTERSECTION; className = org.jetel.component.DataIntersection; } component { type = AGGREGATE; className = org.jetel.component.Aggregate; } component { type = SYS_EXECUTE; className = org.jetel.component.SystemExecute; } component { type = KEY_GEN; className = org.jetel.component.KeyGenerator; } component { type = APROX_MERGE_JOIN; className = org.jetel.component.AproxMergeJoin; } component { type = DBJOIN; className = org.jetel.component.DBJoin; } component { type = XLS_READER; className = org.jetel.component.XLSReader; } component { type = XLS_WRITER; className = org.jetel.component.XLSWriter; } component { type = CLOVER_WRITER; className = org.jetel.component.CloverDataWriter; } component { type = CLOVER_READER; className = org.jetel.component.CloverDataReader; } component { type = STRUCTURE_WRITER; className = org.jetel.component.StructureWriter; } component { type = NORMALIZER; className = org.jetel.component.Normalizer; } component { type = DENORMALIZER; className = org.jetel.component.Denormalizer; } component { type = JMS_READER; className = org.jetel.component.JmsReader; } component { type = JMS_WRITER; className = org.jetel.component.JmsWriter; } component { type = LOOKUP_JOIN; className = org.jetel.component.LookupJoin; } component { type = LOOKUP_TABLE_READER_WRITER; className = org.jetel.component.LookupTableReaderWriter; } component { type = DATA_GENERATOR; className = org.jetel.component.DataGenerator; } component { type = MULTI_LEVEL_READER; className = org.jetel.component.MultiLevelReader; } component { type = SEQUENCE_CHECKER; className = org.jetel.component.SequenceChecker; } component { type = TEXT_TABLE_WRITER; className = org.jetel.component.TextTableWriter; } component { type = XML_XPATH_READER; className = org.jetel.component.XmlXPathReader; } component { type = PACEMAKER; className = org.jetel.component.Pacemaker; } DEBUG [main] - Plugin org.jetel.connection loaded. id - org.jetel.connection version - 2.3.0 provider-name - null connection { type = JDBC; className = org.jetel.connection.DBConnection; } connection { type = JMS; className = org.jetel.connection.JmsConnection; } DEBUG [main] - Plugin org.jetel.jdbc loaded. id - org.jetel.jdbc version - 2.3.0 provider-name - null jdbcDriver { dbDriver = org.gjt.mm.mysql.Driver; urlHint = jdbc:mysql://hostname:3306/database; name = MySQL; database = MYSQL; driverLibrary = lib/mysql/mysql-connector-java-3.0.11-stable-bin.jar; } DEBUG [main] - Plugin org.jetel.lookup loaded. id - org.jetel.lookup version - 2.3.0 provider-name - null lookup { type = simpleLookup; className = org.jetel.lookup.SimpleLookupTable; } lookup { type = dbLookup; className = org.jetel.lookup.DBLookupTable; } lookup { type = rangeLookup; className = org.jetel.lookup.RangeLookupTable; } DEBUG [main] - Plugin org.jetel.sequence loaded. id - org.jetel.sequence version - 2.3.0 provider-name - null sequence { type = SIMPLE_SEQUENCE; className = org.jetel.sequence.SimpleSequence; } sequence { type = PRIMITIVE_SEQUENCE; className = org.jetel.sequence.PrimitiveSequence; } DEBUG [main] - Plugin org.jetel.thirdparty loaded. id - org.jetel.thirdparty version - 2.3.0 provider-name - null component { type = XML_EXTRACT; className = org.jetel.component.XMLExtract; } component { type = FILTER; className = org.jetel.component.Filter; } component { type = LDAP_READER; className = com.linagora.component.LdapReader; } component { type = LDAP_WRITER; className = com.linagora.component.LdapWriter; } DEBUG [main] - Plugin org.jetel.tlfunction loaded. id - org.jetel.tlfunction version - 2.3.0 provider-name - null tlfunction { className = org.jetel.interpreter.extensions.MathLib; function = sqrt,log,log10,exp,round,pow,pi,e,random; libraryName = math; } tlfunction { className = org.jetel.interpreter.extensions.DateLib; function = today,dateadd,datediff,trunc; libraryName = date; } tlfunction { className = org.jetel.interpreter.extensions.StringLib; function = concat,uppercase,lowercase,substring,left,right,trim,length,soundex,replace,split,char_at,is_blank,is_ascii,is_number,is_integer,is_date,remove_diacritic; libraryName = string; } tlfunction { className = org.jetel.interpreter.extensions.ConvertLib; function = num2str,date2str,str2date,date2num,str2num; libraryName = convert; } Graph definition file: graph\graph1.grf INFO [main] - Checking graph configuration... DEBUG [main] - Opening input file C:/dev/workspaces/workspace_cloverETL/sample_etl/source/kokyaku_in.csv DEBUG [main] - Reading input file C:/dev/workspaces/workspace_cloverETL/sample_etl/source/kokyaku_in.csv INFO [main] - Starting WatchDog thread ... INFO [WatchDog] - Thread started. INFO [WatchDog] - Running on 2 CPU(s) max available memory for JVM 65088 KB INFO [WatchDog] - [Clover] Initializing phase: 0 DEBUG [WatchDog] - initializing edges: DEBUG [WatchDog] - all edges initialized successfully... DEBUG [WatchDog] - initializing nodes: DEBUG [WatchDog] - Opening input file C:/dev/workspaces/workspace_cloverETL/sample_etl/source/kokyaku_in.csv DEBUG [WatchDog] - Reading input file C:/dev/workspaces/workspace_cloverETL/sample_etl/source/kokyaku_in.csv DEBUG [WatchDog] - DELIMITED_DATA_READER0 ...OK DEBUG [WatchDog] - DELIMITED_DATA_WRITER0 ...OK DEBUG [WatchDog] - SORT0 ...OK INFO [WatchDog] - [Clover] phase: 0 initialized successfully. INFO [WatchDog] - Starting up all nodes in phase [0] DEBUG [WatchDog] - DELIMITED_DATA_READER0 ... started DEBUG [WatchDog] - DELIMITED_DATA_WRITER0 ... started DEBUG [WatchDog] - SORT0 ... started INFO [WatchDog] - Sucessfully started all nodes in phase! INFO [WatchDog] - Execution of phase [0] successfully finished - elapsed time(sec): 1 INFO [WatchDog] - ----------------------** Final tracking Log for phase [0] **--------------------- INFO [WatchDog] - Time: 03/10/07 23:46:49 INFO [WatchDog] - Node Status Port #Records #KB Rec/s KB/s INFO [WatchDog] - ---------------------------------------------------------------------------------- INFO [WatchDog] - DELIMITED_DATA_READER0 FINISHED_OK INFO [WatchDog] - %cpu:.. Out:0 3 0 0 0 INFO [WatchDog] - DELIMITED_DATA_WRITER0 FINISHED_OK INFO [WatchDog] - %cpu:.. In:0 3 0 0 0 INFO [WatchDog] - SORT0 FINISHED_OK INFO [WatchDog] - %cpu:.. In:0 3 0 0 0 INFO [WatchDog] - Out:0 3 0 0 0 INFO [WatchDog] - ---------------------------------** End of Log **-------------------------------- INFO [WatchDog] - Forcing garbage collection ... INFO [WatchDog] - -----------------------** Summary of Phases execution **--------------------- INFO [WatchDog] - Phase# Finished Status RunTime(sec) MemoryAllocation(KB) INFO [WatchDog] - 0 FINISHED_OK 1 1741 INFO [WatchDog] - ------------------------------** End of Summary **--------------------------- INFO [main] - WatchDog thread finished - total execution time: 6 (sec) INFO [main] - Graph execution finished successfully Execution of graph successful !
うーん、いろいろな情報が出力されますね。この中身はおいおいわかればいいんでしょうかね。
ここまでやってみたのですが、意外と簡単にできることがわかりました。
また、うまくやると変換のルールもJavaでかけたりもするみたいなので、かなりこったこともできそうです。
今度はもう少しこったことでもしてみるとしましょう。