<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Hi! I&#039;m clsung &#187; ruby</title>
	<atom:link href="http://blog.dragon2.net/category/hacker/programming/ruby/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.dragon2.net</link>
	<description>clsung&#039;s blog site</description>
	<lastBuildDate>Mon, 06 Feb 2012 09:29:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>快快樂樂學 Ruby &#8211; 再談 Ferret</title>
		<link>http://blog.dragon2.net/2007/05/18/461.php</link>
		<comments>http://blog.dragon2.net/2007/05/18/461.php#comments</comments>
		<pubDate>Fri, 18 May 2007 02:56:37 +0000</pubDate>
		<dc:creator>clsung</dc:creator>
				<category><![CDATA[phd_student]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://blog.dragon2.net/2007/05/18/461.php</guid>
		<description><![CDATA[這只是為了要呼應之前的舊文章&#8230; 話說昨天早上到了實驗室，發現 b6s 前一晚留給我有關 Ferret 的訊息。兩點，第一是 lukhnos實作了 Ferret 處理中文的方法，在這裡。當然，處理中文很重要，但並不是我棄 Ferret 保 PyLucene 的原因。而 b6s 留的另一個訊息是 Ferret 處理原生 Lucene 的 cfs 索引檔有解（或是將要有解）。這就有點意思了，因為我們實驗室的 index 如果不能用 Ferret 讀，那其實有點英雄無用武之地的感覺。 但是最後讓我再來試 Ferret 的原因，卻還是第一個：中文。 原先 lukhnos 的文章中的 regex 可以適當地處理 ASCII、歐語以及 CJK（UTF-8 碼）字元，雖然我不確定有沒有包含越南（CJKV？）不過 UTF-8 碼的 regex 就在這裡了，要自己改也是沒什麼問題。那對於 Big5 碼呢？其實現在的時代，用 Big5 碼的人逐步減少了，不過還是在國內佔大宗吧？像聯合新聞網、中時電子報還有自由電子報都還在使用 Big5 。雖然我們 *現在* 使用的軟體大多在處理這些語料時會先轉為 UTF-8 再分析，但也不能就此就把 Big5 丟到一邊去(1)。 所以我改了 lukhnos 的 GENERIC_ANALYSIS_REGEX ，加上 Big5 的部份，有關 code range 可以參考 O&#8217;Reilly 的這篇論文： GENERIC_ANALYSIS_REGEX = /&#40;&#91;a-zA-Z&#93;&#124;&#91;xc0-xdf&#93;&#91;x80-xbf&#93;&#41;+&#124;&#91;0-9&#93;+&#124;&#91;xe0-xef&#93;&#91;x80-xbf&#93;&#91;x80-xbf&#93;&#124;&#91;xa1-xfe&#93;&#91;x40-x7exa1-xfe&#93;/ 新加的部份是 &#91;xa1-xfe&#93;&#91;x40-x7exa1-xfe&#93; 當然，要測試一下，在這之前我分開放了兩個文字檔，內容很簡單： big5.txt 中文 大五碼中文 Chinese utf8.txt 中文 八萬碼中文 Chinese 當然這樣看不出什麼端倪，用 vi 來看一下內文的編碼： big5.txt xa4xa4xa4xe5 xa4jxa4xadxbdXxa4xa4xa4xe5 Chinese utf8.txt xe4xb8xadxe6x96x87 xe5x85xabxe8x90xacxe7xa2xbcxe4xb8xadxe6x96x87 Chinese 這樣就很清楚啦。接下來是測試程式的片段： GENERIC_ANALYSIS_REGEX = /&#40;&#91;a-zA-Z&#93;&#124;&#91;xc0-xdf&#93;&#91;x80-xbf&#93;&#41;+&#124;&#91;0-9&#93;+&#124;&#91;xe0-xef&#93;&#91;x80-xbf&#93;&#91;x80-xbf&#93;&#124;&#91;xa1-xfe&#93;&#91;x40-x7exa1-xfe&#93;/ GENERIC_ANALYZER = Analysis::RegExpAnalyzer.new&#40;GENERIC_ANALYSIS_REGEX, true&#41; index = Index::Index.new&#40;&#41; index2 = Index::Index.new&#40;:analyzer =&#62; GENERIC_ANALYZER&#41; &#8230; chinese_u = &#34;中文&#34; conv = Iconv.new&#40;&#8216;big5&#8242;,&#8216;utf-8&#8242;&#41; chinese_b = conv.iconv&#40;chinese_u&#41; puts &#34;Search &#8216;Chinese&#8217;&#8230;&#34; index.search_each&#40;&#34;Chinese&#34;&#41; do &#124;doc, score&#124; &#160; puts index&#91;doc&#93;&#91;&#8216;file&#8217;&#93; end puts &#34;Search utf8 word of &#8216;Chinese&#8217;&#8230;&#34; index.search_each&#40;chinese_u&#41; do &#124;doc, score&#124; &#160; puts index&#91;doc&#93;&#91;&#8216;file&#8217;&#93; end puts &#34;Search big5 word of &#8216;Chinese&#8217;&#8230;&#34; index.search_each&#40;chinese_b&#41; do &#124;doc, score&#124; &#160; puts index&#91;doc&#93;&#91;&#8216;file&#8217;&#93; end 上面的 search code 並沒有 index2，這是因為兩段碼是相同的，就沒有加上去了。至於寫的美不美觀，我只是個初學者，第二支 ruby 程式也不要太要求，我連 loop statement 都不太會咧。 執行結果： Search &#8216;Chinese&#8217;&#8230; ./text/big5.txt ./text/utf8.txt Search utf8 word of &#8216;Chinese&#8217;&#8230; Search big5 word of &#8216;Chinese&#8217;&#8230; Indexer with GENERIC_ANALYZER Search &#8216;Chinese&#8217;&#8230; ./text/big5.txt ./text/utf8.txt Search utf8 word of &#8216;Chinese&#8217;&#8230; ./text/utf8.txt Search big5 word of &#8216;Chinese&#8217;&#8230; ./text/big5.txt 可以看得出來，沒有用 GENERIC_ANALYZER [...]]]></description>
			<content:encoded><![CDATA[<p>這只是為了要呼應之前的<a href="http://blog.dragon2.net/2006/11/14/392.php" title="快快樂樂學 Python - 由 Ferret 與 PyLucene 談起">舊文章</a>&#8230;<br />
<span id="more-461"></span><br />
話說昨天早上到了實驗室，發現 <a href="http://b6s.blogspot.com/" title="Once in a blue moon">b6s</a> 前一晚留給我有關 <a href="http://ferret.davebalmain.com/" title="Ferret is a high-performance, full-featured text search engine library written for Ruby">Ferret</a> 的訊息。兩點，第一是 <a href="http://lukhnos.org/blog/zh/">lukhnos</a>實作了 <a href="http://ferret.davebalmain.com/" title="Ferret is a high-performance, full-featured text search engine library written for Ruby">Ferret</a> 處理中文的方法，在<a href="http://lukhnos.org/blog/zh/archives/501" title="acts_as_ferret: Rails全文搜尋快速上手（與中日韓文支援）">這裡</a>。當然，處理中文很重要，但並不是我棄 Ferret 保 <a href="http://pylucene.osafoundation.org/" title="PyLucene project">PyLucene</a> 的原因。而 <a href="http://b6s.blogspot.com/" title="Once in a blue moon">b6s</a> 留的另一個訊息是 <a href="http://ferret.davebalmain.com/" title="Ferret is a high-performance, full-featured text search engine library written for Ruby">Ferret</a> 處理原生 <a href="http://lucene.apache.org/java/docs/">Lucene</a> 的 cfs 索引檔有解（或是將要有解）。這就有點意思了，因為我們實驗室的 index 如果不能用 <a href="http://ferret.davebalmain.com/" title="Ferret is a high-performance, full-featured text search engine library written for Ruby">Ferret</a> 讀，那其實有點英雄無用武之地的感覺。</p>
<p>但是最後讓我再來試 <a href="http://ferret.davebalmain.com/" title="Ferret is a high-performance, full-featured text search engine library written for Ruby">Ferret</a> 的原因，卻還是第一個：中文。</p>
<p>原先 <a href="http://lukhnos.org/blog/zh/">lukhnos</a> 的文章中的 regex 可以適當地處理 ASCII、歐語以及 CJK（UTF-8 碼）字元，雖然我不確定有沒有包含越南（CJKV？）不過 UTF-8 碼的 regex 就在這裡了，要自己改也是沒什麼問題。那對於 Big5 碼呢？其實現在的時代，用 Big5 碼的人逐步減少了，不過還是在國內佔大宗吧？像<a href="http://udn.com/">聯合新聞網</a>、<a href="http://news.chinatimes.com/">中時電子報</a>還有<a href="http://www.libertytimes.com.tw/">自由電子報</a>都還在使用 Big5 。雖然我們 *現在* 使用的軟體大多在處理這些語料時會先轉為 UTF-8 再分析，但也不能就此就把 Big5 丟到一邊去<sup>(<a href="http://blog.dragon2.net/2007/05/18/461.php#footnote_0_461" id="identifier_0_461" class="footnote-link footnote-identifier-link" title="我很想啦，老實說，動不動就給你一個亂碼實在是&amp;#8230;">1</a>)</sup>。</p>
<p>所以我改了 <a href="http://lukhnos.org/blog/zh/">lukhnos</a> 的 GENERIC_ANALYSIS_REGEX ，加上 Big5 的部份，有關 code range 可以參考  <a href="http://www.oreilly.com/">O&#8217;Reilly</a> 的<a href="http://examples.oreilly.com/cjkvinfo/perl/svpm99-paper.pdf">這篇論文</a>：</p>
<div class="codesnip-container" >
<div class="ruby codesnip" style="font-family:monospace;">GENERIC_ANALYSIS_REGEX = <span class="sy0">/</span><span class="br0">&#40;</span><span class="br0">&#91;</span>a<span class="sy0">-</span>zA<span class="sy0">-</span>Z<span class="br0">&#93;</span><span class="sy0">|</span><span class="br0">&#91;</span>xc0<span class="sy0">-</span>xdf<span class="br0">&#93;</span><span class="br0">&#91;</span>x80<span class="sy0">-</span>xbf<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">+|</span><span class="br0">&#91;</span>0<span class="sy0">-</span>9<span class="br0">&#93;</span><span class="sy0">+|</span><span class="br0">&#91;</span>xe0<span class="sy0">-</span>xef<span class="br0">&#93;</span><span class="br0">&#91;</span>x80<span class="sy0">-</span>xbf<span class="br0">&#93;</span><span class="br0">&#91;</span>x80<span class="sy0">-</span>xbf<span class="br0">&#93;</span><span class="sy0">|</span><span class="br0">&#91;</span>xa1<span class="sy0">-</span>xfe<span class="br0">&#93;</span><span class="br0">&#91;</span>x40<span class="sy0">-</span>x7exa1<span class="sy0">-</span>xfe<span class="br0">&#93;</span><span class="sy0">/</span></div>
</div>
<p>新加的部份是
<div class="codesnip-container" >
<div class="ruby codesnip" style="font-family:monospace;"><span class="br0">&#91;</span>xa1<span class="sy0">-</span>xfe<span class="br0">&#93;</span><span class="br0">&#91;</span>x40<span class="sy0">-</span>x7exa1<span class="sy0">-</span>xfe<span class="br0">&#93;</span></div>
</div>
<p>當然，要測試一下，在這之前我分開放了兩個文字檔，內容很簡單：</p>
<ul>
<li>big5.txt<br />
<blockquote><p>中文<br />
大五碼中文<br />
Chinese</p></blockquote>
</li>
<li>utf8.txt<br />
<blockquote><p>中文<br />
八萬碼中文<br />
Chinese</p></blockquote>
</li>
</ul>
<p>當然這樣看不出什麼端倪，用 vi 來看一下內文的編碼：</p>
<ul>
<li>big5.txt<br />
<blockquote><p>xa4xa4xa4xe5<br />
xa4jxa4xadxbdXxa4xa4xa4xe5<br />
Chinese</p></blockquote>
</li>
<li>utf8.txt<br />
<blockquote><p>xe4xb8xadxe6x96x87<br />
xe5x85xabxe8x90xacxe7xa2xbcxe4xb8xadxe6x96x87<br />
Chinese</p></blockquote>
</li>
</ul>
<p>這樣就很清楚啦。接下來是測試程式的片段：</p>
<div class="codesnip-container" >
<div class="ruby codesnip" style="font-family:monospace;">GENERIC_ANALYSIS_REGEX = <span class="sy0">/</span><span class="br0">&#40;</span><span class="br0">&#91;</span>a<span class="sy0">-</span>zA<span class="sy0">-</span>Z<span class="br0">&#93;</span><span class="sy0">|</span><span class="br0">&#91;</span>xc0<span class="sy0">-</span>xdf<span class="br0">&#93;</span><span class="br0">&#91;</span>x80<span class="sy0">-</span>xbf<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">+|</span><span class="br0">&#91;</span>0<span class="sy0">-</span>9<span class="br0">&#93;</span><span class="sy0">+|</span><span class="br0">&#91;</span>xe0<span class="sy0">-</span>xef<span class="br0">&#93;</span><span class="br0">&#91;</span>x80<span class="sy0">-</span>xbf<span class="br0">&#93;</span><span class="br0">&#91;</span>x80<span class="sy0">-</span>xbf<span class="br0">&#93;</span><span class="sy0">|</span><span class="br0">&#91;</span>xa1<span class="sy0">-</span>xfe<span class="br0">&#93;</span><span class="br0">&#91;</span>x40<span class="sy0">-</span>x7exa1<span class="sy0">-</span>xfe<span class="br0">&#93;</span><span class="sy0">/</span><br />
GENERIC_ANALYZER = <span class="re2">Analysis::RegExpAnalyzer</span>.<span class="me1">new</span><span class="br0">&#40;</span>GENERIC_ANALYSIS_REGEX, <span class="kw2">true</span><span class="br0">&#41;</span><br />
index = <span class="re2">Index::Index</span>.<span class="me1">new</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
index2 = <span class="re2">Index::Index</span>.<span class="me1">new</span><span class="br0">&#40;</span><span class="re3">:analyzer</span> <span class="sy0">=&gt;</span> GENERIC_ANALYZER<span class="br0">&#41;</span></p>
<p>&#8230;</p>
<p><span class="me1">chinese_u</span> = <span class="st0">&quot;中文&quot;</span><br />
conv = <span class="kw4">Iconv</span>.<span class="me1">new</span><span class="br0">&#40;</span><span class="st0">&#8216;big5&#8242;</span>,<span class="st0">&#8216;utf-8&#8242;</span><span class="br0">&#41;</span><br />
chinese_b = conv.<span class="me1">iconv</span><span class="br0">&#40;</span>chinese_u<span class="br0">&#41;</span></p>
<p><span class="kw3">puts</span> <span class="st0">&quot;Search &#8216;Chinese&#8217;&#8230;&quot;</span><br />
index.<span class="me1">search_each</span><span class="br0">&#40;</span><span class="st0">&quot;Chinese&quot;</span><span class="br0">&#41;</span> <span class="kw1">do</span> <span class="sy0">|</span>doc, score<span class="sy0">|</span><br />
&nbsp; <span class="kw3">puts</span> index<span class="br0">&#91;</span>doc<span class="br0">&#93;</span><span class="br0">&#91;</span><span class="st0">&#8216;file&#8217;</span><span class="br0">&#93;</span><br />
<span class="kw1">end</span><br />
<span class="kw3">puts</span> <span class="st0">&quot;Search utf8 word of &#8216;Chinese&#8217;&#8230;&quot;</span><br />
index.<span class="me1">search_each</span><span class="br0">&#40;</span>chinese_u<span class="br0">&#41;</span> <span class="kw1">do</span> <span class="sy0">|</span>doc, score<span class="sy0">|</span><br />
&nbsp; <span class="kw3">puts</span> index<span class="br0">&#91;</span>doc<span class="br0">&#93;</span><span class="br0">&#91;</span><span class="st0">&#8216;file&#8217;</span><span class="br0">&#93;</span><br />
<span class="kw1">end</span><br />
<span class="kw3">puts</span> <span class="st0">&quot;Search big5 word of &#8216;Chinese&#8217;&#8230;&quot;</span><br />
index.<span class="me1">search_each</span><span class="br0">&#40;</span>chinese_b<span class="br0">&#41;</span> <span class="kw1">do</span> <span class="sy0">|</span>doc, score<span class="sy0">|</span><br />
&nbsp; <span class="kw3">puts</span> index<span class="br0">&#91;</span>doc<span class="br0">&#93;</span><span class="br0">&#91;</span><span class="st0">&#8216;file&#8217;</span><span class="br0">&#93;</span><br />
<span class="kw1">end</span></div>
</div>
<p>上面的 search code 並沒有 index2，這是因為兩段碼是相同的，就沒有加上去了。至於寫的美不美觀，我只是個初學者，第二支 ruby 程式也不要太要求，我連 loop statement 都不太會咧。</p>
<p>執行結果：</p>
<blockquote><p>Search &#8216;Chinese&#8217;&#8230;<br />
./text/big5.txt<br />
./text/utf8.txt<br />
Search utf8 word of &#8216;Chinese&#8217;&#8230;<br />
Search big5 word of &#8216;Chinese&#8217;&#8230;<br />
Indexer with GENERIC_ANALYZER<br />
Search &#8216;Chinese&#8217;&#8230;<br />
./text/big5.txt<br />
./text/utf8.txt<br />
Search utf8 word of &#8216;Chinese&#8217;&#8230;<br />
./text/utf8.txt<br />
Search big5 word of &#8216;Chinese&#8217;&#8230;<br />
./text/big5.txt</p></blockquote>
<p>可以看得出來，沒有用 GENERIC_ANALYZER 的 index ，無法搜尋中文字串，而有用 GENERIC_ANALYZER 的 index2 ，結果也符合預期。</p>
<p>結論是我可以再來玩玩 ruby 了。雖然暫時還不會去處理實驗室既有的語料庫，不過自己實驗用的倒可以考慮一下。沒有用 <a href="http://pylucene.osafoundation.org/" title="PyLucene project">PyLucene</a> 的原因是因為在 gcj 在 FreeBSD amd64 上會有問題啊 <img src='http://blog.dragon2.net/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> </p>
<p>[tags] ruby, lucene, ferret [/tags]</p>
<ol class="footnotes"><li id="footnote_0_461" class="footnote">我很想啦，老實說，動不動就給你一個亂碼實在是&#8230;</li></ol>]]></content:encoded>
			<wfw:commentRss>http://blog.dragon2.net/2007/05/18/461.php/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Perl 屬於專業編程語言</title>
		<link>http://blog.dragon2.net/2007/01/17/425.php</link>
		<comments>http://blog.dragon2.net/2007/01/17/425.php#comments</comments>
		<pubDate>Wed, 17 Jan 2007 13:59:52 +0000</pubDate>
		<dc:creator>clsung</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://blog.dragon2.net/2007/01/17/425.php</guid>
		<description><![CDATA[是的，不要懷疑，就如同標題所說的，「Perl 屬於專業編程語言」。看標題就知道這些日子以來我並沒有因為學了 python 與 ruby 就把 Perl 丟到一旁(1) 。 為什麼「Perl 屬於專業編程語言」？原因很簡單，其一：阿西摩寫了這個。裡面提到了一句經典：＂Java 屬於工業編程語言＂；其二，thegiive 發表了這篇，公告了＂以後 Ruby 正式成為商業編程語言＂。當然， python 這裡也有大師級的開講與感想。 所以，身為 Perl 使用者(2) ，不得不站出來講一下話了（php 實在是太少用了 :~）。 我也決定，從今天開始，公告「Perl 屬於專業編程語言」。 謝謝大家。 看我多重視，Perl 的 "P" 還大寫哦是嗎？可是分類裡沒有 Perl 耶]]></description>
			<content:encoded><![CDATA[<p>是的，不要懷疑，就如同標題所說的，「<strong>Perl 屬於<font color="red">專業</font>編程語言</strong>」。看標題就知道這些日子以來我並沒有因為學了 <a href="http://blog.dragon2.net/2006/11/14/392.php" title="快快樂樂學 Python - 由 Ferret 與 PyLucene 談起">python</a> 與 <a href="http://blog.dragon2.net/2007/01/12/421.php" title="[ruby] 第一支 ruby 程式">ruby</a> 就把 Perl 丟到一旁<sup>(<a href="http://blog.dragon2.net/2007/01/17/425.php#footnote_0_425" id="identifier_0_425" class="footnote-link footnote-identifier-link" title="看我多重視，Perl 的 &quot;P&quot; 還大寫哦">1</a>)</sup> 。<br />
<span id="more-425"></span><br />
為什麼「<strong>Perl 屬於<font color="red">專業</font>編程語言</strong>」？原因很簡單，其一：<a href="http://www.one18.com/">阿西摩</a>寫了<a href="http://www.one18.com/?p=109" title="JavaScript ≠ Java">這個</a>。裡面提到了一句經典：＂Java 屬於工業編程語言＂；其二，<a href="http://lightyror.thegiive.net/">thegiive</a> 發表了<a href="http://lightyror.thegiive.net/2007/01/blog-post_16.html" title=" 商業性語言？">這篇</a>，公告了＂以後 Ruby 正式成為商業編程語言＂。當然， python 這裡也有大師級的<a href="http://heaven.branda.to/~thinker/GinGin_CGI.py/show_id_doc/203" title="scirping language?">開講</a>與<a href="http://blog.seety.org/everydaywork/archive/635/" title="有時候換個名詞，感覺起來就會不大一樣">感想</a>。</p>
<p>所以，身為 Perl 使用者<sup>(<a href="http://blog.dragon2.net/2007/01/17/425.php#footnote_1_425" id="identifier_1_425" class="footnote-link footnote-identifier-link" title="是嗎？可是分類裡沒有 Perl 耶">2</a>)</sup> ，不得不站出來講一下話了（php 實在是太少用了 :~）。</p>
<p>我也決定，從今天開始，公告「<strong>Perl 屬於<font color="red">專業</font>編程語言</strong>」。</p>
<p>謝謝大家。</p>
<ol class="footnotes"><li id="footnote_0_425" class="footnote">看我多重視，Perl 的 "P" 還大寫哦</li><li id="footnote_1_425" class="footnote">是嗎？可是分類裡沒有 Perl 耶</li></ol>]]></content:encoded>
			<wfw:commentRss>http://blog.dragon2.net/2007/01/17/425.php/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>[ruby] 第一支 ruby 程式</title>
		<link>http://blog.dragon2.net/2007/01/12/421.php</link>
		<comments>http://blog.dragon2.net/2007/01/12/421.php#comments</comments>
		<pubDate>Fri, 12 Jan 2007 08:35:08 +0000</pubDate>
		<dc:creator>clsung</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://blog.dragon2.net/2007/01/12/421.php</guid>
		<description><![CDATA[寫 paper 寫到快瘋掉時，就會想做點奇怪的事。 早上翻到 Tsung 的文章，有 sed/perl 寫的 Y! dictionary script，然後手也癢了起來。用 python 寫嗎？可是感覺失去挑戰性了，畢竟已經是初學者的身份了，拿這個寫就沒有學習的快樂了。所以就挑上 ruby 啦，反正也是照著彥明的 script 直翻，大概就像這樣啦： 完整的程式在這裡。唔，因為 ports 裡沒有 term-ansicolor，所以我也順手送上去了。 def ydict &#40;shift&#41; &#160; &#160; raise ArgumentError, &#8216;Must specify word&#8217; if shift == nil &#160; &#160; include Term::ANSIColor &#160; &#160; Net::HTTP.start&#40;&#8216;tw.dictionary.yahoo.com&#8217;, 80&#41; &#123;&#124;http&#124; &#160; &#160; &#160; &#160; response = http.get&#40;&#8216;/search?p=&#8217; + shift&#41; &#160; &#160; &#160; &#160; @html = response.body &#160; &#160; &#125; &#160; &#160; if ENV&#91;&#8216;LC_CTYPE&#8217;&#93; =~ /Big5/ &#160; &#160; &#160; &#160; @html = Iconv.new&#40;&#8216;big5&#8242;, &#8216;utf-8&#8242;&#41;.iconv&#40;@html&#41; &#160; &#160; end &#160; &#160; i = 0 &#160; &#160; html = @html.gsub /r/, &#34;&#34; &#160; &#160; html = html.gsub /n/, &#34;&#34; &#160; &#160; if html =~ /&#60;em class=&#34;warning&#34;&#62;/ &#160; &#160; &#160; &#160; if html =~ /&#60;em class=&#34;warning&#34;&#62;.*?&#62;&#40;S+&#41;&#60;.*?&#60;/em&#62;/ &#160; &#160; &#160; &#160; &#160; &#160; q = $1; &#160; &#160; &#160; &#160; &#160; &#160; print bold , yellow , &#34;nERROR: &#34;, shift , &#34; -&#62; &#34; , q , &#34;n&#34;, &#160;reset &#160; &#160; &#160; &#160; &#160; &#160; return ydict&#40;q&#41; &#160; &#160; &#160; &#160; else &#160; &#160; &#160; &#160; &#160; &#160; print bold , yellow , &#34;ERROR: &#34;, shift , &#34;n&#34;, &#160;reset &#160; &#160; &#160; &#160; &#160; &#160; return &#160; &#160; &#160; &#160; end &#160; &#160; end &#160; &#160; print bold , yellow , &#34;n&#34; [...]]]></description>
			<content:encoded><![CDATA[<p>寫 paper 寫到快瘋掉時，就會想做點奇怪的事。<br />
<span id="more-421"></span><br />
早上翻到 <a href="http://plog.longwin.com.tw/">Tsung</a> 的文章，有 <a href="http://www.gnu.org/software/sed/">sed</a>/<a href="http://www.perl.org/">perl</a> 寫的 <a href="http://tw.dictionary.yahoo.com/">Y! dictionary</a> script，然後手也癢了起來。用 <a href="http://www.python.org/">python</a> 寫嗎？可是感覺失去挑戰性了，畢竟已經是初學者的身份了，拿這個寫就沒有學習的快樂了。所以就挑上 <a href="http://www.ruby-lang.org/">ruby</a> 啦，反正也是照著彥明的 script 直翻，大概就像這樣啦：</p>
<p>完整的程式在<a href="http://people.freebsd.org/~clsung/scripts/ydict.rb">這裡</a>。唔，因為 <a href="http://freshports.org/">ports</a> 裡沒有 <a href="http://term-ansicolor.rubyforge.org/">term-ansicolor</a>，所以我也順手<a href="http://www.freshports.org/devel/ruby-term-ansicolor/">送上去</a>了。</p>
<div class="codesnip-container" >
<div class="ruby codesnip" style="font-family:monospace;"><span class="kw1">def</span> ydict <span class="br0">&#40;</span>shift<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw3">raise</span> <span class="kw4">ArgumentError</span>, <span class="st0">&#8216;Must specify word&#8217;</span> <span class="kw1">if</span> shift == <span class="kw2">nil</span><br />
&nbsp; &nbsp; <span class="kw1">include</span> <span class="re2">Term::ANSIColor</span><br />
&nbsp; &nbsp; <span class="re2">Net::HTTP</span>.<span class="me1">start</span><span class="br0">&#40;</span><span class="st0">&#8216;tw.dictionary.yahoo.com&#8217;</span>, 80<span class="br0">&#41;</span> <span class="br0">&#123;</span><span class="sy0">|</span>http<span class="sy0">|</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; response = http.<span class="me1">get</span><span class="br0">&#40;</span><span class="st0">&#8216;/search?p=&#8217;</span> <span class="sy0">+</span> shift<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="re1">@html</span> = response.<span class="me1">body</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> ENV<span class="br0">&#91;</span><span class="st0">&#8216;LC_CTYPE&#8217;</span><span class="br0">&#93;</span> =~ <span class="sy0">/</span>Big5<span class="sy0">/</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="re1">@html</span> = <span class="kw4">Iconv</span>.<span class="me1">new</span><span class="br0">&#40;</span><span class="st0">&#8216;big5&#8242;</span>, <span class="st0">&#8216;utf-8&#8242;</span><span class="br0">&#41;</span>.<span class="me1">iconv</span><span class="br0">&#40;</span>@html<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; &nbsp; i = 0<br />
&nbsp; &nbsp; html = <span class="re1">@html</span>.<span class="kw3">gsub</span> <span class="sy0">/</span>r<span class="sy0">/</span>, <span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; html = html.<span class="kw3">gsub</span> <span class="sy0">/</span>n<span class="sy0">/</span>, <span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> html =~ <span class="sy0">/&lt;</span>em <span class="kw1">class</span>=<span class="st0">&quot;warning&quot;</span><span class="sy0">&gt;/</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> html =~ <span class="sy0">/&lt;</span>em <span class="kw1">class</span>=<span class="st0">&quot;warning&quot;</span><span class="sy0">&gt;</span>.<span class="sy0">*</span>?<span class="sy0">&gt;</span><span class="br0">&#40;</span>S<span class="sy0">+</span><span class="br0">&#41;</span><span class="sy0">&lt;</span>.<span class="sy0">*</span>?<span class="sy0">&lt;</span><span class="sy0">/</span>em<span class="sy0">&gt;/</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; q = $<span class="nu0">1</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw3">print</span> bold , yellow , <span class="st0">&quot;<span class="es0">n</span>ERROR: &quot;</span>, shift , <span class="st0">&quot; -&gt; &quot;</span> , q , <span class="st0">&quot;<span class="es0">n</span>&quot;</span>, &nbsp;reset<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">return</span> ydict<span class="br0">&#40;</span>q<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">else</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw3">print</span> bold , yellow , <span class="st0">&quot;ERROR: &quot;</span>, shift , <span class="st0">&quot;<span class="es0">n</span>&quot;</span>, &nbsp;reset<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">return</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; &nbsp; <span class="kw3">print</span> bold , yellow , <span class="st0">&quot;<span class="es0">n</span>&quot;</span> <span class="sy0">+</span> shift <span class="sy0">+</span> <span class="st0">&quot;<span class="es0">n</span>&quot;</span> , reset<br />
&nbsp; &nbsp; <span class="kw1">while</span> html =~ <span class="sy0">/&lt;</span>div <span class="kw1">class</span>=<span class="kw3">p</span><span class="br0">&#40;</span>w<span class="sy0">+</span><span class="br0">&#41;</span><span class="sy0">&gt;</span><span class="br0">&#40;</span>.<span class="sy0">*</span>?<span class="br0">&#41;</span><span class="sy0">&lt;</span><span class="sy0">/</span>div<span class="sy0">&gt;/</span>i<br />
&nbsp; &nbsp; &nbsp; &nbsp; type = $<span class="nu0">1</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; line = $<span class="nu0">2</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; html = $<span class="st0">&#8216;<br />
&nbsp; &nbsp; &nbsp; &nbsp; $reset = reset<br />
&nbsp; &nbsp; &nbsp; &nbsp; line = line.gsub /^<span class="es0">s</span>+/, &quot;&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; line = line.gsub /<span class="es0">s</span>+$/, &quot;&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; if type == &#8216;</span>cixin<span class="st0">&#8216;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; i = 0<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $color = bold, red<br />
&nbsp; &nbsp; &nbsp; &nbsp; elsif type == &#8216;</span>chi<span class="st0">&#8216; or type == &#8216;</span>eng<span class="st0">&#8216;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $color = cyan;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $reset = reset &nbsp;+ $color<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; line = &quot;<span class="es0">t</span>&quot; + line<br />
&nbsp; &nbsp; &nbsp; &nbsp; elsif type == &#8216;</span>explain<span class="st0">&#8216;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; i = i + 1<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; line = i.to_s + &quot; &quot; + line<br />
&nbsp; &nbsp; &nbsp; &nbsp; else<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; color = bold, blue<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; next<br />
&nbsp; &nbsp; &nbsp; &nbsp; end<br />
&nbsp; &nbsp; &nbsp; &nbsp; line = line.gsub /&lt;b&gt;/, bold<br />
&nbsp; &nbsp; &nbsp; &nbsp; line = line.gsub /&lt;<span class="es0">/</span>b&gt;/, $reset<br />
&nbsp; &nbsp; &nbsp; &nbsp; line = line.gsub /&lt;[^&gt;]+&gt;/, $reset<br />
&nbsp; &nbsp; &nbsp; &nbsp; print $color , line + &quot;<span class="es0">n</span>&quot; , reset<br />
&nbsp; &nbsp; end<br />
&nbsp; &nbsp; puts &quot;&quot;<br />
end</p>
<p>ydict(ARGV[0])</span></div>
</div>
<p>看起來好像可以再簡化（比如說 "+=" 或 ".=" 這東西不知道有沒有相對應的），不過第一個 <a href="http://www.ruby-lang.org/">ruby</a> 程式就不用太講究啦。 <img src='http://blog.dragon2.net/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.dragon2.net/2007/01/12/421.php/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

