C# Wrapping Library of Google AJAX Search API

November 26th, 2009 Phanix

以前在程式裡頭用到 Google Search 的機會比較少,所以都是直接把 search term 丟給 Google search,然後 parsing 回傳 page 這樣土法硬幹的方式。

不過最近使用量比較大,所以就乾脆來找 Google Search API ,沒想到 Google 說現在一般的 Search API 已經不接受申請了,比較建議使用 AJAX API。

看了一下說明文件,對於要寫網頁程式的話確實蠻方便的,不過對於要寫 Windows / Console 程式來說就不是這麼好用… 「該不會要自己來寫一個 library 把這個 AJAX Search API 包起來吧?」

正當腦子裡才冒出這個念頭的時候,另外一個聲音響起,「不對,這事情一定也有人想過,然後很好心地做出來了」… 果不其然,找了一下就找到了 http://gapidotnet.codeplex.com/ 這個 wrapping api… 真是太棒了… XD

The recent days after back to TW

July 5th, 2009 Phanix

Just wake up from noon break, I think I have to write down some day-to-day things.

Read the rest of this entry / 繼續閱讀 »

Using XPath to select nodes with namespace in C# / 在C#中用XPath選取具有namespace之節點

May 9th, 2009 Phanix

處理 XML 文件資料時,利用 XPath 來選取文件中節點是蠻常見的方式,可是當這個節點是具有 namespace 時,該怎麼辦呢?

Read the rest of this entry / 繼續閱讀 »

Abandoned World

April 10th, 2009 Phanix

Early this morning, a fiber optic cable from Santa Cruz to San Francisco area is severed.

Read the rest of this entry / 繼續閱讀 »

[murmur] 囧rz 的 meeting

January 30th, 2009 Phanix

“I am not sure how many sessions I should check whether each of them has single or multiple information need.”

“So, how many sessions you’ve checked?”

“about 2000.”

“Wao… that’s enough, much more than necessary.”

“囧rz…”

“Another question is, how many sessions I should used for experiment? I am not sure how many is enough.”

“How many do you have now?”

“About 50000. Including the data needed for checking information need.”

“Oh, that quite a lot. Actually, I think 50 or 100 are enough.” 

“囧rz… 囧rz… 囧rz… 囧rz…”

[Note] Import Data in SQL Server 2005 (匯入資料)

January 9th, 2009 Phanix

Remeber to check the item “Integration Services” (SSIS) in the install process, or you will fail and get a error message like “product level is insufficient for components” while importing data from txt, excel, etc. files into a SQL server 2005 database.

Related: porduct level is insufficient for components

[Memo] 整理一下最近寫程式用到的東西

December 4th, 2008 Phanix

免得以後要找很麻煩。都是 C# 的程式。

Threading 中作 output 到 textbox 中

delegate void SetTextCallback(TextBox tb, string text);

private void SetText(TextBox tb, string text)
{
    if (tb.InvokeRequired)
    {
        SetTextCallback d = new SetTextCallback(SetText);
        this.Invoke(d, new object[] { tb, text });
    }
    else
    {
        tb.Text = text;
    }
}

Extract text in <body> tag

strpage = ""; // Store The HTML Source
strtext = "";

// only fetch text between <body>
ibodystart = strpage.ToLower().IndexOf("<body");
ibodyend = strpage.ToLower().IndexOf("</body>");

if (ibodystart < 0) return;
if (ibodyend < 0) ibodyend = strpage.Length;

// j and k are used to quote text between continous tags
j = strpage.IndexOf(">", ibodystart);

sw = new StreamWriter([FILENAME], false, Encoding.UTF8);

#region filter out html tags, css and scripts, and then just keep plaintext
while (j > 0 && j < ibodyend)
{
    // j and k are used to quote text between continous tags
    k = strpage.IndexOf("<", j);
    
    
    // read text between tags, and store in strtmp
    if (k < 0)
    {
        strtmp = strpage.Substring(j + 1);
    }
    else
    {
        strtmp = strpage.Substring(j + 1, k - j - 1);
    }
    
    
    strtmp = HttpUtility.HtmlDecode(strtmp).Trim();
    
    // concate strtext and strtmp
    if (strtmp != "")
    {
        if (strtext == "")
        {
            sw.WriteLine(strtmp);
            strtext = strtmp;
        }
        else
        {
            sw.WriteLine(" " + strtmp);
        }
    }
    
    // find out next j
    if (k < 0)
    {
        j = -1;
    }
    else
    {
        //check comment
        if (strpage.Substring(k).Length <= 7)
        {
            j = -1;
        }
        else if (strpage.Substring(k, 4) == "<!--")
        {
            j = strpage.IndexOf("-->", k);
            if (j >= 0)
            {
                j = strpage.IndexOf(">", j);
            }
        }
        else if (strpage.ToLower().Substring(k, 7) == "<script")
        {
            j = strpage.ToLower().IndexOf("</script>", k);
            if (j >= 0)
            {
                j = strpage.IndexOf(">", j);
            }
        }
        else if (strpage.ToLower().Substring(k, 6) == "<style")
        {
            j = strpage.ToLower().IndexOf("</style>", k);
            if (j >= 0)
            {
                j = strpage.IndexOf(">", j);
            }
        }
        else
        {
            j = strpage.IndexOf(">", k);
        }
    }
}
#endregion

sw.Close();

Execute the other .exe with parameters from command line (without showing the window). This example uses WordNet.

Process p = new Process();
string strwn1, strwn2;

#region Call wn.exe for wordnet hypernym
p.StartInfo.UseShellExecute = false;
p.StartInfo.RedirectStandardOutput = true;
p.StartInfo.CreateNoWindow = true;

// word 1
p.StartInfo.FileName = @"C:\Program Files\WordNet\2.1\bin\wn.exe";
p.StartInfo.Arguments = @"" + w1 + " -hypen"; // w1 is a word

p.Start();

strwn1 = p.StandardOutput.ReadToEnd();

p.WaitForExit();

// word 2
p.StartInfo.FileName = @"C:\Program Files\WordNet\2.1\bin\wn.exe";
p.StartInfo.Arguments = @"" + w2 + " -hypen"; // w2 is the other word

p.Start();

strwn2 = p.StandardOutput.ReadToEnd();

p.WaitForExit();

#endregion

William Liu’s talk in NCTU

February 21st, 2008 Phanix

跟過去有些相關的人… 世界真的很小…

Read the rest of this entry / 繼續閱讀 »

Correlation Coefficient

January 21st, 2008 Phanix

多種相關係數, 下面有一些參考 link & formulas。相關係數可以拿來判斷哪一個 feature 和結果有高度的關係。

http://www.stat.nctu.edu.tw/subhtml/source/course/course92_2/SCM/SCM2.2.ppt

http://webclass.ncu.edu.tw/~tang0/Chap8/sas8.htm

http://163.29.37.193/webdesign/001/006/week2_introduction.htm

http://el.mdu.edu.tw/datacos//09410121035A/%E7%9B%B8%E9%97%9C%E4%BF%82%E6%95%B8%E7%A8%AE%E9%A1%9E.doc

http://cclearn.npue.edu.tw/tuition/ccchen-web/%E6%95%99%E8%82%B2%E7%B5%B1%E8%A8%88%E5%AD%B8/7.pdf

嶄新的十二月

December 3rd, 2007 Phanix

許多事情都是新的,也不管我願不願意? T________T

Read the rest of this entry / 繼續閱讀 »