zenkat » Discuss

Discussions on zenkat

  1.  

    wikipedia mining

    1. hi brian, i loved your talk on mining wikipedia. i'd like to mine the soft data, is there a smart way of parsing the unstructured stuff? like small, adhoc stuff -ie for pages in [[category: racecar drivers]] find "sponsored by _____" ?

      you're cool.

      i applied for an internship but didn't hear back. so i suppose i, and the idea on a limb.cheers-

      1. If you're interested in doing deep parsing of wikipedia, I'd suggest looking at WEX.  It's a parsed and XML-formatted version of wikipedia we use for internal processing here at Metaweb.  It's freely available here:

        http://download.freebase.com/wex/

        It works best with postgres, but it can also be processed by Hadoop or by local clients.  Let us know if you have any questions!

        Brian



    Discussion is posted in:

    Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

Start a New Discussion

Discussion will be posted in:

Think this discussion also relates to something else? Cross-post it by adding a new discussion area: