This plugin will "include" other pages in this wiki. In the language of hypertext this is called transclusion. The pages will render in distinct tables. You can also load external pages in a more limited fashion with the FrameIncludePlugin. Examples:

Included from HomePage

(This site is work in progress)

Welcome to the web site of the Web as Corpus Toolkit. This web page is based on a wiki to make maintainance easier and to bring you all those nice features like RSS feeds, fulltext search and so on.

What is the Web as Corpus Toolkit?

The Web as Corpus Toolkit is a collection of programs that can be used to create a (large) text corpus from a list of URLs. The corpus can then be used for linguistic purposes or for lexicography. While it is questionable whether you are allowed to distribute a corpus of web pages you do are not the copyright holder of, it is much easier to distribute only pointers to all those pages - a list of URLs.

The programs are easy to use and written entirely in Perl. Some extra Perl modules from CPAN are required. Linux is recommended. Other Un*x systems are not approved to work with it. A detailed manual is available. The tools are licensed unter the terms of the GNU General Public License.

If you now are wondering what exactly a corpus is and what to do with it, we suggest you to go on a short excursion to Wikipedia.

If you wish get a quick overview, try the Schematic graphic that shows the steps of processing done by the WaC Toolkit. There is also a Screenshots page.


Here is an incomplete list of the most important features of the latest release, not including upcoming features being already present in the SVN releases

  • Parallel downloading from URLs to seize your internet connection
  • Runs with parallel processes wherever possible to seize multi-CPU machines
  • Exessive reporting in logfiles: You can find out everything
  • Uses Unicode
  • A number of filter modules do everything for you. If they don't do enough, you can write your own
  • The common problems of web as corpus are addressed: Wrong character set information, conversion to unicode, boilerplate removal (navigation frames, etc.), sentence-segmentation, tokenization, etc.

More Information (Instead of a Menu)

Concacting Developers

You find possibilities of contacting us on the People page.

Related Projects and Sites

The original contents of the virgin Wiki has been moved to the VirginWiki page.

Included from WabiSabi

Since wabi-sabi represents a comprehensive Japanese world view or aesthetic system, it is difficult to explain precisely in western terms. According to Leonard Koren, wabi-sabi is the most conspicuous and characteristic feature of what we think of as traditional Japanese beauty and it "occupies roughly the same position in the Japanese pantheon of aesthetic values as do the Greek ideals of beauty and perfection in the West."

"Wabi-sabi is a beauty of things imperfect, impermanent, and incomplete.

"It is the beauty of things modest and humble.

"It is the beauty of things unconventional."


The concepts of wabi-sabi correlate with the concepts of Zen Buddhism, as the first Japanese involved with wabi-sabi were tea masters, priests, and monks who practiced Zen. Zen Buddhism originated in India, traveled to China in the 6th century, and was first introduced in Japan around the 12th century. Zen emphasizes "direct, intuitive insight into transcendental truth beyond all intellectual conception." At the core of wabi- sabi is the importance of transcending ways of looking and thinking about things/existence.

  • All things are impermanent
  • All things are imperfect
  • All things are incomplete

(also taken from WABI-SABI: FOR ARTISTS,DESIGNERS, POETS & PHILOSOPHERS, 1994, Leonard Koren):

Material characteristics of wabi-sabi:

  • suggestion of natural process
  • irregular
  • intimate
  • unpretentious
  • earthy
  • simple

For more about wabi-sabi, see


PhpWikiDocumentation WikiPlugin

Last edited on Tuesday 25 October 2005 12:48:12

Edit | PageHistory | Diff  | PageInfo
Datenschutzerklärung & Impressum