Technology
In order to ensure the authenticity and practicality of the " Electronic Version of Siku Quanshu (Wenyuange Edition) ", as well as to guarantee that all the content and functions are really usable to users, we continuously conducted testing and upgrading during the whole course of development and production.
The Development Project, Phase 1:
1. Collecting Information
After confirming the positioning and the target users of the product, we started to collect all related and useful information for the production. We met, discussed and shared actively with scholars, experts and experienced librarians to collect valuable and professional opinions and suggestions. In December 1997, a prototype was first developed with basic functions for trial, which greatly enabled us to get more concrete and substantial ideas on what design or functions could really meet the users' needs.
2. Programming
Since there are over 30,000 Chinese characters used in the " Electronic Version of Siku Quanshu (Wenyuange Edition)", neither GBK nor Big5 coding standards can comprehensively and perfectly deal with such enormous database, our programming team therefore decided to use Unicode for building up the scheme, which enabled the program to run smoothly and compatibly on Chinese (Traditional/Simplified Chinese), English, Japanese and Korean Windows(R)98 or Windows(R)NT4.0 (or above) platforms. The alpha version has been released in February 1998 and its multi-platforms compatibility provided great convenience to both local and overseas users. In May 1998, an improved beta version was released with enhanced functions incorporated.
3. Testing
In order to ensure the quality and accuracy of the product, we have specially invited a group of scholars and researchers from Beijing, Shanghai and Hong Kong to participate in different testing stages. The opinions collected from one stage would be used to further fine-tune, modify and enhance the product so as to make it more practical, user-friendly and reliable.
4. Innovating Technology
A path for digitalising classics & An expansion of Chinese Character Codes
Grounded on Unicode / ISO10646, the Electronic Version of Siku Quanshu (Wenyuange Edition) has generated nearly 32,000 Chinese character codes which helps lay a strong foundation for the future digitalisation of Chinese classics.
Cross-platform
By adopting Microsoft's Single Binary (a cross-platform technology), the Electronic Version of Siku Quanshu could be viewed in a multi-lingual environment, be it Chinese (Traditional/Simplified), English, Japanese or Korean. Likewise, it can be operated in various platforms such as Windows(R)98, Windows(R)2000, Windows(R)NT4.0 and Windows(R)XP.
Fuzzy logic
With a high-speed scanner, image processing tools and Optical Character Recognition (OCR) technology, we managed to abundantly key in and proofread approximately 800 million Chinese characters contained in the original Siku Quanshu.
Large database for research purpose
To facilitate research, we have generated a multiple but large electronic database which can be accessed and searched in a variety ways. The data is catalogued in the following ways:
- 1.8 million subtitles
- Book titles exceeding 3,400
- Information on nearly 3,000 authors
- The full text of approximately 800 million Chinese characters
5. Partners Involved
Digital Vision Multimedia Limited
- Responsible for program design, editing and proofreading, project planning and management, technical support and testing, customer service.
Unihan Digital Development Company Limited
- Responsible for program design and programming, technical research and development, title extraction, editing and proofreading, data generation, quality control, technical support and testing.
Department of Computer Science & Technology of Tsinghua University
- Responsible for developing OCR engine.
Founder Electronics Company Limited
- Responsible for developing special characters database.
Microsoft (Beijing) Research & Development Centre
- Responsible for providing technical support for platform development.
The Development Project, Phase 2:
1. Complete Digitalisation
The Electronic Version of Siku Quanshu has been developed into different versions between 1999 and 2003, however, their content are not completely digitised due to the word limit of the Chinese character set. There was no way but to display those undigitised content in the form of image of their original scripts. Moreover, contents of tables and chronologies were not digitised due to technological restriction. These obstacles have been removed following the technological advancement in recent years. In order to fulfill the needs of market and user, we decided to complete the digitalisation of content in 2005.
2. Adopting Character Set of International Unicode Standard
The new character set adopted about 72,000 characters of the latest international ISO/IEC10646:2003 Unicode 4.1 Standard, with about 10,000 specially made private-used characters to build up a character set of more than 82,000 characters. The character set includes new font shape for Mainland China users and traditiional font shape for Taiwan and Hongkong users. By using the new character set, the undigitised contents are made searchable. Apart from this, the old database including 4,957 private-used character codes are converted into the same coding scheme as the newly digitised content.
3. Innovating Technology
Building up the new big character set
Since the number of characters of the new character set exceeded 65,536 which is the maximum limit Microsoft Windows can support, the technique of font-linking is employed to tackle the problem.
Digitalisation of tables and chronologies
With the advancement of Optical Character Recognition (OCR) technology, we managed to digitise and recover the format of tables and chronologies of a great variety of patterns.
4. Partners Involved
| ITventures Limited |
- Responsible for project planning and management. |
| Magically Asia Limited |
- Responsible for system engineering. |
| TudorTech System Co., Ltd |
Responsible for search engine development. |
| Ilibo Digital Technology Co., Ltd. |
- Responsible for content digitalisation. |
| Founder Electronics Company Limited |
- Responsible for developing character sets. |
|