next up previous contents index
Next: Location of support Up: 4.4 Extracting data for Previous: 4.4.1 Default actions of

4.4.2 Summarizing SGML data

     

Starting with Harvest Version 1.2, it is possible to summarize documents that conform to the Standard Generalized Markup Language (SGML) [15], for which you have a Document Type Definition (DTD). gif The World-Wide Web's Hypertext Mark-up Language (HTML) is actually a particular application of SGML, with a corresponding DTD. (In fact, the Harvest HTML summarizer now uses the HTML DTD and our SGML summarizing mechanism, which provides various advantages; see Section 4.4.2.) SGML is being used in an increasingly broad variety of applications, for example as a format for storing data for a number of physical sciences. Because SGML allows documents to contain a good deal of structure, Harvest can summarize SGML documents very effectively.

The SGML summarizer ( SGML.sum) uses the sgmls program by James Clark to parse the SGML document. The parser needs both a DTD for the document and a Declaration file that describes the allowed character set. The SGML.sum program uses a table that maps SGML tags to SOIF attributes.





next up previous contents index
Next: Location of support Up: 4.4 Extracting data for Previous: 4.4.1 Default actions of



Darren Hardy
Mon Apr 3 15:22:37 MDT 1995