Harvest Gatherers and Brokers communicate using an attribute-value stream protocol called the Summary Object Interchange Format (SOIF), an example of which is available here. Gatherers generate content summaries for individual objects in SOIF, and serve these summaries to Brokers that wish to collect and index them. SOIF provides a means of bracketing collections of summary objects, allowing Harvest Brokers to retrieve SOIF content summaries from a Gatherer for many objects in a single, efficient compressed stream. Harvest Brokers provide support for querying SOIF data using structured attribute-value queries and many other types of queries, as discussed in Section 5.3.
To see an example of a SOIF summary stream, you can run the gather client program, as discussed in Section 4. When you do, you'll see output like this:
@DELETE { } @REFRESH { } @UPDATE { @FILE { ftp://ecrc.de/pub/ECRC_tech_reports/reports/ECRC-93-10.ps.Z Time-to-Live{7}: 9676800 Last-Modification-Time{9}: 774988159 Refresh-Rate{7}: 2419200 Gatherer-Name{50}: Computer Science Technical Reports - Selected Text Gatherer-Host{21}: bruno.cs.colorado.edu Gatherer-Version{3}: 0.3 Type{10}: Compressed Update-Time{9}: 774988159 File-Size{6}: 164373 MD5{32}: 43193942d4d53f5a8e4a7b4bcff7a415 Embed<1>-Nested-Filename{13}: ECRC-93-10.ps Embed<1>-Type{10}: PostScript Embed<1>-File-Size{6}: 428233 Embed<1>-MD5{32}: 84c123582c3d0754a39a78a7e2fb6d23 Embed<1>-Keywords{105}: technical report ECRC{93{10 Polymorphic Sorts and Types for Concurrent Functional Programs Bent Thomsen } @FILE { ftp://cml.rice.edu/pub/reports/9404.ps.Z Time-to-Live{7}: 9676800 Last-Modification-Time{9}: 772872313 Refresh-Rate{7}: 2419200 Gatherer-Name{50}: Computer Science Technical Reports - Selected Text Gatherer-Host{22}: powell.cs.colorado.edu Gatherer-Version{3}: 1.0 Type{10}: Compressed File-Size{6}: 240015 Update-Time{9}: 772872313 MD5{32}: 1712ce5a973cfbb0508b405d6fef1669 Embed<1>-Nested-Filename{7}: 9404.ps Embed<1>-Type{10}: PostScript Embed<1>-File-Size{6}: 488770 Embed<1>-MD5{32}: 84b748dbdda572a1fb1d9c3f67a2dda9 Embed<1>-Keywords{5135}: /dsp/local/papers/spletter94/spletter94.dvi Submitted to: IEEE SP. Letters - May 1994 NONLINEAR WA VELET PROCESSING FOR ENHANCEMENT OF IMAGES J.E. Odegard, M. Lang, H. Guo, R.A. Gopinath, C.S. Burrus Department of Electrical and Computer Engineering, Rice University, Houston, TX-77251 CML TR94-04 May 1994 NONLINEAR WA VELET PROCESSING FOR ENHANCEMENT OF IMAGES J.E. Odegard, M. Lang, H. Guo, R.A. Gopinath, C.S. Burrus Department of Electrical and Computer Engineering, Rice University, Houston, TX-77251 CML TR94-04 May 1994 Abstract In this note we apply some recent results on nonlinear wavelet analysis to image processing. In particular we illustrate how the (soft) thresholding algorithm due to Donoho [2] can successfully be used to remove speckle in SAR imagery. Furthermore, we also show that transform coding artifacts, such as blocking in the JPEG algorithm, can be removed to achieve a perceptually improved image by postprocessing the decompressed image. EDICS: SPL 6.2 Contact Address: Jan Erik Odegard Electrical and Computer Engineering - MS 366 Rice University, Houston, TX-77251-1892 Phone: (713) 527-8101 x3508 FAX: (713) 524-5237 email: odegard@rice.edu 1 Introduction We consider the problem of noise reduction by nonlinear wavelet processing. In particular we focus on two applications of the recently developed theory related to wavelet (soft) thresholding [2]. The model [...rest deleted...]
The ``@DELETE'', ``@REFRESH'', and ``@UPDATE'' commands are part of the Broker's Collector interface (described in Section 5.9), which provides an additional command level on top of SOIF. Currently, only the @UPDATE section is implemented. Within the @UPDATE section you can see individual SOIF objects, each of which contains a type, a Uniform Resource Locator (URL) [2], and a list of byte-count delimited field name -- field value pairs. Because the fields are byte-count delimited, they can contain arbitrary binary data. Note also that SOIF allows Embed fields, corresponding to layers of unnesting when summarizing objects (unnesting from a Compressed PostScript to PostScript file above).
SOIF is based on a combination of the Internet Anonymous FTP Archives (IAFA) IETF Working Group templates [11] and BibTeX [17]. Unlike IAFA templates, SOIF templates support streams of objects, and attribute values with arbitrary content (spanning multiple lines and containing non-ASCII characters).
In time we will make a specification for SOIF (and all of Harvest) available, which defines a set of mandatory and recommended attributes for Harvest system components. For example, attributes for a Broker describe the server's administrator, location, software version, and the type of objects it contains.