Abstract
This proposal would support research and development that will lead to the creation of an inexpensive Internet resource discovery, metadata generation and selected, full-text harvesting service. This would be of use to all organizations and institutions serving the greater learning community which create and maintain Internet portals, subject directories, virtual libraries or library catalogs that link to collections of significant Internet resources. This service would help these tools scale and meet the challenges of: keeping up with the growing numbers of significant resources on the Internet; the relatively small size of most virtual library collections, where searches yielding very few or no results are common; and, the high costs of expert, manually created metadata. Given the lack of objectivity in including and displaying Internet resources on the part of many commercial finding tools and search engines (e.g., paid results placement), library-built finding tools are becoming increasingly crucial to serious, 21st century learners. We have identified over 1,500 academic Internet portals alone and estimate the number of other learning community portals and library catalogs which would benefit from the results of the proposed work to number in the tens of thousands nationally. The software developed would have the advantage of being based upon the iVia open source portal system and would benefit, as a secondary product, the iVia system itself. Development of iVia has been funded by IMLS and carried out by INFOMINE, a virtual library service with close to a decade of experience and one of the first Web-based services offered by a library anywhere. Our already advanced foundation in iVia will be a useful one from which to build new machine learning based approaches to automated resource discovery, metadata generation and full-text harvesting. Data Fountains would be a cooperative, self-sustaining service that would lower the barriers that prevent portals and related tools from cooperating and reaping the benefits of shared efforts and resource savings in collection development and resource discovery, indexing/cataloging or metadata creation and systems development. In so doing, Data Fountains enables these tools to better scale.
Narrative
I. Impact
The proposed Data Fountains system is a research and development project that will yield an efficient, inexpensive, national and international level information utility whose service and products include shared Internet resource discovery, metadata, and rich, full-text. It will be of benefit to most IPVLCs (Internet portals, virtual libraries and library catalogs with portal-like capabilities), ranging from very general to very specialized collection focuses, and their 21st century users. The Data Fountains system and service will be developed through research that will provide technological solutions to some of the major overall problems associated with the scalability of IPVLCs. As such, Data Fountains contributes to providing a foundation for technologically augmenting and bringing forward to the learning community the collection building craft, vision, discernment, experience, and standards of librarians in a new medium and millennium. The project research is based on applying machine learning techniques to automate a number of very laborious and costly IPVLC activities. Specific IPVLC scaling challenges that we are attempting to resolve include the following: the ever-increasing number of important Internet resources; redundant efforts among IPVLCs (both in content and systems building); tasks which are very labor-intensive, require significant expertise, and which are costly to achieve.
By inexpensively providing their universally needed raw materials, i.e., metadata and full-text representing important Internet resources, the proposed Data Fountains service will offer major support and resource savings to cooperating IPVLC participants that otherwise have strong ongoing commitments to their established institutional identity or “brand”, interface presentation or look, system and, more generally, “established ways of doing things”. The key to Data Fountains viability and sustainability is that it provides a valuable universally needed service and very “generic” products that do NOT require established IPVLCs to substantially change. Such change (e.g., in moving to new advanced, resource saving IPVLC systems), even if very beneficial in the long run, is often seen, correctly or not, as prohibitively expensive or time-involved. Data Fountains greatly lowers the barriers for substantive cooperation and resource savings on the part of large numbers of IPVLCs. The core notions here, and why we think Data Fountains has great prospects for success, are that the products it creates will be universally useful to IPVLCs, usage of these products does not require change on the part of IVPLCs, and the service as a whole should function as a powerful tool for inducing greater cooperation very quickly among large numbers of IVPLCs. From the common ground of mutual interest in Data Fountains products, it is hoped that other commonalities and areas of naturally shared interests among IPVLCs can be more fully articulated and result in other substantive, cooperative efforts.
Appendix A provides more detail in addition to the Narrative, should this be desired, on IMLS National Leadership grant criteria and goals met by this proposal.
II. Adaptability
The proposed project and work is adaptable, flexible and designed to evolve with changing technologies and changing IPVLC needs and uses.
A. Service Adaptability to All Major Library Finding Tools Covering Internet Resources - The Data Fountains service would be adaptable to and of use by all IPVLCs. Worth stressing is that it would be of use to library catalogs with Internet portal-like capabilities. Given that most catalogs are increasingly evolving towards including Internet content together with Internet portal-like capabilities, Data Fountains would be of benefit potentially to a great majority of library catalogs. Increasingly these are developing the means by which full MARC records coexist with more streamlined (and less expensive) records (e.g., Dublin Core and other types). Given that library Internet portals, virtual libraries and catalogs are co-evolving towards one another, Data Fountains will adapt to and benefit the full spectrum of library-based finding tools.
B. Service Adaptability Through Offering Multiple Products, Data Types and Data Formats
- Flexibility in Amount, Type, and Cost of Data Offered - Data Fountains will offer multiple levels of product and service geared to fit the needs of IPVLCs of differing sizes, subject needs and desired data “completeness” or depth (this being the amount and type of metadata and full-text needed per resource). Costs or shares charged to cooperating participants will depend on the number of records used and the amount of information in them (e.g., various amounts of metadata and full-text can be subscribed to as differing products). Costs could be paid directly or through mutually agreeable efforts to co-develop the service and system. This flexibility should appeal to a great number of IPVLCs.
- Flexibility in Providing Various Data Formats - Varying needs in format desired will also be accommodated. Catalogs may want fuller metadata than virtual libraries and so we’ll accommodate streamlined MARC format. On the other hand, most catalogs will not be able to handle full-text searching or need full-text data though many virtual libraries increasingly will. Similarly, most virtual libraries will not be looking for MARC-like data so much as more minimal Dublin Core data. Some IPVLCs will want subjects via LCSH while others will want DDC. Yet others may want our key phrases and annotations but prefer to do their own subject application.
C. Open Source System Adaptability
The Data Fountains system is completely open for being adapted and improved by anyone who wants to do this as long as they adhere to open source guidelines. In this way the system can continue to develop and evolve with and/or independently of our efforts.
D. System Design Adaptability
Data Fountains systems architecture is modular. Separate elements of the system (e.g., the crawler or classifiers) could be developed further for other uses independently of the Data Fountains system. In addition, as technologies which the system is dependent upon advance, we and others can more easily swap out older modules and replace them with new modules. This is how the Data Fountains work will also be multi-purposed to improve the iVia portal software; its improvement being an important by-product of this project.
III. Design
A. Overall Description and Design of the Proposed Service and System
The Data Fountains would automatically supply varying levels of what represents the basic "ore" required by IPVLCs for collection building: significant Internet resources, metadata, and selected full-text. This ore would be available in both raw (relatively unprocessed) and refined products depending on the needs of the participating IPVLC and the degree to which the experts of each would be further refining or using the material received from the Data Fountains. It would do this through research and development leading to the creation and refinement of software as described under Specifics of System Design (B.), below.
- Products and Services Created - Data Fountains' multiple products and usage models support the building of a wide array of IPVLC collections with varying needs and approaches. By way of illustration, usage models of Data Fountains could take the following forms:
- Hybrid Collections Development and Support:
The first usage model, based on full automation, might entail the utilization of Data Fountains metadata and rich, full-text "as is", without review, to populate a large collection intended to undergird another, more primary and fully expert created collection that is generally more accurate but which is comparatively very labor intensive and expensive to create and maintain, much smaller and often limited in coverage. This is the INFOMINE model which we consider to be a “hybrid collection” because it features two distinct collections, with the automatically generated collection supporting, as a second tier of data, the expert built content in the primary collection.
- Internet Resource Discovery Service:
A second model would use Data Fountains solely as a large Internet resource discovery service where links and titles and other metadata are supplied but the expert reviews all and applies a considerable amount of new or different metadata. This is the least automated of the usage models.
- First-cut Metadata Records Intended for Expert Refinement:
A third approach, which is semi-automated, might involve using Data Fountains as both a discovery service and as a metadata record building service where employment of records from the Data Fountains data stream is selective but the foundation record is routinely retained and then augmented by the expert.
- Creme de la Data Fountains
A fourth approach, a variation of the third, only utilizes Data Fountains records that have been automatically determined to be the most highly significant resources (e.g., the top 20%). As such, they are flagged for expert review with the Data Fountains metadata retained, as a base record, for expert post-processing and improvement.
- Metadata Records Plus Full-text
A fifth approach is to use, either in addition to the metadata generated or by itself, the rich, full-text selectively identified and harvested from the Internet resource to populate a collection and greatly boost retrieval. Some collections may want to utilize their metadata solely but have Data Fountains perform the service of augmenting their records with full-text.
- Metadata, Standards and Full-text - Data Fountains would be standards based. The most popular format would probably be Dublin Core data, featuring Library of Congress Classifications (LCC) and Library of Congress Subject Headings (LCSH) as subject schema. In addition, a subject disciplines based schema (based in LCC and utilizing categories such as Biology > Botany > Plant Pathology), featuring a category hierarchy, would be available. As part of our work, we would be developing additional classifiers to apply additional subject schema. Participants may choose to fund the creation of new formats, subject schemas and categories which could be created to meet custom needs in collecting and classification.
Other important metadata that would be provided or created by Data Fountains would include: Title, Creators, Description (annotation-like construct), Key phrases and Resource Language, among others.
It is crucial to emphasize that in addition to fielded metadata, Data Fountains would be able to deliver selected rich-text harvested from the Internet resource. Such data is important for enhancing IPVLC retrieval capabilities, and user searching success, via the much greater data granularity provided for retrieval by full-text.
Data Fountains data would be accessed and transferred by participating IPVLC projects through our OAI-PMH server.
- Product Relevance Assurance - IPVLCs could determine and download resources of relevance to them automatically in batch mode via subject profiled searches created by and for each IPVLC in reflecting its particular interests, in the manner of an SDI search (Selective Dissemination of Information). These profiled searches would be stored and automatically executed on new incoming data at selected intervals.
In addition, IPVLCs could manually and interactively identify records which suit their needs for harvest by very selective, interactive searching and browsing of subject, key phrase, title, description, date and full-text data. The Data Fountains search/browse interface created to support this would allow very targeted, custom record identification and downloading which will enable the most general as well as the most subject specialized IPVLCs certainty in identifying and receiving only records that meet their subject criteria and needs. Data so identified would be transferred via OAI-PMH or other standard formats/means. Having such a flexible and powerful interface means that most IPVLCs, regardless of size or subject specialization, could benefit from the Data Fountains service.
- Data Usage and Accounting System - A Data Fountains accounting system would be developed. This would be used for determining the number and type of Data Fountains records downloaded by each IPVLC. It would be used to determine the "share" of Data Fountain usage and thereforeee the share of support owed to the service by the IPVLC. Such accounting would also be used to help eliminate duplicates. We would offer guidance to non-profit and educational organizations in setting up OAI-PMH on their ends when needed.
B. Specifics of Systems Research, Design, Development and Features
The lion's share of the effort (and funding) would be spent on research and development leading to innovations and re-engineering in our work in focused crawling, subject classification and rich, full-text identification and extraction. These are areas that we have much experience in through iVia, an IMLS funded system that we've developed and with which we've had many successes. In order to retain its relevance, our system needs to evolve at a pace that can take advantage of and extend the rapid advances in computer and information sciences research. Developing and applying innovations in the following areas to Data Fountains will benefit the whole IPVLC community.
- Resource Discovery/Significant Resource Identification - Focused Crawling:
A number of crawling systems have been used in iVia. In this grant we concentrate on further developing and re-engineering one, the Focused Crawler (FC). Its name indicates the type of crawling or spidering done. Focused crawling makes possible the focused, accurate identification of significant Internet resources within specific communities of shared subject interest and represents an appropriately scaled approach for the tasks at hand. The proposed funding would support work to explore and implement new, more accurate, and efficient approaches and algorithms (i.e., logic or techniques that comprise the guts of the system) upon which to base the FC. The result would be a greatly improved, high performance FC.
The FC is a program that crawls the Internet to find resources that are strongly inter-linked and which are part of, and contain content similar to, the same or related learning communities as those represented in INFOMINE and other significant IPVLCs. The high quality data from IPVLCs is often used as a seed set, or for training, in guiding the crawler. As the crawling progresses, an inter-linkage graph is developed of which resources link to one another (i.e., cite and co-cite). Good resources focused around a common topic often cite one another. Highly inter-linked resources are evaluated, differentiated and rated as to the degree to which they are linked to/from as well as for their capacities as authoritative resources (e.g., a primary resource such as a database which receives many in-links to it from other resources) or hubs (e.g., secondary sources such as virtual library collections which provide out-links to other, authoritative resources). After such assessments have been taken, a second automated process is then put into play which rates resources, as a second indirect measure of resource quality, by comparing for similarity of content (e.g., similarities in key words and vocabulary) between the potential new resources and resources already in the collection. The most linked to/from authorities and hubs, with terminology most similar to that in other high quality collections, thus become prime candidates for either adding to the collection as automatically created records or for expert review and refinement. There are numerous algorithms and approaches for detecting relevant resources through co-citation or linkage analysis and through text similarity analysis, among other approaches. These areas of inquiry and software development are rapidly changing frontiers in computer science research where great advances, which we hope to contribute to and capitalize on, are being made. The rewards, and why we are proposing this work, are much greater efficiencies and accuracy in the automated discovery of significant Internet resources and, in following, a Data Fountains system and service with a greatly improved capability to contribute to helping library-based IPVLCs scale.
Specific areas of research in focused crawling that we are targeting and which will result in new or improved work for us are specified in Appendix B. These will be used to replace or augment the approaches we currently use which are described in the article in Appendix C.
- Metadata Generation – Automated Classification:
Funding would also go to support far-reaching innovations and improvements in automated metadata generation including identifying and applying appropriate controlled subject terms (using Library-standard subject schema), keywords, and annotation-like constructs. Our automated classifier programs apply these and other metadata and are part of a suite of programs known in iVia as the Record Builder. In addition to developing new algorithms to improve the performance of our LCSH and LCC classifiers, the work here would also involve building entirely new classifiers to apply additional subject schema, DDC and UDC.
Controlled subject terminology applied currently includes LCSH and LCC. In assigning LCSH, a set of keywords and key phrases is derived which serve as a surrogate in representing each Internet resource and which summarize the resource’s content. Then, using a model that encapsulates the relationships between natural language key phrases and the set of controlled language terms making up LCSH, we assign the closest corresponding set of LCSHs. The model is learned from training datasets that consist of very large sets of records (24 million in one corpus) from library catalogs and virtual libraries where both LCSH and key phrase metadata are used to describe a given resource. With LCC our aim has been to assign one or more LCCs to a resource based on the set of Library of Congress Subject Headings (LCSH) associated with that resource. Other projects have explored similar territory though most of these are based on Information Retrieval techniques. Instead, we use Support Vector Machine algorithms. The work has been successful but accuracy could be significantly improved.
Improvements we will pursue in classification for use in Data Fountains include the following:
- Refinement of our "aboutness" measure in identifying the most relevant pages in a resource or sections of a document which are intended by the author(s) to be "rich" in descriptive information about the topics within and the type of resource and from which accurate mining of more relevant key phrases can occur. Involved is improved detection of the type and boundaries of Internet object encountered and better determination of author created structures and conventions in document and resource layout (e.g., introductions, summaries, etc.). Better, more accurate natural language key phrase harvesting means more accurate automated application of, for example, controlled subject schema terms and annotations.
- Better mappings of natural language found on Internet resources to LCSH terms through increasing accuracy in detecting synonymy and background context and intent in word usage.
- The extension of our work from LCSH and LCC to DDC and possibly UDC by developing entirely new classifiers.
- Improved software tools for accurately identifying and extracting structured, author supplied metadata. While working well in iVia, the goal here is that these tools will evolve to handle new approaches and standards that will be used in building the "semantic Web". This author-supplied metadata is an important element, and weighed heavily by our classifiers, in determining closely related terms from the subject schemas.
- More accurate key phrase identification and extraction.
- Using expert input from participants to refine the classification. Rules reflecting the specific semantics of resources in each major subject area would be developed for crawls/classification done in that area.
The specific approaches and algorithms we will be researching that support these improvements are specified in Appendix B. They will be used to replace or augment the approaches we currently use (see article in Appendix C).
- Rich Text Identification and Harvest:
Important work would support the automated identification and extraction of selected, rich text by developing new and greatly improving current approaches and algorithms and thereforeee accuracy. In addition to improving key phrase mining as mentioned above, rich full-text is also important from an information retrieval perspective because the natural language terminology contained partially corrects for the limitations inherent in most metadata and subject schema approaches (e.g., new or specialized subject terminology is often slow to appear in library standard subject schemas; which also have have the problem of being obtuse or unfamiliar to average users). Refinement of our "aboutness" measure in identifying rich text is an important task then and would involve, as mentioned above, better detection of the type and boundaries of Internet object encountered and better determination of author created structures and conventions in document and resource layout. New approaches that would replace or augment those currently used (see article in Appendix C) are detailed in Appendix B.
- Development of an Architecture that Supports Multiple, Subject Specific, Federated Focused Crawlers and Classifiers:
Data Fountains would in reality operate at the systems level as an array of separate, though federated, focused crawlers and associated classifiers given the efficiency of this as compared with a more monolithic, single crawler, multiple subject approach. Instead of one broad, multiple subject, multiple audience Data Fountain which follows a broad shotgun approach to Internet resource discovery and classification, there would be several vertical, subject and/or audience specific, focused Data Fountains (hence the plurality in the name). A specific Data Fountain would exist for each distinctive, major subject area and the subject-specific IPVLCs in that area (e.g., biochemistry, chemistry, horticulture). The cumulative content of multiple Data Fountains would go towards supporting those IPVLCs (e.g., science) that are multiple subject in scope.
Several crawling and classification experts predict greatly increased importance for the distributed, topic focused approach we will follow (i.e., focused crawlers and classifiers which make focused, vertical subject portals possible). They see such an approach as perhaps the only way that Internet finding tools can continue to be relevant to the learning community and offer the accuracy and significant content needed by that community (see Chakrabarti, '02 and Menczer, '03 references under Overall Architecture in Appendix B). This is because the generalized, commercial search engines are coming up against serious technical limitations for their algorithms (e.g., they're easily manipulated by commercial and other non-objective interests) at the same time that they attempt to satisfy the almost infinitely complex task of providing for the Internet finding needs of all audiences in all subject areas. These problems are increasingly reducing their accuracy and relevance to academic and other learning communities. They too don't scale.
These limitations are not faced by focused crawlers and classifiers of the type Data Fountains would be developing. These are better able to develop targeted, more accurate approaches to their subjects because they are focused on more tightly defined, distinct and finite subject universes and intellectual communities which, in turn, allows them to take advantage of major new innovations in effective linkage and similarity (i.e., lexical) analysis (ibid.). These experts note that the future of Internet searching as a whole may lie in searching federated finding tools based in these techniques. Such a federation, as we see it, could be a Librarian's Web of high quality finding tools. Data Fountains is expressly designed to support such a Librarian's Web. The Librarian's Web could well become extremely important to 21st century learners as the technology underlying it makes great advances possible at the same time, perhaps, that the technology underlying large-scale commercial crawlers faulters.
- Moving Advances Developed for Data Fountains into the iVia Portal System:
Given that Data Fountains and iVia work is mutually complementary and given that iVia (suitably modified and augmented to take advantage of developments in Data Fountains) could benefit by utilizing Data Fountains innovations, a secondary goal of ours would be to move appropriate advances in Data Fountains into iVia. Improving the iVia system (which provides much of the foundation for Data Fountains) in the areas of focused crawling and classification and rich text identification, as appropriate, is an important secondary goal of this project and should be easy to accomplish. The value of doing this is that we would offer not only the data products of Data Fountains but, for free as open source software, both the Data Fountains software and an improved, state-of-the-art portal system, i.e., iVia, with which to utilize these products. Sharing systems foundations and a number of components guarantees that these two sister systems would interoperate well and be mutually complementary.
C. Organizational and Service Building Design
Data Fountains is a service that is intended to support itself after start up. Access will be provided to IPVLC cooperators who invest in the service through purchasing various amounts and types of data and/or help develop systems components and/or help market the products and, by so doing, share in supporting its operational as well as some development costs. Investment in the service also results in shares in the governance and development of the service. Data Fountains would be set up organizationally as a non-profit, non-commercial, cooperative entity.
In providing for as much self support as possible, after initial development, augmentation and refinement through IMLS grant funding, Data Fountains would charge a small fee to participating cooperators for the data provided and for any services rendered (e.g., advising with OAI-PMH setup). Costs would be inexpensive, prorated according to the amount and type of data used by each IPVLC, and primarily be meant to offset operational costs incurred in running the Data Fountains. Specific costs to be offset would include: hardware and bandwidth costs, systems administration personnel, service management personnel, and systems development personnel (though the latter would be mostly grant funded).
Access to the Data Fountains would be passworded. Each subject access and download would be costed. Each level of metadata and full-text used would have differing costs based on the amount of data provided. Each would be considered a different "product". In order of increasing expense, types of records available would include the following: a.) Basic Metadata -- title, URL, author and keyword; b.) All Metadata -- the prior fields plus subject terminology (including LCC and LCSH) and description/annotation; and, c.) Full Record -- the prior fields plus selected, rich full-text. Different standard formats would be supported, e.g., Dublin Core and streamlined MARC, as well as various other unique formats desired by and funded by participants.
As a cooperative project, Data Fountains would involve its participants in improvement and publicity for the service as well as in co-development, testing and general improvement of the system. Services so rendered could be considered in-kind payments and help partially defray data costs. Data Fountains would encourage its participants to federate and be an first step towards more advanced forms of cooperation.
A Data Fountains Service Manager/Developer would be hired through the grant to work with the INFOMINE Team to establish the Data Fountains non-profit, cooperative organization. This person would also establish and normalize approaches and routines involving: costing structures; accounting procedures; recruitment and publicity procedures; and, governance among participants. The service manager would work with an Advisory Board of experienced IPVLC managers who would provide commentary and advice on technical, organizational and economic aspects of the project.
Data Fountains Cooperator Recruitment:
The service would be made visible through extensive publicity and outreach. A program of incentives and introductory offers would be developed. For example, the first several downloads of Data Fountains data would be free so that potential subscribers could ascertain for certain the value of the service to their projects.
Other Organizations:
- We have been working with National Science Digital Library (NSDL) personnel on a pilot basis to share information...
- We would be surveying over 20 IPVLC managers to elicit their specific needs in regard to type of record, format, subject schema, appropriate costing and so on. It is our hope that some of them will become involved in catalyzing and publicizing the service.
IV. Management Plan
This project will be carried out by the INFOMINE Team of the Library of the University of California, Riverside. INFOMINE has provided close to a decade of IPVLC service and is one of the first Web mounted, library-based services of any type. Few have as much experience with this type of tool and Internet service provision as our group. Currently (1/03), INFOMINE has over 26,000 expert identified and described records and 80,000 plus crawler/classifier identified and described records linking to important resources of a scholarly or educational nature.
We have successfully completed grant work funded by the Fund for the Improvement of Post-Secondary Education and have successfully brought two recent IMLS National Leadership grants to completion. These have developed and/or are extending the work in advancing our iVia open source Internet portal/virtual library system.
C. Advisory Board Personnel:
For this project, we will be creating an advisory board composed of volunteers experienced with Internet portals, virtual libraries and/or library catalogs with portal-like capabilities. Their duties will include providing approximately three days of work in developing feedback and evaluation in the design of the system, service, and products.
IX. Dissemination and Recruitment
Initial Survey of Needs and Intensive, Directed Contact of User Group - Our Data Fountains Service Manager/Developer will be surveying a sample of 20 portal directors at the beginning of the project to draw out details as to what they would desire in the Data Fountains service.
Articles, Presentations, News Releases - The Service Manager/Developer will also be developing articles and giving presentations on the project, as will most INFOMINE staff. These will address audiences and publications that will be naturally interested in: a.) the Data Fountains service as a whole ( = most libraries and IPVLC providers); b.) the underlying research that has occurred ( = computer and information scientists and other IPVLC focused researchers); and, c.) the community of open source software developers.
Software Publishing - The software will be published as open source. It will be made visible to and disseminated to projects and software developers who wish to utilize and improve the system and, by virtue of its open source copyright, have their improvements in turn be openly published for use by others (including ourselves).
X. Sustainability
Sustainability, in regard to keeping the service optimized and working well and the software evolving and benefiting from innovations of relevance, is a primary goal. Data Fountains is designed to be self-sustaining economically and should support itself after start up. Data Fountains would be set up organizationally as a non-profit, non-commercial, cooperative entity.
It is important to note that the software will be open source precisely because this is a strategy for sustainable software development. As open source, its continued development will be sustained not only through the IPVLCs that benefit from it but, in addition, through the sustained efforts of all parties that may want to further develop all or parts of the system for their own uses. This is the great benefit of copyrighting software as open source since such parties are required to copyright their work, in turn, as open source thus making their work available to all. A concern for sustainability will guide Data Fountains systems architecture. Realizing that rapid change will continue to be the norm, Data Fountains is engineered so that all components are modular and can be easily swapped out and be replaced by new modules, representing new more efficient approaches.
The Service Manager/Developer funded by the grant will develop, market, publicize and routinize all aspects of the service with assistance from permanent INFOMINE staff. It is anticipated that fees received from cooperators will fund any FTE required to sustain management of the service. Similarly, the system developed will be stable and easily maintained through permanent UCR Library systems personnel. While some post-grant systems development will be supported by cooperator fees, most new programmer/developer effort will be dependent on continuing development by the open source community and by grants.
|