XML? Digitization? Meta Data?: News Division Technology Spotlight

Moderator Justin Scroggs presented the first part of this session, highlighting the new forms of digitized content, methods to mold these into revenue generating vehicles, and the problems inherent in multiple media, rights and permissions. The second part of the program dealt with the Chicago Tribune project to digitize retrospectively large portions of their clipping and film collections dating back to 1849.

Scroggs postulated two business problems (and, by the way, Scroggs is always good to listen to, because he can be funny). The first problem is that content providers are challenged to create new revenue generating vehicles; the second problem is capturing content for multiple media, and the concomitant rights and permissions. Consumers expect content as print, on hand-helds, in e-books, and on screens. These multiple ways of delivering will require two factors not consistently found today: meta data and standards. The point Scroggs emphasized is that information will need to be available automatically, and convertible on the fly. It must also look "gorgeous". One overhead tells the story: text, photos, graphics, charts, maps, audio, video, must equal web, print, TV, radio, reprints, books, catalogs.

All of the above must contain implicit and explicit meta data, or information describing other information: content, structure, format, creation date, rights. However, there is a forest of standards out there. Why do we care? We care because a format such as XML and meta data (see and can offer management of rights, control of processes, efficient repurposing and standardization of searches, search query precision, internationalization.

The managing of rights and permissions alone presents issues of geographic restrictions on use, time, language and market, format and alteration statements and indications of exclusive use. If this standardization and internationalization are to occur, the multiplicity of standards is a serious problem. We are offered a list that includes Dublin Core, DOI, <indecs>, EPICS, NITF, XMLNews, SMIL, DASL, ISO 112083, and on and on.

Scroggs asks, How do we avoid reinventing the wheel? Competing standards are not the answer, but PRISM (publishing requirements for industry standard metadata) is an answer. PRISM is a working group of collaborating vendors and publishers. The question was asked above: Why do we care? "In the end it is always about the money" (Linda Burman, PRISM WG Chair).

The second part of the program, presented by John Jansson, Editor, Information Systems, Chicago Tribune, and John R. Yokley, President, PTFS (Progressive Technology Federal Systems,, outlined a three-year project to digitize:

15 million news clippings dating from the early 1900s to 1985;

the images and text of all front-page stories from 1849 to the date when digitizing became a daily event; and

all obituaries, paid death notices, and other stories about deaths from 1849 until recent years when digital records began being kept.

The project purpose is to generate new revenue sources, preserve clippings, eliminate clipping loss, and facilitate simultaneous access. It is planned to take three years to complete and will generate a database of 1.5 terabytes consisting of:

56,000 front pages from 1849 to the present;

15 million clippings, 1900-1985, and

obituaries and death notices (3.84 billion characters), 1849 to 1985.

The immensity of this project is overshadowed, in my mind, by the technical challenges: the need to automate highly in order to reduce cost, rekeying of substantial amounts of text, quality control, material handling, and key field indexing (meta data). Yokley described the processes developed and used for this project.

The actual work is being done in the Ohio prison system and in India. He stressed that quality control is imperative at all stages. His company, PTFS, has developed the PTFS Custom 5 Pass Voting OCR, which he described. PTFS also developed the ArchivalWare Editor for further enhancing and manipulating of the collection. Fuzzy searching is used for the OCR items and clean full-text is available, with a natural language search engine, for the re-keyed articles.

The preparation for this task included lists of Chicago-centric words, slang and curse words, and hyphenated words, all of this to aid in spell-checking and eliminating of unwanted text. Hyphenated words were important because of the need to make a distinction between printed hyphenation and hyphenated words.

These collections will be accessible for internal distribution at the Chicago Tribune, as well as being marketed by NewsBank. This session, as well as all those I attended this year, ran up to the hour and allowed little time for questions. This session was an excellent "door opener" for the remainder of the meeting, where equally as provocative and challenging issues were discussed.

Ethics in an Online World: Legal and Privacy Issues in Journalism and News Research

Jerry Bornstein is a librarian and professor at Baruch College. However, until 1994 he worked for the National Broadcasting Company as a researcher. He anecdotally began his presentation (available at test/sla) by telling a story of being invited to do the research for a program based on the book The Inevitability of Patriarchy by Steven Goldberg. Although the program never came to fruition, NBC's ombudsman raised critical issues about the bogus nature of the book, and the fact that this information should have been brought to the attention of news, since news had paid for the research. This incident became a perfect example of the ethics involved in claims to professionalism and an examination of the various ethical codes that are available to the practicing researcher. Can the researcher lay claim to being a professional when professionalism is defined as engaging in continuing education, the existence and participation in professional societies, scholarly research, and the existence of professional schools at the university level with ethical codes of conduct?

Library ethics are embodied in the code of the American Library Association. They speak to quality of service, opposition to censorship, protection of privacy, confidentiality, respect for intellectual property rights, avoidance of conflict of interest, and respect for professional development. The Special Libraries Association has no code of ethics.

A survey among 68 Special Libraries Association News Division librarians spoke to various ethical issues. The broad categories included quality of service, access to information, conflict of interest, and confidentiality. Several ethical "no nos" arose. These included delivering poor-quality research, obtaining information under false pretenses, copyright infringement, and passing along questionable data. Among the interesting results:

69 per cent responded that there could be conflict between professional loyalty and commitment to the client;

74.9 per cent favored a code of ethics for the Special Libraries Association;

52.9 per cent believed databases with personal information posed ethical issues; and

79.2 per cent observed research quality problems.

Bornstein concluded that his research in some cases produced "cognitive dissonance" since 14.7 per cent polled said they never encountered ethical dilemmas, contradicted by earlier responses in a number of cases.

I was completely unaware that we in the Special Libraries Association operated without a code of ethics, since both the American Library Association and the American Association of Law Libraries function within written codes, copies of which were distributed by Bornstein. As more and more information is made available to the modern-day researcher, the issues raised in this research and presentation will surface more and more frequently.

Fred Mann, from Philly Online, opened his presentation, "Online Ethics and How They Can Pay off for You!," by pointing out that online ethics are rarely discussed and present more questions than answers. Do we care about online ethics in news and information sites? Do the rules apply to new media? Do the underlying principles of journalism get abandoned?

No. The core values of journalism say that the Web sites are credible, that they are believable, and that this makes people trust us. This is true not just for traditional news sites, but also for the top brands, e.g. and Again, the same thread runs through all the discussion of online ­ money is the driver. It is not possible to replicate traditional news products and in the last few years virtually none of the online issues have been resolved, although some are less important now. An example is linking: Who is responsible for offensive sites? Copyright, legalisms? Hands-off bulletin board policy? Unlike the traditional news format, the news hole is endless. Is there fairness and balance? Who decides which supplemental material to add? Who decides content generally? Three big issues exist: speed, accuracy, and fact checking. Is 80 per cent accuracy good enough? No. Other major issues include corrections, privacy (sending cookies and other means of information gathering on users), community publishing, business pressures (news versus advertisements, or can a profit goal support journalistic values and ethics), full disclosure of source of materials online, and general editorial versus advertorial pressures. Mann showed a brief video spoofing sports telecasting and the incessant endorsement of everything. It was funny and a sobering reminder of the influence that endorsement money can wield.

Both Bornstein and Mann offered clarity on an area that is discussed but for which there is little consensus about direction. Many of the issues involved are common to traditional print media sources also; however, the new media environment and the ephemeral nature of its content, make these issues even more pressing of resolution. These topics will be revisited, I am sure, at future meetings.

Hot Technologies ­ eFuture Concepts and Products

Tom Fleming, from Piper, Marbury, Rudnick & Wolfe LLP, opened the session, sponsored by, with his shopping list of new products and concepts. The first item was a summary of Tom Peters' article from Time (May 22, 2000) on the changing nature of work. An audience member took issue with his quote of MIT's Michael Dertouzos that the USA estimates losing 50 million jobs to India. The audience member pointed out that:

there is an acute labor shortage in the computer industry in the USA, prompting many Indians to come here to work; and

there is also a large computer industry in India.

However, we are not losing jobs to India, she pointed out. The larger point made by Fleming is that the Internet has become a vehicle for e-commerce that saves procurement waste via business-to-business ventures and that time is compressing in terms of rate of change in how people conduct their affairs, e.g. 304 million people have Internet access.

Technology Review (May/June, 2000) questions whether the entire computer phenomenon will max out and there is also the issue of personal and professional life blurring, with the widespread use of cell phones as an example. Other phenomena include ASPs, or Application Service Providers (, software and data over the Internet or leased lines, e.g. litigation support can be found at

Many changes are happening to the Internet itself, including the development of Internet2, and Adeline Net. Business to business (B2B), business to consumer (B2C), and especially consumer to consumer (C2C; ebay, email, and instant messaging), are becoming ubiquitous in our daily lives. Projects such as the Stanford Poynter Project are looking at how people view online news ( and who is actually reading it.

New technologies are allowing voice-activated access to services and mobile computing is getting easier with new types of equipment designed for portability and mobility. These include Internet payment services, billing through your ISP, customized portals, PDAs and PocketPCs with remote access, rather than wired. Wireless communication is the next big step and breakthrough in personal and business computing, as it becomes more accessible and available.

Above all, for us, are knowledge management and eLearning. Many new online learning/training programs are being instituted, such as IBM's Mindspan Solutions division to assist companies in setting up online training programs. But on the money side, it is eCommerce that is driving much of the new, producing side products like mCommerce ­ mobile commerce. The Federal Communications Commission is requiring location of cell phones within 400 feet by the end of 2001, as well as televend products and instant advertising. And this brings us to new language being coined: cyberanthropology, the effects of cyber interaction.

Under Fleming's heading of intriguing are the following:

Are solar storms the same as y2k?

What will the breakup of Microsoft mean?

Can Linux replace Windows on the desktop PC?

Is there a life without Intel?

Web sites for the blind, that have smell, touch, feeling?

Fleming's "shopping list" was followed by a description of how to establish the virtual research environment. This summarized much of what has been happening for a number of years now, with some new technologies to update the mix. I continue to be interested in the possibilities Nathan Rosen described for the e-book, with its capacity to load a large quantity of information tailored to the individual user, giving the potential for the tailored mini-library. The e-book has a screen of decent size, as opposed to the PDA, it is inexpensive, and you can make notes. Multiple copies can be made available inexpensively.

Rosen, from Credit Suisse First Boston Corporation, continued on to describe his experiences "wiring" his workplace. I liked his quote from Ms Frizzle in the children's book, The Magic School Bus: "Take chances. Make mistakes. Get messy." These included telecommuting, which is slow via the Web, but possible and faster with other types of access. He also suggested the creation of simple intranets, secure spaces, bulletin boards, a favorite bookmark site, and e-mail for reference responses. Transparent telecommuting is a real plus for him. He suggests also turning to vendors such as Westlaw or Lexis for customized search pages; and he cautioned that we not forget that we are improving service, not inventing widgets.

One audience member pointed out that the Net as we know it all came about under Bill Clinton's presidential administration. A new administration could mean unknown change. I found this closing thought as intriguing as some of Tom Fleming's items.

Sammy Alzofon is Library Director, The Palm Beach Post, West Palm Beach, Florida.