![]() |
Introduction Introduction Introduction Introduction Introduction
|
Obsolescence: File Formats and SoftwareIntroduction Although some file formats specifications are largely independent of specific software (for example encoding schemes such as ASCII and Unicode), most are tied to individual or related groups of software. The software and its related file format specification usually evolve together and have fates that are tightly bound. Therefore it makes sense to discuss software obsolescence and file format obsolescence together. What's in a File Format Specification?
What Factors Contribute to File Format Obsolescence?
Why are File Formats a Challenge to Digital
Preservation? Most software is upgraded on a regular basis. Although most applications can read files created with the previous version and perhaps the one before that, the ability to read older versions is often dropped. Files that have not been migrated may not be readable by the latest version of the software, and the older version software may no longer be available, or may not run on a current computer, or under a current version of the operating system. Also, due to the complexity and dynamic nature of many file formats, it can be extremely difficult to determine whether a file moved from one format to another (or to a newer version of the same format) has retained all of its characteristics and functionality. Are Some File Formats Less Vulnerable to Obsolescence than Others?Since all software is subject to obsolescence, all file formats used by that software are also vulnerable. On the surface, it may seem that the files used by software that is more stable (i.e., not undergoing a lot of change) would be less subject to obsolescence, and that is true in the short term. But software that stands still inevitably also becomes obsolete, because it fails to adapt to the changing computing environment (e.g., CPU architectures, operating systems, encoding schemes, data transfer protocols) that it must operate in. So users must be watchful of files that either rapidly evolve or stagnant, since both are prone to obsolescence. To decode an old file format, the format specification must be available. Therefore, the degree of control the creator of a format specification exerts over its publication has a significant impact on the format's vulnerability to obsolescence. Specifications tend to fall into one of three categories. —Proprietary and closed specifications represent some of the most enduring and successful software in use. However, these also tend to evolve quickly and exist in many different versions for different platforms, with only limited backward compatibility provided. In fact, there is substantial commercial incentive to avoid good backward compatibility, since
the need to share files ultimately forces all users, including those
who'd prefer to keep using older versions, to upgrade to newer versions.
Commercial vendors must regularly release new versions of their software
with added features and functionality in order to entice users to
upgrade and provide a continued revenue stream.
Unfortunately, experience has shown that even very old specifications for versions of commercial file formats long ago pulled from the marketplace may never be released. Also, as one might expect, proprietary and closed file formats are interpreted with the highest accuracy by the manufacturer's own software. Therefore, such formats are the most vulnerable to obsolescence since they face the dual risk of rapid specification change and being tied to a single product or company. Furthermore, today's wildly successful software can be tomorrow's also-ran or distant memory. There has been tremendous consolidation in the commercial software industry and many products have disappeared following mergers and acquisitions. Others have succumbed to competition from superior or more cleverly marketed products. —Some proprietary formats have a lessened risk because the specification has been publicly released, allowing other companies (and non-commercial entities) to produce software that can read them. However, commercial entities can and sometimes do change their minds about leaving specifications open. For example, the DjVu image format was an open specification for a while before its owner decided to make changes and not release them to the public.
Most proprietary but open specifications are still vulnerable to the whims of market forces. In addition to being subject to arbitrary withdrawal, they can be abandoned for commercial reasons.
—In
terms of guaranteed long-term availability, published specifications
produced by international standards bodies are the safest. Generally,
representatives from many different constituencies are involved in
creating the standard, helping to ensure that it balances the needs
of a wide variety of users and that it isn't beholden to any particular
commercial interest. Broad participation also helps provide incentive
for wide support once the standard is completed. Backward compatibility
with older, related standards is usually a priority and there are
no commercial pressures for rapid obsolescence. On the other hand, not all standard formats should be assumed to be best choices. Standards must become widely adopted by both user and developer communities to be bestowed with reduced vulnerability from obsolescence, and yet, that doesn't always happen.
Choosing File Formats for Reduced Vulnerability to Obsolescence
Not all formats, especially those that are obsolete, can be migrated to newer, less risky formats without some loss of fidelity. If the original software is unavailable, it may be impossible to determine the degree of loss. Resources for assessing the potential for migration are starting to appear. The PRONOM database can be helpful in determining whether a migration path exists for an old file format using a newer version or a specialized conversion tool. However, it does not yet provide much detail about invariance, i.e., the degree to which the migrated file will reproduce the appearance and functionality of the original. “Risk Management of Digital Information: A File Format Investigation” by Lawrence, et al, is a study of the impact of migration on file integrity and can provide some guidance in assessing the migration process. The INFORM Methodology is an approach for measuring the preservation durability of digital formats. Only by careful examination of what went in vs. what came out can an assessment of risk and loss be made. This proactive and aware form of risk management is likely to be less risky than a passive “see what happens” approach that could lead to catastrophic loss. In the absence of a software migration path, it may be possible in situations where the original software is available but no longer runs on modern hardware, to retrieve an old file using emulation. Emulators run on modern hardware, but mimic an obsolete software environment, allowing old software to run. The file can at least be viewed, and may be converted to an interchange format from which it can be migrated forward.
|
|||||||||||||||||||||||||||