Project to digitise 1960s moon images turns to OCR
More than 92,000 images of the moon’s surface taken in the 1960s by Nasa’s Surveyor missions have been digitised in a project conducted by the University of Arizona's Lunar and Planetary Laboratory (LPL).
The film images and data from the Surveyor moon landers have been in storage for the last 50 years; there were five successful Surveyor missions between 1966 and 1968. Now the images have been scanned and digitised in an automated system using optical character recognition (OCR) software.
The goal was to create an archive for inclusion in the Nasa Planetary Data System (PDS), a collection of data products from Nasa planetary missions. The scientist wanted to produce a searchable archive that will outlast conventional physical media repositories.
The images were originally captured by focusing a 70mm film camera at a precision CRT display monitor and photographed onto special recording film.
Typical film image from Surveyor mission, with a CRT display (left) and associated data fields (right). Credit: University of Arizona’s LPL
The digitisation project began in February 2015 with the assembly of a Stokes scanning system, and it continues to process, catalogue, and data-mine the information contained within the images. Many frames contain legible text, but printed as a dot matrix using a 7 x 9 teletype-style character, making it a challenge to find OCR software capable of reading the text fields.
In addition, even though there are sprocket perforations on the film stock, the original recording transport was sprocket-less, resulting in inconsistent frame spacing as well as frames drifting with respect to the edge perforations. The team at LPL were unable to determine a consistent film advance and, with each new roll of film, the spacing of the frames and lateral positioning of the image shifted. This resulted in overall images with text in different places, as well as some images tainted with artefacts. Moreover, the data fields have human readable text with varying number of characters.
The team at the University of Arizona's LPL used an OCR solution from Matrox Imaging, which was able to read the dot matrix characters, and reduced the time expenditure to a few minutes per roll.
Lorne Trottier, co-owner of Matrox, saw an article in Planetary Report about the Nasa PDS project and offered the LPL scientists assistance in the form of Matrox’s OCR software to read LPL’s text information.
The initial review of the Matrox OCR solution showed an excellent read rate from nearly 4,500 different image files. For example, for roll 1 of Mission 5, the Matrox OCR solution scanned 846 files, reading 15,191 individual fields at 99.77 per cent accuracy. Rolls 2 and 9 of Mission 5 yielded 99.92 per cent and 100 per cent accuracy rates respectively.
To date, the Matrox software has helped tackle data from Surveyor 5, and will prove a valuable tool during the catalogue and error check of data from Surveyor 6 and 7, along with other mission materials from Nasa projects and explorations.
John Anderson, senior media technician at LPL, commented: ‘Compared with accuracy rates of 75 per cent to 85 per cent achieved with the original approach, there is no doubt as to the better result. Our project has been greatly enhanced and the progress of reading and cataloguing the data with high accuracy would not have been possible without the gracious assistance of the Matrox team.’