Friday, February 11, 2011

Signature files for document retrieval

Hi all, I was wondering if you know somewhere where I can find information on how to build a signature file for docuement retrieval.
Do you know if there is some code out there that I can use or look at?
I have to create a signature file in C++ under linux platform.

UPDATE: Sorry, I appreciatte the help but I was refering to signature files not as a way to validate documents but as a way of indexing documents.


http://en.wikipedia.org/wiki/Signature_files


Any help will be greatly appreciated.

Thanks,

  • md5sum might be what you are looking for. Source code for generating md5 signatures is available if you Google around.

    From Wikipedia:

    Because almost any change to a file will cause its MD5 hash to also change, the MD5 hash is commonly used to verify the integrity of files (i.e., to verify that a file has not changed as a result of file transfer, disk error, meddling, etc.). The md5sum program is installed by default in most Unix, Linux, and Unix-like operating systems or compatibility layers. BSD variants (including Mac OS X) have a similar utility called md5. Versions for Microsoft Windows do exist.

  • Similarly to Adam's suggestion, if you're working on a very large amount of documents, it might be a good idea to check out SHA1 and sha1sum. Less collisions, and a bit more advanced encryption.

  • Firstly, lets clarify some terminology.

    A Digital Signature is intended to be equivilent to a handwritten signature (see http://en.wikipedia.org/wiki/Digital_signature for a better description and overview).

    When a digital signature is applied to a document you get a higher level of assurance of the authenticity of the document (you have a better idea if the document was forged or not).

    The answers from Adam and Robert both refer to methods for verifying document integrity (that the document is unchanged). While a digital signature also provides this, a checksum (hash) does not provide authenticity.

    So it's important that we establish the needs of your "Signature file". I will assume that you are talking about Digital Signatures, rather than checksums as the other answers address checksums.

    You will want to compose a PKCS#7 detached signature (jargon - a standard format signature that does not contain the data, so it can be stored seperately). To acheive this I recommend you use a standard library such as OpenSSL (which is portable).

  • You might look at Semantic Hacker or Yahoo Term Extraction.

0 comments:

Post a Comment