Personal tools
Skip to content. | Skip to navigation
Hachoir project Hachoir is a Python library used to represent of a binary file as a tree of Python objects. Each object has a type, a value, an address, etc. The goal is to be able to know the meaning of each bit in a file. Why using slow Python code instead of fast hardcoded C code? Hachoir has many interesting features: * Autofix: Hachoir is able to open invalid / truncated files * Lazy: Open a file is very fast since no information is read from file, data are read and/or computed when the user ask for it * Types: Hachoir has many predefined field types (integer, bit, string, etc.) and supports string with charset (ISO-8859-1, UTF-8, UTF-16, ...) * Addresses and sizes are stored in bit, so flags are stored as classic fields * Endian: You have to set endian once, and then number are converted in the right endian * Editor: Using Hachoir representation of data, you can edit, insert, remove data and then save in a new file.
hachoir-metadata extracts metadata from multimedia files: music, picture, video, but also archives. It supports most common file formats: * Archives: bzip2, gzip, zip, tar * Audio: MPEG audio ("MP3"), WAV, Sun/NeXT audio, Ogg/Vorbis (OGG), MIDI, AIFF, AIFC, Real audio (RA) * Image: BMP, CUR, EMF, ICO, GIF, JPEG, PCX, PNG, TGA, TIFF, WMF, XCF * Misc: Torrent * Program: EXE * Video: ASF format (WMV video), AVI, Matroska (MKV), Quicktime (MOV), Ogg/Theora, Real media (RM) It tries to give as much information as possible. For some file formats, it gives more information than libextractor for example, such as the RIFF parser, which can extract creation date, software used to generate the file, etc. But hachoir-metadata cannot guess informations. The most complex operation is just to compute duration of a music using frame size and file size. hachoir-metadata has three modes: * classic mode: extract metadata, you can use --level=LEVEL to limit quantity of information to display (and not to extract) * --type: show on one line the file format and most important informations * --mime: just display file MIME type The command 'hachoir-metadata --mime' works like 'file --mime', and 'hachoir-metadata --type' like 'file'. But today file command supports more file formats then hachoir-metadata.
Desktop ui for hachoir-metadata command line
README updated. s a package of most common file format parsers written for Hachoir framework. Not all parsers are complete, some are very good and other are poor: only parser first level of the tree for example. A perfect parser have no "raw" field: with a perfect parser you are able to know each bit meaning. Some good (but not perfect ;-)) parsers: * Matroska video * Microsoft RIFF (AVI video, WAV audio, CDA file) * PNG picture * TAR and ZIP archive What's new in hachoir-parser 1.2.1? * Improve OLE2 and MS Office parsers: - support small blocks - fix the charset of the summary properties - summary property integers are unsigned - use TimedeltaWin64 for the TotalEditingTime field - create minimum Word document parser * Python parser: support magic numbers of Python 3000 with the keyword only arguments * Create Apple/NeXT Binary Property List (BPLIST) parser * MPEG audio: reject file with no valid frame nor ID3 header * Skip subfiles in JPEG files * Create Apple/NeXT Binary Property List (BPLIST) parser by Robert Xiao
A python based HTML parser/tokenizer based on the WHATWG HTML5 specification for maximum compatibility with major desktop web browsers.
This package contains several handy python methods to cleanup HTML markup or perform other common changes. The cleanup is strict enough to be able to clean HTML pasted from MS Word or Apple Pages. This package also contains integration code for z3c.form to provide fields that automatically sanitize HTML on save. The implementation is based on the Cleaner class from lxml.
i18ndude performs various tasks related to ZPT's, Python Scripts and i18n.
iCalendar specification (RFC 2445) defines calendaring format used by many applications (Zimbra, Thunderbird and others). This module is a parser/generator of iCalendar files for use with Python. It follows the RFC 2445 (iCalendar) specification. The aim is to make a package that is fully compliant with RFC 2445, well designed, simple to use and well documented.
API to access and modify XML files in the IMS Vocabulary Definition Exchange format: The IMS Vocabulary Definition Exchange (VDEX) specification defines a grammar for the exchange of value lists of various classes: collections often denoted "vocabulary". Specifically, VDEX defines a grammar for the exchange of simple machine-readable lists of values, or terms, together with information that may aid a human being in understanding the meaning or applicability of the various terms. VDEX may be used to express valid data for use in instances of IEEE LOM, IMS Metadata, IMS Learner Information Package and ADL SCORM, etc, for example. In these cases, the terms are often not human language words or phrases but more abstract tokens. VDEX can also express strictly hierarchical schemes in a compact manner while allowing for more loose networks of relationship to be expressed if required [CITVDEXSITE]_. [CITVDEXSITE] citation from IMS Global, the VDEX-specification-page. This module takes the VDEX-XML objects and offers an API to them. VDEX Version 1 Final Specification is supported, except VDEX references.
Convenience uid/gid helper function used in Zope2.