Personal tools
Skip to content. | Skip to navigation
As undesireable as it might be, more often than not there is extremely useful information embedded in Word documents, PowerPoint presentations, PDFs, etc---so-called "dark data"---that would be valuable for further textual analysis and visualization. While :ref:`several packages <supporting>` exist for extracting content from each of these formats on their own, this package provides a single interface for extracting content from any type of file, without any irrelevant markup. Currently supporting textract supports a growing list of file types for text extraction. If you don't see your favorite file type here, Please recommend other file types by either mentioning them on the issue tracker or by :ref:`contributing a pull request <contributing>`. .csv via python builtins .doc via antiword .docx via python-docx .eml via python builtins .epub via ebooklib .gif via tesseract-ocr .jpg and .jpeg via tesseract-ocr .json via python builtins .html and .htm via beautifulsoup4 .mp3 via SpeechRecognition and sox .msg via msg-extractor .odt via python builtins .ogg via SpeechRecognition and sox .pdf via pdftotext (default) or pdfminer .png via tesseract-ocr .pptx via python-pptx .ps via ps2text .rtf via unrtf .tiff via tesseract-ocr .txt via python builtins .wav via SpeechRecognition .xlsx via xlrd .xls via xlrd
Tornado is an open source version of the scalable, non-blocking web server and tools. The framework is distinct from most mainstream web server frameworks (and certainly most Python frameworks) because it is non-blocking and reasonably fast. Because it is non-blocking and uses epoll, it can handle thousands of simultaneous standing connections, which means it is ideal for real-time web services.
Tornado is an open source version of the scalable, non-blocking web server and and tools. This package contains some example applications.
This package contains a generic transaction implementation for Python. It is mainly used by the ZODB, though.
A library used by various `Repoze <http://repoze.org>`_ packages for internationalization (i18n) duties related to translation. This package provides a *translation string* class, a *translation string factory* class, translation and pluralization primitives, and a utility that helps `Chameleon <http://chameleon.repoze.org>`_ templates use translation facilities of this package. It does not depend on `Babel <http://babel.edgewall.org>`_, but its translation and pluralization services are meant to work best when provided with an instance of the ``babel.support.Translations`` class.
This library provides a pure python interface for the Twitter API. Twitter (http://twitter.com) provides a service that allows people to connect via the web, IM, and SMS. Twitter exposes a web services API (http://twitter.com/help/api) and this library is intended to make it even easier for python programmers to use.
Python port of Browserscope's user agent parser
This is a python port of Text::Unidecode Perl module. It provides a function, 'unidecode(...)' that takes Unicode data and tries to represent it in ASCII characters.
unimr.memcachedlock implements a distributed "soft" locking mechanism using memcached. It provides factory functions and decorators for a primitive locking, a reentrant locking and a special locking for zeo-clients. The native locking methods of python's threading module supports thread safe locking and therefore, provides only full locking support for single zope installations. However, zeo-clients have no locking mechanism beetween each other for concurrent write operations on identical objects (e.g. Catalog) and are unnecessarily stressed to resolve ConflictErrors on heavy load. The reason for this problem is the optimistic concurrency control of the ZODB which primarly prepares the changes of an object (in many cases expensive calculations) and thereafter checks the validity of the object for the write process. The higher the number of writes on the same object the higher the risk that a concurrent zeo-client has already invalidated the object while another zeo-client has still this object in use. The client with the invalidated object is constrained to roll back its changes and to recalculate the changes based on the refreshed object. At worst, this state goes in circles and results in a ConflictError. The optimistic concurrency control therefore perfectly fits only concurrent write operations on distinct objects. Memcache locking overcomes this problem because it extends the regular concurrency mechanism by a shared locking beetween all involved zeo-clients by serializing the concurrent write operations before a ConflictError is provoked. This mechanism is also known as pessimistic concurrency control.
unittest2 is a backport of the new features added to the unittest testing framework in Python 2.7 and onwards. It is tested to run on Python 2.6, 2.7, 3.2, 3.3, 3.4 and pypy.