Personal tools
Skip to content. | Skip to navigation
ZODB development header files
The ZODB browser allows you to inspect persistent objects stored in the ZODB, view their attributes and historical changes made to them. Command-line options Run bin/zodbbrowser --help to see a full and up-to-date list of command-line options: Usage: zodbbrowser [options] [FILENAME | --zeo ADDRESS] Open a ZODB database and start a web-based browser app. Options: -h, --help show this help message and exit --zeo=ADDRESS connect to ZEO server instead --listen=ADDRESS specify port (or host:port) to listen on --rw open the database read-write (allows creation of the standard Zope local utilities if missing) Help! Broken objects everywhere If you don't want to see <persistent broken ...> everywhere, make sure your application objects are importable from the Python path. The easiest way of doing that is adding zodbbrowser to your application's buildout (or virtualenv, if you use virtualenvs). This way your application (or Zope's) nice __repr__ will also be used. Online help There's a little 'help' link in the bottom-right corner of every page that describes the user interface in greater detail. Usage as a plugin Add zodbbrowser to the list of eggs (e.g. in buildout.cfg of your app) and add this to your site.zcml: <include package="zodbbrowser" /> Rerun bin/buildout, restart Zope and append @@zodbbrowser to the end of the URL to start browsing, e.g. http://localhost:8080/@@zodbbrowser. Or, if you still use ZMI (the Zope Management Interface), look for a new menu item titled "ZODB Browser".
Allows Python code to live in the ZODB
Document Overview This document seeks to capture technical information about persistent modules to guide and document their design. Goals These goals largely come from Zope 3. It would be worth while considering other applications. Persistent modules are used to support management of software using the ZODB. Software can be updated using network clients, such as web browsers and file-synchonozation tools. Application-server clusters can be updated transactionally without requiring server restarts. Persistent modules leverage a familiar model, modules, for managing Python software. Persistent modules can be synchronized to a file-system using the Zope file-system synchronization framework. Persistent modules are synchronized for purposes including: o Use of traditional tools such as editors and code-analysis tools o Revision control Ideally, the file-system representation would consist of a Python source file. Use cases Create classes and functions that implement Zope 3 components. o Utility, Adapter, View, and service classes and factories. o Content components, which are typically persistent and/or pickleable. Define interfaces, including schema Import classes, functions, and interfaces from other modules. Import classes, functions, and interfaces from other persistent objects. For example, an adapter registration object might have a direct reference to a persistent-module-defined class. Change module source Changes are reflected in module state Changes are reflected in objects imported into other modules. Synchronize modules with a file-system representation. Edge cases ??? Fundamental dilema Python modules were not designed to change at run time. The source of a Python module normally doesn't change while a Python program is running. There is a crude reload tool that allows modules to be manually reloaded to handle source changes. Python modules contain mutable state. A module has a dictionary that may be mutated by application code. It may contain mutable data that is modified at run time. This is typeically used to implement global registries. When a module is reloaded, it is reexecuted with a dictionary that includes the results of the previous execution. Programs using the ZODB may be said to have logical lifetimes that exceed the lifetimes of individual processes. In addition, the program might exist as multiple individual processes with overlapping run-times. The lifetime of a persistent program is long enough that it is likely that module source code will change during the life time of the program. Issues Should the state of a module be represented soley by the module source? Consider the possibilities: Module state is represented soley by it's source. This would be a departure from the behavior of standard Python modules. Standard Python modules retain a module dictionary that is not overwritten by reloads. Python modules may be mutated from outside and may contain mutable data structures that are modified at run time. OTOH, a regular module's state is not persistent or shared accross processes. For standard Python modules, one could view the module source as an expression of the initial state of the module. (This isn't quite right either, since some modules are written in such a way that they anticipate module reloads.) Deleting variables from a module's source that have been imported by other modules or objects will cause the imported values to become disconnected from the module's source. Even if the variables are added back later, the previously-imported values will be disconnected. It is tempting to introduce a data structure to record imports make from a module. For example, suppose module M1 imports X from M2. It's tempting to record that fact in M2, so that we disallow M2 to be removed or to be changed in such a way that M2 no-longer defines X. Unfortunately, that would introduce state that isn't captured by my M2's source. Persistent modules could only be used for software. You wouldn't be able to use them to store mutable data, such as registries or counters, that are updated outside of the execution of the module source. Module state isn't represented soley by it's source. It would become possible to allow mutable data, such as registries in persistent modules. It could be very difficult to see what a module's state is. If a module contained mutable data, you'd need some way to get to that data so you could inspect and manipulate it. When a module is synchronized to the file system, you'd need to syncronize it's source and you'd also need to synchronize it's contents in some way. Synchronization of the contents could be done using an XML pickle, but management of the data using file-system-based tools would be cumbersome. You'd end up with data duplicated between the two representations. It would be cumbersome to manage the duplicated data in a consistent way. Module state is represented soley by it's source, but allow additional meta data. This is the same as option A, except we support meta-data management. The meta data could include dependency information. We'd keep track of external usage (import) of module variables to influence whether deletion of the module or defined variables is allowed, or whether to issue warnings when variables are deleted. Note that the management of the meta data need not be the responsibility of the module. This could be done via some application-defined facility, in which case, the module facility would need to provide an api for implimenting hooks for managing this information. Special cases This section contains examples that may introduce challenges for persistent modules or that might motivate or highlight issues described above, Persistent classes Persistent classes include data that are not represented by the class sources. A class caches slot definitions inherited from base classes. This is information that is only indirectly represented by it's source. Similarly, a class manages a collection of it's subclasses. This allows a class to invalidate cached slots in subclasses when a new slot definition is assigned (via a setattr). The cached slots and collection of subclasses is not part of a persistent class' state. It isn't saved in the database, but is recomputed when the class is loaded into memory or when it's subclasses are loaded into memory. Consider two persistent modules, M1, which defines class C1, and M2, which defines class C2. C2 subclasses C1. C1 defines a __getitem__ slot, which is inherited and cached by C2. Suppose we have a process, P1, which has M1 and M2 in memory. C2 in P1 has a (cached) __getitem__ slot filled with the definition inherited from C1 in P1. C1 in P1 has C2 in it's collection of subclasses. In P1, we modify M1, by editing and recompiling its source. When we recompile M1's source, we'll update the state of C1 by calling it's __setstate__ method, passing the new class dictionary. The __setstate__ method will, in turn, use setattr to assign the values from the new dictionary. If we set a slot attribute, the __setattribute__ method in C1 will notify each of it's subclasses that the slot has changed. Now, suppose that we've added a __len__ slot definition when we modified the source. When we set the __len__ attribute in C1, C2 will be notified that there is a new slot definition for __len__. Suppose we have a process P2, which also has M1 and M2 loaded into memory. As in P1, C2 in P2 caches the __getitem__ slot and C1 in P2 has C2 in P2 in it's collection of subclasses. Now, when M1 in P1 is modified and the corresponding transaction is committed, an invalidation for M1 and all of the persistent objects it defines, including C1, is sent to all other processes. When P2 gets the invalidation for C1, it invalidates C1. It happens that persistent classes are not allowed to be ghosts. When a persistent class is invalidated, it immediately reloads it's state, rather than converting itself into a ghost. When C2's state is reloaded in P2, we assign it's attributes from the new class dictionary. When we assign slots, we notify it's subclasses, including C2 in P2. Suppose we have a process P3, that only has M1 in memory. In P3, M2 is not in memory, nor are any of it's subobjects. In P3, C2 is not in the collection of subclasses of C1, because C2 is not in memory and the collection of subclasses is volatile data for C1. When we modify C1 in P1 and commit the transaction, the state of C1 in P3 will be updated, but the state of C2 is not affected in P3, because it's not in memory. Finally, consider a process, P4 that has M2, but not M1 in memory. M2 is not a ghost, so C2 is in memory. Now, since C2 is in memory, C1 must be in memory, even though M1 is not in memory, because C2 has a reference to C1. Further, C1 cannot be a ghost, because persistent classes are not allowed to be ghosts. When we commit the transation in P1 that updates M1, an invalidation for C1 is sent to P4 and C1 is updated. When C1 is updated, it's subclasses (in P4), including C2 are notified, so that their cached slot definitions are updated. When we modify M1, all copies in memory of C1 and C2 are updated properly, even though the data they cache is not cached persistently. This works, and only works, because persistent classes are never ghosts. If a class could be a ghost, then invalidating it would have not effect and non-ghost dependent classes would not be updated. Persistent interfaces Like classes, Zope interfaces cache certain information. An interface maintains a set of all of the interfaces that it extends. In addition, interfaces maintain a collection of all of their sub-interfaces. The collection of subinterfaces is used to notify sub=interfaces when an interface changes. (Interfaces are a special case of a more general class of objects, called "specifications", that include both interfaces and interface declareations. Similar caching is performed for other specifications and related data structures. To simplify the discussion, however, we'll limit ourselves to interfaces.) When designing persistent interfaces, we have alternative approaches to consider: We could take the same approach as that taken with persistent classes. We would not save cached data persistently. We would compute it as objects are moved into memory. To take this approach, we'd need to also make persistent interfaces non-ghostifiable. This is necessary to properly propigate object changes. One could argue that non-ghostifiability if classes is a necessary wart forced on us by details of Python classes that are beyond our control, and that we should avoid creating new kinds of objects that require non-ghostifiability. We could store the cached data persistently. For example, we could store the set of extended interfaces and the set of subinterfaces in persistent dictionaries. A significant disadvantage of this approach is that persistent interfaces would accumulate state is that not refelcted in their source code, however, it's worth noting that, while the dependency and cache data cannot be derived from a single module source, it can be derived from the sources of all of the modules in the system. We can implement persistent interface in such a way that execution of module code causes all dependcies among module-defined interfaces to be recomputed correctly. (This is, to me, Jim, an interesting case: state that can be computed during deserialization from other serialized state. This should not be surprising, as we are essentially talking about cached data used for optimization purposes.) Proposals A module's state must be reprersented, directly or indirectly, by it's source. The state may also include information, such as caching data, that is derivable from it's source-represented state. It is unclear if or how we will enforce this. Perhaps it will be just a guideline. The module-synchronization adapters used in Zope will only synchronize the module source. If a module defines state that is not represented by or derivable from it's source, then that data will be lost in synchronization. Of course, applications that don't use the synchronization framework would be unaffected by this limitation. Alternatively, one could develop custom module-synchronization adapters that handled extra module data, however, development of such adapters will be outside the scope of the Zope project. Notes When we invalidate a persistent class, we need to delete all of the attributes defined by it's old dictionary that are not defined by the new class dictionary.
This application measures and compares the performance of various ZODB storages and configurations. It is derived from the RelStorage speedtest script, but this version allows arbitrary storage types and configurations, provides more measurements, and produces numbers that are easier to interpret. Running zodbshootout The zodbshootout script accepts the name of a database configuration file. The configuration file contains a list of databases to test, in ZConfig format. The script deletes all data from each of the databases, then writes and reads the databases while taking measurements. Finally, the script produces a tabular summary of objects written or read per second in each configuration. zodbshootout uses the names of the databases defined in the configuration file as the table column names. Warning: Again, zodbshootout deletes all data from all databases specified in the configuration file. Do not configure it to open production databases! The zodbshootout script accepts the following options. * -n (--object-counts) specifies how many persistent objects to write or read per transaction. The default is 1000. An interesting value to use is 1, causing the test to primarily measure the speed of opening connections and committing transactions. * -c (--concurrency) specifies how many tests to run in parallel. The default is 2. Each of the concurrent tests runs in a separate process to prevent contention over the CPython global interpreter lock. In single-host configurations, the performance measurements should increase with the concurrency level, up to the number of CPU cores in the computer. In more complex configurations, performance will be limited by other factors such as network latency. * -p (--profile) enables the Python profiler while running the tests and outputs a profile for each test in the specified directory. Note that the profiler typically reduces the database speed by a lot. This option is intended to help developers isolate performance bottlenecks. You should write a configuration file that models your intended database and network configuration. Running zodbshootout may reveal configuration optimizations that would significantly increase your application's performance. Interpreting the Results The table below shows typical output of running zodbshootout with etc/sample.conf on a dual core, 2.1 GHz laptop: "Transaction", postgresql, mysql, mysql_mc, zeo_fs "Add 1000 Objects", 6529, 10027, 9248, 5212 "Update 1000 Objects", 6754, 9012, 8064, 4393 "Read 1000 Warm Objects", 4969, 6147, 21683, 1960 "Read 1000 Cold Objects", 5041, 10554, 5095, 1920 "Read 1000 Hot Objects", 38132, 37286, 37826, 37723 "Read 1000 Steamin' Objects", 4591465, 4366792, 3339414, 4534382 zodbshootout runs six kinds of tests for each database. For each test, zodbshootout instructs all processes to perform similar transactions concurrently, computes the average duration of the concurrent transactions, takes the fastest timing of three test runs, and derives how many objects per second the database is capable of writing or reading under the given conditions. zodbshootout runs these tests: * Add objects zodbshootout begins a transaction, adds the specified number of persistent objects to a PersistentMapping, and commits the transaction. In the sample output above, MySQL was able to add 10027 objects per second to the database, almost twice as fast as ZEO, which was limited to 5212 objects per second. Also, with memcached support enabled, MySQL write performance took a small hit due to the time spent storing objects in memcached. * Update objects In the same process, without clearing any caches, zodbshootout makes a simple change to each of the objects just added and commits the transaction. The sample output above shows that MySQL and ZEO typically take a little longer to update objects than to add new objects, while PostgreSQL is faster at updating objects in this case. The sample tests only history-preserving databases; you may see different results with history-free databases. * Read warm objects In a different process, without clearing any caches, zodbshootout reads all of the objects just added. This test favors databases that use either a persistent cache or a cache shared by multiple processes (such as memcached). In the sample output above, this test with MySQL and memcached runs more than ten times faster than ZEO without a persistent cache. (See fs-sample.conf for a test configuration that includes a ZEO persistent cache.) * Read cold objects In the same process as was used for reading warm objects, zodbshootout clears all ZODB caches (the pickle cache, the ZEO cache, and/or memcached) then reads all of the objects written by the update test. This test favors databases that read objects quickly, independently of caching. The sample output above shows that cold read time is currently a significant ZEO weakness. * Read hot objects In the same process as was used for reading cold objects, zodbshootout clears the in-memory ZODB caches (the pickle cache), but leaves the other caches intact, then reads all of the objects written by the update test. This test favors databases that have a process-specific cache. In the sample output above, all of the databases have that type of cache. * Read steamin' objects In the same process as was used for reading hot objects, zodbshootout once again reads all of the objects written by the update test. This test favors databases that take advantage of the ZODB pickle cache. As can be seen from the sample output above, accessing an object from the ZODB pickle cache is around 100 times faster than any operation that requires network access or unpickling.
Update ZODB class references for moved or renamed classes.