Personal tools
Skip to content. | Skip to navigation
A parser generator library based on OMeta, and other useful parsing tools. Parsley is a parsing library for people who find parsers scary or annoying. I wrote it because I wanted to parse a programming language, and tools like PLY or ANTLR or Bison were very hard to understand and integrate into my Python code. Most parser generators are based on LL or LR parsing algorithms that compile to big state machine tables. It was like I had to wake up a different section of my brain to understand or work on grammar rules. Parsley, like pyparsing and ZestyParser, uses the PEG algorithm, so each expression in the grammar rules works like a Python expression. In particular, alternatives are evaluated in order, unlike table-driven parsers such as yacc, bison or PLY. Parsley is an implementation of OMeta, an object-oriented pattern-matching language developed by Alessandro Warth at thesis, which provides a detailed description of OMeta: http://www.vpri.org/pdf/tr2008003_experimenting.pdf
pygtrie is a Python library implementing a trie data structure.Trie data structure < also known
Python bindings for the libsolv library. Python 3 version.
Goals: - Convert UNIX timestamps to and from RFC3339. - Either produce RFC3339 strings with a UTC offset (Z) or with the offset that the C time module reports is the local timezone offset. - Simple with minimal dependencies/libraries. - Avoid timezones as much as possible. - Be very strict and follow RFC3339.
This package provides an drop-in alternative to subprocess.run that captures the output while still printing it in real time, just the way tee does.
url-normalize [
Exact nearest neighbor search (k-nearest-neighbor or KNN) is prohibitively expensive at higher dimensions, because approaches to segment the search space that work in 2D or 3D like quadtree or k-d tree devolve to linear scans at higher dimensions. This is one aspect of what is called “the curse of dimensionality.” With larger datasets, it is almost always more useful to get an approximate answer in logarithmic time, than the exact answer in linear time. This is abbreviated as ANN (approximate nearest neighbor) search. There are two broad categories of ANN index: Partition-based indexes, like LSH or IVF or SCANN Graph indexes, like HNSW or DiskANN Graph-based indexes tend to be simpler to implement and faster, but more importantly they can be constructed and updated incrementally. This makes them a much better fit for a general-purpose index than partitioning approaches that only work on static datasets that are completely specified up front. That is why all the major commercial vector indexes use graph approaches. JVector is a graph index in the DiskANN family tree.