Another module that's packaged with the stdlib that's immensely useful is itertools. I especially find takewhile, cycle, and chain to be incredibly useful building blocks for list-related functions. I highly recommend a quick read.
EDIT: functools is also great! Fantastic module for higher-order functions on callable objects.
import sqlite3
EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')
conn = sqlite3.connect('/companydata')
cursor = conn.cursor()
cursor.execute('SELECT name, age, title, department, paygrade FROM employees')
for emp in map(EmployeeRecord._make, cursor.fetchall()):
print(emp.name, emp.title)
You could of course accomplish the same with a dictionary comprehension, but I find this to be less noisy. Also, they have `_asdict()` should you want to have the contents as a dict.[0]: https://docs.python.org/3/library/collections.html#collectio...
import sqlite3
from dataclasses import asdict, dataclass
@dataclass
class EmployeeRecord:
name: str
age: int
title: str
department: str
paygrade: str
conn = sqlite3.connect("/companydata")
cursor = conn.cursor()
cursor.execute("SELECT name, age, title, department, paygrade FROM employees")
for emp in (EmployeeRecord(*row) for row in cursor.fetchall()):
print(emp.name, emp.title)
print(asdict(emp))
This is valid(as in it will run, but highly unidiomatic) code:
quicksort = lambda arr: [pivot:=arr[0], left:= [x for x in arr[1:] if x < pivot], right := [x for x in arr[1:] if x >= pivot], quicksort(left) + [pivot] + quicksort(right)][-1] if len(arr) > 1 else arr
print(quicksort([1, 33, -4, -2, 110, 5, 88]))
def scan(items, f, initial):
x = initial
return (x := f(x, y) for y in items)
There are lots of other short ways to write `scan`, but I don't think any of them map so clearly to a naive definition of what it's supposed to do. levenshtein_distance = lambda s1, s2: [matrix := [[0] * (len(s2) + 1) for _ in
range(len(s1) + 1)], [
[
(matrix[i].__setitem__(j, min(matrix[i-1][j] + 1, matrix[i][j-1] +
1, matrix[i-1][j-1] + (0 if s1[i-1] == s2[j-1] else 1))), matrix[i][-1])[1]
for j in range(1, len(s2) + 1)
]
for i in range(1, len(s1) + 1)
], matrix[-1][-1]][-1]
for name, age, title in cursor.fetchall():
print(name, age, title)
Ofcourse you have to come up with different variable names, but it still seems more elegant to just unpack. def namedtuple_factory(cursor, row):
fields = [column[0] for column in cursor.description]
cls = namedtuple("Row", fields)
return cls._make(row)
to the fetchall(), to automatically keep the names in sync with those in the SQL query string.These days I use attrs and cattrs, and I’m much happier. Everything feels a lot more straightforward.
attrs is what Python’s dataclasses were based on, but they kept on improving it, so attrs just feels like standard Python with a little bit extra.
Just as recent as today I went to Kotlin to process something semicomplex even though we're a python shop, just because I wanted to bash my head in after a few attempts in python. A DS could probably solve it minutes with pandas or something, but again stringly typed and lots of guesswork.
(It was actually a friendly algorithmic competition at work, I won, and even found a bug in the organizer's code that went undetected exactly because of this)
The docs are here [0].
Some simple motivating applications:
- Look up names in Python locals before globals before built-in functions: `pylookup = ChainMap(locals(), globals(), vars(builtins))`
- Get config variables from various sources in priority order: `var_map = ChainMap(command_line_args, os.environ, defaults)`
- Simulate layered filesystems
- etc
[0] https://docs.python.org/3/library/collections.html#collectio...
And if you’re using a FIFO cache, threading a regular dict through a separate fifo (whether linked list or deque) is more efficient in my experience of implementing both S3 and Sieve.
the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec.
[0] https://docs.python.org/3.7/whatsnew/3.7.htmlI would argue that OrderedDict have more chances to be depreciated than dict becoming unordered again, since there is now little value to keep OrderedDict around now (and the methods currently specific to UnorderedDict could be added to dict).
But seriously: It’s no longer an implementation detail that dictionaries are ordered in Python. It’s a specification of how Python works.
Makes you think what other parts of Python have become obsolete.
He thought that one cycle of "no ordering assumptions" would give a smoother transition. All 3.6 implementations would have dict ordering, but it was safer to not have people rely on it right away.
jython never released a P3 version so is irrelevant, ironpython has yet to progress beyond 3.4 so is also irrelevant.
It's also been frustrating with the lack of tooling support. I mean, I get it – it's hideously EOL'd – but I can't use Poetry, uv, pytest... at least it still has type hints.
# a[star_name][instrument] = set of (seed, planet index) of visited planets
a = defaultdict(lambda: defaultdict(set))
for row in rows:
a[row.star][row.inst].add((row.seed, row.planet))
This is a dict-of-dict-of-set that is accumulating from a stream of rows, and I don't know what stars and instruments will be present.Another related tool is Counter (https://docs.python.org/3/library/collections.html#collectio...)
Word to the wise... as of Python 3.7, the regular dictionary data structure guarantees order. Declaring an OrderedDict can still be worthwhile for readability (to let code reviewers/maintainers know that order is important) but I don't know of any other reason to use it anymore.
Another reason is I think that 3.7 behavior is just a C Python implementation detail, other interpreters may not honor it.
If Dict already guarantees to keep Order, nothing ist won by using both Dict and OrderedDict. Just use Dict.
Which means you still should use it if you might run on 3.6 or earlier.
And Python <=3.7 is already end-of-life anyways: https://devguide.python.org/versions/
https://docs.python.org/3/library/stdtypes.html#dict:~:text=....
https://mail.python.org/pipermail/python-dev/2017-December/1...
https://docs.python.org/3/library/array.html
https://github.com/python/cpython/blob/main/Modules/arraymod...
And you can use struct for heterogenous data =) It has a neat DSL for packing/unpacking the data, reminiscent of the "little languages" from classic book The Practice of Programming. Python is actually pretty nice working with binary data.
It really is! I’ve been working on a project to generate large amounts of synthetic data, and it calls out to C for various shared libraries to do the heavy lifting *. Instead of encoding and decoding back and forth, I can just ship bytes around, and then directly write them out to a file. Saves a lot of time.
*: yes, I should just rewrite it into a faster language entirely. I intend to, but for the time being it’s been “how fast can I make Python without anything but stdlib,” as long as you accept ctypes as being included in that definition.
I’m not positive on why lists are faster to create than lists, though. Retrieval makes sense (lists already store the Python object, arrays have to cast it back), but creation I’m unsure about. I’ll check dis.dis.
EDIT: from a sibling comment above [0], maybe because array reallocs are done much more granularly than lists, so as it grows, it’ll have to do so more frequently compared to lists?
[0]: https://github.com/python/cpython/blob/main/Modules/arraymod...
fractions.Fraction(numerator=1, denominator=3)
fractions.Fraction(1) / 3
ChainMap is maybe better described/used as inheritance for dicts, where something like settings = ChainMap(instance_settings, region_settings, global_settings)
would give you one object to look in.The stdlib is full of goodies.
Now I always appreciated the battery included logic in python. But I noticed this week that LLM diminish that need. It's so easy to prompt for small utilities and saves you from using entire libraries for a few tools.
And the AI can create doc and tests for them as quickly.
So while I was really enthusiastic things like pairwise() were added to itertools, it's not as revolutionary as before.
Pardon the hyperbole but it’s a bit like lauding an IDE for automatically generating thousands of Java class stubs.
They were super useful, but not included in the stdlib, despite being a few lines long.
We also had more-itertools, bolton, and others, to bridge that gap.
Now, there was always a tension between adding more stuff to the stdlib, or letting 3rd party libs handle it. Remember the saying: the stdlib is where projects go to die.
And of course tensions about installing full on 3rd party libs just for a few functions.
The result is that many people copy/pasted a lot of small utilities, and endless debates on python-ideas to include some more.
I think this is going to slow down. Now if you want "def first_true(iterable, default=False, predicate=None)", you ask chatgpt, and you don't care.
The cost of adding those into the project is negligeable.
It's nowhere near generating thousand of class stubs. It's actually the opposite: very targetted, specific code needs being filled instead of haunting python debates or your venv.
But to stimulate a bit your anxiety, I do think code gen is going also making a big comeback with LLM :)
In addition to the extra boilerplate and reduced readability, that also sounds like an easy way to introduce subtle bugs. Standard library functions have been exhaustively field tested, a similar looking LLM generated function could easily include a footgun.
It's not a fun process.
Writing the code is the easy part.
And installing more-itertools for one functions is a bit silly
https://docs.python.org/3/library/types.html#types.MappingPr...
python -m http.server
Pass -h for more options.Last year I ran http.server with -h to remind myself of something, and the --cgi flag caught my eye...funnily enough there's built in support in the web server for running CGI scripts. Alas, it's deprecated and will be removed in 3.13 later this year, but I when I discovered it I couldn't resist the opportunity to write a CGI script for the first time 20-something years: https://github.com/drien/python-httpserver-upload
Good example of the latter use case is the statistics module.
There is a price to pay though: its performance is 10x slower than numpy. So its mostly useful when the required calculation is not a bottleneck.
The benefit is you are good to go (batteries included) without any virtual environmemts, pip's etc.
Nice! When you need it, you need it. It's nice not to have to implement it oneself.
Also, topological sort is like five lines of code... so, it doesn't matter if the function is there.
You have to encode the file name!
file_url = 'file://' + urllib.parse.quote(os.path.realpath('test.html'))
That said, Go has those things so it's crept in a little bit into my quick programming, but I'll always love python.
export PYTHONPATH="package.zip"
python3 -m packagename
will work just fine.(PS. I document this technique in one of my python-template projects: https://git.sr.ht/~tpapastylianou/python-self-contained-runn...)
I suppose, if the intent is to package something in a manner that attempts to make it newbie-proof, then requiring a PYTHONPATH incantation before the python part might be one step too far ... but then again, one could argue the same about people not quite knowing what to do with a .pyz file and getting stuck.
A bootstrap boilerplate that allows the shiv to be able to run as an interpreter. I think zipapp can only be given code.interact as main.
Unpacking wheels into ~/.shiv, which might be faster. I can’t remember if this permits running compiled C, which is not possible from within a zipapp.
I don't think this is true. It allows you to specify and calculate parameters for normal distributions, what allows you to jury rig a naive bayes classifier, what is shown as a doc example. This is not the same as providing a built in classifier.
Nowadays the GNU `uniq` can sort and the `sort` can unique because there are performance benefits. Assume the same is true in Python so if worried about performance `groupby(sorted(...))` might not be the best.
The other thing that is a bit odd is it returns iterators. It's up to you to build concrete groups if that's what you need.
I assume the `namedtuple` syntax is more pleasing for Functional favorable programmers, but this makes me wonder if the stdlib should choose one of them?
* generate source code from a template string.
* eval generated code.
* call constructor.
This is woefully slow and wasteful compared to a sensible solution: writing it in C. But, nobody really cares.
Anyways. The reason I cared is because I was working on a Protobuf parser, where named tuple was supposed to play a key role: the message class. Imagine my disappointment when I started to run benchmarks.
My other favourite parts of the stdlib are functools and itertools. They are both full of stuff that gives you superpowers. I always find it a shame when I see developers do an ad hoc reimplantation of something in functools/itertools.
Everyone these days is using ast [1] but the might be room for dis instead in some cases.
import antigravity
and import braces
separately. from __future__ import braces
Also, just to make sure I understand the joke. It's basically just saying that they'll never add braces to the python syntax, right?
But I know what you meant, and yeah...they'll never use braces as block delimiters. IMO, that's a good thing. Whitespace-as-syntax means you're FORCED to have a minimal level of decent code formatting or it doesn't work.
That also makes it more funny. :)
import this
a['foo'] = 20
a['bar'] = 9
where you want to be able to do: a.foo = 20
a.bar = 9
Otherwise there are tons of tricks to get what you want. To add to the list posted in sibling:
a = vars(a) # readonly
print(a.foo)
or class Obj: pass;
a = Obj()
a.foo = 20
header = Header._make(unpack(f.read(64), header_format))
print(header)