Daniel's Musings: programming

Showing posts with label programming. Show all posts

Sunday, February 28, 2021

Learning Rust: Some Thoughts

Back in September I wrote a post about some utilities written in the Rust language, and mentioned I was toying with the idea of trying to learn a bit of it. With two weeks of vacation around Christmas, I decided to take the plunge and have been reading and working through the examples in the Rust Book off and on as I get a chance.

I was a bit tentative going in, but I find I'm really enjoying it. Prior to this, for reference, I taught myself Python a decade ago while working as an undergraduate research assistant and have dabbled a tiny bit in JavaScript and Lua, but that's the extent of my programming language coverage: I'm still essentially monolingual. I managed to get through my undergraduate time without ever taking any sort of computer science class, and it's left me a little self-conscious of my self-taught status when it comes to Python. Rust is a very different language to Python, and I was worried that perhaps I wouldn't be able to pick up a new language as easily as I could ten years ago if I've unconsciously generalized Python-specific quirks to programming languages as a whole.

However, contrary to my worries I've been finding it an interesting and fun experience. I suspect my experience in studying various far-flung human languages may be helping, as it may be helping me to generalize better between different programming languages. And Python and Rust are opposites in some rather key ways: Python is an interpreted language, which means that programs are (and need to be) compiled and interpreted by an interpreter program at run-time. Rust on the other hand is a compiled language, which means it needs to be compiled before running (but can be run afterwards without needing any external program). Python is a dynamically-typed language where variables can be created with ease as needed and converted to different types without oversight by the language. Rust is statically-typed, with a very strict enforcement of variable types all working at compile time. These are not minor differences, but almost diametrically opposed paradigms, like the difference between a case-based language and one which relies on word order.

One of the major unique features of Rust is its concept of ownership. In essence, this is a requirement (checked and enforced at compile time) that only one “part” of a program can change a variable's value at a given time. A value can be “borrowed” any number of times for use as long as it is not changed, and the system responsible for enforcing this is known as the borrow checker. Colloquially people joke about spending much of their time learning Rust “fighting the borrow checker,” as the concept of ownership is a novel one and wrapping one's head around it takes some time.

Thankfully, Rust has some really good error messages. I've managed to write a few small programs on my own so far, and the error message output usually contains both an explanation of what I've done wrong, and a suggestion for how to fix it. In fact it's generally gone so well so far that I'm a bit suspicious; when is the other shoe going to drop? I've had errors, sure, but I've been able to figure them out quickly and get what I want to happen (within my still-limited understanding of the language as a whole). Granted, I'm not exactly writing complicated programs, just simple ones to find prime numbers or convert temperatures between scales, but still; it's been a pretty pleasant learning experience so far. I've most certainly got a lot to learn left, but it's fun to be seriously learning a new programming language again with no stress about the outcome.

I don't know if this will ever be useful in a job down the line (although a number of large companies are starting to use Rust for its ability to remove even the possibility of whole classes of costly memory errors found in languages like C or C++ due to ownership), but even if not I'm sure the experience of learning a new—and very different—language will have benefits for my Python knowledge, in the same way learning other languages has helped me reason about English*. And you never know, it just might come in handy down the line somehow. A hui hou!

*Interestingly, the upcoming Python 3.10 is getting a “match” system for comparing multiple cases which is extremely similar to the one found in Rust, so it might prove to be more useful than I'd initially thought!

Saturday, February 29, 2020

Fun with Python Decorators on February 29

Happy leap year, everyone. Despite having been writing this blog for a decade now, it turns out I didn't think to write a post on February 29 either of the previous two leap years it's been around for. So I'm rectifying that oversight this year! Of course, there's a purpose to this post beyond merely taking the opportunity of having one up on February 29 (though that was a motivating factor). This week I wrote some rather interesting Python code, and thought I'd share it.

I was working on some code to generate synthetic data sets to test various line-fitting code, and wanted a way to add some random noise to the output, but potentially using different functions to generate the noise, which might take their own variable number of parameters. I turned to the concept of decorators in Python, which are, essentially, functions which operate on other functions. For a mathematical analogy, consider the following situation: \[f(x)=g(h(x)).\] Here we have a function, g, which operates on another function \(h(x)\), and we define this resulting function as \(f(x)\). In Python, the function g would be a decorator, as it takes another function and returns a modified form of it.

Of course, we can extend this idea further; what if, in addition to a function, g also takes additional parameters? Perhaps something like \[f(x)=g(h(x), a, b, c).\] It turns out we can do this in Python as well, though it's somewhat abstract and I don't fully understand it. I read some tutorials on the subject, I guessed at how to extend what they said into code which works, but I'd be lying if I said I truly understood it at this point. Though I'll still take a stab at explaining it. The process involves a triply-nested function to handle passing arbitrary functions and arguments to the decorator. I've embedded the entire decorator function below:

 def add_noise(noise_func, *noise_args, **noise_kwargs):  
   """Add noise from a given function to the output of the decorated function.  
   Parameters  
   ----------  
   noise_func : callable  
       A function to add noise to a data set. Should take as input a 1-D array  
       (and optionally additional parameters) and return the same.  
   args, kwargs  
       Additional arguments passed to this decorator will be passed on through  
       to `noise_func`.  
   Returns  
   -------  
   callable  
       A decorated function which adds noise from the given `noise_func` to  
       its own output.  
   """  
   def decorator_noise(func):  
       @functools.wraps(func)  
       def wrapper_noise(*args, **kwargs):  
           # Create the values from the function wrapped:  
           y = func(*args, **kwargs)  
           # Now generate noise to add to those values from the noise function  
           # provided to the decorator:  
           noise = noise_func(y, *noise_args, **noise_kwargs)  
           return y + noise  
       return wrapper_noise  
   return decorator_noise

The main action happens in the third function where the decorated function is used to create a set of data points, y, the given noise function is used to create a variable noise, and then y and noise are added together to give the output result. You can then use this decorator on a function at definition time like so (assuming you already have a function called gaussian which takes a single parameter sigma and does the appropriate calculations:

 @add_noise(gaussian, sigma=10)  
 def generate_line_1d(x, m, b):  
   ... code ...

This would mean any time you called the generate_line_1d function its output would be modified by the addition of Gaussian noise drawn from a distribution with a standard deviation of 10. If you instead wanted to define multiple instances of generate_line_1d with, say, different values of the sigma parameter, you could do the following:

 gaussian_10 = add_noise(gaussian, sigma=10)(generate_line_1d)  
 gaussian_20 = add_noise(gaussian, sigma=20)(generate_line_1d)  
 ...

And so on and so forth. These various returned functions would be analogous to the \(f(x)\) defined above. You could also switch out the gaussian function for another function, which could itself take an arbitrary number of arguments.

Looking at it now, it feels less useful in the specific context I'm using it in than it seemed when I was writing it, but I'm still proud of it—it's basically more flexible and abstract than I really need, but it's still a pretty neat trick of abstraction. Come the middle of this year I'll have been using Python for a decade now, and I'm still learning new tricks and features. And hey, I might not necessarily need it now, but you never know when it might come in handy down the road! A hui hou!

Saturday, November 23, 2019

Hawaiian Names in Astronomy, or, Fun with Regular Expressions!

I'm currently working on a post for Astrobites about the pronunciation of Hawaiian names due to the fact that there are a few Hawaiian names already used in astronomy and will likely be more in the future, so I figured it would be good for astronomers to be able to pronounce them correctly. I already knew of a few Hawaiian names used in astronomy, but it got me wondering if there were any I didn't know about. For reference, the ones I knew about beforehand were:

Haumea, a likely dwarf planet on the outskirts of the Solar System (and its two moons Hiʻiaka and Nāmaka),
Laniakea, the name of the galaxy supercluster to which our own Milky Way belongs,
and ʻOumuamua, the first known interstellar interloper.

I wondered, while writing, if there were additional Hawaiian names among the named minor bodies in the Solar System (besides Haumea), so I decided to search for them using the power of Python and regular expressions.

Regular expressions, in computing terms, are a sort of meta-language used for matching specific patterns in text. There's no official standard for them, but there are a few informal standard “dialects” which many different languages (including Python) adhere to. At their simplest, a regular expression can be a literal search—for instance, in the sentence:

“The quick brown fox jumps over the lazy dog,”

I could make a regular expression to match the single, literal word “fox.” You'll be familiar with this if you've ever tried to search for something in a text file. However, the power of regular expressions comes from their ability to search for more abstract combinations of letters (or numerals, or punctuation). For instance, I could also make a more complicated regular expression that matches “any group of three letters, surrounded by whitespace or punctuation, where the middle letter is a vowel and the outer letters are consonants,” which would match fox and dog, but not the.

The specific syntax is fairly complicated, so I won't go into it here. Instead, I'll walk through the conceptual process of how I used regular expressions to find Hawaiian names in the IAU list of named minor Solar System bodies.

I first visited this IAU webpage, which has a conveniently-alphabetized list of all 21,922 named minor bodies, saved the list to a text file, and read its contents using Python.
I then took advantage of the fact that Hawaiian orthography is quite regular (no pun intended). All words in Hawaiian are made of one or more syllables, which are composed of exactly one vowel or diphthong (two vowels), and which may optionally have exactly one consonant at the start. We can be smart about it by only looking for consonants which actually appear in Hawaiian (h, k, l, m, n, p, w, ʻ), and we can also exclude any words which have the same vowel repeated twice in a row, since those don't show up in Hawaiian words. (They do show up in Hawaiian words when the ʻokina isn't written, but I figured any Hawaiian names in the list would be recent enough that the person naming them would have taken care to use the correct letters, so I decided not to worry about potentially missing some this way.)
This returned 288 matches, and is also where I ran out of clever tricks. Unfortunately, there are a lot of names in the list which could be Hawaiian names, but aren't (as far as I know) For instance, the first result, by numerical order on the list, is asteroid 32 Pomona. This could be a perfectly fine Hawaiian name, but it's not; it's the name of a Roman goddess of fruit trees (at least, I know it was named for the Roman goddess, I can't actually say that Pomona doesn't occur as a name in Hawaiian somewhere). At this point I sifted through the remaining names in the list and checked each one on the IAU website, which, conveniently enough, includes an explanation of where most of the names come from or who they're named for.

Anyway, to make a long story short, I found 15 names of definite Hawaiian origin, in addition to Haumea. (I discarded one, 197708 Kalipona, which didn't have an explanation and which I wasn't sure about.) Here they are, in numerical order:

2202 Pele: named after the Hawaiian goddess of volcanoes. Name given in 1972, which is the earliest I could definitively find.
7613 ʻAkikiki: a critically-endangered honeycreeper native to Kauaʻi.
14764 Kīlauea: named for Kīlauea volcano on Hawaiʻi island. (I notice that although the names I found correctly use the ʻokina, they don't seem to use the kahakō that indicate long vowels, so I'll add them as appropriate when I'm aware of them.)
88297 Huikilolani: the name of the Hawaiian Astronomical Society, “Hui Kilolani,” which translates to “club of sky watchers.”
123290 Mānoa: a valley and residential district on Oʻahu. (Where the University of Hawaii at Manoa is located, I presume.)
136108 Haumea, which we've already seen, but it serves as a good consistency check!
171183 Haleakalā: the largest and tallest volcano on Maui, where a number of observatories reside.
284891 Kona: named for the region on the west side of Hawaiʻi.
342431 Hilo: my favorite place to live!
115801 Punahou: a school in Honolulu.
361267 ʻIʻiwi: a species of brilliant scarlet honeycreeper found in Hawaii.
374710 ʻŌʻō: an extinct genus of Hawaiian birds.
378002 ʻAkialoa: an extinct genus of Hawaiian honeycreepers. (Noticing a pattern yet?)
388282 ʻAkepa: a type of crossbill bird endemic to the Hawaiian islands, though likely extinct on all but Hawaiʻi. (ʻAkepa means “agile” in Hawaiian.)
469219 Kamoʻoalewa: an unusual asteroid which is currently the most stable quasi-satellite of Earth; it orbits the Sun with nearly the same orbit, and never gets too far away. One of two asteroids named by the new A Hui He Inoa program, an initiative for helping get more Hawaiian names into astronomy.
514107 Kaʻepaokaʻāwela: an unusual asteroid which orbits in a 1:1 resonance with Jupiter, but in a retrograde orbit. The other asteroid named by A Hui He Inoa.

So there we have it, a list of all the Hawaiian names I can find currently used in astronomy. If you know of any more, I'd be interested to hear of them—it's possible I could've missed some. It's certainly a lot more than I knew of before I started! A hui hou!

Sunday, October 27, 2019

A Slug and a Shell

In news which will be of no surprise to those following my painting progress, I've gone back and reworked a painting of mine due to dissatisfaction with how it turned out. The painting in question is my blue glaucus painting, which I already reworked to add a shadow. Turns out I was never satisfied with the way the shadow looked, as it came out very gray and solid, so I decided to try redoing it with a thin black glaze today. I used some ivory black, which is the only black I have listed as anything less than “opaque” (it's “semi-opaque”). I think it worked out fairly well, you can judge for yourself below:

The old version…

…and the newly-blackened shadow.

This way the shadow looks a lot more distinct from the slug casting it, which is also quite silvery-gray. It'd probably have been good to make the shadow more out-of-focus from the start, but at this point I think I'm happy with it.

And what's the shell mentioned in the title, you may be asking? In computer terms, a shell is (to massively simplify, since I'm not even sure I understand it completely) essentially a program that you can use to send commands to the computer by typing them into a terminal window. These commands are pretty low-level, and are immensely powerful, to the point where you can easily save yourself hundreds or thousands of actions if you know what you're doing.

Shells are comparatively ancient in computer history terms, so there are a few common ones and several less commons variants floating around out there. The generally most common one is called Bash, which came out the year I was born in 1989 and stands for Bourne Again SHell (as it was written to be a free software replacement for the then-popular Bourne shell, which came out a decade earlier). This is by far the most common one encountered, as it comes as the default shell in most Linux distributions and macOS, and it's the one I've got the most experience with.

A few days ago I came across another shell called “xonsh” (pronounced “konsh” in a play on “conch shell;” perhaps the ‘x’ is meant to represent a Greek χ?). It had the intriguing premise of being written in Python, the language I've been using at work for quite a few years now, and being able to execute Python code directly in the shell while still retaining access to familiar Bash routines. In Bash you need to open the Python interpreter directly before being able to type and evaluate Python code, and you lose access to Bash's lower-level functions like being able to change directories or list their contents while doing so (technically Bash simply calls little executable programs for that, but you can't do so directly in the Python interpreter). Xonsh offers an alluring alternative where Python code and lower-level functions can be called and mixed freely, so I installed it on my desktop at home this weekend and intend to give it a try. I actually do run into situations in my PhD where I'd love to be able to execute some Python code at the terminal while using Bash, so it'll be interesting to see if this goes anywhere.

The caveat of any non-Bash shell, of course, is adoption; any other shell is not as popular, and therefore won't have as many resources about it online or people familiar with it to ask. I doubt I'll be ditching Bash anytime soon, but maybe I'll be surprised and find I can really replace it entirely. We'll see! A hui hou!

Saturday, May 18, 2019

A Birthday, Collaboration, and the Open Source Process

Yesterday I turned thirty, and this past month I got my first and second “real” pull requests accepted, into the Astroquery module of Astropy.

If you don't understand what I just said, I'm going to need to do some explaining. Let's start with the concept of “open source” mentioned in the title: open source, as used in computing, refers to computer programs where the source code for the program is available somehow for inspection. An open-source program is one where anyone can come along and look at the underlying code, and usually (though it depends on the license) take it, modify it, and use it themselves. Typically it also involves an idea of open collaboration, where anyone can suggest improvements to the code for the benefit of all users.

A “pull request” is one such way to suggest an improvement, using the popular version control software Git (originally written by Linus Torvalds, also the creator of the original Linux kernel). The website GitHub.com hosts vast numbers of Git repositories (the name for a collection of all the source code for a project) and makes it easy to coordinate collaboration from many people around the world. A pull request is a request to the maintainer of a repository to merge (or “pull in”) some changes from another source.

Around a month and a half go ago I started using the Astroquery module of the Astropy project (which is a collection of Python code for use in astronomy). The Astroquery module allows you to query various astronomical databases that don't have official APIs; I use it for searching for information about atomic transitions from the National Institute of Standards and Technology (NIST) Atomic Spectra Database (ASD). Anyway, I discovered that there was some information being returned that wasn't being parsed into the returned results, so I made a one-line addition to my local copy of the code (after a little experimentation) which made it work. I figured it might be of interest to other people, so I made a pull request to the maintainers of the package, and after going through the review process it got accepted!

This was more of a feature addition than anything, but a week or so later I discovered an actual bug in the handling of certain Unicode characters present in the database. (The dagger character [†] was being written as an HTML multi-character code which broke the fixed-width formatting that was being performed on the query results.) This required a little more detective work to figure out, and some back-and-forth with the package maintainers on what a good fix would look like, but I found a simple, effective fix and submitted a pull request for that as well. This time the process was slightly more involved, as I wrote an automated test to cover the situation and a change log entry for the issue I'd raised regarding the bug, but after another week or so this one got accepted as well.

I've long admired the idea of open source, of people around the world giving of their time and creativity to improve software freely available to everyone, and it's a great feeling to finally be part of it myself. A person's contributions to open source projects can look good on a résumé as well (it shows you can code and work as part of a team), so it has practical benefits as well. I don't know what form future contributions might take, but I'd definitely like to continue contributing in the future as my knowledge and skill allow. A hui hou!