IronPython Bytecode Interpreter

One of the things that people would really like to have on IronPython is support for pre-compiled Python (.pyc) files. These files are pre-parsed and converted into the bytecode format that the Python interpreter actually runs. This speeds execution up for a couple of reasons:

  1. The application does not need to parse the source code prior to execution
  2. The operations are usually at least minimally optimized already by the parsing engine

I was intrigued by the Python bytecode, it looked like a fun little thing to play around with, so I developed very minimal and early support for bytecode interpretation in the IronPython codebase. You can see my commits at my fork of the IronPython repository on Github. There is still a LOT of work that needs to be done, but there are a lot smarter people out there who once the basic framework is in place can take it and run with it.

The commits above also add support for the ever important __hello__ and __phello__ imports. One more thing making IronPython more compatible with CPython.

Advertisements

The Greatest Classical Concert of All Time

This is not an historical account of any concert I have ever been to. It is more like a wishlist for the perfect concert of classical music that I can think of. These pieces are in no particular order.

Respighi – Pines of the Appian Way

This conductor is a bit overboard, but fun to watch.

Dvorak – New World Symphony

I can listen to this full piece over and over and over. The Largo (movement 2) is one of my favorite pieces of music ever written.

Tchaikovsky – 1812 Overture

I played this piece in high school symphony, it has left a lasting impression on me. There are several parts where the trombones rest for what seems like hundreds of measures, so the other trombone players and I came up with a nice story about a little Russian woman who’s village is razed while she is away and she goes on a quest for vengeance. Along the way she meets up with an Indian snake charmer and some other interesting people. The cannons at the end are her triumphal attack on the city of the people who killed her family. Don’t judge me, it was a long time to rest.

Tchaikovsky – Marche Slave

This is a piece that was done by the same symphony above the year before I was in high school. My sister was in choral groups and all the musical groups did a big concert, so I got to hear this piece and have loved it since.

Edvard Grieg – In the Hall of the Mountain King

One of the first classical pieces I remember being exposed to. I am not sure where or when, but I’ve always liked it.

Aaron Copeland – Fanfare for the Common Man

This piece has been used in movies a lot, but I don’t think it’s been overused. It’s still a moving piece to me. I love the combination of the percussion and brass. Simple.

There are other pieces I could add to this perfect concert, but I’m not going to.

Prototyping and Testing Groovy Email Templates

One of the features people have requested for the Jenkins email-ext plugin is an easier way to test out their templates for Groovy generated email content. (See JENKINS-9594). The email-ext plugin supports Groovy’s SimpleTemplateEngine for generating email body (and other areas). I haven’t had the change to implement this feature yet, but found a fairly easy way to test out templates for builds based on a previous build. This can be used in the Jenkins Script Console to test the templates on jobs. The Groovy code below will get a project that you specify by name, create a copy of it and then perform the build step for the ExtendedEmailPublisher with the previous build you want to test with. It even prints out the build log output from the ExtendedEmailPublisher running. You can change anything about the ExtendedEmailPublisher before calling perform that you might need to. This is not the final solution, I still plan on implementing this feature when I get the time to look into it more.

import hudson.model.StreamBuildListener
import hudson.plugins.emailext.ExtendedEmailPublisher
import java.io.ByteArrayOutputStream
  
def projectName = "SomeProject"
Jenkins.instance.copy(Jenkins.instance.getItem(projectName), "$projectName-Testing")
    
def project = Jenkins.instance.getItem(projectName)
try {
  def testing = Jenkins.instance.getItem("$projectName-Testing")
  def build = project.lastBuild
  // or def build = project.lastFailedBuild
  // see the <a href="http://javadoc.jenkins-ci.org/hudson/model/Job.html#getLastBuild()" title="Job" target="_blank">javadoc for the Job class</a> 
  //for other ways to get builds

  def baos = new ByteArrayOutputStream()
  def listener = new StreamBuildListener(baos)

  testing.publishersList.each() { p ->;
    println(p)
    if(p instanceof ExtendedEmailPublisher) {
      // modify the properties as necessary here
      p.recipientList = 'me@me.com' // set the recipient list while testing
      
      // run the publisher
      p.perform((AbstractBuild<?,?>)build, null, listener)
      // print out the build log from ExtendedEmailPublisher
      println(new String( baos.toByteArray(), "UTF-8" ))
    }
  }
} finally {
  if(testing != null) {
    // cleanup the test job
    testing.delete()
  }
}

Update: Thanks to Josh Unger for a a few updates to the above to make it more robust

Groovy ‘def’ Jam

As a way to blow off some steam, I like to contribute to open source software. You might ask why, as a software developer who spends his entire day writing code would I want to spend my free time writing more software. I honestly don’t know the answer to that question. I find something enjoyable in giving something back, or something along those lines.

Anywho, one of the projects that I contribute to and have talked about on here before is the Jenkins continuous integration server. Jenkins uses an MVC model for displaying webpages and interacting with the API of the application, the views are created by using Jelly. I will be completely honest, I hate creating and using Jelly views. Whoever thought up “executable XML”…

I found out that you could also do views using Groovy, so basically you just use scripting when you want and use the tag libraries like you would from Jelly, but get this, it doesn’t suck!

I wrote a little utility to convert Jelly views to Groovy views, because when I found out I could convert the views in the email-ext plugin to Groovy from Jelly, I wanted to do it immediately, but who wants to convert by hand! We’re software developers, we don’t do things by hand. We’ll spend twice as long to write a tool as it would take to do it by hand, but then, by George, if we have to do it again, it will take milliseconds!

So, the tool is strangely called jelly2groovy because you have use that naming format when writing a tool like this, just in case you didn’t know that. I was converting the views in the email-ext plugin from Jelly to Groovy and the following code was generated.

f.entry(title:_("Default Subject"), help: "/plugin/email-ext/help/projectConfig/defaultSubject.html") {
    if(instance.configured) {
        input(name: "project_default_subject", value: instance.defaultSubject, class: "setting-input", type: "text")
    } else {
        input(name: "project_default_subject", value: "$DEFAULT_SUBJECT", class: "setting-input", type: "text")
    }
}
}

The thing to notice in that code is the double curly brace at the end. I only have one place that outputs a closing curly brace in the conversion script.

if( doOutput && (elem.children().size() > 0) ) {
    out.writeLine("${' ' * indent}}")
}

I set doOutput to either true or false depending on if the tag I am rendering needs it or not (the Jelly choose tag doesn’t need to be rendered, just the when/otherwise children).

So, somehow, doOutput was getting set to true, even though I set it to false inside the check for the ‘choose’ tag element.

Wha?!

The code is basically like this:

doOutput = true
...
if(tag == 'choose') {
    doOutput = false
}
...
// iterate over children by calling the current method recursively
if(doOutput...) {
    // generate the closing curly brace
}

Not very complex. It turns out though that there is a subtle issue with the way I wrote the code and it all lies in a three letter keyword ‘def’

You can read a full description of the meaning of ‘def’ if you would like to do so, but it boils down the following: NOT putting def in front of variable definitions in Groovy is almost like if the variable were global, by putting ‘def’ in front of the variable declaration, it refines the scope of the variable to be local. Without the ‘def’ in front of doOutput, when I called the method recursively and the value of doOutput was set to true, it retained that value once it got back from the recursive call and so, the ending curly brace was rendered.

Once I figured that out, I added ‘def’ in front of some key variables, and things worked perfectly.

jelly2groovy is now working on several tags and does a good job of converting things over, obviously there are still tags I don’t handle and things I haven’t tried yet (taglibs!) but its coming along nicely.

Jenkins – Standalone Build Generator

I’ve blogged before about how we use Jenkins at work for our continuous integration solution. One thing that our previous CI solution had was the ability for developers to run a standalone version of the tool on their development PC’s to check out large scale changes that might break several applications. Jenkins is much more difficult to do this with, mainly because we are using Rational Clearcase for SCM. We could use the Jenkins server to build from individual development streams if we wanted to, but it would require that all the files be checked in before being able to build locally.

I came up with a Groovy script that runs after each Nightly Build that collects the jobs that are currently in Jenkins and generates a standalone zip file that developers can download and launch a local instance of Jenkins to do a build from their view. The script is used as a  Groovy build step.

Jenkins – Jelly to Groovy

At work we use Jenkins for our continuous integration setup. As I have mentioned previously I really, really like Jenkins (I blogged about Hudson previously, but we moved with the fork to Jenkins since it is more community driven).

I took over as the maintainer for the email-ext plugin, which allows you to configure the emails sent for failures and other build results to a much higher level than the default Mailer. You can have different triggers for different statuses, you can include various pieces of information in your email templates. You can even use scripts to generate the emails. See the wiki page above if you are interested in more information.

Jenkins uses MVC for displaying web pages and interacting with the system. You create a view template in either Jelly, which is an “executable” XML format, or Groovy which is a scripting language for the Java Virtual Machine (JVM). The Jelly format is VERY painful to try and debug what is going wrong if you have something going wrong. Errors are not easy to track down and it is VERY painful to do some things (like call methods on objects and define variables and conditionals and…well pretty much everything). Groovy, on the other hand, it very nice to work with. You have basically a full scripting language to use to your advantage. Conditionals, object creation, variables are all just as easy as if you were writing a simple script (which you are!).

I wanted to start migrating the email-ext plugin to use Groovy views, because I think it gives a lot of power when trying to do things with the Jenkins API. I hand ported one view and it didn’t really take that long, but as most software people realize at some point when dealing with XML, the computer could be doing this for me! I spent about 20 minutes or so writing this initial version of a Jelly to Groovy converter. For very simple views, it works great. Feel free to fork it on GitHub and send me a pull request with updates. Hopefully it will be useful to someone else.

https://github.com/slide/jelly2groovy

Migrate CodePlex Issues to GitHub Issues

The IronPython project is looking at moving and completely using GitHub for all project information: downloads, issues, wiki, etc. The main problem is that IronPython currently resides on CodePlex and CodePlex, sadly, does not provide an API for accessing anything. This means we need to use screen scraping to get the job done on the CodePlex side. On the GitHub side, they have a wonderful API that is very well documented and has libraries for many languages. BeautifulSoup is a library I have previously used for screen scraping from Python and it was a great experience, its a simple to use library.

Some goals for the script based on feedback from the project:

  1. Maintain history (comments) as much as possible
  2. Maintain component notations
  3. Maintain releases
  4. Migrate both open and closed issues
  5. Migrate attachments if possible

When doing screen scraping, I really like to use the developer tools from whatever browser I am using (usually Chrome) in order to make viewing the source and finding patterns in the HTML easier. I decided to scrape the information I needed in a couple different steps. I could get some of the information from the list of issues, but then I would also need to go to each individual issue page and scrape information from there.

I decided to use the Advanced view for the bug tracker on CodePlex because it had a lot of information that I could pull out right from the get go.

CodePlex Issues Advanced View

IronPython advanced view for issues.

You can see that we can get information like ID, Title, Status, Type, Priority and last update  (though the last update wasn’t really useful). It was also possible to grab the link for the specific issue for use later.

One thing I did when writing the script was setup the filters and sorting the way I wanted prior to grabbing the soup and then I used the direct link that can be found on the page to get the issues in the order I really wanted.

As you can see from the screenshot below, if you use the “Inspect Element” in Chrome it will show you the structure for each row in the list of issues.

CodePlex Issue Row

Row information from advanced view.

Each row of the advanced view has several pieces that we can pull out, and each row starts with “row_checkbox_” this makes it very easy to loop through each row using BeautifulSoup.

Each row can have information about who the issue is assigned to, if its currently closed or not as well as a link to the actual individual issue page that we will need later. I grabbed all this info and put it into a sqlite database so that I could update it once I parsed the individual issue page.

GitHub treats the severity and type of issue as labels, so I add the severity and type to the issue_to_label table with a foreign key into the issues table, this makes it easier later to add all the labels necessary. CodePlex will only show up to 100 items per page, so I regenerate the direct link with info on which page I want and parse each page to get all the issues.

Now that I have all the issues in a database, I select them all and iterate through them to parse the individual issue pages to grab all the information.

One thing to note here is that I actually used a different HTML parser for BeautifulSoup in different parts of the script. When parsing the Advanced View, I used “html5lib”, but while parsing the individual issues, I used the “html.parser.” The reason for this is because each parser treats uncompleted tags differently, one of them adds additional tags to make up for missing tags, the other does not. The HTML generated by CodePlex had some weirdness in the area of the descriptions of the issues, so using “html.parser” cleared some of those issues up and made the soup easier to work with.

While parsing each issue, there were four main areas that I wanted to get information from:

  1. Description
  2. Attachments
  3. Comments
  4. Metadata
CodePlex Issue Areas of Interest

Areas of interest

The description was pretty straight forward, I looked at the HTML for that area and found the following:

CodePlex Issue Description HTML

HTML for issue description area.

This was pretty easy to grab from the soup, but then I had an issue that there is possibly markup in the description content (bolds, italics, etc). So, I decided I would use the html2text module to convert the description into valid markdown that could be used directly on GitHub.

The attachments were also pretty easy, each one had a specific id that could be pulled out using BeautifulSoup:

Comments were a little trickier, they had several bits of information that I was interested in. I wouldn’t be able to maintain the person who made the comment on GitHub, but I wanted to keep when the comment was made and who made it, and add these items as comments on the GitHub issues, in the order they were made on CodePlex.

As you can see, with the understanding of how the HTML is put together, it is pretty easy to pull our the information you are interested in and even though CodePlex doesn’t have an API, they do put a lot of information into the HTML of the issues that can be parsed out.

The metadata area was also fairly well structured. It is just a table contained within a div with the id “right_side_table,” and looping through the tr elements and pulling out the info is a piece of cake again.

Some of the metadata was used to update fields for the issue itself, but the rest were added to the description under a header “Work Item Details” to maintain the history of the information when the issue was moved from CodePlex to GitHub.

Once all the data was put into the database, it was pretty easy to import into GitHub using the PyGithub module. The one bad thing about this module is the lack of good documentation. I had to figure a few things out by just looking at the source code as well as looking at the GitHub API documentation to see what was possible with the different API calls.

Since the GitHub part is really easy to comprehend and the majority of this article was to talk about screen scraping, I will just provide the code for the script in the gist below.

The end result of the imported issues list can be seen below on a practice run on GitHub.

GitHub Issue List

Issues after being imported to GitHub

The severity (high, medium, etc.), the type (task, feature, etc.) and the component are all turned into labels with nice color coding on some of them.

The script migrates any plaintext attachments over as Gists and then puts a link to the Gist in the description area. Binary attachments are left on CodePlex and linked to directly. It would be better to have everything in one place, but GitHub doesn’t really have a good way of attaching binary items to tickets (or any attachments at all in fact).

The full script can be seen here, feel free to fork and make improvements. I’d love to see any improvements you have made via pull requests.