PhD Tools: 2011

26/10/2011

Weather in iGoogle

I've been using iGoogle for a while now. For those not in the know, this personalises the Google homepage and allows you to add "gadgets". Since I've started using Chrome as my browser I tend to Google search from the address bar anyway, but I still find iGoogle pretty handy.

I use it as a way to amass all the stuff I tend to look at in one session. Currently I have three different tabs:

News - Local and world affairs and specific technology news
Comics - Webcomics that I like to read
Weather - Local weather forecasts so I know if I need a coat

These are all areas that I tend to want to update myself on regularly - approximately once a day. So instead of trawling through various bookmarked pages I can accumulate all the info in one convenient page.

This works out really well and I've found there are specific gadgets for most of the pages the I want to visit. Some of the news related ones didn't have gadgets but I was able to use use their RSS feed to display them in an RSS reader gadget.

The only real pain was the some of the weather forecast sites that I wanted to look at didn't have gadgets - most notably Metcheck.com. I really like the way they display the forecast so I set about trying to get it onto an iGoogle page. The 'official' way to do this I guess would be to make a specific gadget; however this seemed like it would be a lot of effort and learning time. So I went with a different option.

I used a gadget that allows the inclusion of html. Then I wrote some HTML to include a portion of another website. This allowed me to trim the site down to just the specific window I wanted. For example:
<style type="text/css"> #container{ width: 0px; height: 0px;} #container iframe {width: 780px; height: 590px; margin: -180px 0px 0px -240px; } </style> <div id="container"> <iframe scrolling="no" width="0" height="0" src="https://www.netweather.tv/index.cgi?action=uk7dayx7;page=1;ct=19121~Manchester;sess=#forecast></iframe></div>

This includes the portion of netweather.tv for my specific region. I was previously using metcheck but have recently switched as their results page seems to regularly give errors at the moment.

There might be an easier way of doing things, and for other websites the numbers in the HTML will need fiddling with so that the right portion of the page is in the window, but this technique seems to work pretty well for now.

28/09/2011

Future documentation methods

Coming towards the end of the first year of my PhD and spending some time writing up my progress so far has led me to muse over the nature of report writing and ask the question: "surely there must be something better"...

Whilst the old adage "if it ain't broke don't fix it" might well apply here, I can't help but think that in the 21st century of immersive 3D virtual reality game playing, home 3D printing, and everyone carrying at least one state of the art electronic device about their person at all times, the concept of a paper report seems a little dated.

Here are a few examples to try to illustrate my point:

If there is a book and a film of the same story (or even a webpage and a youtube video) I will inevitably look at the the film first as it will convey the information to me far faster, and with less effort than reading the book.
If I have a choice between a photo or drawing of an object and a 3D model (either manipulable on screen or available to touch) I would get a better understanding of it from the 3D.
If I want information on a specific subject then I turn to Google/Wikipedia before I head off to the library.

I don't think these are examples of me being weird, they are simply illustrations of modern life making information more readily accessible. I'm sure you could argue over semantics ("the book contains more detail than the film", "library books have a more systematic review process than google hits", etc) but I hope you can accept my general point.

It therefore seems strange to me that a piece of work, perhaps costing thousands of pounds and many hundreds of man hours, should be presented in such a one dimensional format as a printed report. Here is a summary of what I see as the limitations to a printed report:

One dimensionality - sure pictures might take this up to 2D, but all too often there aren't enough of these!
Lack of user interaction - I can't search for a keyword, interactively link to source, or request further detail on a topic.
Page constrained format - diagrams need to fit within a certain width, zooming in is limited by your eyes and printer resolution and page breaks artificially chop things up.
Visual sense only - My other senses are put on hold, and only serve as a distraction.

So what have people done to improve on this? Here're a few examples that I can think of:

Video - a good recent example is this guys youtube CV
Hyperlinking - within sections of a document or out to other documents or web sites
Wiki formats - taking linking between sections to the extreme and making progress through the information less linear
3D graphics - starting to be seen more in web pages, an excellent example is Google body
Powerpoint - a format often used in place of a standard document, it has many of the same issues, however users often seem to feel a little less constrained in terms of layout (perhaps this is only due to convention?)
Computable document format - this is a really exciting new development that reflects a lot of what I'm describing here

This last concept may or may not take off but I can see what they are hoping it will achieve. Some of its functionality can already be achieved in a pdf (details of how to achieve a lot of them through LaTeX are here) and almost all of it could also be done through HTML and javascript. An interesting discussion on this is given here. In fact the recently developed HTML5, in combination with javascript programming, offers a whole mass of interesting possibilities for the presentation of information. A step towards using HTML5 for what I'm talking about here is Tangle. This is a javascript library that supports the production of "reactive documents", allowing a reader to play with the content of the document.

Another alternative format with a lot of capability is Flash animation, these animations are typically web-based and often allow user interaction. Some basic options for creating these are given here. Although it is a very widely used format it requires a good level of experience to be able to code it. It has also faced quite widespread criticism recently, the most high profile of which came from Apple, and therefore there is speculation about whether HTML5 will ultimately replace it.

An obvious downside to these types of advanced documentation method is the length of time it takes to actually produce a document. Even when the author has a good knowledge of the specific tool they're using I think it's safe to say that nothing I've mentioned above will be as quick to produce as a simple text document. In fact the more advanced the documentation method - the longer it's likely to take to produce.

I'd love to be able to round off with a recommendation of the ultimate tool or combination of tools that can be used to create the perfect document, but as far as I've seen it doesn't yet exist. Lots of things seem to offer at least part of the solution I'm looking for, but none pull it all together into one great package. So instead I'll do two things, firstly I'll make a few plain points in summary/prediction, then I'll put together a set of use cases that I'd like to see available to the end user of my "ultimate document".

Summary/Predictions

The plain printed word document is currently in the process of being overtaken by more electronic forms of documentation, inherently bringing a lot more potential to the document itself (hyperlinking and embedded video being two major ones). I would expect this to be a continuing trend (that may eventually even reach formal engineering reporting or even academia!).
There is the potential for this to go a lot further than the type of electronic documents seen today with the addition of 3D effects, audio tracks and similar.
HTML5 currently seems to offer the most potential for supporting this type of advanced documentation (although the computable document format may also be a candidate if it manages to pickup much of a user base).
Very little progress towards this end goal will be achieved until there are good tools for authoring the type of document I'm discussing here.
It seems highly likely that viewing of any document of this type will be through a web browser or similar.

Use cases - scenarios that I, as an end 'reader', would like to see supported in this ultimate document format.

User managed detail level - I'd like to be able to look in more detail at sections I'm interested in or know little about, whilst invisibly skipping over the mundane or tedious stuff.
Unconstrained document flow - If I want to read summary, then the contents, then the conclusions, then methods, it should be easy for me to work through that way.
Recommended document flow - If I simply want to be guided through the document ensuring that I pick up all the important information then this should also be easy
User interaction - Where more information could be made available then I should be able to access it. For example I should be able to zoom in on a waveform or rotate a 3D model.
Multiple sense stimulation - practically this is likely to be limited to visual and audio currently (at least until we develop smell-o-vision and feel-o-vision...)
Portability - I want this document to be viewable in as many places as possible, consequently it must be compact and easily openable on a variety of devices (laptops, mobiles, touchpads, e-readers, etc). This might even extend to alternative language/disability support and (somewhat ironically) the ability to print onto plain old paper.

So what have I missed? I'd love to discuss this topic and related areas more so please leave me a comment.
I'd also love to have the time, skills and supervisor buy-in to trying to present my thesis in the manner I've outlined; however I suspect that that will remain a pipe-dream...

18/08/2011

Creating Gantt charts

I've had this issue before and it annoyed me. It's still annoying me now!
Gantt charts are one of the types of chart that I have come to accept as a part of life that's not going away, and actually they're not all that bad. They are however difficult to draw.

I (like plenty of other people I've seen) have tried kludging together a Gantt chart in an Excel spreadsheet and it always seems to come out looking pretty horrific. I've also seen them drawn freehand in paint or Powerpoint or similar. Obviously these aren't really a proper solution, so I had a look round to try to find something that would let me manage a simple Gantt chart to plan and track my PhD progress.

Two options looked promising:
Gnome Planner - part of the gnome desktop but also usable under Windows
GanttProject - another open source cross platform piece of software

I started out with the latter and then switched to the former, but to be honest I don't find either that useful. I think the primary issue is that I'm not looking for a tool to perform PERT or CPM or do resource levelling for me. All I want is something that will accept all of my task details in a well contained format and produce a vector graphic of the chart that I can include in a LaTeX document. It's this last part where both of these tools fall down.

The output of these tools is much more of a formal report than it is a nice image. They also seem to really struggle with the multi-year duration of my work. With planner I've had to resort to taking screenshots of the window and including those in reports!

There are packages for LaTeX that support drawing of Gantt charts, but they're a bit too fiddly for the kind of quick changes that I'd like to be able to make. As Planner saves in XML format there must be a method of auto-converting to the LaTeX code...

Any solutions or alternatives out there that I'm missing?

UPDATE: In the end I went with drawing it out using one of the Latex packages. This used all the same data that the planner XML file had in it but in a different format - I was sorely tempted to put together a script to do the conversion for me but in the end I couldn't justify the time required and did it manually. I also had to meddle around a bit to get the gantt to fit on an A4 page - in the end I made it landscape and adjusted the page margins. I think the output from Latex looked a lot more professional and it would be really useful to have a conversion tool. If there is any interest in how I put the latex code together then leave a comment and I'll write a bit more about it...

16/08/2011

Using a remote Bibtex file

As mentioned previously:

I am using version control software to backup my work
I'm using LaTeX
I don't like duplicating files around my system
I'm working between both Windows and Linux

So I've found myself today writing an end of year report (housed within its own folder in my filesystem) and wanting to pull in some references from my bibtex file (housed within the literature review folder in my filesystem). I could use the copying hack that I previously devised for figures, but that wasn't very elegant.

There are plenty of questions about this out there in internetland, mostly with the following suggestions:

Move your bibtex file to within the tex installation root - I can't really do this because of 1. above.
Use symbolic links to the original in the local folder - I can't do this becuase of 4. (windows doesn't support them)
Give a relative filename reference e.g. '\bibiography{../literatureReview/references}'
Adding the location as an environment variable as described here.

Both of these later two methods seemed to work for me. I went with the relative referencing for portability between systems. Odd that such a straightforward solution works for references but not for figures!?

27/07/2011

Making a bibtex file from a folder of pdf files

The issue
As I'm going to be writing some big documents with lots of references I'd be a fool to try to manage these manually, I therefore needed to pick a reference management piece of software. After some browsing I settled on JabRef because: it's free, it's open source, it's lightweight, it's cross-platform and it handles bibtex format natively (which is what I need for it to integrate with latex). It should also link nicely into the Sciplore mind mapping software which I'm using (more about that some other time).

JabRef is basically a database management tool for references that stores its database in bibtex format. It looks like it will work rather well, but unfortunately my first stumbling block is that I already have a folder full of my references in pdf format (~200). This means that I'm immediately faced with the big task of going through and adding the details of each pdf individually. There must be a better way...

Someone else asked the same question here. The answer seemed to be that there was no easy way in JabRef, but it could be done in some other reference management software - such as Mendeley. So I could install that as well and export from there to use JabRef, that seemed like a pain though, especially as you need log in details and all sorts for Mendeley.

The solution
Somewhere else cb2Bib was suggested. This looks like an awesome piece of software, almost to the point that I could use it instead of JabRef, although I don't think it does quite the same job. It's designed as a bibtex database manager, however it is more tailored towards reference entry than editing or final use (e.g. citations) - although it can do this. Its method of adding a new reference is based on what's currently in the clipboard - thats whatever you most recently 'cut' or 'copied' in your operating system. This can either be a piece of text or a pdf file.

Files from the system can also be queued up to be added to the clipboard for addition to the bibtex database - in this manner a folders worth of pdf files can be added. Once the file is in the clipboard the software interrogates it to try to extract the right details for the bibtex reference entry. It is also able to do some other clever things like search the web and find a web reference for it that matches only one of the pieces of data it has extracted. There is also the option to manually edit the fields or to set off a whole run of files to add automatically.

My implementation
In practice the software took a little while to get used to; the buttons aren't in quite the locations I'd expect, there seem to be about 3 different windows that are independent but interrelated and the method of specifying a bibtex file and then successively saving additions to it felt a little odd (rather than running through to create a file and then saving it all at once). But once I was used to it at that level it all worked.

When I came to actually try to add all of my pre-saved pdfs however, I hit problems. Whilst automatic extraction usually managed to pull out a few nuggets of useful data, it rarely found enough for a complete entry. Hitting the button to search the web didn't seem to give much assistance. So it was time to dig a little deeper.

Probing through the website there is quite a lot of useful information on how to configure the software to do what you want. What I needed to do was look into where was being searched on the web for my articles. This is all setup in a configuration file located at:
C:\Program Files\cb2bib\data\netqinf.txt (windows)
or
/usr/share/cb2bib/data/netqinf.txt (linux) (you'll need permissions or to be root to edit)

Wading into there you can find out where is being searched and in what order. What would have been ideal for me would have been a search of the IEEE Xplore site, as that would have turned up most of my papers. Unfortunately it was not in there. Second best was google scholar, sitting at the bottom of the list of options. The documentation in the file wasn't brilliant, but with a bit of trial and error I was able to work out what was going on.

The major change I made to the file was to add this at the top of the queries list:

# QUERY INFO FOR Google Scholar
journal=
query=http://scholar.google.com/scholar?hl=en&lr=&ie=UTF-8&q=<<title>>&btnG=Search
capture_from_query=info:(.+):scholar
referenceurl_prefix=http://scholar.google.com/scholar.bib?hl=en&lr=&ie=UTF-8&q=info:
referenceurl_sufix=:scholar.google.com/&output=citation&oe=ASCII&oi=citation
pdfurl_prefix=
pdfurl_sufix=
action=

journal=
query=http://scholar.google.com/scholar?hl=en&lr=&ie=UTF-8&q=<<excerpt>>&btnG=Search
capture_from_query=info:(.+):scholar
referenceurl_prefix=http://scholar.google.com/scholar.bib?hl=en&lr=&ie=UTF-8&q=info:
referenceurl_sufix=:scholar.google.com/&output=citation&oe=ASCII&oi=citation
pdfurl_prefix=
pdfurl_sufix=
action=

The important changes here are the <<title>> and <<excerpt>> search strings, and the change from capture_from_query=info:(\w+):scholar in the existing scholar searches to capture_from_query=info:(.+):scholar in my search. I'm not too sure what the latter change did, but its effect was that it found the details - where previously it was often missing them!

The other change I made was to untick the option "Set 'title' in double braces" box in the configuration window. After I'd made these changes it worked a lot more consistently.

Some of the time it still pulled out the wrong details if it mis-extracted the article title, however I'd named all my pdfs with the title of the paper, therefore it was simply a case of copying and pasting the filename into the title field and rerunning. It would have been really nice to be able to use the title of my pdf as part of the search but unfortunately I couldn't find a way of doing that.

The only other issue I'm having is that although cb2bib adds in the link to the pdf file, JabRef wont understand it as it uses a very slightly different bibtex format for it. The cb2bib format seems to be:

file = {location}

whereas the JabRef format seems to be:

file = {description:location:type}

There is a comment here by a Mendeley admin that suggests that there is no prescribed format for this aspect of a bibtex file, so I guess it's to be expected. I should be able to work around it with a bit of clever find/replace, but it's an annoyance.
ACTUALLY - this seems to be working under windows! It looks like a different version of JabRef has gotten around this issue.

UPDATE: After a couple of months of getting used to cb2bib and using it to produce a document I'm not really finding the need to use JabRef at all! The 'citer' facility of cb2bib is actually really good.

UPDATE: I hadn't previously gotten round to extracting from IEEE Xplore, as almost everything is on Google Scholar. However I've just tried to set it up and found that the IEEE pages use javascript buttons to produce the citation. This makes it difficult to fully automate.

If you add the following to netqinf.txt then it should search IEEE Xplore for the title, you can then manually click the "download citation" button, select BibTeX format and then copy the BibTeX citation into cb2bib:

# QUERY INFO FOR IEEEXplore
journal=
query=http://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=<<title>>&x=35&y=7
capture_from_query=arnumber=(\d+)&contentType
referenceurl_prefix=http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=
referenceurl_sufix=
pdfurl_prefix=
pdfurl_sufix=
action=browse_referenceurl

14/07/2011

Latex \psfragfig of figures in other folders

What I've spent part of today wrestling with...

I've detailed previously how I'm exporting Matlab plots and including them in Latex documents. This works very well for figure files in the same directory - which is kind of the way Latex is set up. Unfortunately I'd like to maintain my folder structure in a different manner, for example, I have a "results" folder and a "thesis" folder both at the same level. Within "results" I have my Matlab plotting scripts and the images that are automatically saved by them. In my "thesis" folder I have my Latex files, in which I would like to include my results images.

Ideally I would be able to simply reference the image in my Latex document but that's not quite how it worked out. Let me explain what I tried...

Using \graphicspath
This is the most obvious way of doing things as noted here, and it works fine for graphics inserted with the \includegraphics command. I was also able to reference relatively like this: \graphicspath{{../results/}} and apparently it would also look in all the subdirectories if I put a double slash on the end: \graphicspath{{../results//}}.

This would be a great solution if I were using the \includegraphics command - but I'm not. I'm using \psfragfig and it doesn't work with this command, I guess because it's not set up to?

There is also a suggestion here that this method shouldn't be used, in favour of adding the images directory within your Latex compiler setup - although I'm not sure this would be appropriate for what I was trying to do.

Using \input, \include or \import
These all do similar things, in that they allow you to bring in latex documents from elsewhere in a folder structure. Only some of them will allow relative linking via higher level directories (rather than just subdirectories). \import seemed to be what I needed (or rather the \subimport variant of it did), so I set up a .tex file containing only my \psfragfig command within the "results" folder and imported that into my document. This got the figure into the document, but it didn't do all the nice text replacement that it was supposed to. In this respect it was no better than the previous technique.

After some experimentation it seems that \psfragfig only looks locally to the master document for the .tex files that it uses to include text with the figures.

Using \write18
So it appeared that the only way I was going to get the \psfragfig command to work properly was by having both the "figure.eps" file and the "figure.tex" file local to the master document. I'm not keen on the unnecessary duplication involved in this, so at the very least I decided to make it automatic. I therefore set it up so that these two files are automatically copied to the local directory during compile. This means that I always have the most up to date version of them in my document, and everything stays nice and automatic.

I think the only way to copy a file around between directories from within a Latex document is by issuing a system level command using \write18. This is usually disabled by default in Latex as it has the possibility of really mucking things up in your system if you were compiling 3rd party code. Therefore I had to compile my code with the extra argument '--shell-escape' as detailed here.

So I want to use the Latex \write18 command to execute a DOS copy command, that's fine; but there is also the tricky issue of needing to use backslashes for the path in the DOS copy command - these are obviously reserved in Latex. Therefore I had to use the technique outlined here to get around this.

So finally I managed to put together a Latex command that works, and it looks like this:
\def\psfragfigremote{\begingroup\catcode 92 12 \execB}
\def\execB#1#2%

{\immediate

\write18{copy #1#2.eps}%

\write18{copy #1#2.tex}%

\psfragfig*{#2}

\endgroup

}

So after including that in my header I can reference figures in other directories in my document like so:
\psfragfigremote{..\results\}{figure}
This copies both figure.eps and figure.tex (generated from Matlab with the matlabfrag command) from the results folder into the current folder and then includes them in the document.

It works fairly well, but not the first time it is run. I'm not sure why this is, possibly it hasn't finished copying the .tex file before starting to execute the next command. Therefore two compiles may be necessary to get all the labels sorted out.

I'm sure there must be a better way of doing this, possibly modifying the \graphicspath and \psfragfig commands so that they play well together would be a neater solution, but this works ok for now.

I hope this is of use to someone. Any improvements or suggestions then please comment.

UPDATE: Replacing the \write18 commands with:
\write18{robocopy #1 . #2.tex #2.eps}%
and removing the star after \psfragfig will only copy the files if they have changed, speeding up processing significantly.

11/07/2011

Google Chrome Extensions

A quick post to note down some of the extensions I'm using for Google Chrome. I switched over to Chrome as my browser of choice around a year ago (from Firefox) and I'm still very much enjoying the experience. The extensions I use are probably what make that experience good, so I thought I would give some kudos to them.

AdBlock - Blocks adverts that I don't want to see, brilliant
Add to Amazon Wish List - Allows me to keep track of stuff I'd like to buy
Backspace As Back/Forward for Linux - Does what it says most of the time, but doesn't seem to work 100% of the time - better than nothing.
Better Gmail (unofficial) - Cleans up my gmail inbox a bit
Context Menu Search - This is awesome, it allows me to highlight text and search for it in the websites I usually use. I use it most for Google Maps and internet shopping (Amazon, eBay and Google Products).
Facebook - Lets me keep on top of that all important social networking.
Google Calendar Checker - Counts down the time to my next event and lets me go straight to my calendar with one click.
Google Calendar ebay reminder - I had to hack this one a bit to make it work, but it adds a link to ebay auction pages allowing me to add the auction end as an event in my calendar.
Mail Checker Plus - An icon showing how many unread messages I have and a quick link to my inbox.
Use HTTPS - Keeps my browsing secure
KB SSL Enforcer - More security stuff
Shareaholic - Lets me email a link to the current page, or makes a shortened URL for it.

As you can see from this list, most of them just cut down the number of buttons I need to press to get stuff done - what can I say, I'm lazy!

Anything else that I might find useful?

07/06/2011

Portable Version Control

Background
I've mentioned previously that I'm version controlling my work using distributed version control software - specifically Bazaar. I explained that part of the reason I'm doing this is because I work at a number of different locations, and I need to be able to manage various different versions of my work and keep them all in sync.
One of the locations I work at is a Laptop provided by my sponsor. Unfortunately their corporate system does not allow the installation of unauthorised software and Bazaar is not on their approved list. I did briefly look into using subversion (as it was used elsewhere in the company and linking between this and Bazaar is possible) however it looked like it was going to be too much hassle to justify getting it installed. I've therefore decided to attempt to run my version control purely from my flash memory. This will not only allow me to version control my work on my sponsors laptop, but also on any other machine I decide to use.

Incidentally it's worth mentioning that I'm storing all my work on the flash memory (micro-sd) card of my phone. I upgraded to an 8 Gb card which should be plenty (at least initially). I always carry my phone anyway so all I need to retrieve my work at any time or location is a USB cable (or bluetooth if I'm desperate!). I don't treat this as a working copy, but I backup to it regularly and use it to transfer versions between machines. In version control terminology this means I have a 'branch' on each machine I work at and another 'branch' on my phone. I 'push' or 'pull' changes between the phone branch and the local copy branch to backup/transfer the work.

Bazaar is coded using the Python programming language. This is a pretty modern and, by most accounts, a pretty handy programming language; however it's not something that I'm particularly familiar with. The main benefit to me is that there is a variant of it available called Portable Python which is designed to run from a USB storage device (like my phone). This means that it is possible for me to install Portable Python on my phone, and then run Bazaar from that.

I was quite surprised not to be able to find any instructions for this setup on the net anywhere, as it seems like quite a handy proposition (certainly for someone in my situation). The only references that I could find to it were suggestions of using it here, and here. I'm far from an expert on this type of thing but I've sort of got it working - so I thought I'd publish the steps I took incase anyone has any feedback, or it is of use to someone else.

Unfortunately portable python is only windows based; however this is where I'm most likely to need it. (there's also a chance that it will work under linux through wine, but I haven't really experimented with that yet.)

Python Installation
The current version of Python is 3.2, however I don't think any of the Bazaar source is built on that version, so I went with 2.6 (I'm not sure if I could have gotten away with a newer version or not?). I downloaded Portable Python from here and copied it to a "python" folder on my phone. I then ran the installer and told it to install to the same folder on the phone.

Bazaar Installation
I tried installing Bazaar from some executables, unfortunately these complained that it needed python 2.6 in the registry, and gave me no option for manually specifying a python path. So instead I downloaded a tarball of the Bazaar source from here which I extracted to the Python "App" directory. I then opened a command prompt, navigated to the python "App" directory, and ran:
python bzr-2.4b3\setup.py install
(actually I was still using Linux at this point and instead ran SPE-Portable.exe under wine and then ran setup.py through that - I'm pretty sure it had the same effect though)
This ran for a while and terminated without any errors. Within the bazaar directory a "build" folder had now appeared.

From the command prompt I was then able to run :
python bzr-2.4b3\bzr status C:\test
where C:\test was a directory under version control. This gave the correct response, but it also gave me this warning that some extensions couldn't be loaded. I don't think this is a problem, so I ignored it (the link gives a method for turning off the warning if you're bothered about it).

Bazaar Explorer Installation
I wasn't keen on typing everything through the command line, so I looked into getting bazaar explorer running. The official instructions for this are here.
So I changed to the plugins directory of bazaar:
cd F:\python\PortablePython_1.1_py2.6.1\App\bzr-2.4b3\bzrlib\plugins

and ran bzr with the command to download from launchpad:
F:\python\PortablePython_1.1_py2.6.1\App\python F:\python\PortablePython_1.1_py2.6.1\App\bzr-2.4b3\bzr branch lp:bzr-explorer explorer
This warned me that I hadn't given a launchpad ID, but I don't think that matters as I'm not intending on writing anything back to launchpad. Then it went about downloading and building explorer. This process took a little while.

I then tried to run it, but there were a few dependencies that needed fulfilling. Firstly it told me I needed QBzr. So, still in the plugins directory, I ran:
F:\python\PortablePython_1.1_py2.6.1\App\python F:\python\PortablePython_1.1_py2.6.1\App\bzr-2.4b3\bzr branch lp:qbzr
this then went through the same process as with explorer.

On the next try it requested qt libraries (ERROR: No modeule names PyQt4). Some googling reveals this, which suggests installing it in a "non-portable" install and then copying selected files across. (I did try a proper install before this but got nowhere*). I downloaded "PyQt-Py2.6-gpl-4.5.4-1.exe" from here, and installed to C:\Python26. This created a folder in C:\Python26\Lib called "site-packages", I copied the contents of this to the Lib\site-packages folder of my portable python install.

I then tried running:
python bzr-2.4b3\bzr explorer
and up it popped!

Unfortunately a lot of the buttons bring up a deprecation warning for commands in plugins\explorer\lib\app_suite.py. This is something I haven't been able to fix yet. My guess is that it's due to a new version of explorer being used with an older version of Bazaar, but that's just a guess. If anyone can help with it then I'd love to hear from you!
It's still useful as a GUI for inspecting the changes and looking at differences; however any major commands need to be made from the command prompt.

Also this all took a while working from the phone flash card, next time I might consider doing it locally on the hard drive and then copying it across.

Please let me know if you know of any better methods for getting this working, or where I've gone wrong with getting explorer working. Hope this is of interest to someone.
I've also just come across another (easier looking) solution here.

* The copying method sounded like it might be a bit flaky, and I saw some sites indicating that it could be done in a more "conventional" manner, so I got the files for it from here. I also needed "SIP" (I tried it without but it said no). SIP can be obtained from the same place here, and extracted to the python/App directory. I then changed to the python directory and ran:

python sip-4.12.3\configure.py

after this had created a sip module Makefile it errored with unable to open siplib\siplib.sbf - not sure what this means, so I then tried:

python PyQt-win-gpl-4.8.4\configure.py

this said: "make sure you have a working qt v4 qmake on your path". I was a bit lost then so I gave up and tried the technique of copying across!

03/06/2011

Simulink to eps

I've mentioned previously that I've developed a process to export plots from Matlab and include them as (eps format) figures in a Latex document. Well today I spent a load of time trying to get a similar method working for exporting Simulink model diagrams from Matlab. This hasn't proved quite as easy, but I think I'm there now.

My main issue was that Simulink doesn't allow you to print in the same way as a normal Matlab figure does. The "print" command can be used, but it works in a very specific way. It is fairly easy to use this command to print in eps format (see my previous posting for why this is a good thing); however it doesn't allow you the same freedom to setup the page. More specifically it will only allow you to use a pre-defined page size (A4, US legal, etc). It rescales and centres the figure to fill the page, but, unless the model is the exact same aspect ratio as a pre-set paper type, you end up with white borders at either the top/bottom or left/right.

EDIT: This seems to have been corrected in newer versions of Matlab now!

I've tried all sorts to get round this, including calling Ghostscript from within matlab to post-process the eps file. This should work as it ought to be able to find the right "bounding-box" for the figure and resize the page accordingly. However I had no luck with this - it would either not trim it down at all, or over-trim it removing some of the image.

I also tried exporting in a few other formats (namely normal postscript and pdf) and then converting with Ghostscript. This worked a little better and I was able to get the right bounding box through a process of:

Simulink export to pdf,
Ghostscript convert pdf-to-eps,
Ghostscript convert eps-to-eps.

I have no idea why that last step was necessary! Unfortunately although the output from this was pretty good, the text becomes part of the image (I think during the second step), meaning it doesn't render properly at high zoom levels. I don't know why that happened, I'm pretty sure it shouldn't have, but I thought I could do better.

So an alternative process I finally came up with is:

Find out the ratio of the figure (I did this by exporting as a raster image and then reading back in - I'm not very happy with this technique as it's pretty dirty, but at least it works)
Simulink export to eps, with the model at the very bottom of the page
Automatically edit the resulting eps file to adjust its bounding box information

It's a bit hacky, and if I was being picky the margins aren't perfectly even in the resulting file, but it seems to work. I'm sure there must also be a better way of achieving step 1, but it's sufficient for now. (If anyone has any suggestions then I'd love to hear them!?)

There's nothing very clever in the code, but if anyone wants a copy of my function I'll happily forward it to them. Hope this helps somebody!

EDIT: Code for my function is available here.

19/05/2011

Professional plotting

I've written and read plenty of technical reports in my time, and one major feature that almost all technical reports have in common is the inclusion of figures. Pick up any ten technical documents and I reckon you'll find at least 5 different ways of including figures; and of these 5 only one will actually look any good. So as I start to produce figures for my PhD I've been starting to look into the best way of managing this initially simple sounding task.

When I say "figure" I'm generally thinking of some kind of plot, or set of plots, usually in 2D; however what I'm going to discuss should apply to most other "technical" "pictures", but probably won't extend to photo-type images (for reasons that will hopefully become obvious).

The Issues
I won't bother going into all the different ways a figure and a report might come together, except to say that at the very bottom of the scale would be a 'picture'/screenshot pasted into a word document - this just looks awful. What I'll work up to will hopefully be the top of the scale.

What I will list is what I see as some major stumbling blocks in figure/report preparation:

Difficulty updating the figure.
Inability to resize the figure to suit the report.
Images (+ text) looking crappy when zoomed in.
Disconnect between the figure labels and the report text.
Difficulty regenerating the figure at a later date.

A lot of these are irrelevant when we're looking at a printed out finished article, so what we really need to understand are the details of how the information is stored and processed on the computer.

Background Details
One of the first important distinctions is the difference between types of image. Most images that one would typically come across on a computer are raster images, these are stored as a set of pixels - imagine a piece of squared paper with each square filled in a different colour. From a distance, or if all the squares are very small, then this looks great; however if we zoom in (e.g. make each square larger) then we start to see the joins between edges and everything starts looking "blocky". Most programs usually handle this by blurring the pixels together slightly, which can help up to a point, but often we just end up with a blurred mess - not what we want in a precise technical report.

The alternative is vector graphics. These are saved more as a description of what needs to be drawn, rather than point-for-point what is on the screen. This means that zooming is purely a mathematical operation, and all the lines will still appear as prefect lines. The same also works for text, which is stored within a vector graphic as actual text, rather than as a picture of it.

There are plenty of graphics explaining this along with a good description in the Wiki pages linked above. But if you're still not sure then try this simple experiment: type a word into a paint program (e.g. Microsoft Paint) and zoom in, and then do the same in a word processing program (e.g. Microsoft Word) - the difference should be pretty obvious.

In summary, unless what your are working with is an actual picture (in which case converting it to vector graphics would be impossible) then you will get best quality out of maintaining it in a vector format. There are plenty of these formats to choose from; however I find them to be surprisingly unsupported in a lot of applications. As my final target format is pdf (as mentioned elsewhere in this blog) I'm going to be working with eps and pdf formats. These both rely on postscript as an underlying process and are therefore fairly compatible.

My process (overview)
With all of the above as my aims I've worked out a basic process for generating figures. It seems to be working fairly well so far, so I'll outline it here:

1) Write a script to produce the figure and save is as an eps file. This means that I can always go back and see how each figure was produced (what the original data was, how it was scaled, etc, etc). If the data changes then I can simply rerun the script and a new figure will be produced. If I need the figure in a different ratio or with different colours (or a different label, etc, etc) then I can make some minor changes to the script and rerun it. I keep the script under version control, but not the eps file it produces (as I can always reproduce this if necessary). I use Matlab for this process as it is what I am most familiar with (although I often store the raw data in an excel or csv file and read this into Matlab). I suspect I could use Gnuplot or something similar instead.

2) Include the eps file in my LaTeX script. This means that when I regenerate the pdf output from my LaTeX it always includes the most recent version of the figure. As it remains as vector graphics throughout the process I get nice clean professional results.

This process solves all of the problems outlined above, except point 4. It is still possible to produce an eps figure from Matlab with "Courier" font and then include it in a Latex document using "Times" font. I find that this looks out of place. I get around this by using a function called matlabfrag, in combination with pstool package for LaTeX. This means that the final figure picks up the font used in the rest of the document. It also allows me to use full LaTeX expressions in my figures.

My process (in detail)
This may get more refined as time goes by, but currently this is the detailed version of how I would produce a figure:
1a) Write a Matlab script to plot the figure as normal. Using standard matlab command to plot and label axes, etc.
1b) Within the script include a call to 'saveFigure.m'. This is a function I have created which accepts a file name and optionally a figure size (otherwise some default values are used), resizes the figure and then calls matlabfrag to save it as an eps file (and an associated tex file including all the labels).
2a) In the LaTeX preamble include '\usepackage{pstool}'. This allows the use of the psfragfig command.
2b) Within my LaTeX include the figure in the normal way. However, instead of using the latex '\includegraphics' command, I replace it with the '\psfragfig' command.

Notes
I can make my 'saveFigure.m' function available to anyone interested, but it doesn't do much more than I have described above!
I have created a slightly revised process for including Simulink models in documents which is a little different that I can discuss if anyone is interested?
I spent a little time trying to get psfrag to play well with eps files produced from other packages - e.g. Google Sketchup, however I don't think I've quite got to the bottom of it yet.

17/05/2011

Selecting the correct format

Does format matter?

Today I received an invite to a party in ".doc" format (i.e. a Microsoft Word document) via email. Whilst I was happy to be invited to the party, and the invite very much served its purpose, I can't help thinking that it could have been presented better. Here're some comments I would make:

".doc" is a proprietary format, which although popular and therefore supported on most peoples computers, can lead to inconsistent formatting or in the worst case a user not being able to open it at all.
It is also what I would consider to be an "editing format", which means it is fine for producing a document, or passing over to someone else for meddling with, but not (in my opinion) for presenting to a recipient. This is for several reasons:

The possibility of the user accidentally editing the document. Say for example the last thing the sender changes is the date of the party "3/6/2011" - it would be all to easy for me to open the document and then accidentally nudge the zero key, only to turn up to a distinct lack of party on "30/06/11".
Access to details the sender didn't wish to share. Formats such as this provide for a great deal of version history to be saved with the document; so unless specific steps are taken to ensure that this is not included in what I receive, there is every chance that I would be able to view previous versions of it, or comments about its contents.
It does not necessarily open in an easy to view format - either on the wrong page, or at the wrong zoom level, or with certain formatting visible (for example a nice red line under all the spelling mistakes). Microsoft did try to improve on this with the introduction of their "reading view" in Word 2003; however I don't think this really helped and only served to confuse the majority of users.
Because of all the extra information the format contains, often the files are far, far larger than they need to be.

It was sent attached to an email. Email is already a perfectly good format for presenting information, with a variety of different effects available (providing html format is used), so it seems a little unnecessary to attach a file with the information in.

How about an Analogy?

To make the closest analogy possible - if this were an invitation sent by good old-fashioned snail mail, it would be:

a handwritten letter;
with all the comments and corrections scribbled in the margins;
with some spelling and grammatical mistakes highlighted but not corrected;
spread over multiple pages - but folded open somewhere in the middle of the document;
posted in a large, heavy and cumbersome box which some recipients lack the tools or knowledge to open;
that is itself housed within a larger box.

Now that is maybe a worst-case scenario, but not particularly exaggerated in my experience; and whilst you might be a bit surprised to receive that as a party invite, you would be pretty disgusted to receive it as a Masters level degree thesis submission. And even more appalled if it was presented as a final report for a multi-thousand pound project contract! Yet this is exactly the sort of thing the gets done in Word every day. That's not to say a lot of those issues don't crop up in the use of other programs; however Word seems to be the most common object of misuse.

So what's the alternative?

A lot of the fuss I've made above can be avoided through proper use of the Microsoft tools. Provided you remove any hidden data, properly spell check your work and set the display up before you finally save it then things should come out looking ok. You can even protect the document to avoid accidental editing. However, to completely negate these issues, I prefer to use a totally separate "display format" for presenting information.

For anything that is disseminated wider than myself (or my immediate team) I am very keen on the use of common, open standards. The most common of which I have found to be pdf. Most people have a pdf reader installed on their computer, no matter what their operating system. In fact most modern phones can display pdf files. Many programs are able to save to pdf as built in feature and those that can't are invariably able to print to one of the many pdf conversion programs available.

There are also some very neat features of pdf files that are not often exploited, but can be used to produce some very useful effects. e.g. opening by default in full screen mode, embedding other files within them, etc. (perhaps I'll cover this at a later date)

By far my most important reason for trying to use pdf format for dissemination though is that it is a format that is difficult to edit (granted editors do exist, but 'accidental editing' is almost impossible) . This means that if I send a file to someone, and they choose to send it to someone else, I can be fairly confident that the final recipient will see what I want them to see (and nothing else!).

05/05/2011

Pie charts

I feel like I'm often moaning about pie charts and then having to explain why I hate them, so I thought I should post here so that I can simply refer people to here for an explanation.
But when I Googled the subject it turns out that everyone else hates them too.

So there's really not much more I can say on the subject except for supplying the best link I found, which is a 2007 document by Stephen Few. It does a really good job of explaining how bad they are. It's pretty readable, but for the lazy you can get most of what you need to know from just the illustrations and their explanation.

Also here is the best quote I found on the use of pie charts:

"Piecharts are the information visualization equivalent of a roofing hammer to the frontal lobe. They have no place in the world of grownups, and occupy the same semiotic space as short pants, a runny nose, and chocolate smeared on one’s face. They are as professional as a pair of assless chaps. Anyone who suggests their use should be instinctively slapped."

Document version control

The Problem
Imagine you start writing a document in Microsoft Word and save it as "notes.doc".
If Word crashes (or there is a power cut, or Windows crashes, or your friend accidentally unplugs your PC, or...) then you lose everything. For a short letter to a friend this is annoying, for a days report writing this is more than just annoying. Therefore:

You save your work regularly

I think most people do this automatically these days.
If you're making quite large changes to your document, like altering the formatting or trying out a completely different paragraph order, then maybe you:

Resave your work with a version in the filename

This is again fairly common and allows you to go back to your previous version if you don't like the changes or something goes wrong. But what to pick as a new filename? A common approach seems to be to append the title with an identifier, so you end up with "notes_2.doc". (What I would strongly advise against is ever choosing to title a version "final.doc", as inevitably you then end up with a "final2.doc", and so on)
After a few iterations of this you decide to send "notes_5.doc" to a colleague to proof read. Maybe they're savvy enough to use the built in Word 'track changes' feature, or maybe they just make the corrections in the document. If you're unlucky perhaps they just print it out and scrawl on it in unintelligible handwriting. Either way, they then send it back to you, and you need to keep it separate from your version, so either they or you:

Save it with a different filename/version

You might be able to get away with moving to "notes_6.doc", but if there have been any changes to your version (which might still be "notes_5") then there is a chance that changes could be missed. So maybe you go for "notes_5a.doc" or appending the reviewers initials as "notes_5_pcf.doc".
You might then decide to make some corrections whilst working from home, but unfortunately you've only got "notes_4.doc" with you. What do you save it as then? How do you ensure your changes are properly imported into "notes_6.doc"?
For anything more than quite a straightforward document this can all get out of hand quite quickly. Before long you have a folder of really quite large files, all with slightly different names, and you're not really sure which one is the most recent version. This is a problem that happens far too often and really infuriates me. Whilst I would agree that all the above steps are sensible and I would encourage them (in lieu of a better alternative - see below), one thing I have also started doing is:

Creating a "archive" subfolder to dump all previous versions in.

That way when I navigate to a folder with three different documents in it, I only see three documents, rather than 30 revisions of three documents. This helps a lot with finding things, but not with the underlying issue of manual naming and keeping track of things.

The Solution
This is not a new problem, in fact it's one that was identified, and solved, many years ago in computer science. In fact their problem is much more complicated as it often requires that two people be editing the same document (in their case a computer program) simultaneously.
The problem is solved by having a piece of software do the version control for you. This allows the user(s) to have a single copy of the work on their computer, with a simple title (e.g. "notes.doc") and then they can leave all the complicated stuff to the software. All they need to do instead of resaving a version with a new name is to 'check in' a version to the software.
This will work for virtually any type of file with one important caveat: the software is almost always expecting the file to be a plain text document, not a binary file (all programming source code is plain text, some file types such as Word documents are not). That's not to say it won't work for binary files, in fact it still works pretty well; however not all of the version control software's functionality will be available. This means that some of the more clever functions, such as file differencing and merging won't be available.
So what can a piece of version control software do that will be of use for our documents?

Maintain the current version - as a simple set of files with simple names
Allow access to previous versions - either specific files on their own or a whole folders worth from a particular date
View the differences between files (only for plain text) - so that all the changes since a previous version are highlighted
Merge difference versions (only for plain text) - so that two separate versions are combined into one

This is all pretty useful stuff, not that difficult to set up, and I've found that so far it has really helped keep my work tidy and avoided confusion.

My Setup
I'm using a distributed version control system called Bazaar with a single repository containing all of my documents and code fragments. As I'm trying to use plain text file formats wherever possible (Latex predominantly) I'm able to use the difference and merge functions.
I'm checking in everything except:

Autogenerated files - such as pdf's and plots (likely to be the subject of a later post)
Results files - as these are not expected to change, are only used once to produce results plots, and can be quite large files

I'm hoping that a check out of my repository will be a complete record of everything I've done in my PhD.

I tend to check in any changes to files once a day (or every few days if the changes are minor) and backup by 'pushing' a copy of the repository to my external drive every week or so. I can go into more technical detail on my setup if anyone is interested.

I'm also planning to set up a portable version of the software on my external drive so that I can plug into third party machines (on which I'm not allowed to install software - such as my sponsors laptop) and still use version control. I think this should be possible by using portable python as bazaar is coded in python, but I haven't got very far with it yet...

26/04/2011

Documentation

Within my work, as with many (most?) other professions, I have considerable need to produce documents. For me these usually take the form of:

Ongoing notes - for my own use and occasionally for discussion with my supervisors
Communication - invariably via email
Recording results - as they are produced from experiments
Reporting on progress - monthly/yearly formal reports
Papers - Eventually for publication in journals or similar
Thesis - Finally for submission for my PhD

Years ago my only option would be to handwrite all of these, but thankfully times have changed. This is particularly fortunate for me (or perhaps it is because of this), as my handwriting is appalling. So, although I still keep a handwritten log book for immediate scribbling, I am making an effort to digitise my notes as far as possible.
As I've mentioned before, I'm very much a believer in using the right tool for the job; to this end I thought I would note down what I'm using and try to justify my choices.

Plain text - I'm trying to use for emails, this can get difficult if I need to include equations or images, in these cases I will generally try to include them as an attachment.

Open Office Writer - I'm using to produce short reports. I have chosen not to use Latex (see below) for these as I am producing them as quick, short documents for dissemination to my supervisors; hence I find the WYSIWYG interface easier than going through Latex. I have considered setting up a Latex template but I don't think the extra hassle will be worth it in the end.
These documents I export to pdf format prior to dissemination (see a forthcoming post for reasoning). I also maintain these documents within my version control system (again see a forthcoming separate post).

Handwritten Notes - I keep a log book for scribbling down results and doing rough sketches, however I'm trying to digitise the important points from these as soon as possible as I can never find the page I wanted (no search facility!), or when I do find it that I can't read my own writing (yes it is that bad!).

Latex - (pronounced "lay-tec") I'm trying to use for most other work. My main reasons for use being:

The source is stored as plain text - this means it can be easily version controlled and differenced. (see a forthcoming separate post on this...).
It can compile directly to pdf with lots of nice extras included (menu's, interactive contents, hyperlinks, etc).
It separates the content from the formatting - allowing me to concentrate on one or the other, rather than worrying about items jumping between pages or links not working.
It is very flexible and can produce beautiful documents (especially where equations or complex formatting is involved).

I'll probably be posting more on Latex, as I have played with it to produce some interesting documents in the past.

Do you think the way I'm doing things is sensible? Any questions, comments or arguments then please add them below!

21/04/2011

Tool List

6 months in I've already used quite a few different tools. I'll initially start a list here of what I'm using and then post about separate issues later on.

Jabref - Reference managing software
Sciplore - Mind mapping software
Latex (TeXworks) - Typesetting
Foxit - pdf reader
Open Office - General word processing and Spreadsheeting
Bazaar - Version contol
Matlab/Simulink - Data analysis and Simulation
Pdf conversion software
Planner - gantt chart tool
Gmail + Google calendar - Email and Calendar
Winmerge - document merging and differencing
zbar - split the windows menu bar between two desktops
Irfanview - image viewer
Ghostscript - document viewer for postscript
Google sketchup - 3D CAD software

If there are any specific requests for posts on any of these tools then let me know in the comments, otherwise I will post about them as and when I have something interesting to say.