PhD Tools: pdf

Showing posts with label pdf. Show all posts

02/08/2012

Adding bzr files through the cb2bib

I recently described my referencing process where I simultaneously hold reference pdf files in a version control system and keep track of their details in bibtex file. I have also described a problem I've been having with my version control system of choice, Bazaar.

Based on these I decided that there might be a better (more automated) way of adding my references. After a very useful email exchange with the author of cb2bib they confirmed that this should be possible. Now, after a day of messing around and learning about various command-line tools that I hadn't used before, I think I've got it working. Here's how...

First I created a new batch file that I called "bzrAddRef.bat". Here are its contents:

@echo OFF

rem --------------------------------------------------------------------------
rem cb2Bib Tools addition by J. Welford
rem --------------------------------------------------------------------------

echo bzrAddRef:
echo cb2Bib script for adding BibTeX files to a Bazaar repository
echo.
echo Using sed and xargs utilities from:
echo http://gnuwin32.sourceforge.net
echo.
echo Path below may need changing to be within the repository
echo.

echo Processing:
echo.
cd "C:\Users\welf\Documents\literatureReview"
sed -n -e "s|.*file.*=.*{\(.*\)}.*|\"\1\"|p" %1 > bzrRefs.tmp
xargs bzr add <bzrRefs.tmp
del bzrRefs.tmp

Only the last 4 lines are really important. The first one sets the working directory, I don't think it matters what you use as long as you have write access and it is within the repository you want to add to. The next line runs through the current bibtex file and extracts the location of all the references to a temporary file. The next line adds all these files to Bazaar repository. The final line deletes the temporary file. (maybe I could have used a pipe between commands so that the temporary file was not required?)

Some special commands are used that will need to be installed, what you need are: sed.exe (and its dependencies: regex2.dll, libintl3.dll, libiconv2.dll) and xargs.exe.

Within the settings of cb2bib this batch file can now be pointed at under the "Configure BibTeX" - "External BibTeX Postprocessing" - "Command:" section. Once that is done simply hitting "Alt"+"p" in the cb2bib window should run the batch file and add all the references to version control.

I hope the helps someone! (I presume it could be altered for other version control systems or be called from other applications.)

31/07/2012

My referencing process

I suspect everyone has a subtly different way of doing this, depending on the tools they prefer and the way they like to work. I thought I would document my process as I think it's pretty efficient and has some real advantages, I'm sure there is still plenty of room for improvement though.

Overall aim

As an academic I often need to refer to work done by others, this normally done by noting their published work as a list of references or bibliography at the end of my documents. The most basic way of achieving this would be to have a stack of published books and papers in my drawer that I can refer to and then type in the reference details at the end of the each document I want to write. I'm sure that plenty of people do work this way, but from my point of view I see it as pretty inefficient (lots of printing, lots of sorting, lots of typing, not very portable, an awful pain to change reference styles, etc).

I manage my references entirely in software using a few specific tools. I've mentioned most of them in other posts but I'll go into a bit more detail here on the actual process I use.

I'm going to split referencing into two separate processes. Firstly, as I'm researching a topic, I tend to gather references to get an idea what I'm doing. Secondly, when I come to document my work I need to search and cite the references I've found.

Tools

Google Scholar is generally my primary resource for finding papers and documents. I also rely on standard Google searches a massive amount and I have a Google alert set up to email me when a few key phrases appear in new articles added to the web.

IEEExplore generally has most of the published work (in terms of papers, Journal articles, etc) that I need to refer to. The University has a subscription to this that allows me to download what I need (otherwise I'd have to pay!).

Pdf format is generally how almost all papers are delivered and the format that I keep them in. I use Foxit reader to read pdf documents. I keep all my reference documents in one big folder rather than worrying about any kind of complex filing system.

I use Bazaar to version control my folder of pdf documents. I've previously discussed how this works and how it allows me to work between different computers, even without installing any software on them.

I use a tool called cb2bib to maintain a list of references in bibtex format. I started off just using this to add references to my bibtex file, but I've found it to also be really good for browsing references and citing whilst writing a document. I changed some of the default setup to help it retrieve data from the net.

I use Latex to typeset most of my work. Within this I can simply point it at the bibtex file for all the details of the references. I have previously mentioned how to use a bibtex file that is not in the same place as the rest latex document.

Gathering references

Search for the document that I need to find using Google, Google Scholar or general web browsing.
Open and read the document to see if it looks relevant and useful. Assuming that it does...
Download the document to my big folder of "third party" references. I tend to use the full title of the work as the save name of the document - this can lead to quite long file names, but it makes it a lot easier to find things!
Add the saved file to my bazaar version control system. This only takes a couple of clicks through tortoiseBzr menus in windows explorer.
Add the document to my bibtex file using cb2bib. This is a pretty straightforward process:
1. Open cb2bib, I have a keyboard shortcut setup for this (it should also remember what bibtex file is being used)
2. Click "import from pdf file"
3. Click "select files"
4. Select pdf files saved previously (hold ctrl to select a bunch at once)
5. Click "process"
6. The software will try to extract as much info as possible from the pdf file, this probably won't be enough so...
7. Click Network query to retrieve all the info about the file from the web (this usually works fine, but it's worth checking the results, you may need to give it the right title to start it off)
8. Click save to add the reference to the bibtex file
The changed bibtex file and added references will need to be "commited" to the version control repository.

UPDATE: I have added now made a batch file that effectively takes the place of step 4 and occurs after step 5 - it can be run after a whole set of references have been input through cd2bib and adds them all to my version control repository using "Alt"+"p".

Citing references

With a Latex document that has a pointer to the bibtex file within it.
Open cb2bib citer (I use a keyboard shortcut) and select the reference(s) required:
1. [optional] Select the way I want my reference list displayed by pressing "a" (author), "j" (journal), "t" (title) or "y" (year). I find author is usually best.
2. [optional] Filter the reference list by pressing "f" and then typing what you want to search for ("d" clears the search)
3. Click on a chosen reference
4. [optional] Press "o" to open the reference and read it
5. Press "enter" to cite the reference, a small pen marker will appear next to it (in author view it will appear next to each author for the same paper). Multiple references may be cited by selecting them and pressing "enter". "delete" clears all the selected references.
6. Press "c" once all the references for citing are selected, this will close the citer window and copy the latex text for the references to the clipboard.
Paste the text into the latex file to include the references at that point in the document.

Although those might seem like a lot of steps it's really pretty straightforward once you get used to it. The only real difficulty is remembering the keyboard commands for cb2bib citer. With this setup I can take all my references with me between machines (even between Linux and Windows) and use the same process everywhere.

28/09/2011

Future documentation methods

Coming towards the end of the first year of my PhD and spending some time writing up my progress so far has led me to muse over the nature of report writing and ask the question: "surely there must be something better"...

Whilst the old adage "if it ain't broke don't fix it" might well apply here, I can't help but think that in the 21st century of immersive 3D virtual reality game playing, home 3D printing, and everyone carrying at least one state of the art electronic device about their person at all times, the concept of a paper report seems a little dated.

Here are a few examples to try to illustrate my point:

If there is a book and a film of the same story (or even a webpage and a youtube video) I will inevitably look at the the film first as it will convey the information to me far faster, and with less effort than reading the book.
If I have a choice between a photo or drawing of an object and a 3D model (either manipulable on screen or available to touch) I would get a better understanding of it from the 3D.
If I want information on a specific subject then I turn to Google/Wikipedia before I head off to the library.

I don't think these are examples of me being weird, they are simply illustrations of modern life making information more readily accessible. I'm sure you could argue over semantics ("the book contains more detail than the film", "library books have a more systematic review process than google hits", etc) but I hope you can accept my general point.

It therefore seems strange to me that a piece of work, perhaps costing thousands of pounds and many hundreds of man hours, should be presented in such a one dimensional format as a printed report. Here is a summary of what I see as the limitations to a printed report:

One dimensionality - sure pictures might take this up to 2D, but all too often there aren't enough of these!
Lack of user interaction - I can't search for a keyword, interactively link to source, or request further detail on a topic.
Page constrained format - diagrams need to fit within a certain width, zooming in is limited by your eyes and printer resolution and page breaks artificially chop things up.
Visual sense only - My other senses are put on hold, and only serve as a distraction.

So what have people done to improve on this? Here're a few examples that I can think of:

Video - a good recent example is this guys youtube CV
Hyperlinking - within sections of a document or out to other documents or web sites
Wiki formats - taking linking between sections to the extreme and making progress through the information less linear
3D graphics - starting to be seen more in web pages, an excellent example is Google body
Powerpoint - a format often used in place of a standard document, it has many of the same issues, however users often seem to feel a little less constrained in terms of layout (perhaps this is only due to convention?)
Computable document format - this is a really exciting new development that reflects a lot of what I'm describing here

This last concept may or may not take off but I can see what they are hoping it will achieve. Some of its functionality can already be achieved in a pdf (details of how to achieve a lot of them through LaTeX are here) and almost all of it could also be done through HTML and javascript. An interesting discussion on this is given here. In fact the recently developed HTML5, in combination with javascript programming, offers a whole mass of interesting possibilities for the presentation of information. A step towards using HTML5 for what I'm talking about here is Tangle. This is a javascript library that supports the production of "reactive documents", allowing a reader to play with the content of the document.

Another alternative format with a lot of capability is Flash animation, these animations are typically web-based and often allow user interaction. Some basic options for creating these are given here. Although it is a very widely used format it requires a good level of experience to be able to code it. It has also faced quite widespread criticism recently, the most high profile of which came from Apple, and therefore there is speculation about whether HTML5 will ultimately replace it.

An obvious downside to these types of advanced documentation method is the length of time it takes to actually produce a document. Even when the author has a good knowledge of the specific tool they're using I think it's safe to say that nothing I've mentioned above will be as quick to produce as a simple text document. In fact the more advanced the documentation method - the longer it's likely to take to produce.

I'd love to be able to round off with a recommendation of the ultimate tool or combination of tools that can be used to create the perfect document, but as far as I've seen it doesn't yet exist. Lots of things seem to offer at least part of the solution I'm looking for, but none pull it all together into one great package. So instead I'll do two things, firstly I'll make a few plain points in summary/prediction, then I'll put together a set of use cases that I'd like to see available to the end user of my "ultimate document".

Summary/Predictions

The plain printed word document is currently in the process of being overtaken by more electronic forms of documentation, inherently bringing a lot more potential to the document itself (hyperlinking and embedded video being two major ones). I would expect this to be a continuing trend (that may eventually even reach formal engineering reporting or even academia!).
There is the potential for this to go a lot further than the type of electronic documents seen today with the addition of 3D effects, audio tracks and similar.
HTML5 currently seems to offer the most potential for supporting this type of advanced documentation (although the computable document format may also be a candidate if it manages to pickup much of a user base).
Very little progress towards this end goal will be achieved until there are good tools for authoring the type of document I'm discussing here.
It seems highly likely that viewing of any document of this type will be through a web browser or similar.

Use cases - scenarios that I, as an end 'reader', would like to see supported in this ultimate document format.

User managed detail level - I'd like to be able to look in more detail at sections I'm interested in or know little about, whilst invisibly skipping over the mundane or tedious stuff.
Unconstrained document flow - If I want to read summary, then the contents, then the conclusions, then methods, it should be easy for me to work through that way.
Recommended document flow - If I simply want to be guided through the document ensuring that I pick up all the important information then this should also be easy
User interaction - Where more information could be made available then I should be able to access it. For example I should be able to zoom in on a waveform or rotate a 3D model.
Multiple sense stimulation - practically this is likely to be limited to visual and audio currently (at least until we develop smell-o-vision and feel-o-vision...)
Portability - I want this document to be viewable in as many places as possible, consequently it must be compact and easily openable on a variety of devices (laptops, mobiles, touchpads, e-readers, etc). This might even extend to alternative language/disability support and (somewhat ironically) the ability to print onto plain old paper.

So what have I missed? I'd love to discuss this topic and related areas more so please leave me a comment.
I'd also love to have the time, skills and supervisor buy-in to trying to present my thesis in the manner I've outlined; however I suspect that that will remain a pipe-dream...

27/07/2011

Making a bibtex file from a folder of pdf files

The issue
As I'm going to be writing some big documents with lots of references I'd be a fool to try to manage these manually, I therefore needed to pick a reference management piece of software. After some browsing I settled on JabRef because: it's free, it's open source, it's lightweight, it's cross-platform and it handles bibtex format natively (which is what I need for it to integrate with latex). It should also link nicely into the Sciplore mind mapping software which I'm using (more about that some other time).

JabRef is basically a database management tool for references that stores its database in bibtex format. It looks like it will work rather well, but unfortunately my first stumbling block is that I already have a folder full of my references in pdf format (~200). This means that I'm immediately faced with the big task of going through and adding the details of each pdf individually. There must be a better way...

Someone else asked the same question here. The answer seemed to be that there was no easy way in JabRef, but it could be done in some other reference management software - such as Mendeley. So I could install that as well and export from there to use JabRef, that seemed like a pain though, especially as you need log in details and all sorts for Mendeley.

The solution
Somewhere else cb2Bib was suggested. This looks like an awesome piece of software, almost to the point that I could use it instead of JabRef, although I don't think it does quite the same job. It's designed as a bibtex database manager, however it is more tailored towards reference entry than editing or final use (e.g. citations) - although it can do this. Its method of adding a new reference is based on what's currently in the clipboard - thats whatever you most recently 'cut' or 'copied' in your operating system. This can either be a piece of text or a pdf file.

Files from the system can also be queued up to be added to the clipboard for addition to the bibtex database - in this manner a folders worth of pdf files can be added. Once the file is in the clipboard the software interrogates it to try to extract the right details for the bibtex reference entry. It is also able to do some other clever things like search the web and find a web reference for it that matches only one of the pieces of data it has extracted. There is also the option to manually edit the fields or to set off a whole run of files to add automatically.

My implementation
In practice the software took a little while to get used to; the buttons aren't in quite the locations I'd expect, there seem to be about 3 different windows that are independent but interrelated and the method of specifying a bibtex file and then successively saving additions to it felt a little odd (rather than running through to create a file and then saving it all at once). But once I was used to it at that level it all worked.

When I came to actually try to add all of my pre-saved pdfs however, I hit problems. Whilst automatic extraction usually managed to pull out a few nuggets of useful data, it rarely found enough for a complete entry. Hitting the button to search the web didn't seem to give much assistance. So it was time to dig a little deeper.

Probing through the website there is quite a lot of useful information on how to configure the software to do what you want. What I needed to do was look into where was being searched on the web for my articles. This is all setup in a configuration file located at:
C:\Program Files\cb2bib\data\netqinf.txt (windows)
or
/usr/share/cb2bib/data/netqinf.txt (linux) (you'll need permissions or to be root to edit)

Wading into there you can find out where is being searched and in what order. What would have been ideal for me would have been a search of the IEEE Xplore site, as that would have turned up most of my papers. Unfortunately it was not in there. Second best was google scholar, sitting at the bottom of the list of options. The documentation in the file wasn't brilliant, but with a bit of trial and error I was able to work out what was going on.

The major change I made to the file was to add this at the top of the queries list:

# QUERY INFO FOR Google Scholar
journal=
query=http://scholar.google.com/scholar?hl=en&lr=&ie=UTF-8&q=<<title>>&btnG=Search
capture_from_query=info:(.+):scholar
referenceurl_prefix=http://scholar.google.com/scholar.bib?hl=en&lr=&ie=UTF-8&q=info:
referenceurl_sufix=:scholar.google.com/&output=citation&oe=ASCII&oi=citation
pdfurl_prefix=
pdfurl_sufix=
action=

journal=
query=http://scholar.google.com/scholar?hl=en&lr=&ie=UTF-8&q=<<excerpt>>&btnG=Search
capture_from_query=info:(.+):scholar
referenceurl_prefix=http://scholar.google.com/scholar.bib?hl=en&lr=&ie=UTF-8&q=info:
referenceurl_sufix=:scholar.google.com/&output=citation&oe=ASCII&oi=citation
pdfurl_prefix=
pdfurl_sufix=
action=

The important changes here are the <<title>> and <<excerpt>> search strings, and the change from capture_from_query=info:(\w+):scholar in the existing scholar searches to capture_from_query=info:(.+):scholar in my search. I'm not too sure what the latter change did, but its effect was that it found the details - where previously it was often missing them!

The other change I made was to untick the option "Set 'title' in double braces" box in the configuration window. After I'd made these changes it worked a lot more consistently.

Some of the time it still pulled out the wrong details if it mis-extracted the article title, however I'd named all my pdfs with the title of the paper, therefore it was simply a case of copying and pasting the filename into the title field and rerunning. It would have been really nice to be able to use the title of my pdf as part of the search but unfortunately I couldn't find a way of doing that.

The only other issue I'm having is that although cb2bib adds in the link to the pdf file, JabRef wont understand it as it uses a very slightly different bibtex format for it. The cb2bib format seems to be:

file = {location}

whereas the JabRef format seems to be:

file = {description:location:type}

There is a comment here by a Mendeley admin that suggests that there is no prescribed format for this aspect of a bibtex file, so I guess it's to be expected. I should be able to work around it with a bit of clever find/replace, but it's an annoyance.
ACTUALLY - this seems to be working under windows! It looks like a different version of JabRef has gotten around this issue.

UPDATE: After a couple of months of getting used to cb2bib and using it to produce a document I'm not really finding the need to use JabRef at all! The 'citer' facility of cb2bib is actually really good.

UPDATE: I hadn't previously gotten round to extracting from IEEE Xplore, as almost everything is on Google Scholar. However I've just tried to set it up and found that the IEEE pages use javascript buttons to produce the citation. This makes it difficult to fully automate.

If you add the following to netqinf.txt then it should search IEEE Xplore for the title, you can then manually click the "download citation" button, select BibTeX format and then copy the BibTeX citation into cb2bib:

# QUERY INFO FOR IEEEXplore
journal=
query=http://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=<<title>>&x=35&y=7
capture_from_query=arnumber=(\d+)&contentType
referenceurl_prefix=http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=
referenceurl_sufix=
pdfurl_prefix=
pdfurl_sufix=
action=browse_referenceurl

03/06/2011

Simulink to eps

I've mentioned previously that I've developed a process to export plots from Matlab and include them as (eps format) figures in a Latex document. Well today I spent a load of time trying to get a similar method working for exporting Simulink model diagrams from Matlab. This hasn't proved quite as easy, but I think I'm there now.

My main issue was that Simulink doesn't allow you to print in the same way as a normal Matlab figure does. The "print" command can be used, but it works in a very specific way. It is fairly easy to use this command to print in eps format (see my previous posting for why this is a good thing); however it doesn't allow you the same freedom to setup the page. More specifically it will only allow you to use a pre-defined page size (A4, US legal, etc). It rescales and centres the figure to fill the page, but, unless the model is the exact same aspect ratio as a pre-set paper type, you end up with white borders at either the top/bottom or left/right.

EDIT: This seems to have been corrected in newer versions of Matlab now!

I've tried all sorts to get round this, including calling Ghostscript from within matlab to post-process the eps file. This should work as it ought to be able to find the right "bounding-box" for the figure and resize the page accordingly. However I had no luck with this - it would either not trim it down at all, or over-trim it removing some of the image.

I also tried exporting in a few other formats (namely normal postscript and pdf) and then converting with Ghostscript. This worked a little better and I was able to get the right bounding box through a process of:

Simulink export to pdf,
Ghostscript convert pdf-to-eps,
Ghostscript convert eps-to-eps.

I have no idea why that last step was necessary! Unfortunately although the output from this was pretty good, the text becomes part of the image (I think during the second step), meaning it doesn't render properly at high zoom levels. I don't know why that happened, I'm pretty sure it shouldn't have, but I thought I could do better.

So an alternative process I finally came up with is:

Find out the ratio of the figure (I did this by exporting as a raster image and then reading back in - I'm not very happy with this technique as it's pretty dirty, but at least it works)
Simulink export to eps, with the model at the very bottom of the page
Automatically edit the resulting eps file to adjust its bounding box information

It's a bit hacky, and if I was being picky the margins aren't perfectly even in the resulting file, but it seems to work. I'm sure there must also be a better way of achieving step 1, but it's sufficient for now. (If anyone has any suggestions then I'd love to hear them!?)

There's nothing very clever in the code, but if anyone wants a copy of my function I'll happily forward it to them. Hope this helps somebody!

EDIT: Code for my function is available here.

19/05/2011

Professional plotting

I've written and read plenty of technical reports in my time, and one major feature that almost all technical reports have in common is the inclusion of figures. Pick up any ten technical documents and I reckon you'll find at least 5 different ways of including figures; and of these 5 only one will actually look any good. So as I start to produce figures for my PhD I've been starting to look into the best way of managing this initially simple sounding task.

When I say "figure" I'm generally thinking of some kind of plot, or set of plots, usually in 2D; however what I'm going to discuss should apply to most other "technical" "pictures", but probably won't extend to photo-type images (for reasons that will hopefully become obvious).

The Issues
I won't bother going into all the different ways a figure and a report might come together, except to say that at the very bottom of the scale would be a 'picture'/screenshot pasted into a word document - this just looks awful. What I'll work up to will hopefully be the top of the scale.

What I will list is what I see as some major stumbling blocks in figure/report preparation:

Difficulty updating the figure.
Inability to resize the figure to suit the report.
Images (+ text) looking crappy when zoomed in.
Disconnect between the figure labels and the report text.
Difficulty regenerating the figure at a later date.

A lot of these are irrelevant when we're looking at a printed out finished article, so what we really need to understand are the details of how the information is stored and processed on the computer.

Background Details
One of the first important distinctions is the difference between types of image. Most images that one would typically come across on a computer are raster images, these are stored as a set of pixels - imagine a piece of squared paper with each square filled in a different colour. From a distance, or if all the squares are very small, then this looks great; however if we zoom in (e.g. make each square larger) then we start to see the joins between edges and everything starts looking "blocky". Most programs usually handle this by blurring the pixels together slightly, which can help up to a point, but often we just end up with a blurred mess - not what we want in a precise technical report.

The alternative is vector graphics. These are saved more as a description of what needs to be drawn, rather than point-for-point what is on the screen. This means that zooming is purely a mathematical operation, and all the lines will still appear as prefect lines. The same also works for text, which is stored within a vector graphic as actual text, rather than as a picture of it.

There are plenty of graphics explaining this along with a good description in the Wiki pages linked above. But if you're still not sure then try this simple experiment: type a word into a paint program (e.g. Microsoft Paint) and zoom in, and then do the same in a word processing program (e.g. Microsoft Word) - the difference should be pretty obvious.

In summary, unless what your are working with is an actual picture (in which case converting it to vector graphics would be impossible) then you will get best quality out of maintaining it in a vector format. There are plenty of these formats to choose from; however I find them to be surprisingly unsupported in a lot of applications. As my final target format is pdf (as mentioned elsewhere in this blog) I'm going to be working with eps and pdf formats. These both rely on postscript as an underlying process and are therefore fairly compatible.

My process (overview)
With all of the above as my aims I've worked out a basic process for generating figures. It seems to be working fairly well so far, so I'll outline it here:

1) Write a script to produce the figure and save is as an eps file. This means that I can always go back and see how each figure was produced (what the original data was, how it was scaled, etc, etc). If the data changes then I can simply rerun the script and a new figure will be produced. If I need the figure in a different ratio or with different colours (or a different label, etc, etc) then I can make some minor changes to the script and rerun it. I keep the script under version control, but not the eps file it produces (as I can always reproduce this if necessary). I use Matlab for this process as it is what I am most familiar with (although I often store the raw data in an excel or csv file and read this into Matlab). I suspect I could use Gnuplot or something similar instead.

2) Include the eps file in my LaTeX script. This means that when I regenerate the pdf output from my LaTeX it always includes the most recent version of the figure. As it remains as vector graphics throughout the process I get nice clean professional results.

This process solves all of the problems outlined above, except point 4. It is still possible to produce an eps figure from Matlab with "Courier" font and then include it in a Latex document using "Times" font. I find that this looks out of place. I get around this by using a function called matlabfrag, in combination with pstool package for LaTeX. This means that the final figure picks up the font used in the rest of the document. It also allows me to use full LaTeX expressions in my figures.

My process (in detail)
This may get more refined as time goes by, but currently this is the detailed version of how I would produce a figure:
1a) Write a Matlab script to plot the figure as normal. Using standard matlab command to plot and label axes, etc.
1b) Within the script include a call to 'saveFigure.m'. This is a function I have created which accepts a file name and optionally a figure size (otherwise some default values are used), resizes the figure and then calls matlabfrag to save it as an eps file (and an associated tex file including all the labels).
2a) In the LaTeX preamble include '\usepackage{pstool}'. This allows the use of the psfragfig command.
2b) Within my LaTeX include the figure in the normal way. However, instead of using the latex '\includegraphics' command, I replace it with the '\psfragfig' command.

Notes
I can make my 'saveFigure.m' function available to anyone interested, but it doesn't do much more than I have described above!
I have created a slightly revised process for including Simulink models in documents which is a little different that I can discuss if anyone is interested?
I spent a little time trying to get psfrag to play well with eps files produced from other packages - e.g. Google Sketchup, however I don't think I've quite got to the bottom of it yet.

17/05/2011

Selecting the correct format

Does format matter?

Today I received an invite to a party in ".doc" format (i.e. a Microsoft Word document) via email. Whilst I was happy to be invited to the party, and the invite very much served its purpose, I can't help thinking that it could have been presented better. Here're some comments I would make:

".doc" is a proprietary format, which although popular and therefore supported on most peoples computers, can lead to inconsistent formatting or in the worst case a user not being able to open it at all.
It is also what I would consider to be an "editing format", which means it is fine for producing a document, or passing over to someone else for meddling with, but not (in my opinion) for presenting to a recipient. This is for several reasons:

The possibility of the user accidentally editing the document. Say for example the last thing the sender changes is the date of the party "3/6/2011" - it would be all to easy for me to open the document and then accidentally nudge the zero key, only to turn up to a distinct lack of party on "30/06/11".
Access to details the sender didn't wish to share. Formats such as this provide for a great deal of version history to be saved with the document; so unless specific steps are taken to ensure that this is not included in what I receive, there is every chance that I would be able to view previous versions of it, or comments about its contents.
It does not necessarily open in an easy to view format - either on the wrong page, or at the wrong zoom level, or with certain formatting visible (for example a nice red line under all the spelling mistakes). Microsoft did try to improve on this with the introduction of their "reading view" in Word 2003; however I don't think this really helped and only served to confuse the majority of users.
Because of all the extra information the format contains, often the files are far, far larger than they need to be.

It was sent attached to an email. Email is already a perfectly good format for presenting information, with a variety of different effects available (providing html format is used), so it seems a little unnecessary to attach a file with the information in.

How about an Analogy?

To make the closest analogy possible - if this were an invitation sent by good old-fashioned snail mail, it would be:

a handwritten letter;
with all the comments and corrections scribbled in the margins;
with some spelling and grammatical mistakes highlighted but not corrected;
spread over multiple pages - but folded open somewhere in the middle of the document;
posted in a large, heavy and cumbersome box which some recipients lack the tools or knowledge to open;
that is itself housed within a larger box.

Now that is maybe a worst-case scenario, but not particularly exaggerated in my experience; and whilst you might be a bit surprised to receive that as a party invite, you would be pretty disgusted to receive it as a Masters level degree thesis submission. And even more appalled if it was presented as a final report for a multi-thousand pound project contract! Yet this is exactly the sort of thing the gets done in Word every day. That's not to say a lot of those issues don't crop up in the use of other programs; however Word seems to be the most common object of misuse.

So what's the alternative?

A lot of the fuss I've made above can be avoided through proper use of the Microsoft tools. Provided you remove any hidden data, properly spell check your work and set the display up before you finally save it then things should come out looking ok. You can even protect the document to avoid accidental editing. However, to completely negate these issues, I prefer to use a totally separate "display format" for presenting information.

For anything that is disseminated wider than myself (or my immediate team) I am very keen on the use of common, open standards. The most common of which I have found to be pdf. Most people have a pdf reader installed on their computer, no matter what their operating system. In fact most modern phones can display pdf files. Many programs are able to save to pdf as built in feature and those that can't are invariably able to print to one of the many pdf conversion programs available.

There are also some very neat features of pdf files that are not often exploited, but can be used to produce some very useful effects. e.g. opening by default in full screen mode, embedding other files within them, etc. (perhaps I'll cover this at a later date)

By far my most important reason for trying to use pdf format for dissemination though is that it is a format that is difficult to edit (granted editors do exist, but 'accidental editing' is almost impossible) . This means that if I send a file to someone, and they choose to send it to someone else, I can be fairly confident that the final recipient will see what I want them to see (and nothing else!).

26/04/2011

Documentation

Within my work, as with many (most?) other professions, I have considerable need to produce documents. For me these usually take the form of:

Ongoing notes - for my own use and occasionally for discussion with my supervisors
Communication - invariably via email
Recording results - as they are produced from experiments
Reporting on progress - monthly/yearly formal reports
Papers - Eventually for publication in journals or similar
Thesis - Finally for submission for my PhD

Years ago my only option would be to handwrite all of these, but thankfully times have changed. This is particularly fortunate for me (or perhaps it is because of this), as my handwriting is appalling. So, although I still keep a handwritten log book for immediate scribbling, I am making an effort to digitise my notes as far as possible.
As I've mentioned before, I'm very much a believer in using the right tool for the job; to this end I thought I would note down what I'm using and try to justify my choices.

Plain text - I'm trying to use for emails, this can get difficult if I need to include equations or images, in these cases I will generally try to include them as an attachment.

Open Office Writer - I'm using to produce short reports. I have chosen not to use Latex (see below) for these as I am producing them as quick, short documents for dissemination to my supervisors; hence I find the WYSIWYG interface easier than going through Latex. I have considered setting up a Latex template but I don't think the extra hassle will be worth it in the end.
These documents I export to pdf format prior to dissemination (see a forthcoming post for reasoning). I also maintain these documents within my version control system (again see a forthcoming separate post).

Handwritten Notes - I keep a log book for scribbling down results and doing rough sketches, however I'm trying to digitise the important points from these as soon as possible as I can never find the page I wanted (no search facility!), or when I do find it that I can't read my own writing (yes it is that bad!).

Latex - (pronounced "lay-tec") I'm trying to use for most other work. My main reasons for use being:

The source is stored as plain text - this means it can be easily version controlled and differenced. (see a forthcoming separate post on this...).
It can compile directly to pdf with lots of nice extras included (menu's, interactive contents, hyperlinks, etc).
It separates the content from the formatting - allowing me to concentrate on one or the other, rather than worrying about items jumping between pages or links not working.
It is very flexible and can produce beautiful documents (especially where equations or complex formatting is involved).

I'll probably be posting more on Latex, as I have played with it to produce some interesting documents in the past.

Do you think the way I'm doing things is sensible? Any questions, comments or arguments then please add them below!