technical.allofe.com – New Blog

Our blog has moved to technical.allofe.com. Be sure to visit it for the latest AllofE technical blogs!

Posted in Uncategorized | Leave a comment

Page Encoding Revisited

In a previous post Zach described a method for determining a page’s encoding and encoding strings to that encoding in php. Now the code he’s given is example code, but it is also a good starting point to show how simple example code can be extended in to robust code you could actually use in a production environment. With that let’s take a look at the original function:

function convert_utf8_content_to_iso_8859_1_if_page_default($in_string) {
    // Get headers as an assoc array
    $page_headers = get_headers($the_page_url, 1);
    $page_encoding = "ISO-8859-1";
    if($page_headers["Content-Type"]) {
        $page_encoding = get_charset_from_page_header($page_headers["Content-Type"]);
    }
    if($page_encoding == "UTF-8") {
        return $in_string;
    }
    return mb_convert_encoding($in_string, "ISO-8859-1", "UTF-8");
}

This function works as is(except for one bug when the page’s encoding is neither UTF-8 nor ISO-8869-1), but there are a few things about it that would not make it an ideal solution for converting encodings. Part of it is the functions used within it and part of it is the scope of the problem this function tries to solve. First let’s state what this code does:

  1. It makes a request for headers to the server for a page at $the_page_url (this is the php page we are calling the function in).
  2. Determines what the server’s encoding is from the headers
  3. If the server’s encoding is UTF-8 then we simply return the string
  4. Otherwise we encode from UTF-8 to ISO-8859-1 and return that

The first issue we should resolve is the request to the server. Anytime a request is made to the server your code resides on resources are used, Apache ties up one of its processes, it’s one less request your server can send out to another user, and so on. So we definitely want to do this differently. This code is using the headers to get the encoding from the headers Apache sends out. This encoding is actually defined in Apache’s configuration and once we know what the encoding is it will not change (unless we explicitly change it in Apache configs). We now have two options, we could set php’s default encoding to the same encoding as Apache, or if needed we can simply set a define with apache’s encoding (this seems like a rarely needed case). In Zach’s previous post he updated the above function to use a helper function that cache’s the server’s encoding. While this is better than making requests every time our conversion function gets called, the best solution is to set Apache and PHP to use the same encoding.

Technically, if PHP’s encoding and Apache’s encoding are the same, the this function is not needed. But what we are really trying to solve here is the case where we have a string that is not encoded with the same encoding as our output. Perhaps this string came from an RSS feed on another machine, or it came from user input(think copy and paste). So we do need a function like this to help us with those situations.

We could simply rename our function to be convert_utf8_content_to_iso_8859_1() but if we decide later to support other encodings we’ll have write and then call a different function for every encoding pairing. It is also pretty clear that the call to mb_convert_encoding does not need to have hard coded encoding names. The mb_string library also has a function called mb_detect_encoding which can tell us the encoding of a string. Given this, we could write our conversion function like this:

define("DEFAULT_ENCODING", ini_get("default_charset")); //php.ini has this set to 'UTF-8'
function convert_to_page_encoding($string) {
    $encoding = mb_detect_encoding($string,'ASCII,UTF-8,ISO-8859-1');
    if ($encoding !== false) {
        if(DEFAULT_ENCODING == $encoding) return $string;
        return mb_convert_encoding($string, DEFAULT_ENCODING, $encoding);
    }
    die ("Could not detect encoding of string!");
}

And that’s all there is to it! We can now easily convert any string to our page’s encoding. Well, not really. Working with string encodings can be quite difficult, and the weak link in the above function is the mb_detect_encoding. While it works for the cases we use it most, it’s not robust enough to simply handle detection of any encoding. In fact ISO-8859-1 will get returned for any string encoded in one of the other ISO-8859-* encodings. You also have to be very specific about which order encodings are searched for and even then you may get a false positive. However, for many cases the above function will work, and more importantly it’ll gladly let us know when there is an issue we need to fix should we run into it later on.

I mostly wanted to write on this topic to bring up a point about writing code in general. Almost everything in coding can be done with several different approaches. Programmers are constantly pulled between getting something that works and coming up with the ideal solution. Sometimes when you get into bug fixing mode it is easy to do something that fixes the specific bug but does not fix the general problem the bug describes. It’s also easy to get bogged down trying to write the ideal solution when a halfway approach makes more sense (in terms of time and effort for value gained). This is the process we take a lot of times when writing code. We start out getting something that works, but when we step back and look at we see there are improvements that can be made. And often times after those improvements we see there are more to be made. Oddly enough, that’s part of the fun of coding.

Posted in PHP | Leave a comment

Setting Up a Mobile Development Environment

Introduction
Accessing the Internet on cell phones is becoming increasingly popular. It’s pretty easy to remember a time when all cell phones did was make phone calls, but now they becoming a concern when making websites. Along with testing in the usual browsers like Firefox, Internet Explorer, and Chrome, websites should also be tested in different web browsers on mobile phones. This blog will help you setup a development environment for testing websites on mobile phone browsers.

Blackberry Simulator
Research in Motion has created a huge library of different Blackberry simulators. They allow you to see exactly how things will look when viewed on a Blackberry without having to own one. The simulators are completely free of charge. Intalling them and using them is also pretty easy.

First head over to this page and grab yourself a simulator. I chose 8530 with OS 5.0 since that’s what I own. Fill in some of your information, hit download, then install it wherever you like.

In order to allow your Blackberry simulator to access the Internet, you’ll need to install the Blackberry MDS Simulator. It can be downloaded here. To install the MDS Simulator, you may also have to download the Java SDK.

The only problem I’ve ran into is not being able to run the MDS Simulator in administrator mode. I had to create a batch file that calls the file mds.bat in the MDS install folder, then run that as an administrator to get it to work. My batch file looked like this:

cd /
cd "Program Files"
cd "Research in Motion"
cd "Blackberry Email and MDS Services Simulators 4.1.4"
cd "MDS"
run.bat

To get your Blackberry simulation running, first run the simulator for the device you downloaded then run your MDS batch file. You should now be able to browse the Internet on your Blackberry simulation. Along with the Blackberry browser, also download Opera Mini for your device. Those are the two most popular browsers for Blackberries, and should give you the best feel for how your mobile website will show up on a Blackberry.

Android Simulator
The Android simulator is much easier to install than the Blackberry. Simply download the Android SDK and install to your desired directory. You will eventually get to a screen where you can manage Virtual Devices. Click “Add,” then type “myemulator” for the name, then select the newest version of Android that you can. Save, then exit out of the setup. Create a shortcut to emulator.exe in your Android SDK install directory under tools/. Open up the properties of the shortcut and add to the target “-avd myemulator” or whatever you chose to name your emulator. Open it up and let Android boot.

Palm Pre Simulator
The Palm Pre simulator is even easier to install than the others. Download the Palm Pre SDK and you’re off an running. It’ll let you know if you need to install any dependencies and will download them for you.

Windows Mobile Emulator
The Windows Mobile SDK can be downloaded from this link on Microsoft’s website. Be sure you select “Windows Mobile 6.5 Professional Developer Tool Kit (USA).msi”. (I was confused my first time on the page and downloaded the “ESP” version, which I though was “ENG”)

After that, you need to download the Virtual PC 2007. This allows your emulator to access the Internet. You can download it from here.

Install both of them, then boot up the emulator that came with your SDK. Once it finishes booting, go to File -> Configure -> Network. Click “Enable NE2000 PCMCIA network adapter and bind to:” and select your primary ethernet card. Click OK, then go to File -> Clear Saved State and File -> Reboot -> Hard. Once it boots back up, within the emulated device go to Settings -> Connections -> Network Cards. Select “Internet” for “My network card connects to”. Click Ok, then after a few seconds it should connect to the Internet.

The emulator itself works pretty well. The only issue I’ve had is that after opening it up it doesn’t automatically reconnect. After I boot it up, I always go to File -> Clear Saved State, then File -> Configure -> Network -> Ok, then in the emulator Settings -> Network -> Connections -> OK. If that doesn’t work, then go to File -> Reset -> Hard.

Conclusion
Setting up your own mobile development environment isn’t as hard as I first thought it was. The iPhone and iPad use Safari as their browser, so testing your website in Safari (or Chrome, which is based off of WebKit) should give you a sufficient preview of your website on those two platforms. Aside from the iPhone and iPad, being able to test your website in the Blackberry, Android, and Palm Pre simulators will give you an extremely accurate view of your website on the most popular mobile platformsf that

Posted in Uncategorized | Leave a comment

How to Handle Page Encoding in PHP

Introduction
One issue that I’ve had pop up from time to time is weird character showing up when I retrieve data from a database or from files I retrieve off other servers. Often times it’ll be of the form of a diamond with a question mark in the middle, or sometimes the apostrophe or double quote characters will be slanted. It wasn’t until yesterday that I traced it back to the issue of different encodings. I did a lot of research on different ways to handle encoding, so this blog will be a summary of those issues and how you as a programmer can handle encoding automatically.

Problem Setup – RSS Feed
Lets say that you want to display an RSS feed from a website like wordpress.com. You may even want to use this blog’s feed, which has the URL of http://allofetechnical.wordpress.com/feed/. If you look at the first few lines of the XML that is returned from that URL, you’ll see the following:

<?xml version="1.0" encoding="UTF-8"?>...

This indicates that this feed is encoded in UTF-8. If you wanted to run some processing on the feed to display it on your website by creating a SimpleXMLElement, you may run into a problem with encoding if you don’t take into account the feed’s encoding. Lets take a look at how your script might be setup:

(bootstrap.php, which is loaded by all your PHP files)
...
header("Content-Type: text/html; charset=ISO-8859-1");
...
(rss_page.php)
...
$content = file_get_contents("http://allofetechnical.wordpress.com/feed/");
$xml = new SimpleXMLElement($content);
print_r($xml);
...

The encoding that PHP is sending out with the header() call is different than the encoding of the RSS feed you just read in, so strange characters will start showing up in the feed. Your script is telling the browser that the encoding is ISO-8859-1, but the encoding of your feed is UTF-8, so it is no doubt that the browser is confused.

Problem Setup – Database Results
Another way this encoding problem comes about is when you don’t translate the data you retrieve from your database to the page’s encoding. By default, MySQL stores data in “latin1_swedish_ci”, which is equivalent to ISO-8859-1. Your script may do something like the following:

(bootstrap.php)
...
header("Content-Type: text/html; charset=UTF-8");
...
(database_functions.php)
...
function get_database_data($in_database, $in_query) {
$results = mysql_query($in_query, $in_database);
return mysql_fetch_assoc($results);
}
...

Since your data is stored in ISO-8859-1 but your page is telling the browser it is encoded in UTF-8, strange characters will show up from your database input.

Order of Operations – How a Browser Determines Encoding
Up till now, the only way I’ve shown to change the page’s encoding is by using PHP’s header() function. This unfortunately may not be the encoding the browser sees. Here is the way a browser determines the encoding of a page:

  1. If the user sets the encoding of their browser manually (in FF: View->Character Encoding), than use that.
  2. If that’s not set, look for a metatag from the page containing “Content-Type”. (e.g. <META http-equiv=”Content-Type” content=”text/html; charset=XYZ” />)
  3. If one doesn’t exist, look for the header sent out by PHP by its function header().
  4. If one doesn’t exist, look for the header sent out by PHP by its value of default_charset in php.ini.
  5. If that is not set, then look for the header sent out by Apache by its value of AddDefaultCharset in httpd.conf.
  6. If that is not set, then have the browser auto detect what the encoding of the page is.

Problem Solution
The crux of the problem is being able to detect what the encoding of the current page is before you display content that may be encoded. There are a few different ways that PHP has of detecting the encoding of a page. The most straightforward way is by using PHP’s get_headers() function. Here is an example of how you could encode UTF-8 content to ISO-8859-1 if that is the page’s encoding:


function convert_utf8_content_to_iso_8859_1_if_page_default($in_string) {
	$page_encoding = get_current_page_encoding();

	if($page_encoding == "UTF-8") {
		return $in_string;
	}

	return mb_convert_encoding($in_string, "ISO-8859-1", "UTF-8");
}

function get_page_encoding() {
	if($_SERVER["page_encoding"]) {
		return $_SERVER["page_encoding"];
	}

	$page_encoding = ini_get("default_charset"); // Try to get the charset from PHP first
	if(!$page_encoding) { // If the charset isn't stored in PHP, get it from the current page
		$the_page_url = some_function_to_get_the_current_page_url(); // Get the current page URL
		$page_headers = get_headers($the_page_url, 1); // Get headers as an assoc array
		$page_encoding = "ISO-8859-1";
		if($page_headers["Content-Type"]) {
			$page_encoding = get_charset_from_page_header($page_headers["Content-Type"]);
		}
	}

	$_SERVER["page_encoding"] = $page_encoding;

	return $page_encoding;
}

The example may be very simple, but it should make it clear that it is possible in PHP to tell what a page’s encoding is, and that content encoded in a different charset can be converted to the page’s easily.

Conclusion
The only downfall to this approach is that it doesn’t take into account #1 on the order of operations, which is the user set default encoding. There’s no way I know if in Javascript or PHP to detect the user’s browser encoding, so it may be an unsolvable aspect until further functionality becomes available. But for right now, the strategy of detecting a page’s encoding and switching the encoding of a string to the page’s by using mb_convert_encoding() is the best strategy I can think of.

Futher reading:
Character Sets / Character Encoding Issues on phpwact.org

Posted in Browsers, Languages, PHP | Leave a comment

Git Resources

While doing research for our upcoming move to git I’ve come across quite a few resources for learning how to use git. In particular the two books Pro Git and Git Community Book have been the most helpful in getting me familiar with all the different parts of git. They are pretty short and very easy to read. Here are some of the most useful places I’ve found to learn about git:

  • git-scm.org – This git’s main website, has links to other useful sites
  • Git Community Book – The community book for git
  • Pro Git – Another clear and helpful book
  • Git Ready – A collection of tips, and also more links to othere sites with git info
  • Stack Overflow: Git – All Stack Overflow questions related to git
  • A Git Workflow for Agile Teams – a pretty easy to follow workflow for developing with git. Doing a search for “git workflow” brings up several useful sites.
  • GitHub and Gitorious – These are for hosting git repositories online. If anything it’s useful for personal learning and to see how other projects work with git.

For developers the most important part of learning git is mostly learning to do commits, branching and merging. Learning about other workflows and the other things you can do with git is also useful but not as important, since a lot of it will be setup once and change very occasionally.

Posted in Uncategorized | Leave a comment

Caching Javascript Files

Javascript Cache issues

As HTML pages have evolved and more and more dynamic content gets added they rely heavily on using Javascript.  There are two main ways that developers include Javascript into their pages.  The first involves including an external script using:
<script type=”text/javascript” src=”/path/to/some/javascript/file.js”></script>

The second is to include the script directly into the page such as:
<script type=”text/javascript”>function helloWorld() {alert(“Hello World!”);}</script>

The ideal situation is to use the former because browsers typically will cache the external .js files so that as visitors come back to the site the file can be read from their local machine instead of having to re-download the file.  This will usually make the site quicker to load and reduces bandwidth.  This discussion will focus on this situation of caching the external .js files.

Although caching of .js files offers advantages for the end user there are some draw backs.  As developers make new changes to files it can be difficult to test their changes because their browser will cache earlier versions.  There are ways to circumvent this such as setting the browser to not cache or to constantly clear the browser’s cache after each change is made.  This becomes tedious and also means you lose the advantage of the browser caching the files in the case where it is disabled completely.  Even worse is that once a developer completes the changes and pushes them to a production server, end users browser’s may still be using a cached version resulting in errors and undesired functionality/bugs.  It is not easy to explain what is going on to end users so that they understand why their data didn’t get saved (for example).  There is a solution, however.

One way to help circumvent this is to add a different parameter to the src include each time the file is changed.  For example:
<script type=”text/javascript” src=”/path/to/some/javascript/file.js?temp=123”></script>

Each time the file is updated the parameter of temp=123 should be changed to a different variable/value combination.  The browser sees this as a request to a different file that it doesn’t have in the cache so it re-downloads the file.  Problem solved, right?  Well, not quite.  Although this works it means that when a developer makes changes they have to remember to change the temp=123 to something unique each time to insure that it doesn’t pull from the browsers cache.  One way to circumvent this is to create a centralized function which adds the extra parameter for you.
For example:

function include_javascript($js_file) {
$temp_data = “?temp=123”;
echo “<script type=’text/javascript’ src=’”.$js_file.$temp_data.”’></script>”;
}

What this does is it makes a central place for a developer to make the change so they don’t have to search through code for every location where a script file is being included and change the temp=123 to something new.  So now we are done, right?  Well, we can still make it better.  The issue with this solution is that when we change it in the central function that means every js file we are including will get a new variable which means all js files will be re-download when really it may have only been one that was needed.  So how can we make the browser re-download the files that have changed instead of all of them.  The way to make this work is to have it use the files mtime (The modification time of the file).  This changes each time a file is saved or copied over an existing file.  So the function would become:

function include_javascript($js_file) {
$mtime = mtime($js_file);
$temp_data = “?var”.$mtime.”=”.$mtime;
echo “<script type=’text/javascript’ src=’”.$js_file.$temp_data.”’></script>”;
}

Now the temp variable tacked onto the js file will only change if the file has actually changed.  This allows the browser to still use the cached file for .js files that haven’t changed and download the new versions for ones that have.

With this solution we are able to take advantage of the browser’s cache and not have to worry that a client won’t get the latest version of a file as updates are pushed from development to production.

Posted in Browsers | Leave a comment

IE v FF: How to Handle Bad DOM Style Assignments

Introduction
It seems that now days most of my Javascript bug fixing comes down to differences between IE and FF. I’ve learned a lot about how to create code that works in both browsers, but there are still little aspects that I haven’t fully mastered. One of them is the different ways that they handle bad DOM style assignments. It’s a tricky bug that doesn’t show up in FF as a Javascript error but in IE it makes itself known.

Problem Setup
Lets say that you want to adjust a DOM style property of an element. You may want to adjust the height to be 200px. Here is a generic script for doing that:


function adjustHeight(inElement, inHeight) {
	inElement.style.height = inHeight + "px";
}

The problem comes when your variable inHeight is set to the special value of NaN. It indicates that the variable is not a number. A lot of programmers use that as an initial value for variables who will have meaningful values later. If you tried to do something like the following:


adjustHeight(document.getElementById("myDiv"), NaN);

FF wouldn’t complain, but IE would throw an “Invalid argument” error.

Problem Analysis
When setting an element’s style, both IE and FF do error checking on your input. The difference is in how they handle errors. FF, per the spirit of the W3C’s suggestion on handling errors in stylesheets, ignores errors and doesn’t throw any visible errors. IE on the other hand will raise an error when the invalid assignment happens in Javascript but not in a stylesheet.

Problem Avoidance
If you wanted to create a function that checked your input and made sure that the current browser will accept it, you’d have a pretty long and hard to update function. There are some simple tests that you can do to check for cases where inInput isn’t valid. For instance, the function isNaN() returns true when given a variable that is assigned to NaN. You could also do some testing to make sure that your units (e.g. “px”, “em”, etc.) are valid units.

Conclusion
Requiring the programmer to know what your browser treats as valid is a pretty big burden for IE to place on a developer. IE should treat bad DOM style assignments the same way it treats bad stylesheet definitions: ignore them.

Posted in Uncategorized | Leave a comment