Showing posts with label iphone. Show all posts
Showing posts with label iphone. Show all posts

Tuesday, September 28, 2010

Exchange 2010 + iOS4 = Fail?

So recently we've rolled out an Exchange 2010 cluster at one of our larger clients and for the most part its gone smoothly but it seemed that slowly more and more iPhone users were complaining their iPhones couldn't send mail.

The issue seemed really intermittent and difficult to reproduce, but certainly getting steadily worse.

We didn't realize it at the time but we were rolling out Exchange 2010 at about the same time Apple released iOS4.

Although you won't find any official statements from Apple that the problem even exists with iOS4 it seems they kind of borked their implementation of Exchange Activesync 14 in iOS4. EAS14 implements some fancy new "smart" features that I'm assuming Apple tried to take advantage of.

This problem manifests itself as users that can't send emails from their exchange account, it just sits there and spins and spins forever and as rapidly decreasing battery life on iPhones.

If you check out the Apple message boards you'll find a dozen or so pages full of people screaming about this problem and if you look carefully you'll find a link to JimGoings.com where he posts a .mobileconfig file that essentially disables support for Activesync 14 (in case you're wondering Microsoft skipped version 13... you know, bad luck).

We have rolled this out to around 100 users and it seems to be the miracle cure for iOS4.

Check it out! BlackListExchange2010.mobileconfig

Monday, July 26, 2010

Exchange 2010 Multi-site clustering... almost.

If you've ever implemented an Exchange 2003 cluster you know just how far Exchange High Availability clustering has come, even if you haven't the capabilities that an Exchange 2010 HA cluster promises sound pretty exciting.

I thought so too and started my multi-site HA Exchange 2010 cluster. After all the frustration I've endured and the major lack of documentation on the internet I've decided I would contribute what I've learned in my project so far...

Background
I work with a client that has a very unique set of challenges. They are unable to get a reliable Internet provider in their area but their e-mail is absolutely critical to them. Their most recent solution has been a combination of two wireless Wi-Fi based technologies and a T1 connection that are configured to load balance and fail over. For the last year and a half or so they have been hosting their e-mail off-site and using three IPSEC tunnels with RIP routing between them for high-availability. The problem is they are growing incredibly fast and the bandwidth required to sustain client connections across the WAN is becoming kind of silly. Oh... and did I mention, 80% of the clients are Macs? yuck.

The clients awesome new Dell R510 filled full of 15k RPM SAS drives and tons of RAM showed up and the fun begins. The 510 will be their new on-site Exchange server which they will primarily access. Their backup off-site server will be a virtual machine hosted on VMWare ESX4.

Getting started
Getting started was fairly easy, setting up the server hardware, installing the OS, and Exchange prerequisites, all pretty standard stuff. One of the nice things about Exchange 2010 clustering is you can setup your Database availability groups with only a single member which was really nice since it allowed me to setup the cluster way ahead of time before the physical server hardware even arrived.

One of the first things you might notice is that Exchange 2010 clusters don't really appear to be a cluster at all... well they are just not much in the traditional sense. Exchange 2010 uses the Windows Cluster framework to cluster the exchange mailbox role but even then, it does not show up in the Windows Cluster manager.

I found myself asking a few obvious questions

First, how do I set my cluster IP Address?
Set-DatabaseAvailabilityGroupIPV4Address  <IP ADDRESS,IP ADDRESS>

Second, what happens to the cluster IP when I fail it over?
Microsoft documentation recommends configuring multiple cluster IP addresses, one in each AD site, so it then stands to reason that there would be one active cluster IP address per AD site corresponding to that sites subnet. My initial testing did not indicate this was the case but I think I was just being impatient. The DNS record for the cluster is updated with the cluster IP addresses corresponding to the AD site that the active mailbox database belongs in... it just takes a while sometimes.

So I followed the rest of the MS documentation to get my DAG members setup and tada it works. Awesome that was easy, now lets test it. Naturally my first test was to replicate a service failure. I tried killing IIS first on my active cluster node, assuming IIS would failover to my passive node. https://excluster/owa ... page not found. That didn't work...

Client Access Servers
The CAS is the first one of my many WTF moments when configuring my multi-site Exchange 2010 cluster. I had originally made the assumption that the CAS role is part of the same cluster magic as the mailbox role, but no... its not. In fact, the CAS role is basically not clustered at all.

Microsoft's recommended configuration for HA CAS is to create a CAS array. A CAS array is a fancy name for an NLB cluster of your client access servers. Ok fine, I can deal with that, but wait. CAS Arrays are not support across AD sites. Bah now what?!

So after brainstorming for a while I realize that a fully automated failover is neither wise nor possible anymore. Although, something very easy like a few config line changes would really be optimal. The rest of my project is based around the idea of manual but simplistic failover.

I originally though it would be best to just change the DNATs on my firewall to direct traffic to the CAS across the IPSEC tunnel to the virtual machine node. However, after screwing around with the routing changes and SNATS/DNATs needed to make it all work I decided there would be too many complex routing changes necessary to make the failover happen, especially if it is a serious disaster and I'm already flustered about a broken exchange server. I needed something simpler.

We've used Nginx as a reverse proxy load balancer before with great success so that was my natural first attempt. Nginx was mostly a success except for a few caveats.
  • Mac Mail didn't work
  • EWS didn't work because it requires NTLM
  • Nginx completely mangles the www-authenticate header and causes devices to default to NTLM authentication even though Nginx doesn't even support NTLM auth.
  • Disabling NTLM in IIS on the CAS fixed EWS/Mac mail but naturally breaks Outlook clients
After wasting an embarrassing amount of time on Nginx I turned to Apache. After a very brief amount of time configuring apache I had a working reverse proxy that could co-exist with Outlook, Mac mail, and activesync and all I need to do to failover my CAS is uncomment two lines in my apache config and restart apache. The neat part is since all the traffic is proxied I don't need to make my firewall do any circus tricks.

I did run into one small hiccup configuring apache as a reverse proxy though. In order to get Activesync to work properly I had to add the following lines to my config

SetEnv proxy-sendcl 1
RequestHeader edit Transfer-Encoding Chunked chunked early

I then uninstalled Nginx since I didn't need it anymore and promptly broke apache. After uninstalling and reinstalling Apache life is good again and my reverse proxy still works.

OWA quirk
I'm still not sure if this one is my imagination or not but I'm fairly certain that if your CAS is not in the same AD site as your active mailbox server you tend to have trouble logging in although I've been quite exhausted at any point that I've come to this conclusion so I wouldn't take it for fact.

RpcClientAccessServer
In the process of troubleshooting some of the strange authentication issues I was having with Nginx I was fortunate enough to break my EWS directory so badly it was easier to just blow it away. Rebuilding Exchange virtual directories is not something thats new to me but this one didn't go so smoothly. After doing remove-webservicesvirtualdirectory and new-webservicesvirtualdirectory EWS was still missing and I was looking at an error message that made no sense. After hacking around in ADSIEDIT and the IIS Metabase I decided I was best off uninstalling IIS and CAS and reinstalling both.

The point of that little story is this. I discovered that the LegacyExchangeDN gets totally borked when you do that, fortunately Microsoft has a really nice KB article about it. But what I also realized is that for some reason, my CAS that didnt exist any more was still trying to be used by my Outlook clients. excluster (the cluster DNS name) resolved to (exserver1) the exchange server that I broke when it should have been resolving to exserver2.

Enter reason #2 why Exchange 2010 HA clustering is very incomplete when used in multiple site format.

When you create a mailbox database (yes even that DAG database) the client access server is statically configured for that mailbox database with the RpcClientAccessServer attribute.

If you do
get-mailboxdatabase databasename | fl
you'll see the current value of your RpcClientAccessServer.

So, when you go to failover your exchange database you'll need to make sure you change RpcClientAccessServer to point to your active CAS, on all of your nodes for all of the DAG databases that you need access to

get-mailboxdatabase databasename | set-mailboxdatabase -RpcClientAccessServer servername1.domainname.com
Here is a great post on networkworld.com that explains it in a little more detail RpcClientAccessServer Madness!

The RpcClientAccessServer setting is really not a big deal if you're using a CAS Array in a single AD site since you can just set your mailbox database to your CAS array and forget about it. But its not so simple for multiple site setups.

Hub Transport
As you've probably guessed by now the Hub Transport role is not really clustered either. But this is one that I don't really care about. Hub transport is a piece of cake to make redundant just make sure both servers can send and recieve mail through the firewalls and setup seperate send and recieve connectors since they  are in separate AD sites they can't share one.

My configuration utilizes a barracuda for relaying incoming and outgoing mail but if I didn't have the cuda I could just as easily configure different priority MX records for each server.

Wrapping it up...
at this point I'm mostly finished testing my Exchange 2010 cluster and am ready to put one of the CAS in production as a front end CAS to redirect to my back end Exchange 2007 for the duration of the migration.

I will probably be posting a followup before long about the rest of the migration... like how ginormous the database gets without SIS :-o

-Steve