Tuesday, September 28, 2010

Exchange 2010 + iOS4 = Fail?

So recently we've rolled out an Exchange 2010 cluster at one of our larger clients and for the most part its gone smoothly but it seemed that slowly more and more iPhone users were complaining their iPhones couldn't send mail.

The issue seemed really intermittent and difficult to reproduce, but certainly getting steadily worse.

We didn't realize it at the time but we were rolling out Exchange 2010 at about the same time Apple released iOS4.

Although you won't find any official statements from Apple that the problem even exists with iOS4 it seems they kind of borked their implementation of Exchange Activesync 14 in iOS4. EAS14 implements some fancy new "smart" features that I'm assuming Apple tried to take advantage of.

This problem manifests itself as users that can't send emails from their exchange account, it just sits there and spins and spins forever and as rapidly decreasing battery life on iPhones.

If you check out the Apple message boards you'll find a dozen or so pages full of people screaming about this problem and if you look carefully you'll find a link to JimGoings.com where he posts a .mobileconfig file that essentially disables support for Activesync 14 (in case you're wondering Microsoft skipped version 13... you know, bad luck).

We have rolled this out to around 100 users and it seems to be the miracle cure for iOS4.

Check it out! BlackListExchange2010.mobileconfig

Tuesday, August 3, 2010

Exchange 2010 CAS + Apache 2 Reverse Proxy

As I mentioned in a previous post I implemented an Apache 2 reverse proxy to proxy Exchange 2010  CAS traffic to my cluster node CAS servers to make failovers easy. It took a little bit of tweaking to make it all work properly with Activesync, OWA, and EWS but here it is!

Hopefully this will make your reverse proxy implementation a bit easier.
LoadModule  proxy_module         modules/mod_proxy.so
LoadModule  proxy_http_module    modules/mod_proxy_http.so
LoadModule  headers_module       modules/mod_headers.so
LoadModule  deflate_module       modules/mod_deflate.so
LoadFile    /usr/lib/libxml2.so
LoadModule  ssl_module           modules/mod_ssl.so
LoadModule  proxy_html_module    modules/mod_proxy_html.so




# *.DOMAIN.NET



  
        ProxyRequests Off
    SetEnv proxy-sendcl 1  
  
        ServerName *.domain.net:443
        ServerAlias *.domain.net:443

       
                Order deny,allow
                Allow from all
       


    # CAS Server
        ProxyPass / https://10.176.0.100/
        ProxyPassReverse / https://10.176.0.100/

    ProxyPreserveHost On
    ProxyVia Full
    RequestHeader edit Transfer-Encoding Chunked chunked early

    ErrorLog /var/log/apache2/error.log

    LogLevel info

    CustomLog /var/log/apache2/ssl_access.log combined

    Alias /doc/ "/usr/share/doc/"
    
        Options Indexes MultiViews FollowSymLinks
        AllowOverride None
        Order deny,allow
        Deny from all
        Allow from 127.0.0.0/255.0.0.0 ::1/128
    


    #   SSL Engine Switch:
    #   Enable/Disable SSL for this virtual host.
    SSLEngine on
    SSLProxyEngine on
    SSLCertificateFile    /etc/ssl/mail2.domain.net+gd_bundle.crt
    SSLCertificateKeyFile /etc/ssl/mail2.domain.net.key
    SSLCertificateChainFile /etc/ssl/mail2.domain.net+gd_bundle.crt


  
    
    RequestHeader unset Accept-Encoding
    #SSLRequire (    %{SSL_CIPHER} !~ m/^(EXP|NULL)/ \
    #            and %{SSL_CLIENT_S_DN_O} eq "Snake Oil, Ltd." \
    #            and %{SSL_CLIENT_S_DN_OU} in {"Staff", "CA", "Dev"} \
    #            and %{TIME_WDAY} >= 1 and %{TIME_WDAY} <= 5 \
    #            and %{TIME_HOUR} >= 8 and %{TIME_HOUR} <= 20       ) \
    #           or %{REMOTE_ADDR} =~ m/^192\.76\.162\.[0-9]+$/
    


    
        SSLOptions +StdEnvVars
    

    
        SSLOptions +StdEnvVars
    


    BrowserMatch ".*MSIE.*" \
        nokeepalive ssl-unclean-shutdown \
        downgrade-1.0 force-response-1.0

Monday, July 26, 2010

Exchange 2010 Multi-site clustering... almost.

If you've ever implemented an Exchange 2003 cluster you know just how far Exchange High Availability clustering has come, even if you haven't the capabilities that an Exchange 2010 HA cluster promises sound pretty exciting.

I thought so too and started my multi-site HA Exchange 2010 cluster. After all the frustration I've endured and the major lack of documentation on the internet I've decided I would contribute what I've learned in my project so far...

Background
I work with a client that has a very unique set of challenges. They are unable to get a reliable Internet provider in their area but their e-mail is absolutely critical to them. Their most recent solution has been a combination of two wireless Wi-Fi based technologies and a T1 connection that are configured to load balance and fail over. For the last year and a half or so they have been hosting their e-mail off-site and using three IPSEC tunnels with RIP routing between them for high-availability. The problem is they are growing incredibly fast and the bandwidth required to sustain client connections across the WAN is becoming kind of silly. Oh... and did I mention, 80% of the clients are Macs? yuck.

The clients awesome new Dell R510 filled full of 15k RPM SAS drives and tons of RAM showed up and the fun begins. The 510 will be their new on-site Exchange server which they will primarily access. Their backup off-site server will be a virtual machine hosted on VMWare ESX4.

Getting started
Getting started was fairly easy, setting up the server hardware, installing the OS, and Exchange prerequisites, all pretty standard stuff. One of the nice things about Exchange 2010 clustering is you can setup your Database availability groups with only a single member which was really nice since it allowed me to setup the cluster way ahead of time before the physical server hardware even arrived.

One of the first things you might notice is that Exchange 2010 clusters don't really appear to be a cluster at all... well they are just not much in the traditional sense. Exchange 2010 uses the Windows Cluster framework to cluster the exchange mailbox role but even then, it does not show up in the Windows Cluster manager.

I found myself asking a few obvious questions

First, how do I set my cluster IP Address?
Set-DatabaseAvailabilityGroupIPV4Address  <IP ADDRESS,IP ADDRESS>

Second, what happens to the cluster IP when I fail it over?
Microsoft documentation recommends configuring multiple cluster IP addresses, one in each AD site, so it then stands to reason that there would be one active cluster IP address per AD site corresponding to that sites subnet. My initial testing did not indicate this was the case but I think I was just being impatient. The DNS record for the cluster is updated with the cluster IP addresses corresponding to the AD site that the active mailbox database belongs in... it just takes a while sometimes.

So I followed the rest of the MS documentation to get my DAG members setup and tada it works. Awesome that was easy, now lets test it. Naturally my first test was to replicate a service failure. I tried killing IIS first on my active cluster node, assuming IIS would failover to my passive node. https://excluster/owa ... page not found. That didn't work...

Client Access Servers
The CAS is the first one of my many WTF moments when configuring my multi-site Exchange 2010 cluster. I had originally made the assumption that the CAS role is part of the same cluster magic as the mailbox role, but no... its not. In fact, the CAS role is basically not clustered at all.

Microsoft's recommended configuration for HA CAS is to create a CAS array. A CAS array is a fancy name for an NLB cluster of your client access servers. Ok fine, I can deal with that, but wait. CAS Arrays are not support across AD sites. Bah now what?!

So after brainstorming for a while I realize that a fully automated failover is neither wise nor possible anymore. Although, something very easy like a few config line changes would really be optimal. The rest of my project is based around the idea of manual but simplistic failover.

I originally though it would be best to just change the DNATs on my firewall to direct traffic to the CAS across the IPSEC tunnel to the virtual machine node. However, after screwing around with the routing changes and SNATS/DNATs needed to make it all work I decided there would be too many complex routing changes necessary to make the failover happen, especially if it is a serious disaster and I'm already flustered about a broken exchange server. I needed something simpler.

We've used Nginx as a reverse proxy load balancer before with great success so that was my natural first attempt. Nginx was mostly a success except for a few caveats.
  • Mac Mail didn't work
  • EWS didn't work because it requires NTLM
  • Nginx completely mangles the www-authenticate header and causes devices to default to NTLM authentication even though Nginx doesn't even support NTLM auth.
  • Disabling NTLM in IIS on the CAS fixed EWS/Mac mail but naturally breaks Outlook clients
After wasting an embarrassing amount of time on Nginx I turned to Apache. After a very brief amount of time configuring apache I had a working reverse proxy that could co-exist with Outlook, Mac mail, and activesync and all I need to do to failover my CAS is uncomment two lines in my apache config and restart apache. The neat part is since all the traffic is proxied I don't need to make my firewall do any circus tricks.

I did run into one small hiccup configuring apache as a reverse proxy though. In order to get Activesync to work properly I had to add the following lines to my config

SetEnv proxy-sendcl 1
RequestHeader edit Transfer-Encoding Chunked chunked early

I then uninstalled Nginx since I didn't need it anymore and promptly broke apache. After uninstalling and reinstalling Apache life is good again and my reverse proxy still works.

OWA quirk
I'm still not sure if this one is my imagination or not but I'm fairly certain that if your CAS is not in the same AD site as your active mailbox server you tend to have trouble logging in although I've been quite exhausted at any point that I've come to this conclusion so I wouldn't take it for fact.

RpcClientAccessServer
In the process of troubleshooting some of the strange authentication issues I was having with Nginx I was fortunate enough to break my EWS directory so badly it was easier to just blow it away. Rebuilding Exchange virtual directories is not something thats new to me but this one didn't go so smoothly. After doing remove-webservicesvirtualdirectory and new-webservicesvirtualdirectory EWS was still missing and I was looking at an error message that made no sense. After hacking around in ADSIEDIT and the IIS Metabase I decided I was best off uninstalling IIS and CAS and reinstalling both.

The point of that little story is this. I discovered that the LegacyExchangeDN gets totally borked when you do that, fortunately Microsoft has a really nice KB article about it. But what I also realized is that for some reason, my CAS that didnt exist any more was still trying to be used by my Outlook clients. excluster (the cluster DNS name) resolved to (exserver1) the exchange server that I broke when it should have been resolving to exserver2.

Enter reason #2 why Exchange 2010 HA clustering is very incomplete when used in multiple site format.

When you create a mailbox database (yes even that DAG database) the client access server is statically configured for that mailbox database with the RpcClientAccessServer attribute.

If you do
get-mailboxdatabase databasename | fl
you'll see the current value of your RpcClientAccessServer.

So, when you go to failover your exchange database you'll need to make sure you change RpcClientAccessServer to point to your active CAS, on all of your nodes for all of the DAG databases that you need access to

get-mailboxdatabase databasename | set-mailboxdatabase -RpcClientAccessServer servername1.domainname.com
Here is a great post on networkworld.com that explains it in a little more detail RpcClientAccessServer Madness!

The RpcClientAccessServer setting is really not a big deal if you're using a CAS Array in a single AD site since you can just set your mailbox database to your CAS array and forget about it. But its not so simple for multiple site setups.

Hub Transport
As you've probably guessed by now the Hub Transport role is not really clustered either. But this is one that I don't really care about. Hub transport is a piece of cake to make redundant just make sure both servers can send and recieve mail through the firewalls and setup seperate send and recieve connectors since they  are in separate AD sites they can't share one.

My configuration utilizes a barracuda for relaying incoming and outgoing mail but if I didn't have the cuda I could just as easily configure different priority MX records for each server.

Wrapping it up...
at this point I'm mostly finished testing my Exchange 2010 cluster and am ready to put one of the CAS in production as a front end CAS to redirect to my back end Exchange 2007 for the duration of the migration.

I will probably be posting a followup before long about the rest of the migration... like how ginormous the database gets without SIS :-o

-Steve

Monday, March 1, 2010

Symantec Endpoint is a JOKE

I wanted to try to keep my posts more useful and educational rather than me just sitting here typing snarky comments about things I don't like, but I really feel like I need to get this one off my chest...

WHAT ARE YOU THINKING SYMANTEC?!

Symantec Endpoint Protection is absolutely some of the worst software I have ever encountered. I honestly don't know why I continue to use it, maybe that pretty little shield on my taskbar makes me feel safe.

The management side of Symantec is like some programmers worst nightmare (or best joke) realized. Two different web servers on the same machine talking to each other and the clients for management? Really? WHY?! There must be a better way, I'm not programmer but this is ridiculous. I don't have enough fingers and toes to count the number of times I've had to hack some config file to tell the Endpoint manager to use different ports because the ones that it randomly chose ( apparently without checking ) were in use.

In no particular order lets talk about some other reasons Symantec Endpoint Protection totally sucks.

-The management server is bulky, poorly designed, and breaks if you look at it wrong.
-The client isn't good at catching virii
-Deploying clients from the management console is painful
-Did I mention the console breaks constantly?
-The Network Threat Protection and Proactive Threat Protection part of the client break pretty much any machine you install them on. For real.
-You can't just uninstall the two pieces of the client that break stuff if you accidentally install them or didn't know better... you have to completely uninstall, reboot, reinstal without the junk
-Updates break stuff... like the management console
-The clients break stuff - the clients break random stupid things on the computers they are installed on
-The clients totally cripple performance
-If your client isn't broken now... you're lucky. But it won't last long.
-Certain versions of the client create MASSIVE amounts of logfiles on the client... enough to fill up your whole disk
-All the random stupid things that the clients do like... creating tons of log files, Symantec has all these nifty tools to fix that stupid crap. But guess what, you have to CALL SUPPORT to get them.
-If you don't have a support contract you will have to fight with the tech support guys to get these utilities
-But it doesn't matter, because even after you fix that problem, it will break again next time you update it.
-Symantec releases Maintenance Releases of Endpoint (5 so far!) that fixes a bunch of these stupid little issues. Awesome! Except a few releases later they break it again....

THANKS SYMANTEC for taking a pretty decent product in Symantec Corporate AV and turning it into the 8 headed mutant piece of garbage that is Symantec Endpoint Protection.

START OVER, IT SUCKS

Wednesday, February 10, 2010

Citrix Printing Fun

So,  I suppose the most natural place to start writing my blog is to talk about the challenges I've most recently been dealing with.

Part of my job is supporting a Citrix environment for our hosted clients. These clients essentially host all their desktops and applications in our datacenter. It is really a great value for most of our customers as it allows their IT to be completely "hands off". Most of these environments are very simple 1 or 2 virtualized server environments that run quite well. One of our hosted clients started to develop a very nasty print spooler crashing problem, they were calling the helpdesk 10+ times a day because their printers were missing. This Citrix environment had historically been problematic in nearly every other way, and it was just a matter of times before these printer issues surfaced.

The previous Citrix admin for this organization set a very good example on how not to do things. I learned quite a bit from this ordeal, these printers are mostly running great now, but it took some doing.

Citrix Policies
The Citrix printing policies are the first place to start building how you want your Citrix printing environment to work. The policies that ended up working well in my environment were the following.

Client printer auto-creation - I set this to "auto-create all client printers" I initially thought I would like this to be set to auto create all client printers except for network printers, and in an environment that has scripted installation of network printers I would. But in this environment network printers are all manually mapped as needed (I know... it's not optimal) So, in this case I wanted to map my network printers through the ICA session.

Native printer driver auto-install - I ended up setting this setting to "Do not automatically install drivers" Essentially what this policy does is if you install a printer that Windows already has a native driver available for, it will go ahead and use it. We want to monitor as close as we possibly can which drivers are installed on our Citrix servers (and you do too!) so we do not allow native drivers to be automatically installed.

Universal Driver - I set this to "Use universal driver only if requested driver is unavailable". THIS DOES NOT MEAN that if your printer doesn't match a print driver on the server that it WILL match the universal driver. The universal driver works well for alot of printers. But it certainly does not work for everything.

There are a bunch of other Citrix Printing policies, but these are, in my opinion, the most important.

Print Drivers
Print drivers are like alot of other things, you want to "Keep it short and sweet" Install as few drivers as you can to do the job. A bad print driver will wreak havoc on your farm and it is alot easier to weed out a bad driver amongst 25 drivers than it is amongst 150 drivers.

I have compiled a list of drivers that tend to work really well in my environment. My base list came from this Brian Madden blog. But I have expanded upon it.

  • Canon Bubble-Jet BJC-210
  • Canon Bubble-Jet BJC-4000
  • Generic / Text Only
  • Epson Stylus COLOR ESC/P 2
  • HP Color LaserJet
  • HP Color LaserJet 4500
  • HP DeskJet
  • HP DeskJet 550C
  • HP DeskJet 850C
  • HP DeskJet 855C
  • HP LaserJet
  • HP LaserJet Series II
  • HP LaserJet 1100 (MS)
  • HP LaserJet 2100
  • HP LaserJet 4
  • HP LaserJet 5
  • HP LaserJet 6P
  • HP LaserJet 4000 Series PCL
  • HP LaserJet 8100 Series PCL
  • HP OfficeJet
  • Lexmark Optra
  • Lexmark 1020 Color Jetprinter
  • HP Laserjet 2035n
  • HP Laserjet 2015dn
  • HP Laserjet 6L
  • HP Laserjet 4M Plus
  • HP Universal Printing PCL5e
  • HP Universal Printing PCL6
  • HP Universal Printing PS

I know what you're thinking as you get to the end of the list. "HP Universal driver... really? ... that seems like a bad idea" I thought so too! But... then I found this fantastic HP document that outlines all the drivers they support on Citrix in different Operating Systems. It lists the Universal Printing drivers, and as far as I know, I haven't had issues with them yet.

It is important to note that although this document lists a few host based printers that HP considers compatible with a Xenapp environment, it is generally considered bad practice to implement a host based printer in either a Citrix or terminal services environment.

 
Remapping confused print drivers

So when I first realized I should be able to run my whole farm on a couple dozen drivers I was really excited, but of course skeptical. How could a Laserjet 4050n know it is supposed to use a Laserjet 6L driver? The good folks at Citrix worked that out for you a long time ago!

In the Citrix Advanced Configuration console if you right click on printers you will see the "Mapping..." button. This allows you to map the name of the driver your printer "thinks it wants" to the one you tell it it wants. But, this must be exactly what the Citrix client is looking for or your mapping won't work. Laserjet 4050n PCL6 is different from Laserjet 4050n PCL 6.

To find out what drivers your printer wants to use take a look in the Application log on your Citrix server. You will see an error 1007 logged for any printer that fails to map because it can't find a matching driver. The name of the event is "Metaframe Events" It will say Client Driver: drivername. That is the name you want to use in the advanced configuration console to remap to a driver that is installed on your server.

Printing Bandwidth!!
I rarely ever see this in any article when people are talking about Citrix printing problems. Citrix documents how important it is in pretty good detail, but I think alot of admins neglect to think about it, I know I did at first.

By default, a Citrix session will let you use as much bandwidth as you please to move that print job across the pipe to the printer. Have users wondering why their machine/app seems to lock up when they send a print job? This is probably why.

If you right click on your Citrix servers in the advanced configuration console and click properties you will see a tab to restrict printing bandwidth for each server. I would love if this was a farm level setting, or at least zone level since I think that makes more sense.... but its not. I started my servers out at 200kbps per server and that seemed to help out alot. I knocked it back to 60kbps recently and nobody has noticed. I think I will keep it there.

Keeping the print spooler and Citrix Print Manager services in check
So you're being a good Citrix admin and you've trimmed your print driver inventory down significantly, cut back your printing bandwidth, and even setup a few print driver mappings. Life is much better, but you still have what seems to be crashing spoolers...

You probably still have a funky driver. Its not uncommon to have a bad driver. But you don't really want to just start blowing away drivers.

In almost any troubleshooting situation I find it helpful to define the scope of the problem.

Is it one user? Do they have a certain printer?
Is it happening on all of my servers?
Is it the spooler or the citrix print manager service crashing?

None of these are very easy answers to find. I wrote a little script to help me out. All it does is write to a log file on a network share when a service stops and then restart the service.

echo "The print spooler service has crashed on" %computername% "at" %time% "on" %date% "....restarting the spooler" >> file://servername/sharename/spoolerlog.log
net start spooler

I then attached this script file to the spooler and citrix print manager service by configuring the services to "run a program" on failure.

Note- the name to start the citrix print manager service through net start is cpsvc. The Citrix Print Manager service does not automatically restart itself by default when it fails by default!

Hopefully this type of logging will allow you to cross reference other entries in your event logs to determine what is breaking your spooler/print manager.

In the event you find it is a driver that is working fine on another server you can use the advanced management console to replicate the driver from the working server over top the one on the broken server.

I wrote some other neat scripts that do stuff like manually remove print drivers and automatically install my list of native print drivers, I will post those up at some point.

Reworking Citrix printing an a poorly setup environment  can be a nightmare, so do it right the first time and it will be a breeze.