Crying Cloud

Mike DeLuca

The Concept of Cloud Adjacency

datacenter.jpg

Several years ago, when we first started theorizing around hybrid cloud, it was clear that to do our definition of hybrid cloud properly, you needed to have a fast, low latency connection between your private cloud and the public clouds you where using. After talking to major enterprise customers we theorized five cases where this type of arrangement made sense, and actually accelerated cloud adoption.

  1. Moving workloads but not data - If you expected to move workloads between cloud providers, and between public and private to take advantage of capacity and price moves, moving the workloads quickly and efficiently meant, in some cases, not moving the data.
  2. Regulatory and Compliance Reasons - For regulatory and compliance reasons, say the data is sensitive and must remain under corporate physical ownership wile at rest, but you still want to use the cloud for apps around that data (including possible front ending it).
  3. Application Support - A case where you have a core application that can’t move to cloud due to support requirements (EPIC Health is an example of this type of application), but you have other surround applications that can move to cloud, but need a fast, low latency connection back to this main application.
  4. Technical Compatibility  - A case where there is a technical requirement, say you need real time storage replication to a DR site, or very high IOPS, and Azure can’t for one reason or another handle the scenario, in this case that data can be placed on a traditional storage infrastructure and presented to the cloud over the high bandwidth low latency connection.
  5. Cost – There are some workloads which today are significantly more expensive to run in cloud then on existing large on premise servers. These are mostly very large compute platforms that scale up rather then out. Especially in cases where a customer already owns this type of equipment and it isn't fully depreciated. Again, it may make sense to run surround systems in public cloud that run on commodity two socket boxes, wile retaining the large 4 socket, high memory instances until fully depreciated.

 

All of these scenarios require a low latency high bandwidth connection to the cloud, as well as a solid connection back to the corporate network. In these cases you have a couple options for getting this type of connection.

  1. Place the equipment into your own datacenter, and buy high bandwidth Azure express route and if needed Amazon Direct Connect connections from an established Carrier like AT&T, Level3, Verizon, etc.
  2. Place the equipment into a carrier colocation facility and create a high bandwidth connection from there back to your premise. This places the equipment closer to the cloud but introduces latency between you and the equipment. This works because most applications assume latency between the user and the application, but are far less tolerant of latency between the application tier and the database tier or between the database and its underlying storage. Additionally you can place a traditional WAN Accelerator (Riverbed or the like) on the link between the colocation facility and your premise.

 

For simply connecting your network to Azure, both options work equally well. If you have one of the 5 scenarios above however, the later options (colocation) is better. Depending on the colocation provider, the latency will be low to very low (2-40ms depending on location). All of the major carriers offer a colocation arrangement that is close to the cloud edge. At Avanade, I run a fairly large hybrid cloud lab environment, which allows us to perform real world testing and demonstrations of these concepts. In our case we’ve chosen to host the lab in one of these colocation providers, Equinix. Equinix has an interesting story in that they are a carrier neutral facility. In other words, they run the datacenter and provide a brokerage (The Cloud Exchange), but don’t actually sell private line circuits. You connect thru their brokerage directly to the other party. This is interesting because it means I don’t pay for a 10gig local loop fee, i simply pay for access to the exchange. Right now I buy 2 10gig ports to the exchange, and over those I have 10gigs of express route and 10 gigs of Amazon Direct connect delivered to my edge routers. The latency to a VM running within my private cloud in the lab and a VM in Amazon or azure is sub 2ms. This is incredibly fast. Fast enough to allow me to showcase each of the above scenarios.

 

We routinely move workloads between providers without moving the underlying data by presenting the data disks over iscsi from our Netapp FAS to the VM running at the cloud provider. When we move the VM, we migrate the OS and system state, but the data disk is simply detached and reattached at the new location. This is possible due to the low latency connection and is a capability that Netapp sells as Netapp Private Storage or NPS. This capability is also useful for the regulatory story of keeping data at rest under our control. The data doesn't live in the cloud provider and we can physically pinpoint its location at all times. Further this meets some of the technical compatibility scenario. Because my data is back ended on a capable SAN with proper SAN based replication and performance characteristics, I can run workloads that I may not otherwise have been able to run in cloud due to IOPS, cost for performance or feature challenges with cloud storage.

 

Second, I have compute located in this adjacent space, some of this compute are very large multi TB of RAM, quad socket machines running very large line of business applications. In this case those LOB applications have considerable surround applications that can run just fine on public cloud compute, but this big one isn’t cost effective to run on public cloud since I've already invested in this hardware. Or perhaps I’m not allowed to move it to public due to support constraints. Another example of this is non x86 based platforms. Lets say I have an IBM POWER or mainframe platform. I don’t want to retire it or re platform it due to cost, but I have a number of x86 based surround applications that I’d like to move to cloud. I can place my mainframe within the cloud adjacent space, and access those resources as if they too were running in the public cloud.

 

As you can see, cloud adjacent space opens up a number of truly hybrid scenarios. We’re big supporters of the concept when it makes sense, and wile there are complexities, it can easily unlock move additional workloads to public cloud.

Azure Resource Tagging Best Practices

Taggs.png

We frequently are asked out in the field to help customers understand how tags should be used. Many organizations are worried that tags, being inherently unstructured, will cause confusion. As a result, we’ve come up with a structured way of thinking of and applying tags across your subscription. You can use Azure Resource Manager Policy to enforce this tagging philosophy.

What is a tag?

Tags provide a way to logically organize resources with properties that you define.  Tags can be applied to resource groups or resources directly. Tags can then be used to select resources or resource groups from the console, web portal, PowerShell, or the API. Tags are also particularly useful when you need to organize resource for billing or management.

Key Info

  • You can only apply tags to resources that support Resource Manager operations
    • VMs, Virtual Networks and Storage created through the classic deployment model must be re-deployed through Resource Manager to support tagging
      • A good way around this is to tage the resource group they belong too instead.
    • All ARM resources support tagging
  • Each resource or resource group can have a maximum of 15 tags.
  • Tags are key/value pairs, name is limited to 512 characters, value is limited to 256 characters
  • Tags are free-form text so consistent correct spelling is very important
  • Tags defined on Resource Groups exist only on the group object and do not flow down to the resources under them
    • Through the relationship you can easily find resource by filtering by tagged resource group
    • We recommend keeping the tags to the resource group unless they are resource specific.
  • Each tag is automatically added to the subscription-wide taxonomy
    • Application or resource specific tags will "pollute" the tag list for the entire subscription.

Best Practices

We think its important for a customer to leverage at least some of the tags in a structured way. Given the limit on number of tags we recommend tagging at the group level. We don't feel there is currently a need to set them on the resources as you can easily trace down from the Resource Group.

Primarily we recommend a Service/Application/Environment hierarchy along with an environment type, and a billing identifier be reflected in the tags. To do this we recommend spending the time to define an Application hierarchy that spans everything in your subscription. This hierarchy is a key component to both management and billing and allows you to organize the resource groups logically. Its also important that this hierarchy contain additional metadata about the application, like its importance to the organization and the contact in case there is some issue. By storing this outside the tag, say in a traditional CMDB structure, you cut down on the number of tags you need to use and the complexity of tag enforcement, and reduce the risk of tag drift.

Once a taxonomy is agreed on, create the tags for Service/Application and Environment and set them on each Resource Group. Then set a tag for Environment Type to Dev, Test, Production to allow you to target all Dev, all test, or all production later in policy and thru automation.

For the billing identifier, we recommend some type of internal charge code or order number that corresponds to a General Ledger (GL) line item to bill that resource group's usage to.  Just these few tags would enable you to determine billing usage for VMs running in production for the finance department.

There are several ways to retrieve billing information and the corresponding tags. More information can be found here: https://azure.microsoft.com/en-us/documentation/articles/resource-group-using-tags/#tags-and-billing

Recommended Tags

To be prescriptive, we recommend these tags be set on each resource group:

 

Tag Name
State
Description
Tag Value
Example
AppTaxonomy Required Provides information on who owns the resource group and what purpose it serves within their application Org\App\Environment

USOPS\Finance\Payroll\PayrollTestEnv1

 

MaintenanceWindow Optional Provides a window during which patch and other impacting maintenance can be performed Window in UTC "day:hour:minute-day:hour:minute" Tue:04:00-Tue:04:30
EnvironmentType Required Provides information on what the resource group is used for (useful for maintenance, policy enforcement, chargeback, etc.) Dev, Test, UAT, Prod Test
BillingIdentifier Required Provides a charge code or cost center to attribute the bill for the resources too Costcenter 34821
ExpirationDate Optional Provides a date when the environment is expected to be removed by so that reporting can be done to confirm if an environment is still needed Expiration Date in UTC 2016-06-15T00:0

 

Over time we’ll have additional posts, scripts and other items that will build on this tagging structure.

New! You may find this post with a PowerShell script useful for reporting on resource group tags

Azure Stack TP1 POC Stable Install notes

azurestack1.png

I thought about writing yet another detailed step by step guide for installing TP1, but figured there are enough of those out there. If you need one you can google here. At Azure Field Notes, we're about sharing things we’ve learned thru our experiance in the field, so I decided to hit the high points based on the notes from one of our most stable POC installs to date. Now, this is a fully supported POC, meaning its running on the supported hardware and we’re not modifying any of the install scripts here. We will post other articles soon that cover some of those tricks and tweaks. With that, lets get going:

 

Hardware:

Dell PowerEdge R630 Dual Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 12core (24 cores total) 384GB DDR4 Registered ECC RAM network card: Emulex OCm14104-U1-D 10Gb rNDC (supported for the POC install, not supported for storage spaces direct) Physical Disks: PERC H730 1GB Cache configured as follows 6 300GB SAS 2 disks in a raid 1 mirror for the OS, other 4 disks are pass-through

(Note this box is supported for the POC, but likely won’t match any of the production supported hardware, so don’t go buy a fleet of these expecting to run this stuff in prod). Production supported hardware will be only sold as a pre configured, pre integrated system per this blog post from Mike Neil.

 

Next the process:

  • Perform a complete firmware update of all components
  • configure the array with a single raid 1 mirror and 4 non raid disks set to pass-through mode
  • Set the Bios to boot from UEFI
  • Ensure the time zone and time are set correctly in the Bios
  • Load a servicing OS with boot from vhd support (2016 TP5 works well). Ensure you install to the mirror
  • Setup a static IP on the NIC
  • Copy the stack bits locally
  • Copy any drivers needed for the machine locally
  • Copy the TP4 VHD from the Stack bits to the root
  • Use BCDEdit to change the boot vhd to the TP4 stack vhd.
  • Boot the machine, copy any drivers from the local drive to win\inf and reboot
  • Enable RDP and set the NIC IP
  • Reboot and install chrome or some non edge browser and ensure it is the default (Edge doesn't like the built in admin account by default. You can obviously choose to change its settings, or use IE, but chrome comes in handy).
  • Install update KB3124262 if needed
  • Disable all except the primary NIC port and ensure you have no more then 4 raw disks in disk manager (should match the 4 SAS disks that are passed through)
  • Disable Windows update
  • Disable defender
  • Double check the system time zone and time and ensure it is correct. (Later if you get an AAD Auth Error talking about verifying the message then your time zone is off, check it again)
  • Install Stack. Use an AAD account, set the natVM Static IP to something on the same subnet as your host that isn't used, set the static gateway to your gateway and set the admin password etc.

Script:

[powershell] $secpasswd = ConvertTo-SecureString “urpassword” -AsPlainText -Force $adminpwd = ConvertTo-SecureString “uradminpassword” -AsPlainText -Force $mycreds = New-Object System.Management.Automation.PSCredential (“uraadaccount@yourtenant.com”, $secpasswd) .\DeployAzureStack.ps1 –Verbose -NATVMStaticIP 172.20.40.31/24 –NATVMStaticGateway 172.20.40.1 -adminpassword $adminpwd -AADCredential $mycreds [/powershell]

    • Wait a bit and check for errors. I had none after I did all of the above. If you do run into errors, I highly recommend blowing away the TP4 VHD and starting over. Re running the installer appears to work, but we’ve had instability later with random failures when it doesn't complete in one pass.
    • Login to the client VM and Install Chrome and set to default browser (see above)
    • Go through and disable windows update and defender on all stack VMs (Look for a post on how to do this in an unsupported way as part of the install, some boxes arent domain joined, so a simple GPO wont do it. Update, Matt's post is live HERE)
    • Turn off TIP tests once you think things are working properly (from the client VM) (If its around 12am the TIP tests will be running, wait until they are complete and have cleaned up everything]

[powershell] Disable-ScheduledTask -TaskName AzureStackSystemvalidationTask [/powershell]

Probably not required, but seems to speed up demos and such:

  • Shutdown the environment
  • Up the memory and CPU on all VMs to 16gig min 100%, update cores to 4, 8 or 12 depending on the original setting.
  • Start it back up, wait about 10 minutes and then run the validation script to make sure all the services are online properly.

 

Some additional notes:

  • The MUXVM and BGPBM seem to hang occasionally when connected via the hyper-v console. This appears to be Hyper-v on TP4 issues. During these hangs they also seem to stop responding to the network.
  • Sometimes rebooting a VM causes the host to bluescreen and reboot (also appears to be a TP4 issue)
  • Once your up and running, screens in stack that show a user picker seem to sit and spin for  wile, especially on group pickers. Other things are simple not done (new buttons on some of the resource RPs), or need some time to JIT the first time they’re accessed (Quota menus when creating a plan for example).
  • If doing scripted testing, it seems creating and tearing down an empty resource group through Powershell is pretty repeatable. I’d start with deploying a template containing nothing but a resource group if trying to test toolchains. Next least impactful seems to be storage accounts. Compute/Network seems to be the heaviest, and also the most inconsistent with if it will be successful or not on any given attempt.
  • You’ll notice we’re not installing any additional Resource Providers. We’ve had significant stability problems with the current builds and so are waiting until new builds are available before loading them in anything but our most bleeding edge environments.
  • Finally, Remember folks, its TP1, and according to Mike Neil’s Blog post, we’re a year away.

How to Connect to Azure Stack via Powershell

azurestack.png

Sometimes, when a new product is introduced to the market, some of the basic things are either missing, or not concisely documented. In this case since Azure and Azure stack use the same API and share the same cmdlets, you connect the same way right? The answer is yes, in theory. What's missing from that answer is that for Azure stack you need to setup your Powershell connection using the Add-AzureRMEnvironment command to set the specific AAD endpoints, RM API endpoint, Gallery Endpoint, etc. Since this is a mix of Azure and Azure Stack specific information, we thought a nice, concise script was in order. Now we’re assuming that your stack instance is running TP1 (this should work on TP2 when it ships, but we obviously haven’t tested it yet).

Step 1: Ensure you have the right Powershell Cmdlets on the machine you’ll be running from. For simplicity this is likely the client VM, but you could be running from your VPN connected workstation or, if your a little creative, someplace else. We’re assuming your connected.

We’ve found at present this version works well with TP1.

Step 2: Ensure you have some stuff setup in Azure Stack to see:

  • Start by opening a browser to portal.azurestack.local, login with the admin AAD account used during setup and create a plan and offer.
  • Next, Connect to AAD and a create test AAD user. No special rights needed.
  • Login with the test user and Create a subscription using the offer you created. Take note of the subscription name (Hint: by default its named after the offer).
  • You should now be able to create some resources. We recommend creating a simple resource group with a single VM and/or a storage account (depending on if your VM Resource Provider is behaving itself). Idea here is to create some objects you can see in Powershell.
  • Now hang out and wait for that to complete.

Step 3: Fire up Powershell on the client copy, paste and edit the below script and then run it.

In the script you need to set the following:

  • $AADUserName value to the UPN of the test account you used in AAD
  • $AADPassword value to the password of the test account you used in AAD
  • $AADTenantID value to the domain name used by your AAD tenant.
  • If your the creative type and aren’t running a stock instance of TP1 you might need to adjust some DNS names in the script from azurestack.local to match your environment.
  • On the Get-AzureRMSubscription command, set the –subscriptionName value to the name of the subscription you created.

That should do it. Any questions/issues, leave a comment.

[powershell] $AADUserName='YourAADAccount@Yourdomain' $AADPassword='YourAADPassword'|ConvertTo-SecureString -Force -AsPlainText $AADCredential=New-Object PSCredential($AADUserName,$AADPassword)

$AADTenantID = "YourAADDomain" Add-AzureRmEnvironment -Name "Azure Stack" ` -ActiveDirectoryEndpoint ("https://login.windows.net/$AADTenantID/") ` -ActiveDirectoryServiceEndpointResourceId "https://azurestack.local-api/" ` -ResourceManagerEndpoint ("Https://api.azurestack.local/") ` -GalleryEndpoint ("Https://gallery.azurestack.local:30016/") ` -GraphEndpoint "https://graph.windows.net/"

$env = Get-AzureRmEnvironment 'Azure Stack' Add-AzureRmAccount -Environment $env -Credential $AADCredential -Verbose Get-AzureRmSubscription -SubscriptionName "youroffer" | Select-AzureRmSubscription Get-AzureRmResource [/powershell]

Welcome to Azure Field Notes

wereback.jpg

We’re back! Several years ago a group of us used to post to TheCloudBuilderBlog.com. There we focused primarily on System Center based private clouds, Orchestrator, SCVMM, SCSM and SCOM. Over the past several years the marketplace has changed dramatically. System Center is moving to cloud thru tools like OMS and App Insights. Microsoft’s public cloud has a level of maturity in the marketplace, and  everyone is talking about cloud. We’ve changed as well. We’ve worked on a number of different enterprise cloud projects, both public and private, both Microsoft and non Microsoft since then. As a result, we’ve decided to share our knowledge with the wider community and re-launch. With the re-launch and shift in topics comes a new name. Azure Field Notes. This refers to our experiences coming from the field, and the focus on Azure and Azure Stack. (That’s right, we’re going to cover Azure Stack as well). We’ll still touch on System Center topics occasionally, especially with regards to integrations, but the focus of this blog will be covering the Microsoft Cloud. We’ll cover deep technical topics, scripts, and how to guides, but we’ll also cover higher level strategic thoughts, where we see the market going, what we believe is important for an enterprise to think about as part of the journey to cloud. And of course, as always, any posts, opinions, or information on this site is our own and does not represent the positions of Microsoft or our employers.