This was an assignment for an optimization (COMP 671) course I took in 2012. Word's translation from .docx to .html looked OK but oh my the html itself was ugly. This is being transcribed manually from Word, Powerpoint, and original R output into markdown (and trying out the newly installed $math \KaTeX math$ capabilities).

COMP671_2012_Slide02-1

Problem Description

The primary data center for Qualcomm Engineering is reaching its maximum capacity. There is a demand for more compute resources which must be physically located somewhere. The problem I am trying to address is which of the following strategies to propose:

  1. A central data center to serve all (domestic) engineering requirements. If so, which center (location) would be most optimal in terms of meeting requirements.
  2. Regional data center strategy Asia-PAC, Europe, US-West, US-East, et al.
  3. Local data centers at each engineering site to minimize the latency in network traffic.
  4. Outsource all data center activities (e.g. AWS or colocation).

Scope:

Only the first strategy, a central data center, will be considered for this exercise as a means of using different optimization techniques.

Solution:

The goal is to determine the optimum location of a single domestic engineering datacenter that is sensitive to latency standards and power expense consideration.

Two different optimization techniques will be used for a two phased approach:

  1. Optimize data center location for all domestic sites county by county
  2. Determine the optimal location within a county

OBJECTIVE FUNCTION:

The two main objective functions for decision making are a Latency Function and Power (cost) Function.

Latency Function

%math \begin{aligned} F(N,d) &= log_{10}(1+N)(1+d)-0.2\dfrac{1-d}{2000} \\ N &= \small\textsf{number of engineers at site} \\ d &= \small \textsf{distance from site to potential datacenter location} \end{aligned} math%

The objective is to maximize the F(N, d) subject to the constraint d ≤ 2000. That is anything over 2k miles is unacceptable. %math F(N, d≥2000) = 0 math%

Another constraint is that the datacenter must be within the Continental United States. This is achieved by only providing input data within the desired regions.

This function is an approximation based on experimental data. Some details on the experiment are in the appendix. This isn't necessarily the same function that would be used "in production."

Power Function
Consider Power Function to be:

%math \begin{aligned} F(P)&=C\dfrac{Q_{maxR}-Q_{rate}}{Q_{maxR}-Q_{minR}}\\ C&=\small\textsf{scaling factor} \\ Q_{maxR}&=\small\textsf{Maximum Power Rate amongst all the counties within United States} \\ Q_{minR}&=\small\textsf{Minimum Power Rate amongst all the counties within United States} \\ Q_{rate}&=\small\textsf{Power Rate for a particular county where F(P) is being evaluated} \end{aligned} math%

Distances between locations "1" and "2" calculated using the spherical law of cosines:

%math \begin{aligned} d&=R*acos(sin(\phi_1)*sin(\phi_2)+cos(\phi_1)*cos(\phi_2)*cos(\lambda_2-\lambda_1)) \\ R&=\small\textsf{radius of Earth} \\ \phi&=\small\textsf{latitude} \\ \lambda&=\small\textsf{longitude} \end{aligned} math%

Remove from consideration any counties that do not meet constraints.

The solution is to maximize $math F_p + F(N,d)math$, subject to the constraint d≤2000.

PHASE 1 : Optimize the data center location for all domestic sites by county


For the purpose of this study only consider the largest 11 engineering locations.

COMP671_2012_Slide06

code	site    	state	engineers
AUS	Austin  	TX	250
BOS	Boxborough	MA	150
CHD	Chandler	AZ	75
NUQ	Sunnyvale	CA	500
OCF	Ocala   	FL	100
ORL	Orlando 	FL	50
RDU	RTP     	NC	1000
SAN	San Diego	CA	8000
SJC	San Jose	CA	1500
SNA	Irvine  	CA	200
WBU	Boulder 	CO	400

COMP671_2012_Slide07

The Latency Function plotted for sites of different sizes. The goal is to ensure that smaller sites are not neglected or overwhelmed by large sites.
COMP671_2012_Slide08

This is the Latency Function plotted for the center of each county and each of the 11 engineering datacenters, without constraints.
COMP671_2012_Slide09
The influence of San Diego, Bay Area, Boulder, Austin, and RTP can be seen. It's interesting to note how the algorithm favors the two Florida sites but Boxborough (near Boston) has negligible impact even though the two areas have the same total number of engineers.

Taking into consideration the constraint that we cannot have the distance of a data center from any engineering site exceeding 2000 miles due to effect on latency our solution space is restricted to this space.
COMP671_2012_Slide10

The area within all constraints:
COMP671_2012_Slide11

Recomputing the Latency Function for counties within the constrained area results in the following map:
COMP671_2012_Slide12

The greyed out counties in the above graph indicates failure to meet the constraint that d≤2000 for all sites.


In the next step I consider the industrial power rate across all the counties in the United States and plot the Power Function. ![COMP671_2012_Slide14](https://s3-us-west-2.amazonaws.com/com.altgnat.www.blog/2017/12/COMP671_2012_Slide14.png)

Assume worse case (most expensive) for any county where the power rate was not available.
COMP671_2012_Slide15

Sum the latency and power rate functions for the objective function for all counties which fall with the constraint d≤2000 miles.
COMP671_2012_Slide17

Listing of the Top Ten counties which satisfy the objective function:
COMP671_2012_Slide19
See below text for explanation of the red/green color coding.

Several practical considerations should be taken into account at this stage. The first is that among the Top Ten contenders there is very little deviation of the objective function $math F_{sum}=F(P)+F(L) math$. This means that factors that are not included in this analysis will be the deciding factor. For instance, counties or states may provide tax incentives to attract investment within a particular region. There may be existing QCOM offices that we'd want to be co-located with, large airports, existing vendors or outsourcing agents that we have existing relationships with, or existing abandoned infrastructure that we could acquire that would save time and money. Also not considered is the source of power. For instance, in Illinois it is quite likely that most of the power would be generated by coal. The company may not want to be seen as a large consumer of high-carbon output energy. This would mean we would likely need to invest in "green technology" to offset these emissions/perceptions. An example of this is a recent datacenter built by Apple in North Carolina. Apple invested $50M in solar arrays amid protest by Greenpeace.

Having multiple viable candidate counties is a good problem to have - we are not locked into any single location. In order to proceed with this analysis though we must pick a winner with the data we have.

Highlighted on the above table is some data pulled with the input data provided by the U.S. Energy Information Administration on 13 Nov 2012 that may influence the viability of the selected counties. Some "red flags", highlighted in red, include a small number of industrial customers (Consumers) for Hill and Macoupin Counties. The small number of customers is also reflected in low Sales and Revenue numbers for the power suppliers of those counties. The reason this is red flagged is because a large datacenter may tax the existing infrastructure leading to downtimes or large capital expenditures required to provide upgraded power reliability. The "non-technical" issues around data center power consumption should not be taken lightly as evidenced by the experience of companies that built large datacenters in the Columbia River region.

Another red flag is the relatively high power price in Boulder Colorado. This may be an opportunity though, to negotiate a lower price, as Microsoft did in Washington State. Boulder is also home to a large Qualcomm engineering site, close to multiple internet backbones, and a large airport. It has a large county population and a large high-tech presence meaning that a solid labor pool is available to hire from. Boulder also had the best Latency Function among all the contenders so any further study should investigate the ability to negotiate lower power rates from the utility companies or local governments.

"Green flags" among the top contenders involve large customer bases for Fremont and Park Counties in Colorado and particularly low power rates for the counties located in Illinois.

To proceed with this optimization, using the data we have, let's proceed with Hill County, Texas.

CONCLUSION PHASE 1:

Hill County is optimial when it comes to maximizing the Power and Latency functions.
COMP671_2012_Slide20

PHASE TWO: Optimize intra-county

In this phase determine the optimal location of data center within Hill County. For the purpose of the study ignore constraints such as where facilities, towns, networks, etc., are located. The Power Function is assumed constant across the county so it may be ignored.

Objective:

Search for optimal location within county, keeping maximum latency constraints and that the solution must be within county limits for which we “discretize” county borders. Use the built-in solver function within Excel to determine the optimal location based on Nonlinear generalized reduced gradient method with the approximated county boundary inputs and the constraint that distance of data center site from each of the 11 engineering site does not exceed the constraint of 2000 miles.
COMP671_2012_Slide23

CONCLUSION PHASE 2:

After solving the optimization problem in Excel Solver we arrive at the optimal location for data center location to be the point at the left hand bottom corner point shown in the graph below:
COMP671_2012_Slide24
The graphical solution confirms the solver output as well - the graphic shows the solution favors the bottom left area in green. Again, this is the optimization based on the input data. If Hill County was determined to be the best county to explore after the first phase then the next step would likely be to enlist an agent familiar with datacenters that knows the region. They would know where facilities within the county are located. Facilities such as existing power and network grids, available real estate, etc. These would likely be more influential in our decision making process than the purely analytic optimization given here.

Appendix:

Attachments:

There were three primary systems used in this assignment. The power data was drawn from several Excel worksheets and imported into R. Several database joins were performed to get the data into a usable format. The output of this was put in a single flat file "power_grid.csv". The first ten lines of the file are:

COUNTY	STATE	CONSUMERS	REVENUES	SALES	RATE	POPULATION	LAT	LONG
Autauga	AL	5186	1307665	21680815	0.0603144	54571	32.5364	-86.6445
Barbour	AL	12952	1587822	25981954	0.0611125	27457	31.8707	-85.4055
Bullock	AL	5192	1315591	21764771	0.0604459	10914	32.1018	-85.7173
Chilton	AL	5186	1307665	21680815	0.0603144	43643	32.8541	-86.7266
Cleburne	AL	28379	1761949	29265866	0.0602049	14972	33.672	-85.5161
Coffee	AL	14470	2926811	46104752	0.0634818	49948	31.4022	-85.9892
Colbert	AL	29	43308	561328	0.0771528	54428	34.7031	-87.8015
Conecuh	AL	5190	1309636	21705444	0.0603368	13228	31.4283	-86.992
Coosa	AL	5186	1307665	21680815	0.0603144	11539	32.9314	-86.2435

In addition, information about the QUALCOMM offices was collected into two CSV files.

qcomoffices_locations.csv

code,name,city,country,country_code,lat,long
"ATL","Hartsfield-jackson Atlanta International","Atlanta, GA","United States","US",33.636667,-84.428056
"AUS","Austin-bergstrom International","Austin","United States","US",30.194444,-97.670000
"BOS","Logan International","Boston","United States","US",42.363056,-71.000000
"CHD","Williams Gateway Airport","Mesa","United States","US",33.307778,-111.655556
"DCA","Ronald Reagan Washington National Airport","Washington, DC","United States","US",38.851944,-77.037778
"DTW","Detroit Metropolitan Wayne County","Detroit, MI","United States","US",42.212500,-83.353333
"JBK","Berkeley","Berkeley","United States","US",37.950000,-122.300000
"JFK","John F Kennedy Intl","New York","United States","US",40.633333,-73.783333
"LAS","Mc Carran Intl","Las Vegas","United States","US",36.083333,-115.166667

qcomoffices_size.csv

code,site,engineers
"ATL","Atlanta, GA",0
"AUS","Austin, TX",250
"BOS","Boxborough, MA",150
"CHD","Chandler, AZ",75
"DCA","Washington, DC",0
"DTW","Detroit, MI",0
"JBK","Berkeley, CA",0
"JFK","New York, NY",0
"LAS","Las Vegas",0

These input files were used by the program R, along with the following R script, to create the above graphics. The Excel spreadsheet is available upon request.

R Libraries:

doit.r

# Keith P Jolley
# 12 Nov 2012
#
library(ggplot2)
library(ggmap)
library(maps)
library(plyr)
library(rgl)

savepdf <- T
showplt <- T

bg_fill <- "grey"
myred   <- "#BB2020"
myyellow<- "#FFFFBF"
mygreen <- "#1B7837"
myblue  <- "#1F78B4"

pgrid   <- read.csv("power_grid.csv", sep="\t")
left    <- round(min(pgrid$LONGITUDE) - 5)  # x axis
right   <- round(max(pgrid$LONGITUDE) + 5)
top     <- round(max(pgrid$LATITUDE)  + 2)  # y axis
bottom  <- round(min(pgrid$LATITUDE)  - 2)

qcloc      <- read.csv("qcomoffices_locations.csv")
qcsize     <- read.csv("qcomoffices_size.csv")
qcom       <- merge(qcloc, qcsize)
state      <- cbind(state.name, state.abb)
pgrid      <- merge(pgrid, state, by.x="STATE", by.y="state.abb")
pgrid      <- cbind(pgrid, tolower(paste(pgrid$state.name, pgrid$COUNTY))) # concatenate state and county names into a single column
colnames(pgrid)[length(pgrid)] <- "SC"  # rename the last column something sensible

counties   <- map_data("county") # add the same columns as i just did to pgrid
counties   <- cbind(counties, tolower(paste(counties$region, counties$subregion))) # concatenate state and county names into a single column
colnames(counties)[length(counties)] <- "SC"  # rename the last column something sensible

# see "ggplot2" by hadley wickham, springer, 2009, page 78 for more info
#merge the pgrid data and the county data
powcou        <- merge(counties, pgrid, by="SC", all=TRUE)
powcou        <- powcou[order(powcou$order), ]
powcou.hi     <- max(powcou$RATE,na.rm=TRUE)  # nationwide values
powcou.lo     <- min(powcou$RATE,na.rm=TRUE)
powcou.mean   <- mean(powcou$RATE,na.rm=TRUE)
#powcou.HI     <- na.omit(ddply(powcou, .(STATE), summarise, max=max(RATE))) # state by state max
#names(powcou)[names(powcou) == "breaks"] <- "Industrial Power ($/MW)"  # rename the last column something sensible

# now, start merging all these into a single dataframe.  create a type for each
#airports$type   <- "airport"
#big_cities$type <- "city"
#powcou$type     <- "county"
qcom$type  <- ifelse(qcom$engineers > 0, "Engineering", "Business")
#mydata          <- merge(airports, big_cities, all=TRUE)
#mydata          <- merge(mydata,   qcom,       all=TRUE)

p <- ggplot(qcom, aes(long, lat))
p <- p + scale_x_continuous(limits=c(left,right))
p <- p + scale_y_continuous(limits=c(bottom,top))
p <- p + xlab("") + ylab("") + coord_map()
p <- p + borders("state", size=0.5, fill=bg_fill, colour="white")
p <- p + theme(legend.position="bottom")
if (showplt) print(p)
if (savepdf) {
  pdf(file="01_USA.pdf", width=10, height=7.5)
  print(p)
  dev.off()
}
p1 <- p  + scale_colour_manual(values=c(mygreen, myblue), name="Office Type: ")
p1 <- p1 + geom_point(aes(color=type), size=3.0, shape=16)
if (showplt) print(p1)
if (savepdf) {
  pdf(file="02_AllOffices.pdf", width=10, height=7.5)
  print(p1)
  dev.off()
}

eng <- subset(qcom, type == "Engineering")
eng$vjust<--0.3
eng$hjust<-0.5
                                  eng[eng$code=="CHD",]$hjust<-0.1   # Chandler
                                  eng[eng$code=="NUQ",]$hjust<-0.6   # Sunnyvale
eng[eng$code=="ORL",]$vjust<-1.3                                     # Orlando
eng[eng$code=="SJC",]$vjust<-1.3; eng[eng$code=="SJC",]$hjust<-0.5   # San Jose
eng[eng$code=="SAN",]$vjust<-1.3; eng[eng$code=="SAN",]$hjust<-0.7   # San Diego
                                  eng[eng$code=="SNA",]$hjust<-0.7   # Irvine


p1 <- p  + scale_colour_manual(values=c(myblue, myblue), name="Office Type: ")
p1 <- p1 + geom_point(data=eng, aes(long,lat,color=eng$type), size=3.0, shape=16)
if (showplt) print(p1)
if (savepdf) {
  pdf(file="03_EngOffices.pdf", width=10, height=7.5)
  print(p1)
  dev.off()
}

p2 <- p1 + geom_text(data=eng, aes(long,lat,label=site), vjust=eng$vjust, hjust=eng$hjust)
if (showplt) print(p2)
if (savepdf) {
  pdf(file="04_EngOfficeName.pdf", width=10, height=7.5)
  print(p2)
  dev.off()
}

p2 <- p1 + geom_text(data=eng, aes(long,lat,label=engineers), vjust=eng$vjust, hjust=eng$hjust)
prad <- p2  # save this for later
if (showplt) print(p2)
if (savepdf) {
  pdf(file="05_EngOfficeSize.pdf", width=10, height=7.5)
  print(p2)
  dev.off()
}

# power rates with missing counties == grey
p <- ggplot(powcou, aes(long, lat, group=group, fill=RATE))
p <- p + scale_x_continuous(limits=c(left,right))
p <- p + scale_y_continuous(limits=c(bottom,top))
p <- p + xlab("") + ylab("") + coord_map()
p <- p + borders("state", size=0.5, fill="grey", colour="white")
p <- p + geom_polygon(colour="white", size=0.15)
p <- p + theme(legend.position="bottom")
p <- p + scale_fill_gradient2("Industrial Power Rate\n($/MW-hr)", low=mygreen, mid=myyellow, high=myred, limits=c(powcou.lo, powcou.hi), midpoint=powcou.mean)
if (showplt) print(p)
if (savepdf) {
  pdf(file="06_CountyPower.pdf", width=10, height=7.5)
  print(p)
  dev.off()
}

# set all unknown county power rates to the highest rate in that state
powcou[is.na(powcou$RATE),]$RATE <- powcou.hi
p <- ggplot(powcou, aes(long, lat, group=group, fill=RATE))
p <- p + scale_x_continuous(limits=c(left,right))
p <- p + scale_y_continuous(limits=c(bottom,top))
p <- p + xlab("") + ylab("") + coord_map()
p <- p + borders("state", size=0.5, fill="grey", colour="white")
p <- p + geom_polygon(colour="white", size=0.15)
p <- p + theme(legend.position="bottom")
p <- p + scale_fill_gradient2("Industrial Power Rate\n($/MW-hr)", low=mygreen, mid=myyellow, high=myred, limits=c(powcou.lo, powcou.hi), midpoint=powcou.mean)
if (showplt) print(p)
if (savepdf) {
  pdf(file="07_CountyPowerFilled.pdf", width=10, height=7.5)
  print(p)
  dev.off()
}

# latency rings - big
rmax<-350
p <- ggplot(eng, aes(long, lat))
p <- p + scale_x_continuous(limits=c(left,right))
p <- p + scale_y_continuous(limits=c(bottom,top))
p <- p + xlab("") + ylab("") + coord_map()
p <- p + borders("state", size=0.5, fill=bg_fill, colour="white")
p <- p + theme(legend.position="none")
p <- p + scale_colour_brewer(type="div", palette="Paired")
p <- p + geom_point(aes(color=code), size=3.0, shape=16)
p2 <- p
p <- p + geom_point(aes(color=code), size=rmax, shape=1)
if (showplt) print(p)
if (savepdf) {
  pdf(file="08_Influence_A.pdf", width=10, height=7.5)
  print(p)
  dev.off()
}

# this was a good idea trying to get out
p <- p2
imax <- 150
for (i in 1:imax) { p <- p + geom_point(aes(color=code), size=rmax/i, shape=16, alpha=I(4.0/imax)) }
p <- p + geom_point(aes(color=code), size=3.0, shape=16)
p <- p + geom_point(aes(color=code), size=rmax, shape=1)
if (showplt) print(p)
if (savepdf) {
  pdf(file="08_Influence_B.pdf", width=10, height=7.5)
  print(p)
  dev.off()
}

# try again
p <- p2
imax <- 7
for (i in 1:imax) { p <- p + geom_point(aes(color=code), size=i*rmax/imax, shape=1, alpha=I(0.4*(imax-i)/imax)) }
p <- p + geom_point(aes(color=code), size=rmax/imax/2, shape=1, alpha=I(0.4))
#p2 <- p2 + geom_point(aes(color=code), size=3.0, shape=16)
#p2 <- p2 + geom_point(aes(color=code), size=rmax, shape=1)
if (showplt) print(p)
if (savepdf) {
  pdf(file="08_Influence_C.pdf", width=10, height=7.5)
  print(p)
  dev.off()
}

eng$far <- 0.01
eng[eng$code=="NUQ" | eng$code=="BOS",]$far <- 1

p <- ggplot(eng, aes(long, lat, color=code))
p <- p + scale_x_continuous(limits=c(left,right))
p <- p + scale_y_continuous(limits=c(bottom,top))
p <- p + xlab("") + ylab("") + coord_map()
p <- p + borders("state", size=0.5, fill=bg_fill, colour="white")
p <- p + theme(legend.position="none")
p <- p + scale_colour_brewer(type="div", palette="Paired")
p <- p + geom_point(alpha=I(eng$far/5), size=rmax, shape=16)
p <- p + geom_point(size=3.0,  shape=16)
p <- p + geom_point(alpha=I(eng$far), size=rmax, shape=1)
if (showplt) print(p)
if (savepdf) {
  pdf(file="09_Venn.pdf", width=10, height=7.5)
  print(p)
  dev.off()
}

rad <- function (deg) {
  return(deg * 2.0 * pi / 360.0)
}

# return the distance in miles between two positions
# http://www.movable-type.co.uk/scripts/latlong.html
dm <- function(lat1, lon1, lat2, lon2) {
  R <- 3956.6 # miles - earth radius
  # convert input from degrees to radians
  lat1 <- rad(lat1)
  lon1 <- rad(lon1)
  lat2 <- rad(lat2)
  lon2 <- rad(lon2)
  return(acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(lon2-lon1)) * R);
}

#  d <- log1p(pop) * (1.0+r)**(-0.2) * (1.0-r/maxrad)

# latency as a function of distance (miles)
latency.dist  <- function(pop, r) {
  maxrad<-2000.0 # this function goes to zero at "maxrad"
  d <- ifelse (r>maxrad, 0, log1p(pop) * (1.0+r)**(-0.2))
  d <- d * (1.0-r/maxrad)
  return(d)
}

# latency as a function of relative positions (degrees)
latency.coord <- function(pop, lat1, lon1, lat2, lon2) {
  r <- dm(lat1, lon1, lat2, lon2)
  return(latency.dist(pop, r))
}

df <- expand.grid(population=10**(0:4), distance=0:2500)
df$latency <- latency.dist(df$population, df$distance)
p <- ggplot(df,aes(x=distance,y=latency,group=population, colour=sprintf("%d",population))) + geom_line()
p <- p + scale_colour_hue("Engineers at Site") + ylab("latency penalty ($/yr)") + xlab("distance (miles)")
if (showplt) print(p)
if (savepdf) {
  pdf(file="10_latencyVdist.pdf", width=10, height=7.5)
  print(p)
  dev.off()
}

j <- expand.grid(unique(counties$SC),eng$code) # a list of each county and each office
names(j) <- c("sc", "code")
j <- merge(j, eng[c("code", "lat", "long", "engineers")])
j <- merge(j, ddply(powcou, .(SC), summarize, lat2=mean(lat), lon2=mean(long)), by.x=c("sc"), by.y=("SC"), all=F)
names(j) <- c("sc", "code", "site_lat", "site_lon", "engineers", "county_lat", "county_lon")

# here's the contribution of each site
j$F_latency <- with(j, latency.coord(engineers, site_lat, site_lon, county_lat, county_lon))

# summarize for a total at each county
t <- na.omit(ddply(j, .(sc), summarise, sum=sum(F_latency))) # county by county total
names(t) <- c("sc", "F_latency")
q <- merge(powcou, t, by.x="SC", by.y="sc", all.x=TRUE)

rm(t)

q.hiL    <- max(q$F_latency,na.rm=TRUE)  # nationwide values
q.loL    <- min(q$F_latency,na.rm=TRUE)
q.meanL  <- mean(q$F_latency,na.rm=TRUE)

# set all unknown county power rates to the highest rate in that state
#p <- ggplot(powcou, aes(long, lat, group=group, fill=F_latency))
q[is.na(q$F_latency),]$F_latency <-0
q <- q[order(q$order), ]
p <- ggplot(q, aes(long, lat, group=group, fill=F_latency))
p <- p + scale_x_continuous(limits=c(left,right))
p <- p + scale_y_continuous(limits=c(bottom,top))
p <- p + xlab("") + ylab("") + coord_map()
p <- p + borders("state", size=0.5, fill="grey", colour="white")
p <- p + geom_polygon(colour="white", size=0.15)
p <- p + theme(legend.position="bottom")
p <- p + scale_fill_gradient2("F_latency", low=myred, mid=myyellow, high=mygreen, limits=c(q.loL, q.hiL), midpoint=q.meanL)
if (showplt) print(p)
if (savepdf) {
  pdf(file="11_map_z.pdf", width=10, height=7.5)
  print(p)
  dev.off()
}

# zero counties - these are counties outside the venn diagram
zc <-j[j$F_latency==0,]$sc
q[q$SC %in% zc,]$F_latency <-0 
t <- na.omit(ddply(j, .(sc), summarise, sum=sum(F_latency))) # county by county total

q.maxL   <- max(q$F_latency,na.rm=TRUE)  # nationwide values
q.minL   <- min (q[q$F_latency>0,]$F_latency,na.rm=TRUE)
q.meanL  <- mean(q[q$F_latency>0,]$F_latency,na.rm=TRUE)

q <- q[order(q$order), ]
p <- ggplot(q, aes(long, lat, group=group, fill=F_latency))
p <- p + scale_x_continuous(limits=c(left,right))
p <- p + scale_y_continuous(limits=c(bottom,top))
p <- p + xlab("") + ylab("") + coord_map()
p <- p + borders("state", size=0.5, fill="grey", colour="white")
p <- p + geom_polygon(colour="white", size=0.15)
p <- p + theme(legend.position="bottom")
p <- p + scale_fill_gradient2("F_latency", low=myred, mid=myyellow, high=mygreen, limits=c(q.minL, q.maxL), midpoint=q.meanL)
if (showplt) print(p)
if (savepdf) {
  pdf(file="12_eligible_counties.pdf", width=10, height=7.5)
  print(p)
  dev.off()
}

# rescale and plot the power
q$F_power <- 0
q.maxR    <- max(q$RATE,  na.rm=TRUE)
q.minR    <- min(q$RATE,  na.rm=TRUE)
q.meanR   <- mean(q$RATE, na.rm=TRUE)

q$F_power <- q.maxL * (q.maxR - q$RATE)/(q.maxR - q.minR)
q$F_sum   <- ifelse(q$F_latency > 0, q$F_power + q$F_latency, 0)

q.maxS    <- max(q$F_sum, na.rm=TRUE)
q.minS    <- min(q[q$F_sum>0,]$F_sum, na.rm=TRUE)
q.meanS   <- mean(q[q$F_sum>0,]$F_sum, na.rm=TRUE)

q <- q[order(q$order), ]
p <- ggplot(q, aes(long, lat, group=group, fill=F_sum))
p <- p + scale_x_continuous(limits=c(left,right))
p <- p + scale_y_continuous(limits=c(bottom,top))
p <- p + xlab("") + ylab("") + coord_map()
p <- p + borders("state", size=0.5, fill="grey", colour="white")
p <- p + geom_polygon(colour="white", size=0.15)
p <- p + theme(legend.position="bottom")
p <- p + scale_fill_gradient2("F_sum", low=myred, mid=myyellow, high=mygreen, limits=c(q.minS, q.maxS), midpoint=q.meanS)
if (showplt) print(p)
if (savepdf) {
  pdf(file="13_Sum.pdf", width=10, height=7.5)
  print(p)
  dev.off()
}

# best ten counties
bt <- factor(unique(q[q$F_sum>=head(tail(sort(unique(q$F_sum)),n=10),n=1),]$SC))
qq <- merge(as.data.frame(bt), q, by.y="SC", by.x="bt")
qq <- unique(qq[c("bt", "group", "order", "COUNTY", "state.name", "POPULATION", "LATITUDE", "LONGITUDE", "CONSUMERS", "REVENUES", "SALES", "RATE", "F_latency", "F_power", "F_sum")])
names(qq) <- c("sc", "group", "order", "County", "State", "Co_Pop", "lat", "long", "Consumers", "Revenue", "Sales", "Ind_Rate", "F_latency", "F_power", "F_sum")
qq <- qq[order(qq$F_sum, decreasing=TRUE),]
options(width=200)
print(unique(qq[4:length(qq)]))
qq$labels <- with(qq, sprintf("%s\n%s", County, State))
p <- ggplot(qq,aes(x=labels))
p <- p + geom_bar(aes(y=F_sum),fill="#888888", color="black")
p <- p + geom_bar(aes(y=F_power),fill="#777777",color="black")
p <- p + xlab(c("Region")) + ylab(c("F_power + F_latency"))
p <- p + theme(legend.position="bottom")
if (showplt) print(p)
if (savepdf) {
  pdf(file="14_totals.pdf", width=10, height=3.0)
  print(p)
  dev.off()
}

winner <- sprintf("%s", unique(qq[max(qq$F_sum)==qq$F_sum,]$sc)) # now r is just being silly
wdf <- q[q$SC == winner,]

l <- min(wdf[wdf$SC == winner, ]$long)
r <- max(wdf[wdf$SC == winner, ]$long)
t <- max(wdf[wdf$SC == winner, ]$lat)
b <- min(wdf[wdf$SC == winner, ]$lat)

mymap <- get_map(location=c(left=l,bottom=b,right=r,top=t))
p <- ggmap(mymap)
p <- p + geom_polygon(data=wdf, aes(long, lat, group=group), fill="orange", colour="darkred", alpha=I(0.15), size=1)
p <- p + xlab("") + ylab("") + coord_map()
if (showplt) print(p)
if (savepdf) {
  pdf(file="15_hilltx.pdf", width=10, height=7.5)
  print(p)
  dev.off()
}

# show the bounding box around the coutny
min_long <- -97.30
max_long <- -96.85
min_lat  <-  31.75
max_lat  <-  32.23
p.org <- p
p <- p + geom_rect(xmin=min_long, xmax=max_long, ymin=min_lat, ymax=max_lat, fill=mygreen, colour=mygreen, alpha=I(0.1), size=1)
if (showplt) print(p)
if (savepdf) {
  pdf(file="16_hilltxAppx.pdf", width=10, height=7.5)
  print(p)
  dev.off()
}

# break the bounding box into parts
gridsize <- 100
lons <- seq(max_long, min_long, length.out=gridsize)
lats <- seq(min_lat,  max_lat,  length.out=gridsize)
J <- with(eng, expand.grid(code, lats, lons))
names(J) <- c("code", "lat2", "lon2")
J <- merge(eng[c("code", "engineers", "lat", "long")], J)
J$z <- with(J, latency.coord(engineers, lat, long, lat2, lon2))
J <- merge(J, ddply(J, .(lat2, lon2), summarize, sum=sum(z)))

# optimal point according to excel optimizer
longO <- -97.30
latO  <-  31.75

minS  <- min(J$sum)
maxS  <- max(J$sum)
meanS <- mean(J$sum)
strS  <- sprintf("Optimal location: [%3.2f, %3.2f]", longO, latO)

# p.org has the picture of the county on it
p <- p.org
# add the latency data, set the gradient, draw a box around it, and put the legend at the bottom of the picture
p <- p + geom_tile(data=J, aes(x=lon2, y=lat2, fill=sum), alpha=I(0.15), linetype=0, alpha=I(0.5))
p <- p + scale_fill_gradient2("F_latency", low=myred, mid=myyellow, high=mygreen, limits=c(minS, maxS), midpoint=meanS)
p <- p + geom_rect( xmin=min_long, xmax=max_long, ymin=min_lat, ymax=max_lat, fill=NA, colour=mygreen, size=1)
p <- p + theme(legend.position="bottom")
# add some shading behind the text, then text
p <- p + geom_rect( xmin=min_long, xmax=max_long, ymin=latO-0.045, ymax=latO-0.015, fill="grey80", alpha=I(0.4), colour=NA)
p <- p + geom_text( x=longO, y=latO, label=strS, color="black", vjust=2, hjust=0)
# pretty little marker
p <- p + geom_point(x=longO, y=latO, shape=19, size=5, color="black")
p <- p + geom_point(x=longO, y=latO, shape=8,  size=5, color="yellow")
p <- p + geom_point(x=longO, y=latO, shape=19, size=2, color="red")
if (showplt) print(p)
if (savepdf) {
  pdf(file="17_optimized.pdf", width=10, height=7.5)
  print(p)
  dev.off()
}

Latency Testing Procedure

The latency function listed here is a simplification of the results from a separate study led by the Engineering Compute team at Qualcomm. The first round of testing was inconclusive so the test procedure was completely redone by the Qualcomm Human Factors team.

The testing environment included a small compute environment consisting of a few production workstations but no local resources such as file or license servers. This lan was connected to Qualcomm's intranet through a "bump-in-the-wire" device that injected selected amounts of latency and/or dropped packets. Bandwidth was not a limiting factor. From the networking team we had detailed knowledge of expected network performance between the various sites.

The engineers running the tests were self-selected from teams across Qualcomm though Layout and RF Designers were our primary concern. The goal was to have those best able to represent their team's interests, and this project's worst critics, doing the evaluations. The selections were vetted by engineering management as qualified to represent their particular function. Each team devised their own set of tests that they would run in the lab in order to capture the most frequent, demanding, and latency sensitive tasks within that work function.

The users then ran their test suite multiple times, each time with different levels of latency. During each test the users recorded their subjective assesment of the system's usability. The ratings were scored from "no noticable latency" to "some latency noticable but little to no impact", up to "unacceptable."

latency pain assessment tool
Not the actual scoring system.

Screen capture and video of the user recorded objective performance measurements. Users were encouraged to run as many test suites as they felt necessary and the testing was "double-blind." This means that neither the test-subjects or the lab admins knew what the actual latency was during the testing.

Different engineering functions had different results, but were consistent across users within each function. This was very different than the results we got before we updated our test procedures. Prior to "double-blind" there was no correlation between usability and latency. As expected, Layout and RF designers were the most sensitive to increasing latency.

For latency testing we did not use any other devices than whatever was current standard issue for that engineering function. A different project tested display technologies in a similar manner.

This may be obvious but it is important to note as a first-pass filter that the test results give the maximum distance between an engineering function and its datacenter, and by extension, the maximum distance between two engineering functions that share a datacenter.

The actual testing results are company proprietary and I no longer have access to them.

Miscellaneous

Latency Function without constraints. X-Y coordinates are latitude and longitude.
COMP671_2012_Slide26