This was an assignment for an optimization (COMP 671) course. ## Problem Description

The primary data center for Qualcomm Engineering is reaching its maximum capacity. There is a demand for more compute resources which must be physically located somewhere. The problem I am trying to address is which of the following strategies to propose:

1. A central data center to serve all (domestic) engineering requirements. If so, which center (location) would be most optimal in terms of meeting requirements.
2. Regional data center strategy Asia-PAC, Europe, US-West, US-East, et al.
3. Local data centers at each engineering site to minimize the latency in network traffic.
4. Outsource all data center activities (e.g. AWS or colocation).

## Scope:

Only the first strategy, a central data center, will be considered for this exercise as a means of using different optimization techniques.

## Solution:

The goal is to determine the optimum location of a single domestic engineering datacenter that is sensitive to latency standards and power expense consideration.

Two different optimization techniques will be used for a two phased approach:

1. Optimize data center location for all domestic sites county by county
2. Determine the optimal location within a county

### OBJECTIVE FUNCTION:

The two main objective functions for decision making are a Latency Function and Power (cost) Function.

Latency Function

\begin{aligned} F(N,d) &= log_{10}(1+N)(1+d)-0.2\dfrac{1-d}{2000} \\ N &= \small\textsf{number of engineers at site} \\ d &= \small \textsf{distance from site to potential datacenter location} \end{aligned}

The objective is to maximize the F(N, d) subject to the constraint d ≤ 2000. That is anything over 2k miles is unacceptable. $$F(N, d≥2000) = 0$$

Another constraint is that the datacenter must be within the Continental United States. This is achieved by only providing input data within the desired regions.

This function is an approximation based on experimental data. Some details on the experiment are in the appendix. This isn't necessarily the same function that would be used "in production."

Power Function
Consider Power Function to be:

\begin{aligned} F(P)&=C\dfrac{Q_{maxR}-Q_{rate}}{Q_{maxR}-Q_{minR}}\\ C&=\small\textsf{scaling factor} \\ Q_{maxR}&=\small\textsf{Maximum Power Rate amongst all the counties within United States} \\ Q_{minR}&=\small\textsf{Minimum Power Rate amongst all the counties within United States} \\ Q_{rate}&=\small\textsf{Power Rate for a particular county where F(P) is being evaluated} \end{aligned}

Distances between locations "1" and "2" calculated using the spherical law of cosines:

\begin{aligned} d&=R*acos(sin(\phi_1)*sin(\phi_2)+cos(\phi_1)*cos(\phi_2)*cos(\lambda_2-\lambda_1)) \\ R&=\small\textsf{radius of Earth} \\ \phi&=\small\textsf{latitude} \\ \lambda&=\small\textsf{longitude} \end{aligned}

Remove from consideration any counties that do not meet constraints.

The solution is to maximize $F_p + F(N,d)$, subject to the constraint d≤2000.

## PHASE 1 : Optimize the data center location for all domestic sites by county

For the purpose of this study only consider the largest 11 engineering locations. code	site    	state	engineers
AUS	Austin  	TX	250
BOS	Boxborough	MA	150
CHD	Chandler	AZ	75
NUQ	Sunnyvale	CA	500
OCF	Ocala   	FL	100
ORL	Orlando 	FL	50
RDU	RTP     	NC	1000
SAN	San Diego	CA	8000
SJC	San Jose	CA	1500
SNA	Irvine  	CA	200
WBU	Boulder 	CO	400 The Latency Function plotted for sites of different sizes. The goal is to ensure that smaller sites are not neglected or overwhelmed by large sites. This is the Latency Function plotted for the center of each county and each of the 11 engineering datacenters, without constraints. The influence of San Diego, Bay Area, Boulder, Austin, and RTP can be seen. It's interesting to note how the algorithm favors the two Florida sites but Boxborough (near Boston) has negligible impact even though the two areas have the same total number of engineers.

Taking into consideration the constraint that we cannot have the distance of a data center from any engineering site exceeding 2000 miles due to effect on latency our solution space is restricted to this space. The area within all constraints: Recomputing the Latency Function for counties within the constrained area results in the following map: The greyed out counties in the above graph indicates failure to meet the constraint that d≤2000 for all sites.

In the next step I consider the industrial power rate across all the counties in the United States and plot the Power Function. ![COMP671_2012_Slide14](https://s3-us-west-2.amazonaws.com/com.altgnat.www.blog/2017/12/COMP671_2012_Slide14.png)

Assume worse case (most expensive) for any county where the power rate was not available. Sum the latency and power rate functions for the objective function for all counties which fall with the constraint d≤2000 miles. Listing of the Top Ten counties which satisfy the objective function: See below text for explanation of the red/green color coding.

Several practical considerations should be taken into account at this stage. The first is that among the Top Ten contenders there is very little deviation of the objective function $F_{sum}=F(P)+F(L)$. This means that factors that are not included in this analysis will be the deciding factor. For instance, counties or states may provide tax incentives to attract investment within a particular region. There may be existing QCOM offices that we'd want to be co-located with, large airports, existing vendors or outsourcing agents that we have existing relationships with, or existing abandoned infrastructure that we could acquire that would save time and money. Also not considered is the source of power. For instance, in Illinois it is quite likely that most of the power would be generated by coal. The company may not want to be seen as a large consumer of high-carbon output energy. This would mean we would likely need to invest in "green technology" to offset these emissions/perceptions. An example of this is a recent datacenter built by Apple in North Carolina. Apple invested $50M in solar arrays amid protest by Greenpeace. Having multiple viable candidate counties is a good problem to have - we are not locked into any single location. In order to proceed with this analysis though we must pick a winner with the data we have. Highlighted on the above table is some data pulled with the input data provided by the U.S. Energy Information Administration on 13 Nov 2012 that may influence the viability of the selected counties. Some "red flags", highlighted in red, include a small number of industrial customers (Consumers) for Hill and Macoupin Counties. The small number of customers is also reflected in low Sales and Revenue numbers for the power suppliers of those counties. The reason this is red flagged is because a large datacenter may tax the existing infrastructure leading to downtimes or large capital expenditures required to provide upgraded power reliability. The "non-technical" issues around data center power consumption should not be taken lightly as evidenced by the experience of companies that built large datacenters in the Columbia River region. Another red flag is the relatively high power price in Boulder Colorado. This may be an opportunity though, to negotiate a lower price, as Microsoft did in Washington State. Boulder is also home to a large Qualcomm engineering site, close to multiple internet backbones, and a large airport. It has a large county population and a large high-tech presence meaning that a solid labor pool is available to hire from. Boulder also had the best Latency Function among all the contenders so any further study should investigate the ability to negotiate lower power rates from the utility companies or local governments. "Green flags" among the top contenders involve large customer bases for Fremont and Park Counties in Colorado and particularly low power rates for the counties located in Illinois. To proceed with this optimization, using the data we have, let's proceed with Hill County, Texas. ### CONCLUSION PHASE 1: Hill County is optimial when it comes to maximizing the Power and Latency functions. ## PHASE TWO: Optimize intra-county In this phase determine the optimal location of data center within Hill County. For the purpose of the study ignore constraints such as where facilities, towns, networks, etc., are located. The Power Function is assumed constant across the county so it may be ignored. ### Objective: Search for optimal location within county, keeping maximum latency constraints and that the solution must be within county limits for which we “discretize” county borders. Use the built-in solver function within Excel to determine the optimal location based on Nonlinear generalized reduced gradient method with the approximated county boundary inputs and the constraint that distance of data center site from each of the 11 engineering site does not exceed the constraint of 2000 miles. ### CONCLUSION PHASE 2: After solving the optimization problem in Excel Solver we arrive at the optimal location for data center location to be the point at the left hand bottom corner point shown in the graph below: The graphical solution confirms the solver output as well - the graphic shows the solution favors the bottom left area in green. Again, this is the optimization based on the input data. If Hill County was determined to be the best county to explore after the first phase then the next step would likely be to enlist an agent familiar with datacenters that knows the region. They would know where facilities within the county are located. Facilities such as existing power and network grids, available real estate, etc. These would likely be more influential in our decision making process than the purely analytic optimization given here. ## Appendix: ### Attachments: There were three primary systems used in this assignment. The power data was drawn from several Excel worksheets and imported into R. Several database joins were performed to get the data into a usable format. The output of this was put in a single flat file "power_grid.csv". The first ten lines of the file are: COUNTY STATE CONSUMERS REVENUES SALES RATE POPULATION LAT LONG Autauga AL 5186 1307665 21680815 0.0603144 54571 32.5364 -86.6445 Barbour AL 12952 1587822 25981954 0.0611125 27457 31.8707 -85.4055 Bullock AL 5192 1315591 21764771 0.0604459 10914 32.1018 -85.7173 Chilton AL 5186 1307665 21680815 0.0603144 43643 32.8541 -86.7266 Cleburne AL 28379 1761949 29265866 0.0602049 14972 33.672 -85.5161 Coffee AL 14470 2926811 46104752 0.0634818 49948 31.4022 -85.9892 Colbert AL 29 43308 561328 0.0771528 54428 34.7031 -87.8015 Conecuh AL 5190 1309636 21705444 0.0603368 13228 31.4283 -86.992 Coosa AL 5186 1307665 21680815 0.0603144 11539 32.9314 -86.2435  In addition, information about the QUALCOMM offices was collected into two CSV files. qcomoffices_locations.csv code,name,city,country,country_code,lat,long "ATL","Hartsfield-jackson Atlanta International","Atlanta, GA","United States","US",33.636667,-84.428056 "AUS","Austin-bergstrom International","Austin","United States","US",30.194444,-97.670000 "BOS","Logan International","Boston","United States","US",42.363056,-71.000000 "CHD","Williams Gateway Airport","Mesa","United States","US",33.307778,-111.655556 "DCA","Ronald Reagan Washington National Airport","Washington, DC","United States","US",38.851944,-77.037778 "DTW","Detroit Metropolitan Wayne County","Detroit, MI","United States","US",42.212500,-83.353333 "JBK","Berkeley","Berkeley","United States","US",37.950000,-122.300000 "JFK","John F Kennedy Intl","New York","United States","US",40.633333,-73.783333 "LAS","Mc Carran Intl","Las Vegas","United States","US",36.083333,-115.166667  qcomoffices_size.csv code,site,engineers "ATL","Atlanta, GA",0 "AUS","Austin, TX",250 "BOS","Boxborough, MA",150 "CHD","Chandler, AZ",75 "DCA","Washington, DC",0 "DTW","Detroit, MI",0 "JBK","Berkeley, CA",0 "JFK","New York, NY",0 "LAS","Las Vegas",0  These input files were used by the program R, along with the following R script, to create the above graphics. The Excel spreadsheet is available upon request. R Libraries: doit.r # Keith P Jolley # 12 Nov 2012 # library(ggplot2) library(ggmap) library(maps) library(plyr) library(rgl) savepdf <- T showplt <- T bg_fill <- "grey" myred <- "#BB2020" myyellow<- "#FFFFBF" mygreen <- "#1B7837" myblue <- "#1F78B4" pgrid <- read.csv("power_grid.csv", sep="\t") left <- round(min(pgrid$LONGITUDE) - 5)  # x axis
right   <- round(max(pgrid$LONGITUDE) + 5) top <- round(max(pgrid$LATITUDE)  + 2)  # y axis
bottom  <- round(min(pgrid$LATITUDE) - 2) qcloc <- read.csv("qcomoffices_locations.csv") qcsize <- read.csv("qcomoffices_size.csv") qcom <- merge(qcloc, qcsize) state <- cbind(state.name, state.abb) pgrid <- merge(pgrid, state, by.x="STATE", by.y="state.abb") pgrid <- cbind(pgrid, tolower(paste(pgrid$state.name, pgrid$COUNTY))) # concatenate state and county names into a single column colnames(pgrid)[length(pgrid)] <- "SC" # rename the last column something sensible counties <- map_data("county") # add the same columns as i just did to pgrid counties <- cbind(counties, tolower(paste(counties$region, counties$subregion))) # concatenate state and county names into a single column colnames(counties)[length(counties)] <- "SC" # rename the last column something sensible # see "ggplot2" by hadley wickham, springer, 2009, page 78 for more info #merge the pgrid data and the county data powcou <- merge(counties, pgrid, by="SC", all=TRUE) powcou <- powcou[order(powcou$order), ]
powcou.hi     <- max(powcou$RATE,na.rm=TRUE) # nationwide values powcou.lo <- min(powcou$RATE,na.rm=TRUE)
powcou.mean   <- mean(powcou$RATE,na.rm=TRUE) #powcou.HI <- na.omit(ddply(powcou, .(STATE), summarise, max=max(RATE))) # state by state max #names(powcou)[names(powcou) == "breaks"] <- "Industrial Power ($/MW)"  # rename the last column something sensible

# now, start merging all these into a single dataframe.  create a type for each
#airports$type <- "airport" #big_cities$type <- "city"
#powcou$type <- "county" qcom$type  <- ifelse(qcom$engineers > 0, "Engineering", "Business") #mydata <- merge(airports, big_cities, all=TRUE) #mydata <- merge(mydata, qcom, all=TRUE) p <- ggplot(qcom, aes(long, lat)) p <- p + scale_x_continuous(limits=c(left,right)) p <- p + scale_y_continuous(limits=c(bottom,top)) p <- p + xlab("") + ylab("") + coord_map() p <- p + borders("state", size=0.5, fill=bg_fill, colour="white") p <- p + theme(legend.position="bottom") if (showplt) print(p) if (savepdf) { pdf(file="01_USA.pdf", width=10, height=7.5) print(p) dev.off() } p1 <- p + scale_colour_manual(values=c(mygreen, myblue), name="Office Type: ") p1 <- p1 + geom_point(aes(color=type), size=3.0, shape=16) if (showplt) print(p1) if (savepdf) { pdf(file="02_AllOffices.pdf", width=10, height=7.5) print(p1) dev.off() } eng <- subset(qcom, type == "Engineering") eng$vjust<--0.3
eng$hjust<-0.5 eng[eng$code=="CHD",]$hjust<-0.1 # Chandler eng[eng$code=="NUQ",]$hjust<-0.6 # Sunnyvale eng[eng$code=="ORL",]$vjust<-1.3 # Orlando eng[eng$code=="SJC",]$vjust<-1.3; eng[eng$code=="SJC",]$hjust<-0.5 # San Jose eng[eng$code=="SAN",]$vjust<-1.3; eng[eng$code=="SAN",]$hjust<-0.7 # San Diego eng[eng$code=="SNA",]$hjust<-0.7 # Irvine p1 <- p + scale_colour_manual(values=c(myblue, myblue), name="Office Type: ") p1 <- p1 + geom_point(data=eng, aes(long,lat,color=eng$type), size=3.0, shape=16)
if (showplt) print(p1)
if (savepdf) {
pdf(file="03_EngOffices.pdf", width=10, height=7.5)
print(p1)
dev.off()
}

p2 <- p1 + geom_text(data=eng, aes(long,lat,label=site), vjust=eng$vjust, hjust=eng$hjust)
if (showplt) print(p2)
if (savepdf) {
pdf(file="04_EngOfficeName.pdf", width=10, height=7.5)
print(p2)
dev.off()
}

p2 <- p1 + geom_text(data=eng, aes(long,lat,label=engineers), vjust=eng$vjust, hjust=eng$hjust)
prad <- p2  # save this for later
if (showplt) print(p2)
if (savepdf) {
pdf(file="05_EngOfficeSize.pdf", width=10, height=7.5)
print(p2)
dev.off()
}

# power rates with missing counties == grey
p <- ggplot(powcou, aes(long, lat, group=group, fill=RATE))
p <- p + scale_x_continuous(limits=c(left,right))
p <- p + scale_y_continuous(limits=c(bottom,top))
p <- p + xlab("") + ylab("") + coord_map()
p <- p + borders("state", size=0.5, fill="grey", colour="white")
p <- p + geom_polygon(colour="white", size=0.15)
p <- p + theme(legend.position="bottom")
p <- p + scale_fill_gradient2("Industrial Power Rate\n($/MW-hr)", low=mygreen, mid=myyellow, high=myred, limits=c(powcou.lo, powcou.hi), midpoint=powcou.mean) if (showplt) print(p) if (savepdf) { pdf(file="06_CountyPower.pdf", width=10, height=7.5) print(p) dev.off() } # set all unknown county power rates to the highest rate in that state powcou[is.na(powcou$RATE),]$RATE <- powcou.hi p <- ggplot(powcou, aes(long, lat, group=group, fill=RATE)) p <- p + scale_x_continuous(limits=c(left,right)) p <- p + scale_y_continuous(limits=c(bottom,top)) p <- p + xlab("") + ylab("") + coord_map() p <- p + borders("state", size=0.5, fill="grey", colour="white") p <- p + geom_polygon(colour="white", size=0.15) p <- p + theme(legend.position="bottom") p <- p + scale_fill_gradient2("Industrial Power Rate\n($/MW-hr)", low=mygreen, mid=myyellow, high=myred, limits=c(powcou.lo, powcou.hi), midpoint=powcou.mean)
if (showplt) print(p)
if (savepdf) {
pdf(file="07_CountyPowerFilled.pdf", width=10, height=7.5)
print(p)
dev.off()
}

# latency rings - big
rmax<-350
p <- ggplot(eng, aes(long, lat))
p <- p + scale_x_continuous(limits=c(left,right))
p <- p + scale_y_continuous(limits=c(bottom,top))
p <- p + xlab("") + ylab("") + coord_map()
p <- p + borders("state", size=0.5, fill=bg_fill, colour="white")
p <- p + theme(legend.position="none")
p <- p + scale_colour_brewer(type="div", palette="Paired")
p <- p + geom_point(aes(color=code), size=3.0, shape=16)
p2 <- p
p <- p + geom_point(aes(color=code), size=rmax, shape=1)
if (showplt) print(p)
if (savepdf) {
pdf(file="08_Influence_A.pdf", width=10, height=7.5)
print(p)
dev.off()
}

# this was a good idea trying to get out
p <- p2
imax <- 150
for (i in 1:imax) { p <- p + geom_point(aes(color=code), size=rmax/i, shape=16, alpha=I(4.0/imax)) }
p <- p + geom_point(aes(color=code), size=3.0, shape=16)
p <- p + geom_point(aes(color=code), size=rmax, shape=1)
if (showplt) print(p)
if (savepdf) {
pdf(file="08_Influence_B.pdf", width=10, height=7.5)
print(p)
dev.off()
}

# try again
p <- p2
imax <- 7
for (i in 1:imax) { p <- p + geom_point(aes(color=code), size=i*rmax/imax, shape=1, alpha=I(0.4*(imax-i)/imax)) }
p <- p + geom_point(aes(color=code), size=rmax/imax/2, shape=1, alpha=I(0.4))
#p2 <- p2 + geom_point(aes(color=code), size=3.0, shape=16)
#p2 <- p2 + geom_point(aes(color=code), size=rmax, shape=1)
if (showplt) print(p)
if (savepdf) {
pdf(file="08_Influence_C.pdf", width=10, height=7.5)
print(p)
dev.off()
}

eng$far <- 0.01 eng[eng$code=="NUQ" | eng$code=="BOS",]$far <- 1

p <- ggplot(eng, aes(long, lat, color=code))
p <- p + scale_x_continuous(limits=c(left,right))
p <- p + scale_y_continuous(limits=c(bottom,top))
p <- p + xlab("") + ylab("") + coord_map()
p <- p + borders("state", size=0.5, fill=bg_fill, colour="white")
p <- p + theme(legend.position="none")
p <- p + scale_colour_brewer(type="div", palette="Paired")
p <- p + geom_point(alpha=I(eng$far/5), size=rmax, shape=16) p <- p + geom_point(size=3.0, shape=16) p <- p + geom_point(alpha=I(eng$far), size=rmax, shape=1)
if (showplt) print(p)
if (savepdf) {
pdf(file="09_Venn.pdf", width=10, height=7.5)
print(p)
dev.off()
}

return(deg * 2.0 * pi / 360.0)
}

# return the distance in miles between two positions
# http://www.movable-type.co.uk/scripts/latlong.html
dm <- function(lat1, lon1, lat2, lon2) {
R <- 3956.6 # miles - earth radius
# convert input from degrees to radians
return(acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(lon2-lon1)) * R);
}

#  d <- log1p(pop) * (1.0+r)**(-0.2) * (1.0-r/maxrad)

# latency as a function of distance (miles)
latency.dist  <- function(pop, r) {
d <- ifelse (r>maxrad, 0, log1p(pop) * (1.0+r)**(-0.2))
return(d)
}

# latency as a function of relative positions (degrees)
latency.coord <- function(pop, lat1, lon1, lat2, lon2) {
r <- dm(lat1, lon1, lat2, lon2)
return(latency.dist(pop, r))
}

df <- expand.grid(population=10**(0:4), distance=0:2500)
df$latency <- latency.dist(df$population, df$distance) p <- ggplot(df,aes(x=distance,y=latency,group=population, colour=sprintf("%d",population))) + geom_line() p <- p + scale_colour_hue("Engineers at Site") + ylab("latency penalty ($/yr)") + xlab("distance (miles)")
if (showplt) print(p)
if (savepdf) {
pdf(file="10_latencyVdist.pdf", width=10, height=7.5)
print(p)
dev.off()
}

j <- expand.grid(unique(counties$SC),eng$code) # a list of each county and each office
names(j) <- c("sc", "code")
j <- merge(j, eng[c("code", "lat", "long", "engineers")])
j <- merge(j, ddply(powcou, .(SC), summarize, lat2=mean(lat), lon2=mean(long)), by.x=c("sc"), by.y=("SC"), all=F)
names(j) <- c("sc", "code", "site_lat", "site_lon", "engineers", "county_lat", "county_lon")

# here's the contribution of each site
j$F_latency <- with(j, latency.coord(engineers, site_lat, site_lon, county_lat, county_lon)) # summarize for a total at each county t <- na.omit(ddply(j, .(sc), summarise, sum=sum(F_latency))) # county by county total names(t) <- c("sc", "F_latency") q <- merge(powcou, t, by.x="SC", by.y="sc", all.x=TRUE) rm(t) q.hiL <- max(q$F_latency,na.rm=TRUE)  # nationwide values
q.loL    <- min(q$F_latency,na.rm=TRUE) q.meanL <- mean(q$F_latency,na.rm=TRUE)

# set all unknown county power rates to the highest rate in that state
#p <- ggplot(powcou, aes(long, lat, group=group, fill=F_latency))
q[is.na(q$F_latency),]$F_latency <-0
q <- q[order(q$order), ] p <- ggplot(q, aes(long, lat, group=group, fill=F_latency)) p <- p + scale_x_continuous(limits=c(left,right)) p <- p + scale_y_continuous(limits=c(bottom,top)) p <- p + xlab("") + ylab("") + coord_map() p <- p + borders("state", size=0.5, fill="grey", colour="white") p <- p + geom_polygon(colour="white", size=0.15) p <- p + theme(legend.position="bottom") p <- p + scale_fill_gradient2("F_latency", low=myred, mid=myyellow, high=mygreen, limits=c(q.loL, q.hiL), midpoint=q.meanL) if (showplt) print(p) if (savepdf) { pdf(file="11_map_z.pdf", width=10, height=7.5) print(p) dev.off() } # zero counties - these are counties outside the venn diagram zc <-j[j$F_latency==0,]$sc q[q$SC %in% zc,]$F_latency <-0 t <- na.omit(ddply(j, .(sc), summarise, sum=sum(F_latency))) # county by county total q.maxL <- max(q$F_latency,na.rm=TRUE)  # nationwide values
q.minL   <- min (q[q$F_latency>0,]$F_latency,na.rm=TRUE)
q.meanL  <- mean(q[q$F_latency>0,]$F_latency,na.rm=TRUE)

q <- q[order(q$order), ] p <- ggplot(q, aes(long, lat, group=group, fill=F_latency)) p <- p + scale_x_continuous(limits=c(left,right)) p <- p + scale_y_continuous(limits=c(bottom,top)) p <- p + xlab("") + ylab("") + coord_map() p <- p + borders("state", size=0.5, fill="grey", colour="white") p <- p + geom_polygon(colour="white", size=0.15) p <- p + theme(legend.position="bottom") p <- p + scale_fill_gradient2("F_latency", low=myred, mid=myyellow, high=mygreen, limits=c(q.minL, q.maxL), midpoint=q.meanL) if (showplt) print(p) if (savepdf) { pdf(file="12_eligible_counties.pdf", width=10, height=7.5) print(p) dev.off() } # rescale and plot the power q$F_power <- 0
q.maxR    <- max(q$RATE, na.rm=TRUE) q.minR <- min(q$RATE,  na.rm=TRUE)
q.meanR   <- mean(q$RATE, na.rm=TRUE) q$F_power <- q.maxL * (q.maxR - q$RATE)/(q.maxR - q.minR) q$F_sum   <- ifelse(q$F_latency > 0, q$F_power + q$F_latency, 0) q.maxS <- max(q$F_sum, na.rm=TRUE)
q.minS    <- min(q[q$F_sum>0,]$F_sum, na.rm=TRUE)
q.meanS   <- mean(q[q$F_sum>0,]$F_sum, na.rm=TRUE)

q <- q[order(q$order), ] p <- ggplot(q, aes(long, lat, group=group, fill=F_sum)) p <- p + scale_x_continuous(limits=c(left,right)) p <- p + scale_y_continuous(limits=c(bottom,top)) p <- p + xlab("") + ylab("") + coord_map() p <- p + borders("state", size=0.5, fill="grey", colour="white") p <- p + geom_polygon(colour="white", size=0.15) p <- p + theme(legend.position="bottom") p <- p + scale_fill_gradient2("F_sum", low=myred, mid=myyellow, high=mygreen, limits=c(q.minS, q.maxS), midpoint=q.meanS) if (showplt) print(p) if (savepdf) { pdf(file="13_Sum.pdf", width=10, height=7.5) print(p) dev.off() } # best ten counties bt <- factor(unique(q[q$F_sum>=head(tail(sort(unique(q$F_sum)),n=10),n=1),]$SC))
qq <- merge(as.data.frame(bt), q, by.y="SC", by.x="bt")
qq <- unique(qq[c("bt", "group", "order", "COUNTY", "state.name", "POPULATION", "LATITUDE", "LONGITUDE", "CONSUMERS", "REVENUES", "SALES", "RATE", "F_latency", "F_power", "F_sum")])
names(qq) <- c("sc", "group", "order", "County", "State", "Co_Pop", "lat", "long", "Consumers", "Revenue", "Sales", "Ind_Rate", "F_latency", "F_power", "F_sum")
qq <- qq[order(qq$F_sum, decreasing=TRUE),] options(width=200) print(unique(qq[4:length(qq)])) qq$labels <- with(qq, sprintf("%s\n%s", County, State))
p <- ggplot(qq,aes(x=labels))
p <- p + geom_bar(aes(y=F_sum),fill="#888888", color="black")
p <- p + geom_bar(aes(y=F_power),fill="#777777",color="black")
p <- p + xlab(c("Region")) + ylab(c("F_power + F_latency"))
p <- p + theme(legend.position="bottom")
if (showplt) print(p)
if (savepdf) {
pdf(file="14_totals.pdf", width=10, height=3.0)
print(p)
dev.off()
}

winner <- sprintf("%s", unique(qq[max(qq$F_sum)==qq$F_sum,]$sc)) # now r is just being silly wdf <- q[q$SC == winner,]

l <- min(wdf[wdf$SC == winner, ]$long)
r <- max(wdf[wdf$SC == winner, ]$long)
t <- max(wdf[wdf$SC == winner, ]$lat)
b <- min(wdf[wdf$SC == winner, ]$lat)

mymap <- get_map(location=c(left=l,bottom=b,right=r,top=t))
p <- ggmap(mymap)
p <- p + geom_polygon(data=wdf, aes(long, lat, group=group), fill="orange", colour="darkred", alpha=I(0.15), size=1)
p <- p + xlab("") + ylab("") + coord_map()
if (showplt) print(p)
if (savepdf) {
pdf(file="15_hilltx.pdf", width=10, height=7.5)
print(p)
dev.off()
}

# show the bounding box around the coutny
min_long <- -97.30
max_long <- -96.85
min_lat  <-  31.75
max_lat  <-  32.23
p.org <- p
p <- p + geom_rect(xmin=min_long, xmax=max_long, ymin=min_lat, ymax=max_lat, fill=mygreen, colour=mygreen, alpha=I(0.1), size=1)
if (showplt) print(p)
if (savepdf) {
pdf(file="16_hilltxAppx.pdf", width=10, height=7.5)
print(p)
dev.off()
}

# break the bounding box into parts
gridsize <- 100
lons <- seq(max_long, min_long, length.out=gridsize)
lats <- seq(min_lat,  max_lat,  length.out=gridsize)
J <- with(eng, expand.grid(code, lats, lons))
names(J) <- c("code", "lat2", "lon2")
J <- merge(eng[c("code", "engineers", "lat", "long")], J)
J$z <- with(J, latency.coord(engineers, lat, long, lat2, lon2)) J <- merge(J, ddply(J, .(lat2, lon2), summarize, sum=sum(z))) # optimal point according to excel optimizer longO <- -97.30 latO <- 31.75 minS <- min(J$sum)
maxS  <- max(J$sum) meanS <- mean(J$sum)
strS  <- sprintf("Optimal location: [%3.2f, %3.2f]", longO, latO)

# p.org has the picture of the county on it
p <- p.org
# add the latency data, set the gradient, draw a box around it, and put the legend at the bottom of the picture
p <- p + geom_tile(data=J, aes(x=lon2, y=lat2, fill=sum), alpha=I(0.15), linetype=0, alpha=I(0.5))
p <- p + scale_fill_gradient2("F_latency", low=myred, mid=myyellow, high=mygreen, limits=c(minS, maxS), midpoint=meanS)
p <- p + geom_rect( xmin=min_long, xmax=max_long, ymin=min_lat, ymax=max_lat, fill=NA, colour=mygreen, size=1)
p <- p + theme(legend.position="bottom")
p <- p + geom_rect( xmin=min_long, xmax=max_long, ymin=latO-0.045, ymax=latO-0.015, fill="grey80", alpha=I(0.4), colour=NA)
p <- p + geom_text( x=longO, y=latO, label=strS, color="black", vjust=2, hjust=0)
# pretty little marker
p <- p + geom_point(x=longO, y=latO, shape=19, size=5, color="black")
p <- p + geom_point(x=longO, y=latO, shape=8,  size=5, color="yellow")
p <- p + geom_point(x=longO, y=latO, shape=19, size=2, color="red")
if (showplt) print(p)
if (savepdf) {
pdf(file="17_optimized.pdf", width=10, height=7.5)
print(p)
dev.off()
}


### Latency Testing Procedure

The latency function listed here is a simplification of the results from a separate study led by the Engineering Compute team at Qualcomm. The first round of testing was inconclusive so the test procedure was completely redone by the Qualcomm Human Factors team.

The testing environment included a small compute environment consisting of a few production workstations but no local resources such as file or license servers. This lan was connected to Qualcomm's intranet through a "bump-in-the-wire" device that injected selected amounts of latency and/or dropped packets. Bandwidth was not a limiting factor. From the networking team we had detailed knowledge of expected network performance between the various sites.

The engineers running the tests were self-selected from teams across Qualcomm though Layout and RF Designers were our primary concern. The goal was to have those best able to represent their team's interests, and this project's worst critics, doing the evaluations. The selections were vetted by engineering management as qualified to represent their particular function. Each team devised their own set of tests that they would run in the lab in order to capture the most frequent, demanding, and latency sensitive tasks within that work function.

The users then ran their test suite multiple times, each time with different levels of latency. During each test the users recorded their subjective assesment of the system's usability. The ratings were scored from "no noticable latency" to "some latency noticable but little to no impact", up to "unacceptable." Not the actual scoring system.

Screen capture and video of the user recorded objective performance measurements. Users were encouraged to run as many test suites as they felt necessary and the testing was "double-blind." This means that neither the test-subjects or the lab admins knew what the actual latency was during the testing.

Different engineering functions had different results, but were consistent across users within each function. This was very different than the results we got before we updated our test procedures. Prior to "double-blind" there was no correlation between usability and latency. As expected, Layout and RF designers were the most sensitive to increasing latency.

For latency testing we did not use any other devices than whatever was current standard issue for that engineering function. A different project tested display technologies in a similar manner.

This may be obvious but it is important to note as a first-pass filter that the test results give the maximum distance between an engineering function and its datacenter, and by extension, the maximum distance between two engineering functions that share a datacenter.

The actual testing results are company proprietary and I no longer have access to them.

### Miscellaneous

Latency Function without constraints. X-Y coordinates are latitude and longitude. 