Application server needs much stronger CPU to control the complex business.
Database server need much faster disk and larger memory to get disk retrieval & data cache faster
So as file server need the disk.
Note: There are 80% business on the 20% data
Read and write database will be separated exactly.
CDN & Reverse Proxy
Both CDN and Reverse Proxy use the cache model.
But the CDN decided u access the web on the close net provider while the Reverse Proxy on the center.
- distributed file and database.
- With the task get much complex, the demand ability of data check getting higher, so you need to take some advice like
Nondatabase query technologylike a search engine.
We are growing, not rebuild or create.
The real power is the business development
Business make technology, career makes a man.
Blind pursue large site solutions.
For technology to technology.(but for business)
Technology not the real point sometimes(12306)
stratification(horizontal): application, service, data.
advantage: keep the interface, everyone justifies their own works.
disadvantage: the interface and splice layer border need be careful.
segmentation(vertical): divide the function and business
distributed: both front points to this aim
distributed application and business
distributed static sources
distributed data and storage
- Server clustering
Some servers deploy the same application and provide service by loading balance.
the cache only be short-term effective
the data which caused by hot point without balance should be put in the cache
improve web site responsive speed
avoid distributed access peak
lose effect move
lose effect recover
base server layer: support database, storage, cache, search and other technology.
the middle layer is platform service and application service.
the upon layer is API, the third party service and sina business layer.
MPSS: Now the solutions shows like virtual the physical machine. As this way, they even can use the same port while the MPSS can not.
What is architecture?
The highest level of planning, difficult decisions to change
Keep their balance
Browser: cache, compress the page, decrease the transfer cookie, layer regularly.
Server: CDN, local and distributed cache, asynchronous message queue
Code: multi threads, manage the memory
Databae: index, cache, sql optimise, NoSQL
For application server, it can not storage session info.
For storage server, it should be real-time backup
Function: Check whether the whole can work when some servers died.
Application Cluster: Add new blood by using loading balance machine.
Cache Cluster: Cache router algorithm
Database Cluster Way: Routing partition
Event-driven Architecture: Message queue
Distributed Service: Divide the business and reuse service and call by distributed service framework
Different view of the sites’ performance
User view: the speed
Most about front end, optimise html css, cdn, reverse proxy, cache strategy
Cache speed up data, distributed handle
Improve the read and write ability by using cluster
Asynchronous message speed up the response.
Server hardware configuration
Data center network architecture
Performance test index
Time: test to calculate the time segment
Distribution counts: test by using multi threads
Throughput capacity: TPS, QPS, HPS
Performance counter: System load(top command)
top - 18:38:43 up 5 days, 8:33, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 74 total, 1 running, 73 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 1883860 total, 106724 free, 456832 used, 1320304 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 1115392 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 43076 3528 2408 S 0.0 0.2 0:06.31 systemd
Performance test function
- Performance test report example
reduce the http request(merge the img, css and js)
Use the browser cache(header: Cache-control, expire; one by one update the icon; update the call name instead of the file(html call, js file))
turn on the gzip
CSS on the header, js on the body tail.
Reduce the cookie transfer(While static resources)
CDN & Reverse Proxy
- loading speed, reduce request, security; loading balance, cache
- Distributed cache
Rational use cache
update the data usually
No access point
Not the same data & read wrong info
Cache use ability
The cache to penetrate(not exist also need be saved as null value)
Distributed cache architecture
JBoss cache and Memcached
Memory manage: slab, chunk, LRU,
Multi threads code
Start threads = [Task execute time/(Task execute time - IO wait time)] * CPU kernel counts
distributed access with lock
Data Structure & Algorithm
Stack: function args, local variables
Heap: create & delete object & garbage collection
SSD (B+ potential)
B+ tree VS LSM tree
RAID VS HDFS
- HDFS: name node & data node(Map reduce)
session copy(small cluster)
note session by cookie
Session server(The best method)
CAP (always use ap without c)
check: keep-alive, report of access failed
move: route computing find true server
data recover: recover the backup count again
User behaviors collection
Server logs collection
Client Browser logs collection by js(Tool: storm log analyze)
Server Performance Monitor
- Load, memory, disk IO, NetWork IO(Tool: Ganglia)
Different function divided by physical
Single function divided by cluster
- Http(302 but for SEO works not well)
- Reverse Proxy
- Data Link(Direct route) [linux tool: LVS]
Weighted round robin
- Memcached model
When: Distributed Cache Cluster need be extension
Loading balance design advantage demand: Cache
Distributed Cache Hash Algorithm
Schema Database Telescopic (Cobar, GreenPlum)
NoSQL Database(Apache HBase)
Module Coupling Decoupling
Event Driven Architecture
Distributed Message Queue
- ESB SOA
Compiling & Deploy
Code Patch Manage Difficult
Database Connections Exhaustion
New Business Add Difficult
Vertical: various applications
Horizontal: distributed business
Web Service & Enterprise Service
Server[WSDL] -> Service Broker[UDDI] <- SOAP [Client]
Bloat register and find management
Inefficient xml serializable method
Large spending Http connections
Complex deploy and maintenance method
Distributed Service demand and features
Loading balance, fail over, efficient long-distance communication
Heterogeneous systems, Minimum invasion to applications
Versions control, Real time monitor
Distributed Service Framework Design
Filter escape character
- OPEN Sources, Error echo, Blinds, Filter escape, Args bind(OS injection)
- Form token, verify code, Refer check
- Error code, HTML annotation, file upload, traversal paths
Web application firewall
Web security scanner
One-way hash encryption
- MD5, SHA
- DES, RC
- Https, RSA
Secret key management
Trie (base array: storage, check array: status)
Multilevel hash match
- Basyes(Advance) -> TAN -> ARCS
- Bloom Filter
- Account, Sellers, Buyers, Trade
2004, eBay: Php->Java, Mysql->Oracle, MVC Webx, ORM: iBatis, Manage: antx, Server: Weblogic
Note: Taobao choose the free plan when the begin and choose the no free plan when speed up growing web. Both of them are the right decisions.
abandon EJB, import spring; JBoss(Jetty further more) not Weblogic,
At this moment, taobao begin to make progress, many technology which be their base was from that moment.
- GeoDNS, LVS, Squid, Lighttpd, PHP, Memcached, Lucene, MySQL
replace strtr function
- Instant, Temp, Forever
Normal status access
Instant fault high availability solutions
- Temporary Error high availability solutions
- forever fault high availability solutions
Machine can not distinguish between temp and forever.
So you need to find it artificial.
Impact existing business
Application, Database loading
Page static(reduce request)
Rent tape width(CDN)
Dynamic generate random order page url
- Spike button control
- Spike process & Architecture Design
Log output level: global debug
Self log & third party should be config individually
Config log level at least: warn, and check the output code call whether accord with real log level.
Shut down third party no use log(Most are error log)
Home page should not access database
Home page had better as static
JBoss start, then request it by curl. success: start Apache
- Tiny file should be storage themselves instead of sharing with distributed big file storage system.
- Access production environment should be regularly(DBA)
Diff before you push the code
Stronger the code review
Check the null pointer when you are not sure the input object status
Null object pattern
- Design, Fire Fighting, Sermon, Geek
- Sherpa, Spartan, VIP
- Productions, Basic service, Basic equipment
- Function, Not function(Performance & others), Team organization, Production Future, Production operative
- Best, good, normal, bad. worst
- Normal, Literature, 1+1
- Cache, reduce Http, page compress
Static resources should be storage in their own server cluster
Image(not logo… but the user upload like avatar)
- Own server & child domain
Dynamic page staticize
Browser data collection
Server business data collection
Server performance collection
Web: only static html
CGI cause dynamic page content
process like: server push the reuqest to cgi programmer, CGI computing and generate the html.
CGI use Perl, Java servlet call servlet in the web container.
Php(Asp, Jsp) improve the situation which caused business code and page programmer coupling by CGI
MVC (combine cgi and web server)