Wt embedded

Version 26 (Frans Meulenbroeks, 02/01/2012 08:28 am)

1 1
h1. Wt embedded
2 1
3 1
{{toc}}
4 1
5 22 Peter Mortensen
Find here information on running Wt in resource constrained embedded systems: performance, code size, memory usage, and other information.
6 1
7 1
h2. General
8 1
9 22 Peter Mortensen
Wt can easily be built for and deployed on embedded POSIX systems, such as embedded Linux.
10 1
11 1
h3. Cross-building
12 1
13 1
Using CMake with a cross compilation environment: to be completed...
14 1
15 8 Koen Deforche
Instructions for cross compiling with cmake can be found on the "CMake Wiki":http://www.cmake.org/Wiki/CMake_Cross_Compiling.
16 1
17 25 Wim Dumon
Wt user Alistair of QuickForge has written a blog about cross-compiling for ARM on Windows, and uses Wt as an example in his blog post "Exploration of Cross-Compiling on Windows for ARM Linux Distributions":http://blog.quickforge.co.uk/2011/10/exploration-of-cross-compiling-on-windows-for-arm-linux-distributions/
18 25 Wim Dumon
19 1
h3. Optimizing executable size
20 1
21 8 Koen Deforche
Points to consider when optimizing the executable size.
22 1
23 22 Peter Mortensen
For building Boost:
24 22 Peter Mortensen
* Use a static build of Boost, which allows the linker to strip away unused symbols
25 22 Peter Mortensen
* Use the following compile flags for Boost:
26 8 Koen Deforche
** @-fvisibility=hidden -fvisibility-inlines-hidden@: to avoid exporting symbols in the executable
27 8 Koen Deforche
** @-ffunction-sections -fdata-sections@: to allowing fine-grained garbage collection of unused functions/data
28 8 Koen Deforche
29 8 Koen Deforche
For building Wt:
30 8 Koen Deforche
* Choose build-type @MinSizeRel@
31 22 Peter Mortensen
* Extra compile flags (@CMAKE_CXX_FLAGS@)
32 8 Koen Deforche
** @-fvisibility=hidden -fvisibility-inlines-hidden@: to avoid exporting symbols in the executable
33 22 Peter Mortensen
** @-ffunction-sections -fdata-sections@: to allow fine-grained garbage collection of unused functions/data
34 8 Koen Deforche
** @-DHAVE_GNU_REGEX@: to avoid the dependency on libboost_regex, when building on a system that is based on glibc or uClibc
35 8 Koen Deforche
** @-DWT_NO_LAYOUT@: to avoid pulling in the Wt's layout managers, if you are not using any WLayout classes
36 26 Frans Meulenbroeks
** @-DWT_NO_SPIRIT@: to avoid depending on spirit to parse locale and cookies (if you don't need that)
37 8 Koen Deforche
** @-DWT_NO_XSS_FILTER@: to avoid the extra (runtime) overhead of XSS filtering, usually not relevant for a trusted embedded platform
38 1
* Build static libraries (for libwt.a and libwthttp.a)
39 8 Koen Deforche
** in CMake: @SHARED_LIBS:BOOL=OFF@
40 1
* Disable build options you don't need and introduce extra dependencies (libz, openssl ?)
41 8 Koen Deforche
* Further tune your linker command:
42 8 Koen Deforche
** Append @-v@ to the linker command used by CMake to see the raw @collect2@ command-line.
43 8 Koen Deforche
** By default, shared/static libraries is all-or-nothing with CMake. However, you probably want to use system-wide versions of libstdc++, libm and libc depending on other applications on your device.
44 8 Koen Deforche
*** Use -Bdynamic in front of libraries you wish to link dynamically against
45 8 Koen Deforche
** There are some other flags that you need to use to make sure the linker does not keep unused symbols:
46 8 Koen Deforche
*** Remove @-export-dynamic@
47 8 Koen Deforche
*** Add @--gc-sections@
48 9 Koen Deforche
* Strip your binary using @strip -s@.
49 1
* Optionally, when available for your platform, you may want to compress the size of your binary using the "Ultimate Packer for eXecutables (upx)":http://upx.sourceforge.net/. This typically reduces executable size further by 60-70%, without noticable run-time performance hits.
50 1
51 1
h3. Measuring performance
52 1
53 1
To report the run-time performance of Wt on a particular embedded platform, you must connect to the device using a local area connection (through at most one switch), and measure the time between transmission and reception of packets (using a packet sniffer). For the measurements, we use two examples that are included in the Wt distribution: "hello":http://www.webtoolkit.eu/wt/examples/hello/hello.wt (as an example of a minimal application), and "composer":http://www.webtoolkit.eu/wt/examples/composer/composer.wt (as an example of a simple, yet functional, application).
54 1
55 1
We propose to measure the time to create a new session, and the time of a small event.
56 1
57 1
58 1
h4. Runtime: new session
59 1
60 22 Peter Mortensen
Wt starts a new session by serving a small page to determines browser capabilities, and then triggers a second call to get the "main page", that has all visible content. To compare the relative performance for a particular platform, you should measure this "load" time, as the total duration of these two requests. You should measure the time from sending the first request, to sending the third request. The third request is either a GET request for auxiliary content (CSS or images), a GET request to a Wt resource, or a POST request to load invisible content in the background.
61 1
62 1
63 1
h4. Runtime: event
64 1
65 1
We estimate the time needed to process a small event, such as a click on the "Greet me" button in hello, and "Save now" in composer, by measuring the total time for the packet exchange triggered by such an event.
66 1
67 1
68 1
h4. Memory usage: basis
69 1
70 1
Measuring memory usage is a tricky thing, since code and read-only data memory used by shared libraries is effectively shared between processes, while writable data segments are obviously private to each process.
71 1
72 22 Peter Mortensen
Therefore, we use @pmap@ to study the memory in different segments. The basis RAM usage is divided between read-only segments, and writable segments. Only the latter are really constrained by physical RAM. We get the total writable size by summing the size of all writable segments, indicated by pmap with a *w*. The total size reported by pmap and top, minus the size of all writable segments is then the read-only RAM usage. Thus, this number includes shared libraries, and thus overestimates actual RAM usage.
73 1
74 1
h4. Memory usage: per session
75 1
76 1
Compare the memory usage after starting 10 sessions with base memory usage, and divide the difference by 10 to estimate the memory used by a single session.
77 1
78 1
h2. Platforms
79 1
80 1
h3. ARM926EJ-S
81 1
82 1
h4. Processor features
83 1
84 1
* Clock-speed: 200 MHz
85 1
* Linux BogoMIPS: 89.70
86 22 Peter Mortensen
* Caches: 8K instructions, 8K data
87 1
88 8 Koen Deforche
Configurations are ordered chronically, latest first.
89 1
90 22 Peter Mortensen
h4. Configuration 3: minimal (15/12/2010)
91 16 Koen Deforche
92 16 Koen Deforche
h5. Setup
93 16 Koen Deforche
94 22 Peter Mortensen
* *Wt version:* Git (15/12/2010, > Wt 3.1.7)
95 16 Koen Deforche
* *Target system:* Linux uclibc 2.6.23
96 16 Koen Deforche
* *Build environment:* buildroot, arm-linux-gcc 4.2.1
97 16 Koen Deforche
* *Options:* without multi-threading, libz and OpenSSL
98 16 Koen Deforche
* *Build type:* full static build, except for: libstdc++, libc, and libm
99 16 Koen Deforche
* *Runtime settings:* ./app.wt --docroot . --http-address 0.0.0.0 --no-compression
100 16 Koen Deforche
101 16 Koen Deforche
h5. Performance results
102 16 Koen Deforche
103 16 Koen Deforche
*Runtime-performance*
104 16 Koen Deforche
|_.Program  |_.New session (http) |_.Event (http)|
105 16 Koen Deforche
| hello | 0.19 s | 0.06 s |
106 16 Koen Deforche
|composer| 0.60 s | 0.07 s |
107 16 Koen Deforche
108 22 Peter Mortensen
h4. Configuration 2: minimal (16/03/2010)
109 1
110 8 Koen Deforche
h5. Setup
111 8 Koen Deforche
112 22 Peter Mortensen
* *Wt version:* Git (16/03/2010, >= Wt 3.1.1)
113 8 Koen Deforche
* *Target system:* Linux uclibc 2.6.23
114 8 Koen Deforche
* *Build environment:* buildroot, arm-linux-gcc 4.2.1
115 8 Koen Deforche
* *Options:* without multi-threading, libz and OpenSSL
116 11 Koen Deforche
* *Build type:* full static build, except for: libstdc++, libc, and libm
117 8 Koen Deforche
* *Runtime settings:* ./app.wt --docroot . --http-address 0.0.0.0 --no-compression
118 8 Koen Deforche
119 8 Koen Deforche
h5. Performance results
120 8 Koen Deforche
121 8 Koen Deforche
*Code size and RAM usage (in KBytes)*
122 8 Koen Deforche
|_.Program|_.Code size (strip)|_.Code size (strip + upx)|_.RAM: basis † (read-only)|_.RAM: basis (writable)|_.RAM: per session|
123 20 Koen Deforche
| hello| 1214  | 362 | 2544 | 228 | 14.8 |
124 20 Koen Deforche
| composer| 1462  | 420 | 2796 | 232 | 83.6 |
125 8 Koen Deforche
126 8 Koen Deforche
† includes shared libraries !
127 8 Koen Deforche
128 8 Koen Deforche
*Runtime-performance*
129 8 Koen Deforche
|_.Program  |_.New session (http) |_.Event (http)|
130 9 Koen Deforche
| hello | 0.26 s | 0.07 s |
131 8 Koen Deforche
|composer| 0.69 s | 0.08 s |
132 8 Koen Deforche
133 22 Peter Mortensen
h4. Configuration 1: minimal (18/03/2008)
134 1
135 1
h5. Setup
136 1
137 1
* *Wt version:* CVS-snapshot 18/03/08
138 1
* *Target system:* Linux uclibc 2.6.23
139 1
* *Build environment:* buildroot, arm-linux-gcc 4.2.1
140 1
* *Options:* with multi-threading, but without libz and OpenSSL
141 1
* *Build type:* full static build, except for: libc, libpthread, libdl, libstdc++, and libm
142 1
* *Build settings:* MinSizeRel, -DHAVE_GNU_REGEX
143 1
* *Runtime settings:* ./app.wt --docroot . --http-address 0.0.0.0 --threads=2 --no-compression
144 1
145 1
h5. Performance results
146 3 Pieter Libin
147 4 Pieter Libin
*Code size and RAM usage (in KBytes)*
148 6 Pieter Libin
|_.Program|_.Code size (strip)|_.Code size (strip + upx)|_.RAM: basis † (read-only)|_.RAM: basis (writable)|_.RAM: per session|
149 20 Koen Deforche
| hello| 1130  | 304 | 2580 | 372 | 28|
150 20 Koen Deforche
| composer| 1265  | 332 | 2712 | 372 | 126|
151 1
152 3 Pieter Libin
† includes shared libraries !
153 1
154 5 Pieter Libin
*Runtime-performance*
155 7 Pieter Libin
|_.Program  |_.New session (http) |_.Event (http)|
156 5 Pieter Libin
| hello | 0.58 s | 0.15 s |
157 5 Pieter Libin
|composer| 1.8 s | 0.15 s |