In January 1999 while upgrading server operating system from Slackware 3.4 to Slackware 3.6, the following idea has visited my mind: what if I will take server logs for 1998 and try to extract something useful from them before archiving to CD or deleting? The logs actually contain a lot of information, but this article only deals with single but important aspect: what browser was used to view my pages. Technically, it is called 'User Agent Log' (browser is an agent from server's point of view).
The total size of the logfile is about 100MB uncompressed (this is pure User-Agent log, without Access, Referrer or Error logs). It contains 2,757,324 records. Of them, at least 400,000 belong to various web indexers, robots and mass downloaders, so we discard them from the start. Remaining 2,350,000 entries will be considered real browser data, and further percentages will be given relative to this number.
The every line of the log contains a string which identifies the user agent (which is not necessarily the WWW browser). Sometimes it is as simple as
Mozilla/2.02 (OS/2; I)but sometimes it's as complex as
Mozilla/2.01S (X11; I; IRIX 5.3 IP22) via proxy gateway CERN-HTTPD/3.0 libwww/2.17The initial processing was to run
sort agent_log | uniq -c | sort -r >agent_log.statsThis gave the listing where every browser is listed on its own line, together with number indicating how many hits have been generated by it. This file is only 970KB. However, every version of the browser is on its own line, and for some sorts of queries this is not quite useful. When counting all hits from all versions of some browser, it is much easier to use simple fgrep:
fgrep -i lynx agent_log | wc -l
121277 Mozilla/4.0 (compatible; MSIE 4.01; Windows 98) 119546 Mozilla/4.0 (compatible; MSIE 4.01; Windows 95) 119033 Mozilla/4.0 (compatible; MSIE 4.0; Windows 95) 110693 Mozilla/3.0 (compatible; MSIE 3.0) 97114 Mozilla/2.0 (compatible; MSIE 3.02; Update a; Windows 95) 94184 Mozilla/4.04 [en] (Win95; I) 79309 Mozilla/3.0 (Win95; I) 61271 Mozilla/2.0 (compatible; MSIE 3.01; Windows 95) 61214 Teleport Pro/1.28 58402 Mozilla/2.0 (compatible; MSIE 3.0; Windows 95) 55084 Mozilla/4.05 [en] (Win95; I) 39879 Mozilla/4.0 (compatible; MSIE 4.01; Windows NT) 39485 Mozilla/2.0 (compatible; MSIE 3.02; Windows 95) 32486 Arkanavt/1.02.015 (compatible; Win16; I) 31868 Mozilla/4.03 [en] (Win95; I) 29425 Teleport Pro/1.29 27537 Mozilla/4.01 [en] (Win95; I) 25959 Mozilla/3.01 (Win95; I) 25100 Mozilla/3.01Gold (Win95; I) 21457 Mozilla/2.02 (OS/2; I) 20410 Mozilla/3.01 (WinNT; I) 19758 WebZIP/2.32 (http://www.spidersoft.com) 18604 Mozilla/3.0 (Win95; I; HTTPClient 1.0) 17280 IBM-WebExplorer-DLL/v1.2 17027 Mozilla/4.0 [en] (Win95; I) 17010 Mozilla/2.0 (compatible; MSIE 3.02; Update a; AK; Windows 95) 16915 Mozilla/2.0 (compatible; MSIE 3.02; AK; Windows 95) 16897 Mozilla/3.0 16082 Mozilla/4.0 (compatible; MSIE 4.0; Windows NT) 15107 Mozilla/4.04 [en] (WinNT; I)As you probably know, many browsers (including MSIE) pretend to be called Mozilla which is some kind of internal name for Netscape Navigator. Apparently this is done to fool some overly clever web designers who check the browser id string and generate different output for different browsers or direct them to different pages. Usually browsers not from Netscape Corp. have the keyword "compatible" in their id string, to be distinguished from genuine Netscape software. MSIE also contains its own signature; for example: Mozilla/2.0 (compatible; MSIE 3.02; Windows 95). This is how we can tell Navigator from IE and some others (Opera, BeOS NetPositive).
2164967 92.1% Mozilla (total) 1067083 45.4% Netscape 1077591 45.8% MSIE 20293 0.9% Mozilla (neither Netscape nor MSIE)So to speak, the two most popular browsers have got a tie in this race.
The most touted alternative browser is Opera. It has produced 4734 hits (0.2%). Not quite big number. Arkanavt for Windows 3.1 has collected about 32,500 hits (1.3%); it is little known outside of Russia. Teleport Pro is an `offline browser' which means it is closer to robots than to real browsers (robots are different because they are downloading more pages than human will ever want to read). Various versions and direct derivatives of Windows Mosaic did not reach 500 hits:
314 SPRY_Mosaic/v8.24 (Windows 16-bit) SPRY_package/v4.00 111 SPRY_Mosaic/v8.32 (Windows 16-bit) SPRY_package/v4.00 39 SPRY_Mosaic/v8.08 (Windows 16-bit) SPRY_package/v4.00 29 Spyglass_Mosaic/2.10 Windows Datastorm/3.00 18 SPRY_Mosaic/v8.25 (Windows 16-bit) SPRY_package/v3.10 12 SPRY_Mosaic/v7.36 (Windows 16-bit) SPRY_package/v4.00 11 Multilingual_Mosaic/1.0e Win32 Accent/69 6 Multilingual Mosaic/1.0c Win32 Accent/0024All these numbers tell us that two giants (Netscape and IE) dwarf any competitive products. Should I remind you that their total share is about 91%?
However, good Web writer should always keep in mind that even if Windows is popular, it is not the only game in town. Linux quickly gains its share, loyal users of commercial Unixes won't give up their boxes and OS/2 while being proclaimed dead for years is still in use. The most crossplatform browser is undoubtedly Lynx. But since it is textmode-based not much people use it.
8687 0.37% Lynx (virtually any OS) 17541 0.75% IBM WebExplorer (OS/2) 959 0.04% NetPositive (BeOS)WebExplorer is an older browser for OS/2. Its development has been stopped by IBM in approximately 1994; it supports HTML 2.0 and tables. The hit number is probably not authoritative because it's my primary browser. BeOS does not seem to have user base of any significance and I really wonder why this number is so low. Do these people only tend to visit Be-specific sites? Or maybe most BeOS users are software developers who don't have enough time to aimlessly wander around? :-)
7154 0.6% Netscape 1.x 69622 6.5% Netscape 2.x 443537 41.5% Netscape 3.x 544036 51.0% Netscape 4.x (current) 93 0.0% Netscape 5 (alpha; Open source Mozilla project) 1067083 Netscape total 4 0.0% MSIE 1.x 17455 1.6% MSIE 2.x 532351 49.4% MSIE 3.x 520922 48.3% MSIE 4.x (current) 6799 0.6% MSIE 5.x (beta) 1077591 MSIE totalThe data tells us that at least half of the installed browser base is from previous version. This is very suspectible generalization and apparently it depends on when new versions are released. Nevertheless users of MSIE seem to upgrade more often; at least number of MSIE 1.x and 2.x users is negligible, unlike corresponding counts for Navigator. The number of hits from Netscape 1 (which does not support even frames) is rather surprising.
8.5% of users of Windows 98 have replaced their built-in IE4 with Navigator:
13649 8.5% Netscape on Win98 159645 100.0% Windows 98 (total)
89880 Slurp/2.0 (firstname.lastname@example.org; http://www.inktomi.com/slurp.html) 33227 AltaVista Intranet V1.0 msu.ru email@example.com 33191 ArchitextSpider 30277 InfoSeek Sidewinder/0.9 12545 Aport 8113 Scooter/1.0 firstname.lastname@example.org 6229 Googlebot/1.0 email@example.com http://googlebot.com/ 4047 Scooter/2.0 G.R.A.B. X2.0 4044 StackRambler/1.4 3776 Scooter/2.0 G.R.A.B. V1.1.0 3376 ia_archiver/1.6 2909 Openfind Robot/1.1A2 631 WebCrawler/3.0 Robot libwww/5.0a 564 ArchitextSpider/ libwww/5.0a 472 Lycos_Spider_(T-Rex)The most active are Inktomi (Slurp), Excite (ArchitextSpider) and Infoseek. Second line tells me that somebody wants to index Moscow State University pages. Google Search is a newcomer; personally, I like its clean and uncluttered interface. Hope they won't pollute it with ad banners too soon. Google usually gives very relevant links (unlike many other search engines). Aport and Rambler are Russian search engines. Altavista (Scooter) is rather quiet, WebCrawler is forgotten, and how do I supposed to trust Lycos after that number of hits? Perhaps they buy datasets from others?
Some people prefer to alter their browser their browser id string to look funny:
343 Nutscrape/1.0 (CP/M; 8-bit) 36 Lord Vishnu/Transcendental (Vaikuntha;Supreme Personality of Godness) 21 TakSebeProxy/0.0 (CP/M; 1-bit) -- you have to be Russian to understand this 14 Nutscrape/1.0 (CP/M; 8-bit) via NetCache version NetApp Release 3.2.1: Thu May 21 16:33:01 PDT 1998 11 Microsoft Pocket Internet Explorer/0.6 -- hmmm... will it explore my pockets??? 9 James_Bond/007 (CP/M; 8-bit) 8 Nutscrape/1.001beta (CP/M; 8-bit) 6 Psychoscape/1.11 (MaSaDoSa;12-bit) 6 None/1.0 (CP/M; 8-bit) 6 MFC_Tear_Sample -- what is that??? 5 Nutscrape/1.1 (DRDOS; 8-bit) 5 InterNetscape Operasaic Browser, Zeta Release (The one that don't crash) -- doesn't crash??? 4 This is not text/Warpzilla 0.005 4 Nutscrape-1.0 (CP/M; 8-bit) 4 I am not a number, I am a human being! 2 Boomscape/1.0 (Dog; 4-bit) 2 Bidon/1.0 (CP/M; 12-bit) 1 The Knights Who Say Ni (Highlander; M; There Can Be Only One) 1 Nutscrape/1.0 (DRDOS; 8-bit) 1 Nutscrape/1.0 (CP/M; 8-bit) 1 MSIE/4.0 (not Windows,really,AWEB for Amiga!); (Spoofed by Amiga-AWeb/3.2) 1 Fake Browser #2 1 BPFTP/1.07/ILLEGAL COPY! THIS PROGRAM HAS BEEN MODIFIED!The top line reflects a hint from Squid configuration file about using fake_user_agent directive.
There are browsers for rather obscure operating systems or devices:
7269 Mozilla/3.0 WebTV/1.2 (compatible; MSIE 2.0) 342 Nokia-Communicator-WWW-Browser/1.0 (Geos 3.0 Nokia-9000) 48 xChaos_Arachne/1.20;overlaid;beta 7 (DOS x86; 640x480,16c; firstname.lastname@example.org; http://www.naf.cz/arachne/) 42 Charlotte/2.1.0 VM_ESA/2.2.0 CMS/13 1 ArcWeb/1.91 (Acorn RISC OS; StrongARM) 1 Mozilla/1.22 (compatible; NetBox/1.0 R77; NEOS 5.15) 1 Mozilla/2.0 (compatible; QNX Voyager 1.0 ;Photon) 1 Mozilla/2.0 EasyRider-XT/22.214.171.124 (ARM; 32bit; compatible; MSIE 2.0; IA-PAL) libwww/2.17 modified
Windows 95 1372044 63.9% Windows NT 184982 8.6% Windows 98 159645 7.4% Windows 3.1 96809 4.5% Win32 ? 126123 5.8% Windows total 1939603 90.4% Linux 25601 1.2% Solaris/SunOS 17906 0.8% FreeBSD 6981 0.3% HP-UX 4287 0.2% IRIX 3516 0.16% BSD/OS 2298 0.11% AIX 1868 0.09% OSF/1 1534 0.07% Unix total 63991 3.0% Macintosh 78189 3.6% OS/2 54433 2.5% WebTV 7288 0.3% Amiga 1297 0.06% BeOS 959 0.04% ------------------------------------ Total 2145760 100%Since not all browsers indicate on which system they run, the total does not match total number of hits (the offset is about 8.5%). You can draw your own conclusions from this table.
Further reading: list of browsers on BrowserWatch, web server survey on NetCraft.
Copyright (C) 1999 Sergey Ayukov. No part of this text can be reproduced
20 March 1999