Chamblon Systems Inc. TerminologyExtractor Example 2

[ Home ] [ TerminologyExtractor ] [ Quant ] [ Contact us ]

 [ Order ]  [ How it works ]  [ Download page ]  [ Example 1 ]

Example 2 - Internet RFC

Internet RFCs (Request for Comments) are documents that contain the specifications for the Internet. We analyzed RFC 1716, which is 186 pages long, using TerminologyExtractor. It produced a list of about 2500 collocations and 2600 words.

The list of words with their frequency produced by TerminologyExtractor starts as follows:

router 1356
route 825
ip 723
network 482
address 458
protocol 452
packet 409
requirement 331
RFC 325
internet 244
layer 242
host 242
destination 236
option 229

The word list is nice. However, it is by looking at the collocations produced by TerminologyExtractor that we get a real feel for what the document is about. The collocations that appear with a high frequency are clearly terms that have to be managed as such: 

ip router 203
requirement for ip router 192
Almquist * Kastenholz 192
link layer 95
ip address 93
route protocol 87
request for comment 79
destination address 76
source route 66
directed broadcast 58
ip header 54
source address 46
ip datagram 45
subnet mask 42
route option 42
route information 41
ICMP error 38
layer protocol 36
error message 36
type of service 35
gateway protocol 35
internet protocol 34
ICMP error message 33
static route 31
destination unreachable 31
autonomous system 31
logical interface 29
ip multicast 29
ip broadcast 29
configuration option 29
network * number 28
describe in section 28
broadcast address 28
source route option 27
physical interface 27
subnet * directed 26
record route 26
subnet * directed broadcast 24

It took TerminologyExtractor no more than 15 seconds to create these two lists. If this document had to be translated into several languages, how much time do you think it would save?