The written form of language is contained in printed documents, such as newspapers, magazines and books, and in handwritten matter, such as found in notebooks and personal letters. Given the importance of written language in human transactions, its automatic recognition has practical significance.
Fundamental characteristics of the written language are:
1. it consists of artificial graphical marks on a surface
2. its purpose is to communicate something
3. its purpose is achieved by virtue of the mark’s conventional relation to language
Although speech is a sign system that is more natural than writing to humans, writing is considered to have made possible much of culture and civilization.
Different writing systems, or scripts, represent linguistic units, words, syllables and phonemes. Scripts and alphabets are the primitive elements, or characters, which are used to represent words.
Several languages such as English, Dutch, French, etc, share the Latin script. The Devanagari script, which represents syllables as well as alphabets, is used by several Indian languages, including Hindi. The Chinese script, which consists of ideograms, is an alternative to alphabets. The Japanese script consists of the Chinese ideograms and syllables.
There are roughly two dozen different scripts in use today (ignoring minor differences in orthography, as between English and French). Each script has its own set of icons, known as characters or letters, that have certain basic shapes. Each script has its rules for combining the letters to represent the shapes of higher level linguistic units. For example, there are rules for combining the shapes of individual letters so as to form cursively written words in the Latin alphabet.
In addition to linguistic symbols, each script has a representation for numerals, such as the Arabic-Indic digits used in conjunction with the Latin alphabet. In addition, there are icons for special symbols found on keyboards.
History of the written language:
Since the invention of the printing press in the fifteenth century by Johannes Gutenberg (an invention whose principal elements included the movable type, an alloy for letter faces, printing mechanism and oil-based ink), most of archived written language has been in the form of printed paper documents. In such documents, text is presented as a visual image on a high contrast background, where the shapes of characters belong to families of type fonts.
Paper documents, which are an inherently analogue medium, can be converted into digital form by a process of scanning and digitalization.
More recently, it has become possible to store and view electronically prepared documents as formatted pages on a computer graphics screen, where the scanning and recognition process is eliminated. However, the elimination of printed paper documents is hardly likely, due to the convenience and high-contrast they offer.
Written language is also encountered in the form of handwriting inscribed on paper or registered on an electronically sensitive surface. Handwriting data is converted to digital form either by scanning the writing on paper or by writing with a special pen on an electronic surface.
The most important model for written language recognition is the lexicon of words. The lexicon is determined by linguistic constraints, e.g., in recognizing running text, the lexicon
for each word is constrained by the syntax, semantics and pragmatics of the sentence.
Principles of written communication:
- Clear aim
- Logical structure
- Clear layout
- Appropriate style
- Avoid unexplained abbreviations
Types of written communication:
- Layout, conventions, style and register, acceptable abbreviations, opening and closing statements
· preparation: defining report aim, report type, structure, planning the introduction, body; evaluation (if required), conclusion, logical order and layout.
· structure: introduction summarising content, logical sequence for the body of the report; evaluation (if required) and conclusion, page numbering, topic headings, index of the report and appendices
· simple style, varying sentence lengths, use of layout to break up dense text
· purpose: efficient method of communication within an organisation, gives or requests action or information
· structure: simple structure comprising sender/s and recipient/s names, date, subject heading, information, action proposed.
· style: simple, fairly short sentences, no clichés or slang, concise, informal but businesslike, easy to understand
· purpose: low cost and quick method of communication within and external to an organisation, which give or request action or information
· structure: very simple comprising the recipient´s accurate e-mail or an internet e-mail address, subject heading, information, action proposed
· style: simple, short sentences and paragraphs, no clichés or slang, informal and concise, easy to understand
- aim: to be identified before starting to write a letter
- structure: subject heading; clear and concise statements for writing the letter; brief step-by-step explanation of the context; summary of action proposed
- style: appropriate to the receiver; clarify through short words, sentences and paragraphs
When representing the knowledge of academic meaning systems we have a tendency to prefer particular ways of organising this knowledge. The informational layer of text deals with the way that information is structured or organised. There are various set patterns for the organisation of text in academic writing. In some genres or types of text, headings are used which indicate what each section contains. But even within these headings it is important to arrange information according to certain patterns and these include:
- The distribution of old (given) and new information: In a large proportion of sentences the arrangement of new and old information is arranged in a particular way. The reader is generally introduced to what he/she already knows before new information is provided. You can see that information which is in the first sentence, is presented first in the second sentence and this is then followed by the new information. This is a very common pattern and is one that enhances readability, that is, it makes what you are writing very easy to understand because it follows the patterns which readers of English are used to.
- General to specific order: Another way in which we arrange information is according to a general to specific pattern. This is a linear pattern with general ideas being introduced first and details which expand or elaborate on this general idea following
- Chronological order: Another way of organising information within a text is to follow a chronological order, that is, to follow the time in which the events that you are describing occurred. At the level of individual sentences we frequently arrange information in such a way that given or already known information precedes new information and what is new in the previous sentence is given or old information in the next sentence and itself precedes further new information.
We might change these patterns when we want to change the focus of our writing or introduce a new topic.
- Chronological Sequence: Questions to consider:
· What happened?
· What is the sequence of events?
· What are the substages?
- Comparison/Contrast: Questions to consider
· What are the similar and different qualities of these things?
· What qualities of each thing correspond to one another? In what way?
- Description: Questions to consider:
· What are you describing?
· What are its qualities?
- Point of View: Questions to consider:
· What are the various perspectives?
· How do they impact behaviour?
· What contributed to their development?
- Problem/Solution: Questions to consider:
· What is the problem?
· What are the possible solutions?
· Which solution is best?
· How will you implement this solution?
- Process/Cause and Effect: Questions to consider
· What are the causes and effects of this event?
· What might happen next?
1. Text purpose. The purpose or aim of a text depends on the role of the writer, reader, the signal or the linguistic product, and the reality. When the main emphasis is on the writer, the aim becomes expressive, intending to convey emotion, individuality, and aspirations. When the focus is on the reader, the aim becomes persuasive, seeking to elicit a particular stance or reaction from him or her. The emphasis on the signal results in a literary purpose, aiming for an appreciation of the language of the text. When reality is dominant, a referential aim evolves in which the author writes exploratory, informative, and scientific texts.
2. Unity of focus. Unity refers to the writer’s ability to convey and maintain the purpose or aim throughout the text.
3. Organizational structure. In order to organize a text, a writer must include both a macrostructure (a network of main ideas) and a microstructure (supporting details) that provide the foundation for the main ideas. The ideas cast in the form of words, sentences, and paragraphs should be cohesive or well linked.
4. Development and validity of ideas. Writers achieve development by providing sufficient explanation, depth, and proof, often in the form of primary or secondary sources that include anecdotes, quotations, dialogue, observations, and philosophical principles. Validity refers to the truth or accuracy of the writer’s ideas. This factor is critical when students are learning information from a text or undergoing conceptual changes.
5. Stylistic expression. Stylistic expression is often evident in the clarity, variation, and uniqueness of words, phrases, and clauses used in a fashion appropriate for the desired aim or purpose of the text. That is, texts that are more referential in aim may include less emotionally charged words. Also, stylistic considerations include decisions about whether a word is commonly known, how concrete or abstract the language should be, and the use of a variety of sentence types and patterns.
6. Correct mechanics. This concerns the surface features of the text, including standard conventions of language usage such as grammatical correctness and proper punctuation.