# miknaaz

**Repository Path**: mirrors_linuxscout/miknaaz

## Basic Information

- **Project Name**: miknaaz
- **Description**: Generate arabic golden standard corpus  for morphology and stemming 
- **Primary Language**: Unknown
- **License**: GPL-3.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-10-22
- **Last Updated**: 2026-05-17

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Miknaaz مكناز

## Description

Generate Arabic golden standard corpus for morphology and stemming 

### Citation

If you would cite it in academic work, can you use this citation

	Taha Zerrouki‏, Miknaaz,  http://github.com/linuxscout/miknaaz, 2023

or in bibtex format

	@misc{zerrouki2018miknaaz,
	  title={Miknaaz: Generate arabic golden standard},
	  author={Zerrouki, Taha},
	  url={http://github.com/linuxscout/miknaaz},
	  year={2018}
	}

## Usage

* Build word features for linguistics building corpus

```python
from miknaaz.corpus_builder import CorpusBuilder
text = u"إلى البيت"
lemmer = CorpusBuilder()
words = lemmer.tokenize(text)
for word in words:
    result = lemmer.morph_suggestions(word, True)
    print(result)
```


* Extract separate features

  ```python
  from miknaaz.corpus_builder import CorpusBuilder
  text = u"إلى البيت"
  lemmer = CorpusBuilder()
  words = lemmer.tokenize(text)
  # test get lemmas
  for word in words:
      result = lemmer.get_lemmas(word)
      # the result contains objects
      print(result)
  # test get roots
  for word in words:
      result = lemmer.get_roots(word)
      # the result contains objects
      print(result)
  # test get wordtypes
  for word in words:
      result = lemmer.get_word_type(word)
      # the result contains objects
      print(result)
  # test get wazns
  for word in words:
      result = lemmer.get_wazns(word)
      # the result contains objects
      print(result)
  ```