Overview
Comment: | tools/smaz: implement the new forced words feature |
---|---|
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
e252a053ef7243e091e720850ca88902 |
User & Date: | nat on 2017-05-09 20:45:21 |
Other Links: | manifest | tags |
Context
2017-05-10
| ||
20:59 | tools/smaz: add a new action for dedicated dictionary modification check-in: 9ab0cc7cbf user: nat tags: trunk | |
2017-05-09
| ||
20:45 | tools/smaz: implement the new forced words feature check-in: e252a053ef user: nat tags: trunk | |
2017-05-08
| ||
21:48 | tools/smaz: add a command-line option for a list of forced words check-in: 95e42d23fe user: nat tags: trunk | |
Changes
Modified tools/smaz.adb from [390ec3a8af] to [e6de65592c].
︙ | ︙ | |||
285 286 287 288 289 290 291 292 293 294 295 296 297 298 | Hash_Package_Name : in String := "") is <>; with function Remove_Element (Dict : in Dictionary; Element : in Dictionary_Entry) return Dictionary; Score_Encoded, Score_Frequency, Score_Gain : in access function (D : in Dictionary; C : in Dictionary_Counts; E : in Dictionary_Entry) return Score_Value; | > > > > > > | 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 | Hash_Package_Name : in String := "") is <>; with function Remove_Element (Dict : in Dictionary; Element : in Dictionary_Entry) return Dictionary; with function Replace_Element (Dict : in Dictionary; Element : in Dictionary_Entry; Value : in String) return Dictionary; Score_Encoded, Score_Frequency, Score_Gain : in access function (D : in Dictionary; C : in Dictionary_Counts; E : in Dictionary_Entry) return Score_Value; |
︙ | ︙ | |||
321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 | Method : in Methods) return Dictionary_Entry; package Dictionary_Subprograms is package Holders is new Ada.Containers.Indefinite_Holders (Dictionary); procedure Evaluate_Dictionary (Job_Count : in Natural; Dict : in Dictionary; Corpus : in String_Lists.List; Compressed_Size : out Ada.Streams.Stream_Element_Count; Counts : out Dictionary_Counts); -- Dispatch to parallel or non-parallel version of -- Evaluate_Dictionary depending on Job_Count. function Image (Dict : in Dictionary; Code : in Dictionary_Entry) return Natools.S_Expressions.Atom; -- S-expression image of Code procedure Optimization_Round (Dict : in out Holders.Holder; Score : in out Ada.Streams.Stream_Element_Count; Counts : in out Dictionary_Counts; Pending_Words : in out String_Lists.List; Input_Texts : in String_Lists.List; | > > > > > > > > > > > | 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 | Method : in Methods) return Dictionary_Entry; package Dictionary_Subprograms is package Holders is new Ada.Containers.Indefinite_Holders (Dictionary); function Adjust_Dictionary (Handler : in Callback'Class; Dict : in Dictionary; Corpus : in String_Lists.List; Method : in Methods) return Dictionary; -- Adjust the given dictionary according to info in Handle procedure Evaluate_Dictionary (Job_Count : in Natural; Dict : in Dictionary; Corpus : in String_Lists.List; Compressed_Size : out Ada.Streams.Stream_Element_Count; Counts : out Dictionary_Counts); -- Dispatch to parallel or non-parallel version of -- Evaluate_Dictionary depending on Job_Count. function Image (Dict : in Dictionary; Code : in Dictionary_Entry) return Natools.S_Expressions.Atom; -- S-expression image of Code function Is_In_Dict (Dict : Dictionary; Word : String) return Boolean; -- Return whether Word is in Dict (inefficient) procedure Optimization_Round (Dict : in out Holders.Holder; Score : in out Ada.Streams.Stream_Element_Count; Counts : in out Dictionary_Counts; Pending_Words : in out String_Lists.List; Input_Texts : in String_Lists.List; |
︙ | ︙ | |||
392 393 394 395 396 397 398 399 400 401 402 403 404 405 | -- Convert the input into a dictionary given the option in Handler end Dictionary_Subprograms; package body Dictionary_Subprograms is procedure Evaluate_Dictionary (Job_Count : in Natural; Dict : in Dictionary; Corpus : in String_Lists.List; Compressed_Size : out Ada.Streams.Stream_Element_Count; Counts : out Dictionary_Counts) | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 | -- Convert the input into a dictionary given the option in Handler end Dictionary_Subprograms; package body Dictionary_Subprograms is function Adjust_Dictionary (Handler : in Callback'Class; Dict : in Dictionary; Corpus : in String_Lists.List; Method : in Methods) return Dictionary is begin if Handler.Forced_Words.Is_Empty or else Corpus.Is_Empty then return Dict; end if; Add_Forced_Words : declare Actual_Dict : constant Dictionary := Activate_Dictionary (Dict); Counts : Dictionary_Counts; Discarded_Size : Ada.Streams.Stream_Element_Count; Replacement_Count : String_Count; Current : Holders.Holder := Holders.To_Holder (Actual_Dict); begin Evaluate_Dictionary (Handler.Job_Count, Actual_Dict, Corpus, Discarded_Size, Counts); Replacement_Count := Counts (Counts'First); for I in Counts'Range loop if Replacement_Count < Counts (I) then Replacement_Count := Counts (I); end if; end loop; for Word of Handler.Forced_Words loop if not Is_In_Dict (Actual_Dict, Word) then declare Worst_Index : constant Dictionary_Entry := Worst_Element (Actual_Dict, Counts, Method); New_Dict : constant Dictionary := Replace_Element (Current.Element, Worst_Index, Word); begin Ada.Text_IO.Put_Line (Ada.Text_IO.Current_Error, "Removing" & Counts (Worst_Index)'Img & "x " & Natools.String_Escapes.C_Escape_Hex (Dict_Entry (Actual_Dict, Worst_Index), True) & " at" & Worst_Index'Img & ", replaced by " & Natools.String_Escapes.C_Escape_Hex (Word, True)); Current := Holders.To_Holder (New_Dict); Counts (Worst_Index) := Replacement_Count; end; end if; end loop; return Current.Element; end Add_Forced_Words; end Adjust_Dictionary; procedure Evaluate_Dictionary (Job_Count : in Natural; Dict : in Dictionary; Corpus : in String_Lists.List; Compressed_Size : out Ada.Streams.Stream_Element_Count; Counts : out Dictionary_Counts) |
︙ | ︙ | |||
420 421 422 423 424 425 426 427 428 429 430 431 432 433 | (Dict : in Dictionary; Code : in Dictionary_Entry) return Natools.S_Expressions.Atom is begin return Compress (Dict, Dict_Entry (Dict, Code)); end Image; procedure Optimization_Round (Dict : in out Holders.Holder; Score : in out Ada.Streams.Stream_Element_Count; Counts : in out Dictionary_Counts; Pending_Words : in out String_Lists.List; Input_Texts : in String_Lists.List; | > > > > > > > > > > > > | 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 | (Dict : in Dictionary; Code : in Dictionary_Entry) return Natools.S_Expressions.Atom is begin return Compress (Dict, Dict_Entry (Dict, Code)); end Image; function Is_In_Dict (Dict : Dictionary; Word : String) return Boolean is begin for Code in Dictionary_Entry'First .. Last_Code (Dict) loop if Dict_Entry (Dict, Code) = Word then return True; end if; end loop; return False; end Is_In_Dict; procedure Optimization_Round (Dict : in out Holders.Holder; Score : in out Ada.Streams.Stream_Element_Count; Counts : in out Dictionary_Counts; Pending_Words : in out String_Lists.List; Input_Texts : in String_Lists.List; |
︙ | ︙ | |||
638 639 640 641 642 643 644 | procedure Process (Handler : in Callback'Class; Word_List : in String_Lists.List; Data_List : in String_Lists.List; Method : in Methods) is Dict : constant Dictionary := Activate_Dictionary | > > | > > | 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 | procedure Process (Handler : in Callback'Class; Word_List : in String_Lists.List; Data_List : in String_Lists.List; Method : in Methods) is Dict : constant Dictionary := Activate_Dictionary (Adjust_Dictionary (Handler, To_Dictionary (Handler, Word_List, Method), Data_List, Method)); Sx_Output : Natools.S_Expressions.Printers.Canonical (Ada.Text_IO.Text_Streams.Stream (Ada.Text_IO.Current_Output)); Ada_Dictionary : constant String := Ada.Strings.Unbounded.To_String (Handler.Ada_Dictionary); Hash_Package : constant String := Ada.Strings.Unbounded.To_String (Handler.Hash_Package); begin |
︙ | ︙ | |||
1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 | Decompress => Natools.Smaz_256.Decompress, Dict_Entry => Natools.Smaz_256.Dict_Entry, Evaluate_Dictionary => Tools_256.Evaluate_Dictionary, Evaluate_Dictionary_Partial => Tools_256.Evaluate_Dictionary_Partial, Filter_By_Count => Natools.Smaz_Tools.Filter_By_Count, Last_Code => Last_Code, Remove_Element => Tools_256.Remove_Element, Score_Encoded => Tools_256.Score_Encoded'Access, Score_Frequency => Tools_256.Score_Frequency'Access, Score_Gain => Tools_256.Score_Gain'Access, Simple_Dictionary => Natools.Smaz_Tools.Simple_Dictionary, Simple_Dictionary_And_Pending => Natools.Smaz_Tools.Simple_Dictionary_And_Pending, To_Dictionary => Tools_256.To_Dictionary, | > | 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 | Decompress => Natools.Smaz_256.Decompress, Dict_Entry => Natools.Smaz_256.Dict_Entry, Evaluate_Dictionary => Tools_256.Evaluate_Dictionary, Evaluate_Dictionary_Partial => Tools_256.Evaluate_Dictionary_Partial, Filter_By_Count => Natools.Smaz_Tools.Filter_By_Count, Last_Code => Last_Code, Remove_Element => Tools_256.Remove_Element, Replace_Element => Tools_256.Replace_Element, Score_Encoded => Tools_256.Score_Encoded'Access, Score_Frequency => Tools_256.Score_Frequency'Access, Score_Gain => Tools_256.Score_Gain'Access, Simple_Dictionary => Natools.Smaz_Tools.Simple_Dictionary, Simple_Dictionary_And_Pending => Natools.Smaz_Tools.Simple_Dictionary_And_Pending, To_Dictionary => Tools_256.To_Dictionary, |
︙ | ︙ | |||
1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 | Decompress => Natools.Smaz_4096.Decompress, Dict_Entry => Natools.Smaz_4096.Dict_Entry, Evaluate_Dictionary => Tools_4096.Evaluate_Dictionary, Evaluate_Dictionary_Partial => Tools_4096.Evaluate_Dictionary_Partial, Filter_By_Count => Natools.Smaz_Tools.Filter_By_Count, Last_Code => Last_Code, Remove_Element => Tools_4096.Remove_Element, Score_Encoded => Tools_4096.Score_Encoded'Access, Score_Frequency => Tools_4096.Score_Frequency'Access, Score_Gain => Tools_4096.Score_Gain'Access, Simple_Dictionary => Natools.Smaz_Tools.Simple_Dictionary, Simple_Dictionary_And_Pending => Natools.Smaz_Tools.Simple_Dictionary_And_Pending, To_Dictionary => Tools_4096.To_Dictionary, | > | 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 | Decompress => Natools.Smaz_4096.Decompress, Dict_Entry => Natools.Smaz_4096.Dict_Entry, Evaluate_Dictionary => Tools_4096.Evaluate_Dictionary, Evaluate_Dictionary_Partial => Tools_4096.Evaluate_Dictionary_Partial, Filter_By_Count => Natools.Smaz_Tools.Filter_By_Count, Last_Code => Last_Code, Remove_Element => Tools_4096.Remove_Element, Replace_Element => Tools_4096.Replace_Element, Score_Encoded => Tools_4096.Score_Encoded'Access, Score_Frequency => Tools_4096.Score_Frequency'Access, Score_Gain => Tools_4096.Score_Gain'Access, Simple_Dictionary => Natools.Smaz_Tools.Simple_Dictionary, Simple_Dictionary_And_Pending => Natools.Smaz_Tools.Simple_Dictionary_And_Pending, To_Dictionary => Tools_4096.To_Dictionary, |
︙ | ︙ | |||
1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 | Decompress => Natools.Smaz_64.Decompress, Dict_Entry => Natools.Smaz_64.Dict_Entry, Evaluate_Dictionary => Tools_64.Evaluate_Dictionary, Evaluate_Dictionary_Partial => Tools_64.Evaluate_Dictionary_Partial, Filter_By_Count => Natools.Smaz_Tools.Filter_By_Count, Last_Code => Last_Code, Remove_Element => Tools_64.Remove_Element, Score_Encoded => Tools_64.Score_Encoded'Access, Score_Frequency => Tools_64.Score_Frequency'Access, Score_Gain => Tools_64.Score_Gain'Access, Simple_Dictionary => Natools.Smaz_Tools.Simple_Dictionary, Simple_Dictionary_And_Pending => Natools.Smaz_Tools.Simple_Dictionary_And_Pending, To_Dictionary => Tools_64.To_Dictionary, | > | 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 | Decompress => Natools.Smaz_64.Decompress, Dict_Entry => Natools.Smaz_64.Dict_Entry, Evaluate_Dictionary => Tools_64.Evaluate_Dictionary, Evaluate_Dictionary_Partial => Tools_64.Evaluate_Dictionary_Partial, Filter_By_Count => Natools.Smaz_Tools.Filter_By_Count, Last_Code => Last_Code, Remove_Element => Tools_64.Remove_Element, Replace_Element => Tools_64.Replace_Element, Score_Encoded => Tools_64.Score_Encoded'Access, Score_Frequency => Tools_64.Score_Frequency'Access, Score_Gain => Tools_64.Score_Gain'Access, Simple_Dictionary => Natools.Smaz_Tools.Simple_Dictionary, Simple_Dictionary_And_Pending => Natools.Smaz_Tools.Simple_Dictionary_And_Pending, To_Dictionary => Tools_64.To_Dictionary, |
︙ | ︙ | |||
1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 | Dict_Entry => Natools.Smaz.Dict_Entry, Evaluate_Dictionary => Natools.Smaz.Tools.Evaluate_Dictionary, Evaluate_Dictionary_Partial => Natools.Smaz.Tools.Evaluate_Dictionary_Partial, Filter_By_Count => Natools.Smaz.Tools.Filter_By_Count, Last_Code => Last_Code, Remove_Element => Natools.Smaz.Tools.Remove_Element, Score_Encoded => Natools.Smaz.Tools.Score_Encoded'Access, Score_Frequency => Natools.Smaz.Tools.Score_Frequency'Access, Score_Gain => Natools.Smaz.Tools.Score_Gain'Access, Simple_Dictionary => Natools.Smaz.Tools.Simple_Dictionary, Simple_Dictionary_And_Pending => Natools.Smaz.Tools.Simple_Dictionary_And_Pending, To_Dictionary => Natools.Smaz.Tools.To_Dictionary, | > | 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 | Dict_Entry => Natools.Smaz.Dict_Entry, Evaluate_Dictionary => Natools.Smaz.Tools.Evaluate_Dictionary, Evaluate_Dictionary_Partial => Natools.Smaz.Tools.Evaluate_Dictionary_Partial, Filter_By_Count => Natools.Smaz.Tools.Filter_By_Count, Last_Code => Last_Code, Remove_Element => Natools.Smaz.Tools.Remove_Element, Replace_Element => Natools.Smaz.Tools.Replace_Element, Score_Encoded => Natools.Smaz.Tools.Score_Encoded'Access, Score_Frequency => Natools.Smaz.Tools.Score_Frequency'Access, Score_Gain => Natools.Smaz.Tools.Score_Gain'Access, Simple_Dictionary => Natools.Smaz.Tools.Simple_Dictionary, Simple_Dictionary_And_Pending => Natools.Smaz.Tools.Simple_Dictionary_And_Pending, To_Dictionary => Natools.Smaz.Tools.To_Dictionary, |
︙ | ︙ |